正则表达式
正则表达式是一种用于匹配和操作文本的强大工具,它是由一系列字符和特殊字符组成的模式,用于描述要匹配的文本模式。
正则表达式可以在文本中查找、替换、提取和验证特定的模式。
正则表达式的模式
- 字面值字符:例如字母、数字、空格等,可以直接匹配它们自身。
- 特殊字符:例如点号
.
、星号*
、加号+
、问号?
等,它们具有特殊的含义和功能。 - 字符类:用方括号
[ ]
包围的字符集合,用于匹配方括号内的任意一个字符。 - 元字符:例如
\d
、\w
、\s
等,用于匹配特定类型的字符,如数字、字母、空白字符等。 - 量词:例如
{n}
、{n,}
、{n,m}
等,用于指定匹配的次数或范围。 - 边界符号:例如
^
、$
、\b
、\B
等,用于匹配字符串的开头、结尾或单词边界位置。
特殊字符
$ | 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性,则 也匹配 '\n' 或 '\r'。要匹配 字符本身,请使用 $。 |
---|---|
( ) | 标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符,请使用( 和 )。 |
* | 匹配前面的子表达式零次或多次。要匹配 * 字符,请使用*。 |
+ | 匹配前面的子表达式一次或多次。要匹配 + 字符,请使用+。 |
. | 匹配除换行符 \n 之外的任何单字符。要匹配 . ,请使用. 。 |
[ | 标记一个中括号表达式的开始。要匹配 [,请使用[。 |
? | 匹配前面的子表达式零次或一次,或指明一个非贪婪限定符。要匹配 ? 字符,请使用?。 |
\ | 将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如, 'n' 匹配字符 'n'。'\n' 匹配换行符。序列 '\' 匹配 "",而 '(' 则匹配 "("。 |
^ | 匹配输入字符串的开始位置,除非在方括号表达式中使用,当该符号在方括号表达式中使用时,表示不接受该方括号表达式中的字符集合。要匹配 ^ 字符本身,请使用^。 |
{ | 标记限定符表达式的开始。要匹配 {,请使用{。 |
正则表达式实践
最近在公司接到一个新需求,背景是,在公司平台研发页面接入不同数据源,可以直接不同数据源可视化建表和 ddl 建表,填写一些基本属性,增加导入导出功能,目前支持业内主流的数据源,如 mysql,pgsql,gbase,hive,spark,hbase 等等
- 可视化建表
根据用户填写的表名,字段,字段类型,字段描述进行组装 ddlsql,根据相应的数据源获取相应的 JDBC connection 连接,执行 execute,在数据库服务器建表,在元数据库存储相应对应表的元数据 - ddl 建表
根据相应的数据源获取相应的 JDBC connection 连接,执行 execute,在数据库服务器建表,在元数据库存储相应对应表的元数据,比可视化建表区别是,存储元数据到数据库是,需要根据 ddl 解析 fieldsInfo 等相关信息
解析 fieldsInfo 字段
- connection 执行 prestatment 执行表结构,根据返回的表结构获取相应的字段(太麻烦,可读性也不好,还需要浪费 connection 资源)
- 正则匹配 ddl 建表语句(重点)
正则表达式匹配 ddl 思路
- 根据不同的数据源,每个数据源下有不同的实现类,抽离成工厂模式
- 项目结构用领域驱动,构建领域驱动模型
- ddl 建表语句有特殊转义,需要处理
- 检验 ddl 建表语句,只支持 create,防止 SQL 注入
- 实现正则表达式,解析每个字段
解析 ddl 代码实现
private final Map<ParamKey<?>, Object> params = new HashMap<>(4);
public static CreateDdlParser create(DbType dbType, String createDdl) {
switch (dbType) {
case HIVE:
case SPARK:
return new HiveCreateDdlParser(createDdl);
case HBASE:
return new HBaseCreateDdlParser(createDdl);
case POSTGRESQL:
case DORIS:
case GBASE:
return new JdbcTableCreateDdlParser(createDdl);
default:
throw new UnsupportedOperationException("不支持此类型: " + dbType);
}
}
private final String escapedCreateDdl;
private static final Pattern PATTERN_COLUMN = Pattern.compile(
"\"(\\w+)\"\\s+(\\w+(?:\\(\\d+(,\\s*\\d+)?\\))?)", Pattern.CASE_INSENSITIVE);
private static final Pattern PATTERN_TABLE_NAME = Pattern.compile("^(\\s+)?CREATE\\s+TABLE\\s+(\\w+)", Pattern.CASE_INSENSITIVE);
public JdbcTableCreateDdlParser(String createDdl) {
this.escapedCreateDdl = escape(createDdl);
}
@Override
public TableInfo parseTableInfo(TableInfo tableInfo) {
Matcher matcher = PATTERN_COLUMN.matcher(escapedCreateDdl);
List<FieldInfo> fieldInfos = new ArrayList<>();
while (matcher.find()) {
String fieldName = matcher.group(1);
String fieldType = matcher.group(2);
String fieldDescription = matcher.group(3) != null ? matcher.group(3) : "";
if (!("PRIMARY".equalsIgnoreCase(fieldName) || "PRIMARY".equalsIgnoreCase(fieldType))
&& !"CREATE".equalsIgnoreCase(fieldName) && !"default".equalsIgnoreCase(fieldName)) {
FieldInfo fieldInfo = new FieldInfo();
fieldInfo.setFieldName(fieldName);
fieldInfo.setFieldType(fieldType);
fieldInfo.setFieldComment(fieldDescription);
fieldInfos.add(fieldInfo);
}
}
tableInfo.setFieldInfo(JSONUtils.toJsonString(fieldInfos));
return tableInfo;
}
@Override
public String getTableName() {
Matcher matcher = PATTERN_TABLE_NAME.matcher(escapedCreateDdl);
return matcher.find() ? matcher.group(2) : "";
}
public static final ParamKey<Boolean> PARAM_NEED_LOCATION_PREFIX = new ParamKey<>();
public static final ParamKey<String> PARAM_LOCATION_PREFIX = new ParamKey<>();
public static final ParamKey<String> PARAM_TABLE_PROPERTIES = new ParamKey<>();
/**
* 匹配字段列表
*/
private static final Pattern PATTERN_COLUMNS = Pattern.compile(
"create[^(]+table[^(]+\\((.+?)\\)\\n", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
/**
* 匹配分区字段列表
*/
private static final Pattern PATTERN_PARTITIONED_COLUMNS = Pattern.compile(
"partitioned\\s+by\\s+\\(([^)]+)", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
/**
* 匹配字段
*/
private static final Pattern PATTERN_COLUMN = Pattern.compile(
"`?([^`\\s]+)`?\\s+([\\w<>(),]+)(?:\\s+comment\\s+'([^']*)?')?,?", Pattern.CASE_INSENSITIVE);
/**
* 匹配存储路径
*/
private static final Pattern PATTERN_STORAGE_PATH = Pattern.compile(
"location\\s+'hdfs://[^/]+([^']+)'", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
/**
* 匹配存储格式
*/
private static final Pattern PATTERN_STORAGE_FORMAT = Pattern.compile(
"stored as\\s+(.*(?:\\n\\s*'\\S*')?)", Pattern.CASE_INSENSITIVE);
/**
* 匹配分隔符
*/
private static final Pattern PATTERN_SEPARATOR = Pattern.compile(
"'serialization.format'='(.*?)'", Pattern.CASE_INSENSITIVE);
/**
* 匹配压缩格式
*/
private static final Pattern PATTERN_COMPRESS = Pattern.compile(
"'orc.compress'='(.*?)'", Pattern.CASE_INSENSITIVE);
/**
* 匹配是否分区
*/
private static final Pattern PATTERN_PARTITIONED = Pattern.compile(
"PARTITIONED BY", Pattern.CASE_INSENSITIVE);
/**
* 匹配表类型
*/
private static final Pattern PATTERN_TABLE_TYPE = Pattern.compile(
"CREATE\\s+EXTERNAL\\s+TABLE", Pattern.CASE_INSENSITIVE);
/**
* 匹配表名
*/
private static final Pattern PATTERN_TABLE_NAME = Pattern.compile(
"CREATE\\s+(?:EXTERNAL\\s+)?TABLE\\s+`?(\\w+?)`?\\s*\\(", Pattern.CASE_INSENSITIVE);
public class HBaseCreateDdlParser extends CreateDdlParser {
private static final Pattern PATTERN_KEY_VALUE = Pattern.compile("'?(\\w+)'?\\s*=>\\s*('?\\w+'?|\\[[^]]*])");
private static final Pattern PATTERN_BLOCK = Pattern.compile("\\{[^}]+}");
private static final Pattern PATTERN_TABLE_NAME = Pattern.compile("^create '([^']+)'", Pattern.CASE_INSENSITIVE);
private static final String PARAM_NAME = "NAME";
private static final String PARAM_SPLITS = "SPLITS";
private final NestedCharacterEscape escape = new NestedCharacterEscape('{', '}', (char) 0x02, (char) 0x03);
private final String createDdl;
public HBaseCreateDdlParser(String createDdl) {
this.createDdl = escape.escape(createDdl.trim());
}
@Override
public TableInfo parseTableInfo(TableInfo tableInfo) {
TableDescriptorBuilder tableBuilder = TableDescriptorBuilder.newBuilder(TableName.valueOf("t"));
List<ColumnFamilyProperty> fields = new ArrayList<>();
Matcher matcher = PATTERN_BLOCK.matcher(createDdl);
while (matcher.find()) {
Map<String, String> map = parseKeyValue(matcher.group());
String columnFamilyName = map.remove(PARAM_NAME);
if (StringUtils.isNotEmpty(columnFamilyName)) {
// 列族
ColumnFamilyDescriptorBuilder builder = ColumnFamilyDescriptorBuilder.newBuilder(
Bytes.toBytes(columnFamilyName));
map.forEach((k, v) -> HBaseUtils.setValue(builder, k, v));
fields.add(HBaseUtils.copy(builder.build(), new ColumnFamilyProperty()));
} else {
String splits = map.get(PARAM_SPLITS);
// 预分区
if (StringUtils.isNotEmpty(splits)) {
splits = splits.substring(1, splits.length() - 1);
String partition = Arrays.stream(StringUtils.split(splits, ','))
.map(String::trim)
.map(it -> StringUtils.removeStart(it, "'"))
.map(it -> StringUtils.removeEnd(it, "'"))
.collect(Collectors.joining(Constants.COMMA));
tableInfo.setPartitionInfo(JSONUtils.toJsonString(new HbasePartitionDto(partition)));
} else {
// 表属性
map.forEach(tableBuilder::setValue);
}
}
}
// 表属性
Map<String, String> tableProperty = HBaseUtils.copy(tableBuilder.build(), new HashMap<>());
tableInfo.setOtherInfo(JSONUtils.toJsonString(tableProperty));
// 列族
tableInfo.setFieldInfo(JSONUtils.toJsonString(fields));
return tableInfo;
}
@Override
public String getTableName() {
Matcher matcher = PATTERN_TABLE_NAME.matcher(createDdl);
return matcher.find() ? matcher.group(1) : "";
}
private Map<String, String> parseKeyValue(String str) {
if (StringUtils.isEmpty(str)) {
return Collections.emptyMap();
}
Matcher matcher = PATTERN_KEY_VALUE.matcher(str);
Map<String, String> map = new HashMap<>();
while (matcher.find()) {
String key = matcher.group(1);
String value = escape.unescape(matcher.group(2));
if (value.startsWith("'") && value.endsWith("'")) {
value = value.substring(1, value.length() - 1);
}
map.put(key, value);
}
return map;
}
}
导入导出代码实现
public Set<String> importTable(MultipartFile[] files, Integer dataSourceId, boolean isDev, User loginUser) {
List<MultipartFile> multipartFiles = Lists.newArrayList(files);
MultipartFile sqlFile = getMultipartFile(multipartFiles,".sql");
MultipartFile excelFile = getMultipartFile(multipartFiles, XLSX);
InputStream inputStream = sqlFile.getInputStream();
CountDownLatch countDownLatch = new CountDownLatch(1);
DataSource dataSource = dataSourceService.findDataSourceOrThrow(dataSourceId);
Connection connection = dataSourceService.getSqlConnection(dataSource);
AtomicReference<Map<String, String>> ddlMap = new AtomicReference<>(new HashMap<>());
//执行sql语句
executorService.execute(() -> {
try {
ddlMap.set(tableInfoService.executeSQL(inputStream,connection,dataSource));
} finally {
countDownLatch.countDown();
}
});
List<ExcelDto> tableExcelList;
switch (dataSource.getType()) {
case HIVE:
tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(HiveTableExcelDto.class).sheet(0).doReadSync();
break;
case HBASE:
tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(HBaseTableExcelDto.class).sheet(0).doReadSync();
break;
case GBASE:
case POSTGRESQL:
case DORIS:
tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(OtherTableExcelDto.class).sheet(0).doReadSync();
break;
default:
throw new UnsupportedOperationException();
}
countDownLatch.await();
List<TableInfo> tableDtoList = tableExcelList.stream().map(it -> {
if (ddlMap.get().containsKey(it.getTableName())) {
String ddl = ddlMap.get().get(it.getTableName());
TableInfo tableInfo = TableInfo.createByExcelDto(it,loginUser.getId(),dataSourceId,isDev,ddl);
CreateDdlParser.create(dataSource.getType(), ddl)
.setParam(HiveCreateDdlParser.PARAM_NEED_LOCATION_PREFIX, isDev)
.setParam(HiveCreateDdlParser.PARAM_LOCATION_PREFIX,tableInfo.getLocationPrefix())
.parseTableInfo(tableInfo);
ddlMap.get().remove(it.getTableName());
return tableInfo;
}
return null;
}).filter(Objects::nonNull).collect(Collectors.toList());
tableInfoService.saveBatch(tableDtoList);
return ddlMap.get().keySet();
}
private MultipartFile getMultipartFile(List<MultipartFile> multipartFiles,String suffix) {
return multipartFiles.stream()
.filter(it -> Objects.requireNonNull(it.getOriginalFilename()).contains(suffix))
.findFirst()
.orElseThrow(() -> new IllegalArgumentException("请上传sql文件和基本属性excel文件"));
}
public void exportOtherTable(Integer id, HttpServletResponse response,boolean isDev) {
String fileName = "exported-files-" + System.currentTimeMillis() + ".zip";
response.setHeader("Content-Disposition", "attachment; filename=" + fileName);
response.setContentType("application/zip");
DataSource dataSource = dataSourceService.findDataSourceOrThrow(id);
List<TableInfoDto> tableInfoList = tableInfoService.findByDataSourceCode(dataSource.getCode(), isDev);
Assert.notEmpty(tableInfoList, () -> new DataSourceException("数据源下没有表记录"));
CountDownLatch countDownLatch = new CountDownLatch(1);
//生成zip文件
try (ZipOutputStream zipOut = new ZipOutputStream(response.getOutputStream())) {
//基本属性用excel
executorService.execute(() -> {
try {
exportBaseExcel(tableInfoList, dataSource, zipOut);
} finally {
countDownLatch.countDown();
}
});
//ddl write sql文件
exportDDlSql(tableInfoList, zipOut, dataSource);
countDownLatch.await();
} catch (IOException e) {
logger.error("导出zip压缩失败 {}", e.getMessage());
} catch (InterruptedException e) {
logger.warn("Interrupted!", e);
Thread.currentThread().interrupt();
}
}
@SneakyThrows
private void exportDDlSql(List<TableInfoDto> tableInfoDtoList,ZipOutputStream zipOut,DataSource dataSource) {
List<String> ddlSqlList = tableInfoDtoList
.stream()
.map(TableInfoDto::getCreateDdl)
.filter(Objects::nonNull)
.collect(Collectors.toList());
String sqlFileName = "exported-sql-" + dataSource.getName() + "_" + System.currentTimeMillis() + ".sql";
File sqlFile = new File(sqlFileName);
StringJoiner joiner = new StringJoiner("\n");
ddlSqlList.forEach(joiner::add);
try (PrintWriter writer = new PrintWriter(new FileWriter(sqlFile))) {
writer.write(joiner.toString());
}
storageOperate.addToZip(zipOut, Files.newInputStream(sqlFile.toPath()), sqlFileName);
Files.delete(sqlFile.toPath());
}
@SneakyThrows
private void exportBaseExcel(List<TableInfoDto> tableInfoDtoList, DataSource dataSource,ZipOutputStream zipOut) {
String fileName = "ExportTable-" + dataSource.getName() + "_" + System.currentTimeMillis() + XLSX;
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ExcelWriter excelWriter = EasyExcelFactory.write(outputStream).build()) {
List<ExcelDto> tableBaseProperties;
WriteSheet writeSheet;
switch(dataSource.getType()) {
case HIVE:
tableBaseProperties = tableInfoDtoList.stream().map(HiveTableExcelDto::create).collect(Collectors.toList());
writeSheet = EasyExcelFactory.writerSheet(0, "hive表基本属性").head(HiveExcelDto.class).build();
break;
case HBASE:
tableBaseProperties = tableInfoDtoList.stream().map(HBaseTableExcelDto::create).collect(Collectors.toList());
writeSheet = EasyExcelFactory.writerSheet(0, "hbase表基本属性").head(HBaseExcelDto.class).build();
break;
case POSTGRESQL:
case GBASE:
case DORIS:
tableBaseProperties = tableInfoDtoList.stream().map(OtherTableExcelDto::create).collect(Collectors.toList());
writeSheet = EasyExcelFactory.writerSheet(0, "表基本属性").head(OtherTableExcelDto.class).build();
break;
default:
throw new UnsupportedOperationException();
}
excelWriter.write(tableBaseProperties, writeSheet).finish();
storageOperate.addToZip(zipOut, new ByteArrayInputStream(outputStream.toByteArray()), fileName);
}
}