文章 62
浏览 15135
正则表达式实践

正则表达式实践

image.png

正则表达式

正则表达式是一种用于匹配和操作文本的强大工具,它是由一系列字符和特殊字符组成的模式,用于描述要匹配的文本模式。

正则表达式可以在文本中查找、替换、提取和验证特定的模式。

正则表达式的模式

  • 字面值字符:例如字母、数字、空格等,可以直接匹配它们自身。
  • 特殊字符:例如点号 .、星号 *、加号 +、问号 ? 等,它们具有特殊的含义和功能。
  • 字符类:用方括号 [ ] 包围的字符集合,用于匹配方括号内的任意一个字符。
  • 元字符:例如 \d\w\s 等,用于匹配特定类型的字符,如数字、字母、空白字符等。
  • 量词:例如 {n}{n,}{n,m} 等,用于指定匹配的次数或范围。
  • 边界符号:例如 ^$\b\B 等,用于匹配字符串的开头、结尾或单词边界位置。

特殊字符

$ 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性,则 也匹配 '\n' 或 '\r'。要匹配 字符本身,请使用 $。
( ) 标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符,请使用( 和 )。
* 匹配前面的子表达式零次或多次。要匹配 * 字符,请使用*。
+ 匹配前面的子表达式一次或多次。要匹配 + 字符,请使用+。
. 匹配除换行符 \n 之外的任何单字符。要匹配 . ,请使用. 。
[ 标记一个中括号表达式的开始。要匹配 [,请使用[。
? 匹配前面的子表达式零次或一次,或指明一个非贪婪限定符。要匹配 ? 字符,请使用?。
\ 将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。例如, 'n' 匹配字符 'n'。'\n' 匹配换行符。序列 '\' 匹配 "",而 '(' 则匹配 "("。
^ 匹配输入字符串的开始位置,除非在方括号表达式中使用,当该符号在方括号表达式中使用时,表示不接受该方括号表达式中的字符集合。要匹配 ^ 字符本身,请使用^。
{ 标记限定符表达式的开始。要匹配 {,请使用{。

正则表达式实践

最近在公司接到一个新需求,背景是,在公司平台研发页面接入不同数据源,可以直接不同数据源可视化建表和 ddl 建表,填写一些基本属性,增加导入导出功能,目前支持业内主流的数据源,如 mysql,pgsql,gbase,hive,spark,hbase 等等

  1. 可视化建表
    根据用户填写的表名,字段,字段类型,字段描述进行组装 ddlsql,根据相应的数据源获取相应的 JDBC connection 连接,执行 execute,在数据库服务器建表,在元数据库存储相应对应表的元数据
  2. ddl 建表

根据相应的数据源获取相应的 JDBC connection 连接,执行 execute,在数据库服务器建表,在元数据库存储相应对应表的元数据,比可视化建表区别是,存储元数据到数据库是,需要根据 ddl 解析 fieldsInfo 等相关信息

解析 fieldsInfo 字段

  • connection 执行 prestatment 执行表结构,根据返回的表结构获取相应的字段(太麻烦,可读性也不好,还需要浪费 connection 资源)
  • 正则匹配 ddl 建表语句(重点)

正则表达式匹配 ddl 思路

  1. 根据不同的数据源,每个数据源下有不同的实现类,抽离成工厂模式
  2. 项目结构用领域驱动,构建领域驱动模型
  3. ddl 建表语句有特殊转义,需要处理
  4. 检验 ddl 建表语句,只支持 create,防止 SQL 注入
  5. 实现正则表达式,解析每个字段

解析 ddl 代码实现

   private final Map<ParamKey<?>, Object> params = new HashMap<>(4);

    public static CreateDdlParser create(DbType dbType, String createDdl) {
        switch (dbType) {
            case HIVE:
            case SPARK:
                return new HiveCreateDdlParser(createDdl);
            case HBASE:
                return new HBaseCreateDdlParser(createDdl);
            case POSTGRESQL:
            case DORIS:
            case GBASE:
                return new JdbcTableCreateDdlParser(createDdl);
            default:
                throw new UnsupportedOperationException("不支持此类型: " + dbType);
        }
    }

private final String escapedCreateDdl;
    private static final Pattern PATTERN_COLUMN = Pattern.compile(
            "\"(\\w+)\"\\s+(\\w+(?:\\(\\d+(,\\s*\\d+)?\\))?)", Pattern.CASE_INSENSITIVE);

    private static final Pattern PATTERN_TABLE_NAME = Pattern.compile("^(\\s+)?CREATE\\s+TABLE\\s+(\\w+)", Pattern.CASE_INSENSITIVE);

    public JdbcTableCreateDdlParser(String createDdl) {
        this.escapedCreateDdl = escape(createDdl);
    }
    @Override
    public TableInfo parseTableInfo(TableInfo tableInfo) {
        Matcher matcher = PATTERN_COLUMN.matcher(escapedCreateDdl);
        List<FieldInfo> fieldInfos = new ArrayList<>();
        while (matcher.find()) {
            String fieldName = matcher.group(1);
            String fieldType = matcher.group(2);
            String fieldDescription = matcher.group(3) != null ? matcher.group(3) : "";
            if (!("PRIMARY".equalsIgnoreCase(fieldName) || "PRIMARY".equalsIgnoreCase(fieldType))
                    && !"CREATE".equalsIgnoreCase(fieldName) && !"default".equalsIgnoreCase(fieldName)) {
                FieldInfo fieldInfo = new FieldInfo();
                fieldInfo.setFieldName(fieldName);
                fieldInfo.setFieldType(fieldType);
                fieldInfo.setFieldComment(fieldDescription);
                fieldInfos.add(fieldInfo);
            }
        }
        tableInfo.setFieldInfo(JSONUtils.toJsonString(fieldInfos));
        return tableInfo;
    }

    @Override
    public String getTableName() {
        Matcher matcher = PATTERN_TABLE_NAME.matcher(escapedCreateDdl);
        return matcher.find() ? matcher.group(2) : "";
    }
 
 public static final ParamKey<Boolean> PARAM_NEED_LOCATION_PREFIX = new ParamKey<>();
    public static final ParamKey<String> PARAM_LOCATION_PREFIX = new ParamKey<>();
    public static final ParamKey<String> PARAM_TABLE_PROPERTIES = new ParamKey<>();

    /**
     * 匹配字段列表
     */
    private static final Pattern PATTERN_COLUMNS = Pattern.compile(
            "create[^(]+table[^(]+\\((.+?)\\)\\n", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

    /**
     * 匹配分区字段列表
     */
    private static final Pattern PATTERN_PARTITIONED_COLUMNS = Pattern.compile(
            "partitioned\\s+by\\s+\\(([^)]+)", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

    /**
     * 匹配字段
     */
    private static final Pattern PATTERN_COLUMN = Pattern.compile(
            "`?([^`\\s]+)`?\\s+([\\w<>(),]+)(?:\\s+comment\\s+'([^']*)?')?,?", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配存储路径
     */
    private static final Pattern PATTERN_STORAGE_PATH = Pattern.compile(
            "location\\s+'hdfs://[^/]+([^']+)'", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

    /**
     * 匹配存储格式
     */
    private static final Pattern PATTERN_STORAGE_FORMAT = Pattern.compile(
            "stored as\\s+(.*(?:\\n\\s*'\\S*')?)", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配分隔符
     */
    private static final Pattern PATTERN_SEPARATOR = Pattern.compile(
            "'serialization.format'='(.*?)'", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配压缩格式
     */
    private static final Pattern PATTERN_COMPRESS = Pattern.compile(
            "'orc.compress'='(.*?)'", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配是否分区
     */
    private static final Pattern PATTERN_PARTITIONED = Pattern.compile(
            "PARTITIONED BY", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配表类型
     */
    private static final Pattern PATTERN_TABLE_TYPE = Pattern.compile(
            "CREATE\\s+EXTERNAL\\s+TABLE", Pattern.CASE_INSENSITIVE);

    /**
     * 匹配表名
     */
    private static final Pattern PATTERN_TABLE_NAME = Pattern.compile(
            "CREATE\\s+(?:EXTERNAL\\s+)?TABLE\\s+`?(\\w+?)`?\\s*\\(", Pattern.CASE_INSENSITIVE);

image.png

public class HBaseCreateDdlParser extends CreateDdlParser {
    private static final Pattern PATTERN_KEY_VALUE = Pattern.compile("'?(\\w+)'?\\s*=>\\s*('?\\w+'?|\\[[^]]*])");
    private static final Pattern PATTERN_BLOCK = Pattern.compile("\\{[^}]+}");
    private static final Pattern PATTERN_TABLE_NAME = Pattern.compile("^create '([^']+)'", Pattern.CASE_INSENSITIVE);
    private static final String PARAM_NAME = "NAME";
    private static final String PARAM_SPLITS = "SPLITS";

    private final NestedCharacterEscape escape = new NestedCharacterEscape('{', '}', (char) 0x02, (char) 0x03);
    private final String createDdl;

    public HBaseCreateDdlParser(String createDdl) {
        this.createDdl = escape.escape(createDdl.trim());
    }

    @Override
    public TableInfo parseTableInfo(TableInfo tableInfo) {
        TableDescriptorBuilder tableBuilder = TableDescriptorBuilder.newBuilder(TableName.valueOf("t"));
        List<ColumnFamilyProperty> fields = new ArrayList<>();

        Matcher matcher = PATTERN_BLOCK.matcher(createDdl);
        while (matcher.find()) {
            Map<String, String> map = parseKeyValue(matcher.group());
            String columnFamilyName = map.remove(PARAM_NAME);
            if (StringUtils.isNotEmpty(columnFamilyName)) {
                // 列族
                ColumnFamilyDescriptorBuilder builder = ColumnFamilyDescriptorBuilder.newBuilder(
                        Bytes.toBytes(columnFamilyName));
                map.forEach((k, v) -> HBaseUtils.setValue(builder, k, v));
                fields.add(HBaseUtils.copy(builder.build(), new ColumnFamilyProperty()));
            } else {
                String splits = map.get(PARAM_SPLITS);
                // 预分区
                if (StringUtils.isNotEmpty(splits)) {
                    splits = splits.substring(1, splits.length() - 1);
                    String partition = Arrays.stream(StringUtils.split(splits, ','))
                            .map(String::trim)
                            .map(it -> StringUtils.removeStart(it, "'"))
                            .map(it -> StringUtils.removeEnd(it, "'"))
                            .collect(Collectors.joining(Constants.COMMA));
                    tableInfo.setPartitionInfo(JSONUtils.toJsonString(new HbasePartitionDto(partition)));
                } else {
                    // 表属性
                    map.forEach(tableBuilder::setValue);
                }
            }
        }

        // 表属性
        Map<String, String> tableProperty = HBaseUtils.copy(tableBuilder.build(), new HashMap<>());
        tableInfo.setOtherInfo(JSONUtils.toJsonString(tableProperty));

        // 列族
        tableInfo.setFieldInfo(JSONUtils.toJsonString(fields));

        return tableInfo;
    }

    @Override
    public String getTableName() {
        Matcher matcher = PATTERN_TABLE_NAME.matcher(createDdl);
        return matcher.find() ? matcher.group(1) : "";
    }

    private Map<String, String> parseKeyValue(String str) {
        if (StringUtils.isEmpty(str)) {
            return Collections.emptyMap();
        }
        Matcher matcher = PATTERN_KEY_VALUE.matcher(str);
        Map<String, String> map = new HashMap<>();
        while (matcher.find()) {
            String key = matcher.group(1);
            String value = escape.unescape(matcher.group(2));
            if (value.startsWith("'") && value.endsWith("'")) {
                value = value.substring(1, value.length() - 1);
            }
            map.put(key, value);
        }
        return map;
    }
}

导入导出代码实现

public Set<String> importTable(MultipartFile[] files, Integer dataSourceId, boolean isDev, User loginUser) {
        List<MultipartFile> multipartFiles = Lists.newArrayList(files);
        MultipartFile sqlFile = getMultipartFile(multipartFiles,".sql");
        MultipartFile excelFile = getMultipartFile(multipartFiles, XLSX);
        InputStream inputStream = sqlFile.getInputStream();
        CountDownLatch countDownLatch = new CountDownLatch(1);
        DataSource dataSource = dataSourceService.findDataSourceOrThrow(dataSourceId);
        Connection connection = dataSourceService.getSqlConnection(dataSource);
        AtomicReference<Map<String, String>> ddlMap = new AtomicReference<>(new HashMap<>());

        //执行sql语句
        executorService.execute(() -> {
            try {
                ddlMap.set(tableInfoService.executeSQL(inputStream,connection,dataSource));
            } finally {
                countDownLatch.countDown();
            }
        });
        List<ExcelDto> tableExcelList;
        switch (dataSource.getType()) {
            case HIVE:
                tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(HiveTableExcelDto.class).sheet(0).doReadSync();
                break;
            case HBASE:
                tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(HBaseTableExcelDto.class).sheet(0).doReadSync();
                break;
            case GBASE:
            case POSTGRESQL:
            case DORIS:
                tableExcelList = EasyExcelFactory.read(excelFile.getInputStream()).head(OtherTableExcelDto.class).sheet(0).doReadSync();
                break;
            default:
                throw new UnsupportedOperationException();
        }
        countDownLatch.await();

        List<TableInfo> tableDtoList = tableExcelList.stream().map(it -> {
            if (ddlMap.get().containsKey(it.getTableName())) {
                String ddl = ddlMap.get().get(it.getTableName());
                TableInfo tableInfo = TableInfo.createByExcelDto(it,loginUser.getId(),dataSourceId,isDev,ddl);
                CreateDdlParser.create(dataSource.getType(), ddl)
                        .setParam(HiveCreateDdlParser.PARAM_NEED_LOCATION_PREFIX, isDev)
                        .setParam(HiveCreateDdlParser.PARAM_LOCATION_PREFIX,tableInfo.getLocationPrefix())
                        .parseTableInfo(tableInfo);
                ddlMap.get().remove(it.getTableName());
               return tableInfo;
            }
            return null;
        }).filter(Objects::nonNull).collect(Collectors.toList());

        tableInfoService.saveBatch(tableDtoList);
        return ddlMap.get().keySet();
    }
 private MultipartFile getMultipartFile(List<MultipartFile> multipartFiles,String suffix) {
        return multipartFiles.stream()
                .filter(it -> Objects.requireNonNull(it.getOriginalFilename()).contains(suffix))
                .findFirst()
                .orElseThrow(() -> new IllegalArgumentException("请上传sql文件和基本属性excel文件"));
    }
public void exportOtherTable(Integer id, HttpServletResponse response,boolean isDev) {
        String fileName = "exported-files-" + System.currentTimeMillis() + ".zip";
        response.setHeader("Content-Disposition", "attachment; filename=" + fileName);
        response.setContentType("application/zip");
        DataSource dataSource = dataSourceService.findDataSourceOrThrow(id);
        List<TableInfoDto> tableInfoList = tableInfoService.findByDataSourceCode(dataSource.getCode(), isDev);
        Assert.notEmpty(tableInfoList, () -> new DataSourceException("数据源下没有表记录"));

        CountDownLatch countDownLatch = new CountDownLatch(1);
        //生成zip文件
        try (ZipOutputStream zipOut = new ZipOutputStream(response.getOutputStream())) {
            //基本属性用excel
            executorService.execute(() -> {
                try {
                    exportBaseExcel(tableInfoList, dataSource, zipOut);
                } finally {
                    countDownLatch.countDown();
                }
            });
            //ddl write sql文件
            exportDDlSql(tableInfoList, zipOut, dataSource);
            countDownLatch.await();
        } catch (IOException e) {
            logger.error("导出zip压缩失败 {}", e.getMessage());
        } catch (InterruptedException e) {
            logger.warn("Interrupted!", e);
            Thread.currentThread().interrupt();
        }
    }
  @SneakyThrows
    private void exportDDlSql(List<TableInfoDto> tableInfoDtoList,ZipOutputStream zipOut,DataSource dataSource) {
        List<String> ddlSqlList = tableInfoDtoList
                .stream()
                .map(TableInfoDto::getCreateDdl)
                .filter(Objects::nonNull)
                .collect(Collectors.toList());
        String sqlFileName = "exported-sql-" + dataSource.getName() + "_" + System.currentTimeMillis() + ".sql";
        File sqlFile = new File(sqlFileName);
        StringJoiner joiner = new StringJoiner("\n");
        ddlSqlList.forEach(joiner::add);
        try (PrintWriter writer = new PrintWriter(new FileWriter(sqlFile))) {
            writer.write(joiner.toString());
        }
        storageOperate.addToZip(zipOut, Files.newInputStream(sqlFile.toPath()), sqlFileName);
        Files.delete(sqlFile.toPath());
    }

    @SneakyThrows
    private void exportBaseExcel(List<TableInfoDto> tableInfoDtoList, DataSource dataSource,ZipOutputStream zipOut) {
        String fileName = "ExportTable-" + dataSource.getName()  + "_" + System.currentTimeMillis() + XLSX;
        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
             ExcelWriter excelWriter = EasyExcelFactory.write(outputStream).build()) {
            List<ExcelDto> tableBaseProperties;
            WriteSheet writeSheet;
            switch(dataSource.getType()) {
                case HIVE:
                    tableBaseProperties = tableInfoDtoList.stream().map(HiveTableExcelDto::create).collect(Collectors.toList());
                    writeSheet = EasyExcelFactory.writerSheet(0, "hive表基本属性").head(HiveExcelDto.class).build();
                    break;
                case HBASE:
                    tableBaseProperties = tableInfoDtoList.stream().map(HBaseTableExcelDto::create).collect(Collectors.toList());
                     writeSheet = EasyExcelFactory.writerSheet(0, "hbase表基本属性").head(HBaseExcelDto.class).build();
                    break;
                case POSTGRESQL:
                case GBASE:
                case DORIS:
                    tableBaseProperties = tableInfoDtoList.stream().map(OtherTableExcelDto::create).collect(Collectors.toList());
                    writeSheet = EasyExcelFactory.writerSheet(0, "表基本属性").head(OtherTableExcelDto.class).build();
                    break;
                default:
                    throw new UnsupportedOperationException();
            }
            excelWriter.write(tableBaseProperties, writeSheet).finish();
            storageOperate.addToZip(zipOut, new ByteArrayInputStream(outputStream.toByteArray()), fileName);
        }
    }

标题:正则表达式实践
作者:xiaohugg
地址:https://xiaohugg.top/articles/2023/09/06/1693972473383.html

人民有信仰 民族有希望 国家有力量