如何去除包含多个字符串和注释符号的行的行注释



我想解析包含由#字符引入的单行注释的 KConf 文件。您可以在下面找到此类文件的示例。

https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig

我知道单行测试字符串几乎看起来是随机的,尽管它应该包含嵌套哈希和字符串的大多数(如果不是全部)变体,并且在不引入字符串的注释中引用。

我目前使用的正则表达式引擎是基于Java的Groovy引擎。

测试字符串

Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.

期望的结果

Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non

(带前导空格)

#bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.

首先,我已经转义了你的字符串,所以它可以使用 JavaScript 存储为变量(因为你似乎没有指示一种语言,所以我假设 JS):

var str = 'Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit#mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.';

要删除 " 后跟 " 后

跟不后空格的 "#" 之后的所有内容:

str.replace(/ #[^ ].*/, '');

最后,你的第二个期望的结果是完全没有意义的。

当然,所有这一切都会得到适当的描述的帮助。

根据有限的信息,此正则表达式可能有效。
不过,试图从coments中分离嵌入式哈希似乎有点特别复杂。
没有时间测试它,但剪切粘贴了一些正则表达式片段。
请注意,它应该在多行模式下使用。一切都面向一行解析。
即正则表达式中的任何内容都不会跨越行。

 #  (?-s)^(?:"[^"\n]*(?:\.[^"\n]*)*"|'[^'\n]*(?:\.[^'\n]*)*'|[^#"'s]+|(?<=[^s#])#+|[^Sn]+(?!#))*(?:[^Sn]+|^)(#.*)$
 #  "(?-s)^(?:"[^"\\\n]*(?:\\.[^"\\\n]*)*"|'[^'\\\n]*(?:\\.[^'\\\n]*)*'|[^#"'\s]+|(?<=[^\s#])\#+|[^\S\n]+(?!\#))*(?:[^\S\n]+|^)(\#.*)$"
 (?-s)                   # Modifier, No dot all 
 ^                       # Beginning of line
 (?:
      "                       # Double quotes
      [^"\n]* 
      (?: \ . [^"\n]* )*
      "
   |                        # or
      '                       # Single quotes
      [^'\n]* 
      (?: \ . [^'\n]* )*
      '
   |                        # or
      [^#"'s]+               # Not hash, quotes, whitespace
   |                        # or
      (?<= [^s#] )           # Preceded by a character, but not hash or whitespace
      #+                     # Embeded hashes
   |                        # or
      [^Sn]+                # Whitespaces (non-newline)
      (?! # )                # Not folowed by hash
 )*
 (?: [^Sn]+ | ^ )      # Whitespaces  (non-newline) or BOL
 ( # .* )               # (1), hash comment
 $                       # End of line

原始正则表达式:

^((?:\.|("|')(?:(?!2|\|[rn]).|\.)*2|[^#'"rn])+)#.+

替换为 $1

例:

String re = "^((?:\\.|("|')(?:(?!\2|\\|[\r\n]).|\\.)*\2|[^#'"\r\n])+)#.+";
String line = "Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.";
String uncommented = line.replaceAll(re, "$1");
//=> Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non

正则表达式101演示

IDEe演示

故障:

^                         # Beginning of line
  (                       # Beginning of 1st capture group
    (?:                   # Non-capture group 1
      \.                 # Match an escaped character
    |
      ("|')               # Or, a quote (and capture it in 2nd capture group),
      (?:                 # Non-capture group 2
        (?!2|\|[rn]). # Followed by any character except relevant quote,  or newline
      |
        \.               # Or an escaped character
      )*                  # Close of non-capture group 2 and repeat as many times
      2                  # Close the quoted part
    |
      [^#'"rn]          # Any non-hash, single/double quote, newline characters
    )+                    # Close of non-capture group 1 and repeat as many times
  )                       # Close capture group 1
  #.+                     # Match comments

最新更新