我想解析包含由#
字符引入的单行注释的 KConf 文件。您可以在下面找到此类文件的示例。
https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig
我知道单行测试字符串几乎看起来是随机的,尽管它应该包含嵌套哈希和字符串的大多数(如果不是全部)变体,并且在不引入字符串的注释中引用。
我目前使用的正则表达式引擎是基于Java的Groovy引擎。
测试字符串
Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.
期望的结果
Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non
或(带前导空格)
#bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.
首先,我已经转义了你的字符串,所以它可以使用 JavaScript 存储为变量(因为你似乎没有指示一种语言,所以我假设 JS):
var str = 'Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit#mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.';
跟不后空格的 "#" 之后的所有内容:
str.replace(/ #[^ ].*/, '');
最后,你的第二个期望的结果是完全没有意义的。
当然,所有这一切都会得到适当的描述的帮助。
根据有限的信息,此正则表达式可能有效。
不过,试图从coments中分离嵌入式哈希似乎有点特别复杂。
没有时间测试它,但剪切粘贴了一些正则表达式片段。
请注意,它应该在多行模式下使用。一切都面向一行解析。
即正则表达式中的任何内容都不会跨越行。
# (?-s)^(?:"[^"\n]*(?:\.[^"\n]*)*"|'[^'\n]*(?:\.[^'\n]*)*'|[^#"'s]+|(?<=[^s#])#+|[^Sn]+(?!#))*(?:[^Sn]+|^)(#.*)$
# "(?-s)^(?:"[^"\\\n]*(?:\\.[^"\\\n]*)*"|'[^'\\\n]*(?:\\.[^'\\\n]*)*'|[^#"'\s]+|(?<=[^\s#])\#+|[^\S\n]+(?!\#))*(?:[^\S\n]+|^)(\#.*)$"
(?-s) # Modifier, No dot all
^ # Beginning of line
(?:
" # Double quotes
[^"\n]*
(?: \ . [^"\n]* )*
"
| # or
' # Single quotes
[^'\n]*
(?: \ . [^'\n]* )*
'
| # or
[^#"'s]+ # Not hash, quotes, whitespace
| # or
(?<= [^s#] ) # Preceded by a character, but not hash or whitespace
#+ # Embeded hashes
| # or
[^Sn]+ # Whitespaces (non-newline)
(?! # ) # Not folowed by hash
)*
(?: [^Sn]+ | ^ ) # Whitespaces (non-newline) or BOL
( # .* ) # (1), hash comment
$ # End of line
原始正则表达式:
^((?:\.|("|')(?:(?!2|\|[rn]).|\.)*2|[^#'"rn])+)#.+
替换为 $1
:
例:
String re = "^((?:\\.|("|')(?:(?!\2|\\|[\r\n]).|\\.)*\2|[^#'"\r\n])+)#.+";
String line = "Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.";
String uncommented = line.replaceAll(re, "$1");
//=> Lorem "ipsum # " dolor" sit amet, 'consectetur # ' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non
正则表达式101演示
IDEe演示
故障:
^ # Beginning of line
( # Beginning of 1st capture group
(?: # Non-capture group 1
\. # Match an escaped character
|
("|') # Or, a quote (and capture it in 2nd capture group),
(?: # Non-capture group 2
(?!2|\|[rn]). # Followed by any character except relevant quote, or newline
|
\. # Or an escaped character
)* # Close of non-capture group 2 and repeat as many times
2 # Close the quoted part
|
[^#'"rn] # Any non-hash, single/double quote, newline characters
)+ # Close of non-capture group 1 and repeat as many times
) # Close capture group 1
#.+ # Match comments