这个包含正则表达式的 Java 语句应该做什么?

我正在处理一些遗留的Java代码，我看到以下语句：

Pattern lineWithCommentP2 = Pattern.compile("//(.[^<>]+?)(\R|$)", Pattern.CASE_INSENSITIVE);
Matcher m = lineWithCommentP2.matcher(s);
s = m.replaceAll("<span class="cip">//$1</span>$2");

根据代码中的注释，它应该替换格式中的任何文本行

text1//text2
text3//text4

跟

text1<span class="cip">//text2</span>
text3<span class="cip">//text4</span>

但是，在测试它时，我看到它正在用

text1<span class="cip">//text2
</span>
text3<span class="cip">//text4
</span>

(它在文本 2 和文本 4 之后添加一个新行(。

我无法调整正则表达式以避免额外的换行符。知道为什么以及如何解决它吗？

谢谢。

添加了以下内容：要重现，请使用以下数据创建一个文本文件：

<p>test statement </p>
<pre class="code">public class TestClass{   
public static void main(String[] args){
statement1; //1
stement2(); //2
}
}
</pre>
<p>test stmt</p>

然后运行以下代码：

byte[] ba = Files.readAllBytes(Paths.get("c:\temp\test.txt"));
String s = new String(ba);
Pattern lineWithCommentP2 = Pattern.compile("//(.[^<>]+?)(\R|$)", Pattern.CASE_INSENSITIVE);
Matcher m = lineWithCommentP2.matcher(s);
s = m.replaceAll("<span class="cip">//$1</span>$2");
Files.write(Paths.get("c:\temp\test2.txt"), s.getBytes(), StandardOpenOption.CREATE);

这将在 test2.text 中生成以下内容：

<p>test statement </p>
<pre class="code">public class TestClass{   
public static void main(String[] args){
statement1; <span class="cip">//1
</span>
stement2(); <span class="cip">//2
</span>
}
}
</pre>
<p>test stmt</p>

正则表达式如下：

//            Match '//'
(             Start capture group 1
.             Match any character, except linebreaks
[^<>]+?       Match any character, except `<` and `>`, one or more times, reluctantly
)             End capture group 1
(             Start capture group 2
\R           Match linebreak, e.g. `r`, `n`, or `rn`
|             OR
$             Match end of input
)             End capture group 2

您有以下文本：

...rn
statement1; //1rn
stement2(); //2rn
...

由于捕获组 1是一个字符加上一个或多个字符，这意味着捕获组 1 匹配2 个或更多字符。由于它不情愿，一旦满足剩余模式，它将停止匹配。

这会立即发生，因此您可以获得：

第 0 组："//1rn"
第 1 组："1r"，具有.匹配"1"和[^<>]+?匹配"r"
第2组："n"，\R匹配"n"

溶液

若要解决此问题，请删除.，并通过将v(垂直空格(添加到排除的字符列表中，确保组 1 与换行符不匹配：

"//([^<>\v]+?)(\R|$)"

仅供参考：由于正则表达式中没有字母，因此指定标志CASE_INSENSITIVE是无用的，并且具有误导性，因此请摆脱它。

相关内容

最新更新

热门标签：