我有一个多行字符串,它由一组不同的分隔符分隔
A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H
我需要用分隔符分隔字符串,但如果某些单词在括号内,则将括号提取为单个单词,即使它包含分隔符。我需要它们被提取如下,
A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H
目前,我正在使用这个表达式来通过分隔符进行拆分,
(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))
我尝试了以下操作,但不起作用。那么我该如何让它发挥作用呢?
((?=()|(?<=))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))
Java代码,
String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=()|(?<=))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");
for (String word : texts) {
System.out.println(word);
}
IMO,匹配比拆分更容易
由于";定界符";也是需要的,我建议匹配我们需要的模式。根据给出的示例,我们有以下模式可供捕获。
(C DelimiterA D)
-括号包含一个单词、分隔符和一个单词
,该单词为"\(\w+ (DelimiterA|DelimiterB) \w+\)"
DelimiterB
-整分隔符
即"(DelimiterA|DelimiterB)"
B
、B X
-一个或多个不是分隔符的单词
如何检查单词是否为分隔符
我们可以检查"中间没有分隔符(检查Regex not运算符(,即"\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
import java.util.Scanner;
public class SplitWithCustomDelimiter {
public static void main(String[] args) {
String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
// scanner can accept different source
Scanner scanner = new Scanner(txt);
scanner.findAll(
"\(\w+ (DelimiterA|DelimiterB) \w+\)" +
"|(DelimiterA|DelimiterB)" +
"|\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
)
.map(matchResult -> matchResult.group()).forEach(System.out::println);
}
}