在regexjava中获取括号内的文本以及拆分分隔符



我有一个多行字符串,它由一组不同的分隔符分隔

A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H

我需要用分隔符分隔字符串,但如果某些单词在括号内,则将括号提取为单个单词,即使它包含分隔符。我需要它们被提取如下,

A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H

目前,我正在使用这个表达式来通过分隔符进行拆分,

(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))

我尝试了以下操作,但不起作用。那么我该如何让它发挥作用呢?

((?=()|(?<=))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))

Java代码,

String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=()|(?<=))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");
for (String word : texts) {
System.out.println(word);
}

IMO,匹配比拆分更容易

由于";定界符";也是需要的,我建议匹配我们需要的模式。根据给出的示例,我们有以下模式可供捕获。

  1. (C DelimiterA D)-括号包含一个单词、分隔符和一个单词
    ,该单词为"\(\w+ (DelimiterA|DelimiterB) \w+\)"
  2. DelimiterB-整分隔符
    "(DelimiterA|DelimiterB)"
  3. BB X-一个或多个不是分隔符的单词
    如何检查单词是否为分隔符
    我们可以检查"中间没有分隔符(检查Regex not运算符(,即"\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
import java.util.Scanner;
public class SplitWithCustomDelimiter {
public static void main(String[] args) {
String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
// scanner can accept different source
Scanner scanner = new Scanner(txt);
scanner.findAll(
"\(\w+ (DelimiterA|DelimiterB) \w+\)" +
"|(DelimiterA|DelimiterB)" +
"|\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
)
.map(matchResult -> matchResult.group()).forEach(System.out::println);
}
}

最新更新