Java:Pattern matcher意外地返回新行

我有一个用例，我必须将任何转义/未转义的字符作为分隔符来分隔句子。到目前为止，我们拥有的未跳过/转义字符是：

" " (space),"\t","|", "\|",";","\;","," etc

到目前为止，它使用的是正则表达式，定义为：

String delimiter = " ";
String regex = "(?:\\.|[^"+ delimiter +"\\]++)*";

输入字符串为：

String input = "234|Tamarind|something interesting ";

现在，下面是分割和打印的代码：

List<String> matchList = new ArrayList<>(  );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );

但是，还有一些额外的字符串(新行(被意外存储。因此输出看起来像：

Unescaped/escaped test result with size: 5
234|Tamarind|something
interesting
.

有没有更好的方法可以做到这一点，这样就不会有任何额外的字符串？

这很简单：确保至少匹配一个字符。这意味着您可以删除++量词，并将*替换为+。请参阅regex演示。

完整的Java演示：

String delimiter = " ";
String regex = "(?:\\.|[^"+ delimiter +"\\])+";
// System.out.println(regex); // => (?:\.|[^ \])+
Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
String input = "234|Tamarind|something interesting ";
List<String> matchList = new ArrayList<>(  );
Matcher regexMatcher = pattern.matcher( input );
while ( regexMatcher.find() )
{
// System.out.println("'"+regexMatcher.group()+"'");
matchList.add( regexMatcher.group() );
}
System.out.println( "Unescaped/escaped test result with size: " + matchList.size() );
matchList.stream().forEach( System.out::println );

输出：

Unescaped/escaped test result with size: 2
234|Tamarind|something
interesting

相关内容

最新更新

热门标签：