匹配器找不到重叠的单词

我正在尝试获取一个字符串：

String s = "This is a String!";

并返回该字符串中的所有 2 字对。即：

{"this is", "is a", "a String"}

但是现在，我所能做的就是返回：

{"this is", "a String"}

如何定义我的 while 循环，以便我可以解释缺少重叠单词的原因？我的代码如下：（真的，我很高兴它只返回一个表示它找到多少字符串子集的 int......

int count = 0;
while(matcher.find()) {
    count += 1;
}

谢谢大家。

我喜欢

已经发布的两个答案，数单词并减去一个，但是如果您只需要一个正则表达式来查找重叠的匹配项：

Pattern pattern = Pattern.compile('\S+ \S+');
Matcher matcher = pattern.matcher(inputString);
int matchCount = 0;
boolean found = matcher.find();
while (found) {
  matchCount += 1;
  // search starting after the last match began
  found = matcher.find(matcher.start() + 1);
}

实际上，你需要比简单地添加 1 更聪明一点，因为在"力"上尝试这个将匹配"he force"，然后是"e force"。当然，这对于计算单词来说是矫枉过正的，但如果正则表达式比这更复杂，这可能会很有用。

运行从 i = 0 到字数 - 2 的 for 循环，然后单词 i 和 i+1 将组成一个 2 字字符串。

String[] splitString = string.split(" ");
for(int i = 0; i < splitString.length - 1; i++) {
    System.out.println(splitString[i] + " " + splitString[i+1]);
}

句子中 2 个单词的字符串数只是单词数减去 1。

int numOfWords = string.split(" ").length - 1;

总对数 = 总字数 - 1

而且您已经知道如何计算单词总数。

我尝试使用一组模式。

String s = "this is a String";
Pattern pat = Pattern.compile("([^ ]+)( )([^ ]+)");
Matcher mat = pat.matcher(s);
boolean check = mat.find();
while(check){
    System.out.println(mat.group());
    check = matPOS.find(mat.start(3));
}

从模式([^ ]+)( )([^ ]+)
...........................|_______________|
..................................组队（0）
..........................|([^ ]+) |<--组（1）
......................................|( ) |<--组（2）
............................................|([^ ]+) |<--组（3）

相关内容

最新更新

热门标签：