所以我试图得到单词的五个顺序。我有这个输入:
太平洋是地球上最大的海洋分区
输出应如下所示:
Pacific
Pacific Ocean
Pacific Ocean is
Pacific Ocean is the
Pacific Ocean is the largest
Ocean
Ocean is
Ocean is the
Ocean is the largest
Ocean is the largest of
is
is the
is the largest
is the largest of
is the largest of the
the
the largest
the largest of
the largest of the
the largest of the Earth's
largest
largest of
largest of the
largest of the Earth's
largest of the Earth's oceanic
of
of the
of the Earth's
of the Earth's oceanic
of the Earth's oceanic divisions
the
the Earth's
the Earth's oceanic
the Earth's oceanic divisions
Earth's
Earth's oceanic
Earth's oceanic divisions
oceanic
oceanic divisions
divisions
我的尝试:
public void getComb(String line) {
String words[] = line.split(" ");
int count = 0;
for (int i = 0; i < words.length; i++) {
String word = "";
int m = i;
while (count < 5) {
count++;
word += " " + words[m];
System.out.println(word);
m++;
}
}
}
但是输出是错误的!输出:
Pacific
Pacific Ocean
Pacific Ocean is
Pacific Ocean is the
Pacific Ocean is the largest
如何解决?
使用嵌套的 for 循环而不是 while 循环,并在外部循环中前进起始词:
public static void getComb(String line) {
String words[] = line.split(" ");
for (int i = 0; i < words.length; i++) {
String word = "";
for (int w = i; w < ((i + 5 < words.length) ? (i + 5) : words.length); w++) {
word += " " + words[w];
System.out.println(word);
}
}
}
请注意内部 for 循环中条件中的((i + 5 < words.length) ? (i + 5) : words.length)
;需要它,以便在剩余的单词少于五个时不会访问数组之外的元素 - 没有它,你会得到一个ArrayIndexOutOfBoundsException
更改代码段count = 0
的位置:
public void getComb(String line) {
String words[] = line.split(" ");
for (int i = 0; i < words.length; i++) {
int count = 0; // RESET COUNT
String word = "";
int m = i;
while (count < 5 && m < words.length) { // NO EXCEPTION with 'm' limit
count++;
word += " " + words[m];
System.out.println(word);
m++;
}
}
}
正式地,您希望从字符串中找到大小为 1、2、3、4 和 5 的 n 元语法。Apache Lucene 库中的 ShingleFilter 类可用于此目的。来自JavaDoc:
瓦片过滤器从令牌流构造带状疱疹(令牌 n 元语法)。换句话说,它将令牌组合创建为单个令牌。 例如,句子"请将这句话分成带状疱疹"可以标记为带状疱疹"请分割"、"分割这个"、"这句"、"句子成"和"成带状疱疹"。
尝试以下方法。安迪内定的修改版
public void getComb(String line)
{
String words[] = line.split(" ");
for(int i=0;i<words.length;i++)
{
int count=0; //******* RESET CONT *****//
String word = "";
int m=i;
while(count<5 && m < 10)
{
count++;
word += " "+words[m];
System.out.println(word);
m++;
}
}
}