我正在尝试按句使用Illinois Chunker。可以说,提供的入口点是以下代码片段:
public class ChunksAndPOSTags {
public static void main(String[] args) {
String filename = null;
try {
filename = args[0];
if (args.length > 1) throw new Exception();
}
catch (Exception e) {
System.err.println("usage: java edu.illinois.cs.cogcomp.lbj.chunk.ChunksAndPOSTags <input file>");
System.exit(1);
}
Chunker chunker = new Chunker();
Parser parser = new PlainToTokenParser(
new WordSplitter(new SentenceSplitter(filename)));
String previous = "";
for (Word w = (Word) parser.next(); w != null; w = (Word) parser.next()) {
String prediction = chunker.discreteValue(w);
if (prediction.startsWith("B-") ||
prediction.startsWith("I-") &&
!previous.endsWith(prediction.substring(2)))
System.out.print("[" + prediction.substring(2) + " ");
System.out.print("(" + w.partOfSpeech + " " + w.form + ") ");
if (!prediction.equals("O") &&
(w.next == null ||
chunker.discreteValue(w.next).equals("O") ||
chunker.discreteValue(w.next).startsWith("B-") ||
!chunker.discreteValue(w.next).endsWith(prediction.substring(2))))
System.out.print("] ");
if (w.next == null)
System.out.println();
previous = prediction;
}
}
}
我们如何修改上面的内容以一次一句而不是给出一个文本文件?
您应该创建自己的PensioneParser,它只会返回您的字符串(您的"一次一句话")。
下面是示例代码
import LBJ2.parse.Parser;
import LBJ2.nlp.Sentence;
public class FakeSentenceSplitter implements Parser {
private final String sentenceText;
public FakeSentenceSplitter(String sentenceText) {
super();
this.sentenceText = sentenceText;
}
public Object next() {
return new Sentence(sentenceText);
}
public void reset() {
}
public void close() {
}
}
如果您还没有使用LBJ2软件包,可以在这里下载。
之后,你应该在这一行中使用新的句子拆分器:
Parser parser = new PlainToTokenParser(
new WordSplitter(new FakeSentenceSplitter(filename)));