如何将Illinois Chunker与句子一起用作输入



我正在尝试按句使用Illinois Chunker。可以说,提供的入口点是以下代码片段:

public class ChunksAndPOSTags {
    public static void main(String[] args) {
    String filename = null;
    try {
        filename = args[0];
        if (args.length > 1) throw new Exception();
    }
    catch (Exception e) {
        System.err.println("usage: java edu.illinois.cs.cogcomp.lbj.chunk.ChunksAndPOSTags <input file>");
        System.exit(1);
    }
    Chunker chunker = new Chunker();
    Parser parser = new PlainToTokenParser(
        new WordSplitter(new SentenceSplitter(filename)));
        String previous = "";
        for (Word w = (Word) parser.next(); w != null; w = (Word) parser.next()) {
            String prediction = chunker.discreteValue(w);
            if (prediction.startsWith("B-") ||
                prediction.startsWith("I-") &&
                !previous.endsWith(prediction.substring(2)))
                System.out.print("[" + prediction.substring(2) + " ");
            System.out.print("(" + w.partOfSpeech + " " + w.form + ") ");
            if (!prediction.equals("O") &&
                (w.next == null                                 || 
                 chunker.discreteValue(w.next).equals("O")      || 
                 chunker.discreteValue(w.next).startsWith("B-") ||
                 !chunker.discreteValue(w.next).endsWith(prediction.substring(2))))
                System.out.print("] ");
            if (w.next == null)
                System.out.println();
            previous = prediction;
        }
    }
}

我们如何修改上面的内容以一次一句而不是给出一个文本文件?

您应该创建自己的PensioneParser,它只会返回您的字符串(您的"一次一句话")。

下面是示例代码

import LBJ2.parse.Parser;
import LBJ2.nlp.Sentence;
public class FakeSentenceSplitter implements Parser {
    private final String sentenceText;
    public FakeSentenceSplitter(String sentenceText) {
        super();
        this.sentenceText = sentenceText;
    }
    public Object next() {
        return new Sentence(sentenceText);
    }
    public void reset() {
    }
    public void close() {
    }
}

如果您还没有使用LBJ2软件包,可以在这里下载。

之后,你应该在这一行中使用新的句子拆分器:

Parser parser = new PlainToTokenParser(
        new WordSplitter(new FakeSentenceSplitter(filename)));

最新更新