用Java打印每个段落的第一句话



我有一个文本文件,希望打印每个段落的第一句。段落之间用换行符分隔,即"\n"。

从BreakIterator中,我想我可以使用getLineInstance()来实现这一点,但它似乎是每个单词的迭代器:

public String[] extractFirstSentences() {
    BreakIterator boundary = BreakIterator.getLineInstance(Locale.US);
    boundary.setText(getText());
    List<String> sentences = new ArrayList<String>();
    int start = boundary.first();
    int end = boundary.next();
    while (end != BreakIterator.DONE) {
        String sentence = getText().substring(start, end).trim();
        if (!sentence.isEmpty()) {
            sentences.add(sentence);
        }
        start = end;
        end = boundary.next();
    }
    return sentences.toArray(new String[sentences.size()]);

我是否错误地使用了getLineInstance(),或者是否有其他方法可以执行我想要的操作?

作为替代方案如何:

public String[] extractFirstSentences() {
    String myText = getText();
    String[] paragraphs = myText.split("\n");
    List<String> result = new ArrayList<String>();
    for (String paragraph : paragraphs) {
        result.add(paragraph.split("[\.\?\!][\r\n\t ]+")[0] + ".");
    }
    return result.toArray(new String[result.size()]);
}

最新更新