如何从Word Document .doc或.docx中的两个标题之间获取所有文本



如何在特定标题下的两个标题或文本之间获取所有文本?喜欢..

"标题ABC"

"标题xyz"
这是XYZ标题下的内容
测试..

" XYZ的子标题或标题2"
xyz标题继续

"标题123" 标题下的内容123

我想获取XYZ标题的所有内容,包括子标题,直到出现下一个标题123。文件可以是.doc或.docx

您可以使用NPOI库读取Word文档。一些示例代码可以使您开始。

public string ReadAllTextFromWordDocFile(string fileName)
{
    using (StreamReader streamReader = new StreamReader(fileName))
    {
        var document = new HWPFDocument(streamReader.BaseStream);
        var wordExtractor = new WordExtractor(document);
        var docText = new StringBuilder();
        foreach (string text in wordExtractor.ParagraphText)
        {
            docText.AppendLine(text.Trim());
        }
        streamReader.Close();
        return docText.ToString();
    }
}

玩一点。

您还想看看Docx。这里的基本示例。MagicText每个段落的属性可能会帮助您识别标题。

 private void DocReader(string fileLocation,string headingText, string headingStyle)
    {
        Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
        object miss = System.Reflection.Missing.Value;
        object path = fileLocation;
        object readOnly = true;
        Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
        string totaltext = "";
        int ind = 0;
        bool flag = false;
        int paraCount = docs.Paragraphs.Count;
        for (int i = 1; i < paraCount; i++)
        {
            Microsoft.Office.Interop.Word.Style style = docs.Paragraphs[i].get_Style() as Microsoft.Office.Interop.Word.Style;
            if (style != null && style.NameLocal.Equals(headingStyle))
            {
                flag = false;
                if (docs.Paragraphs[i].Range.Text.ToString().TrimEnd('r').ToUpper() == headingText.ToUpper())
                {
                    ind++;
                    flag = true;
                }
            }
            if (flag && ind>=1)
                totaltext += " rn " + docs.Paragraphs[i].Range.Text.ToString();
        }
        if (totaltext == "") { totaltext = "No such data found!"; }
        richTextBox1.Text = totaltext;
        docs.Close();
        word.Quit();  }

最新更新