我想从 C# 代码.docx文件中读取数据 - 如字符串。我浏览了一些问题,但不明白要使用哪一个。
我正在尝试使用ApplicationClass Application = new ApplicationClass();
但我得到
错误:
类型"Microsoft.Office.Interop.Word.ApplicationClass"没有 定义的构造函数
我想从我的docx文件中获取全文,而不是单独的单词!
foreach (FileInfo f in docFiles)
{
Application wo = new Application();
object nullobj = Missing.Value;
object file = f.FullName;
Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
doc.Activate();
doc. == ??
}
我想知道如何从docx文件中获取全文?
这就是我想从docx文件中提取全文的内容!
using (ZipFile zip = ZipFile.Read(filename))
{
MemoryStream stream = new MemoryStream();
zip.Extract(@"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin);
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load(stream);
string PlainTextContent = xmldoc.DocumentElement.InnerText;
}
试试
Word.Application interface instead of ApplicationClass.
了解 Office 主互操作程序集类和接口
.docx格式与其他以"x"结尾的Microsoft Office文件一样,只是一个可以打开/修改/压缩的ZIP包。
因此,请使用像这样的 Office Open XML 库。
享受。
确保您使用的是 .Net Framework 4.5。
using NUnit.Framework;
[TestFixture]
public class GetDocxInnerTextTestFixture
{
private string _inputFilepath = @"../../TestFixtures/TestFiles/input.docx";
[Test]
public void GetDocxInnerText()
{
string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);
Assert.IsNotNull(documentText);
Assert.IsTrue(documentText.Length > 0);
}
}
using System.IO;
using System.IO.Compression;
using System.Xml;
public static class DocxInnerTextReader
{
public static string GetDocxInnerText(string docxFilepath)
{
string folder = Path.GetDirectoryName(docxFilepath);
string extractionFolder = folder + "\extraction";
if (Directory.Exists(extractionFolder))
Directory.Delete(extractionFolder, true);
ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
string xmlFilepath = extractionFolder + "\word\document.xml";
var xmldoc = new XmlDocument();
xmldoc.Load(xmlFilepath);
return xmldoc.DocumentElement.InnerText;
}
}
首先,您需要从程序集中添加一些引用,例如:
System.Xml
System.IO.Compression.FileSystem
其次,你应该确定在你的类中使用以下命令来调用它们:
using System.IO;
using System.IO.Compression;
using System.Xml;
然后你可以使用以下代码:
public string DocxToString(string docxPath)
{
// Destination of your extraction directory
string extractDir = Path.GetDirectoryName(docxPath) + "\" + Path.GetFileName(docxPath) + ".tmp";
// Delete old extraction directory
if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);
// Extract all of media an xml document in your destination directory
ZipFile.ExtractToDirectory(docxPath, extractDir);
XmlDocument xmldoc = new XmlDocument();
// Load XML file contains all of your document text from the extracted XML file
xmldoc.Load(extractDir + "\word\document.xml");
// Delete extraction directory
Directory.Delete(extractDir, true);
// Read all text of your document from the XML
return xmldoc.DocumentElement.InnerText;
}
享受。。。