在docx中替换文本，并使用python docx保存更改后的文件

我正试图使用python docx模块来替换文件中的一个单词，并保存新文件，但需要注意的是，新文件的格式必须与旧文件完全相同，但要替换单词。我该怎么做？

docx模块有一个savedocx，它接受7个输入：

文件
岩心支柱
appprops
内容类型
网络设置
单词关系
输出

除了替换的单词外，我如何保持原始文件中的所有内容不变？

这对我有效：

rep = {'a': 'b'}  # lookup dictionary, replace `a` with `b`
def docx_replace(old_file,new_file,rep):
    zin = zipfile.ZipFile (old_file, 'r')
    zout = zipfile.ZipFile (new_file, 'w')
    for item in zin.infolist():
        buffer = zin.read(item.filename)
        if (item.filename == 'word/document.xml'):
            res = buffer.decode("utf-8")
            for r in rep:
                res = res.replace(r,rep[r])
            buffer = res.encode("utf-8")
        zout.writestr(item, buffer)
    zout.close()
    zin.close()

看起来，Docx for Python并不意味着存储一个完整的带有图像、标头…的Docx，但仅包含文档的内部内容。所以没有简单的方法可以做到这一点。

然而，以下是你可以做到的方法：

首先，看看docx标签wiki:

它解释了如何解压缩docx文件：以下是典型文件的样子：

+--docProps
|  +  app.xml
|    core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |    image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |    theme1.xml
|  +  webSettings.xml
|  --_rels
|       document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
--_rels
     .rels

Docx只获取文档的一部分，方法为opendocx

def opendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

它只获取document.xml文件。

我建议你做的是：

使用**opendocx获取文档的内容*
用advReplace方法替换document.xml
以zip形式打开docx，并用新的xml内容替换document.xml内容
关闭并输出压缩文件（将其重命名为output.docx）

如果你安装了node.js，请注意，我已经开发了DocxGenJS，它是docx文档的模板引擎，该库正在积极开发中，很快将作为节点模块发布。

您在这里使用docx模块吗？

如果是，那么docx模块已经公开了replace、advReplace等方法，这些方法可以帮助您完成任务。有关公开方法的更多详细信息，请参阅源代码。

from docx import Document
file_path = 'C:/tmp.docx'
document = Document(file_path)
def docx_replace(doc_obj, data: dict):
    """example: data=dict(order_id=123), result: {order_id} -> 123"""
    for paragraph in doc_obj.paragraphs:
        for key, val in data.items():
            key_name = '{{{}}}'.format(key)
            if key_name in paragraph.text:
                paragraph.text = paragraph.text.replace(key_name, str(val))
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace(cell, data)
docx_replace(document, dict(order_id=123, year=2018, payer_fio='payer_fio', payer_fio1='payer_fio1'))
document.save(file_path)

上述方法的问题是它们丢失了现有的格式。请看我的答案，它执行替换并保留格式。

还有python-docx-template，它允许在docx模板中使用jinja2样式的模板。这是文档的链接

我在这里分叉了一个python docx的repo，它保留了docx文件中所有预先存在的数据，包括格式化。希望这就是你想要的。

除了@ramil之外，在将一些字符作为字符串值放入XML之前，还必须对它们进行转义，所以这对我很有效：

def escape(escapee):
  escapee = escapee.replace("&", "&amp;")
  escapee = escapee.replace("<", "&lt;")
  escapee = escapee.replace(">", "&gt;")
  escapee = escapee.replace(""", "&quot;")
  escapee = escapee.replace("'", "&apos;")
return escapee

我们可以使用python docx在docx上保存图像。docx将图像检测为段落。但对于这一段，案文是空的。所以你可以这样使用。paragraphs = document.paragraphs for paragraph in paragraphs: if paragraph.text == '': continue

相关内容

最新更新

热门标签：