lxml 在树中插入新节点,其中包含父节点的内容



我有这棵树:

<TEI>
<teiHeader/>
<text>
<body>
<div type="chapter">
<p rend="b"><pb n="1"/>lorem ipsum...</p>
<p rend="b">lorem pb n="2"/> ipsum2...</p>
<p>lorem ipsum3...</p>
</div>
<div type="chapter">
<p>lorem ipsum4...</p>
<p rend="b">lorem ipsum5...</p>
<p rend="b">pb n="3"/> lorem ipsum6...</p>
</div>
</body>
</text>
</TEI>

和我想改变所有

<p rend="b">lorem ipsum...</p>

<p><hi rend="b">lorem ipsum...</hi></p>

问题是:所有的<pb n="X"/>标签被移除。

我试过这个(根=上面的XML树):

parser = etree.XMLParser(ns_clean=True, remove_blank_text=True)
root = etree.fromstring(root, parser)
for item in root.findall(".//p[@rend='b']"):
hi = etree.SubElement(item, "hi", rend=font_variant[variant])
hi.text = ''.join(item.itertext())
print(etree.tostring(root, pretty_print=True, xml_declaration=True))

和我得到,例如第一个<p/>:

<p><pb n="1"/>lorem ipsum...<hi rend="b"> lorem ipsum...</hi></p>

<pb n="1"/>缺失

你能帮我吗?

如果我理解正确的话,你可能在寻找这样的东西:

for p in root.xpath('//p[@rend="b"]'):
#clone the old <p>
old = etree.fromstring(etree.tostring(p))
#change its name
old.tag = "hi"
#create a new element
new = etree.fromstring('<p/>')    
#append the clone to the new element
new.append(old)
new.tail ="n"
#delete the old <p> and replace it with the new element
p.getparent().replace(p, new)

最新更新