如何从列表中不存在的xml中删除子节点



我有一个具有不同节点级别的xml。我想检查树中的每个节点,只有当它及其子节点不在列表中时才将其删除

<node1>
<xxx>
stuff
</xxx>
<subnode2>
<yyy>
stuf2
</yyy>
</subnode2>
</node1>

我的问题是,如果"yyy"在Dont列表中,而其父项不在,那么yyy仍将被清除。

import xml.etree.ElementTree as ET
document = ET.parse("foo.xml")
root = document.getroot()
#list of nodes
toRemove = root.findall('.//')
#list of tags that shouldn't be removed
dontRemove = ['xxx','yyy']
#take element from root and compare it with "dont remove it", if it's present remove from removing list
for element in list(toRemove):
string = str(element)
string = string.split(" ")
string = string[1].replace("'", '')
print(string)
removed = 0
for i in range(len(dontRemove)):
if dontRemove[i] in string and removed == 0:
toRemove.remove(element)
removed = 1
#removing: 
for i in range(len(toRemove)):
toRemove[i].clear()

您可以检查元素是否应该重复删除——如果它至少包含一个"不可移动"的子元素,则不应该。

dontRemove = ['xxx','yyy']
elements_to_remove = []
def should_not_be_removed(parent):
if parent.tag in dontRemove:
return True
nonremovable_child_found = False
for child in parent:
if should_not_be_removed(child):
nonremovable_child_found = True
if not nonremovable_child_found:
elements_to_remove.append(parent)
return nonremovable_child_found
should_not_be_removed(root)

在这个以根开始的重复调用之后,elements_to_remove包含一个元素列表,这些元素不包含dont remove中指定的带有标记的子元素

我还扩展了您的xml以涵盖更多的测试用例,请检查这是否是您的意思:

<node1>
<xxx>
don't remove
</xxx>
<subnode2>
<yyy>
don't remove
</yyy>
</subnode2>
<subnode3>
remove
</subnode3>
<subnode4>
<xxx>
don't remove
</xxx>
<abc>
remove
</abc>
</subnode4>
</node1>

最新更新