递归遍历XML后返回值



我正在处理一个非常嵌套的XML文件,路径对于理解至关重要。这个答案使我能够打印路径和值:Python xml绝对路径

我不能弄清楚的是如何以更可用的方式输出结果(试图构建一个列出路径和值的数据框架)。

例如,从链接的示例中:

<A>
<B>foo</B>
<C>
<D>On</D>
</C>
<E>Auto</E>
<F>
<G>
<H>shoo</H>
<I>Off</I>
</G>
</F>
</A>
from lxml import etree
root = etree.XML(your_xml_string)
def print_path_of_elems(elem, elem_path=""):
for child in elem:
if not child.getchildren() and child.text:
# leaf node with text => print
print "%s/%s, %s" % (elem_path, child.tag, child.text)
else:
# node with child elements => recurse
print_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
print_path_of_elems(root, root.tag)

输出结果如下:

/A/B, foo
/A/C/D, On
/A/E, Auto
/A/F/G/H, shoo
/A/F/G/I, Off

我相信yield是正确的技术,但我没有得到任何地方,当前尝试返回什么:

from lxml import etree
root = etree.XML(your_xml_string)
def yield_path_of_elems(elem, elem_path=""):
for child in elem:
if not child.getchildren() and child.text:
ylddict = {'Path':elem_path, 'Value':child.text}
yield(ylddict)
else:
# node with child elements => recurse
yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))
for i in yield_path_of_elems(root):
#print for simplicity in example, otherwise turn into DF and concat
print(i)

从实验中,我相信当我使用yield或return时,递归不能正确运行。

您需要将递归调用产生的值传递回原始调用者。所以改变:

yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

yield from yield_path_of_elems(child, "%s/%s" % (elem_path, child.tag))

这类似于在普通递归函数中使用return recursive_call(...)的方式。

最新更新