我需要一些帮助来理解python中的ElementTree来迭代这个xml字符串:
b'<?xml version="1.0" ?><BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><OperationRequest><HTTPHeaders><Header Name="UserAgent" Value="Python-urllib/3.5"/></HTTPHeaders><RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId><Arguments><Argument Name="AWSAccessKeyId" Value="xxxxx"/><Argument Name="AssociateTag" Value="xxx-21"/><Argument Name="BrowseNodeId" Value="2844434031"/><Argument Name="Operation" Value="BrowseNodeLookup"/><Argument Name="Service" Value="AWSECommerceService"/><Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/><Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/><Argument Name="Version" Value="2011-08-01"/></Arguments><RequestProcessingTime>0.005945883</RequestProcessingTime></OperationRequest><BrowseNodes><Request><IsValid>True</IsValid><BrowseNodeLookupRequest><BrowseNodeId>2844434031</BrowseNodeId></BrowseNodeLookupRequest></Request><BrowseNode><BrowseNodeId>2844434031</BrowseNodeId><Name>Categorie</Name><IsCategoryRoot>1</IsCategoryRoot><Children><BrowseNode><BrowseNodeId>2892859031</BrowseNodeId><Name>Donna</Name></BrowseNode><BrowseNode><BrowseNodeId>2892862031</BrowseNodeId><Name>Uomo</Name></BrowseNode><BrowseNode><BrowseNodeId>2892857031</BrowseNodeId><Name>Bambine e ragazze</Name></BrowseNode><BrowseNode><BrowseNodeId>2892858031</BrowseNodeId><Name>Bambini e ragazzi</Name></BrowseNode><BrowseNode><BrowseNodeId>1739205031</BrowseNodeId><Name>Prima infanzia</Name></BrowseNode><BrowseNode><BrowseNodeId>2892860031</BrowseNodeId><Name>Abbigliamento specifico e altre marche</Name></BrowseNode></Children><Ancestors><BrowseNode><BrowseNodeId>2844433031</BrowseNodeId><Name>Abbigliamento</Name></BrowseNode></Ancestors></BrowseNode></BrowseNodes></BrowseNodeLookupResponse>'
我需要在输出中子浏览ID和名称:
children : 2892859031
name : Donna
children : 2892862031
name : Uomo
children : 2892857031
name : Bambine e ragazze
...
有人可以帮助我用 Python 编写一个小脚本来解析这个 XML?
如果你仍然想用python ElementTree来做这件事,这基本上可以打印你正在寻找的信息。使用递归。
#!/usr/bin/env python3
import xml.etree.ElementTree as ET
root=ET.fromstring(b'<?xml version="1.0" ?><BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><OperationRequest><HTTPHeaders><Header Name="UserAgent" Value="Python-urllib/3.5"/></HTTPHeaders><RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId><Arguments><Argument Name="AWSAccessKeyId" Value="xxxxx"/><Argument Name="AssociateTag" Value="xxx-21"/><Argument Name="BrowseNodeId" Value="2844434031"/><Argument Name="Operation" Value="BrowseNodeLookup"/><Argument Name="Service" Value="AWSECommerceService"/><Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/><Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/><Argument Name="Version" Value="2011-08-01"/></Arguments><RequestProcessingTime>0.005945883</RequestProcessingTime></OperationRequest><BrowseNodes><Request><IsValid>True</IsValid><BrowseNodeLookupRequest><BrowseNodeId>2844434031</BrowseNodeId></BrowseNodeLookupRequest></Request><BrowseNode><BrowseNodeId>2844434031</BrowseNodeId><Name>Categorie</Name><IsCategoryRoot>1</IsCategoryRoot><Children><BrowseNode><BrowseNodeId>2892859031</BrowseNodeId><Name>Donna</Name></BrowseNode><BrowseNode><BrowseNodeId>2892862031</BrowseNodeId><Name>Uomo</Name></BrowseNode><BrowseNode><BrowseNodeId>2892857031</BrowseNodeId><Name>Bambine e ragazze</Name></BrowseNode><BrowseNode><BrowseNodeId>2892858031</BrowseNodeId><Name>Bambini e ragazzi</Name></BrowseNode><BrowseNode><BrowseNodeId>1739205031</BrowseNodeId><Name>Prima infanzia</Name></BrowseNode><BrowseNode><BrowseNodeId>2892860031</BrowseNodeId><Name>Abbigliamento specifico e altre marche</Name></BrowseNode></Children><Ancestors><BrowseNode><BrowseNodeId>2844433031</BrowseNodeId><Name>Abbigliamento</Name></BrowseNode></Ancestors></BrowseNode></BrowseNodes></BrowseNodeLookupResponse>')
# This AMAZON_STR is a kind of "header". It comes from xmls attribute of BrowseNodeLookupResponse (could be extracted from it, in case it changes in the future)
AMAZON_STR="{http://webservices.amazon.com/AWSECommerceService/2011-08-01}"
# Recursive function that deals with a node
def get_data_from(xml_node):
# Loop over the children of current node
for child in xml_node:
# To avoid the "Category" Node, we only check the ones who have exactly 2 children
if len(child) == 2:
# We only look into the content of BrowseNode nodes
if child.tag == AMAZON_STR + "BrowseNode":
# Looping over the children of a BrowseNode node,
# we simply print the contents (.text)
for c in child:
if c.tag == AMAZON_STR + "BrowseNodeId":
print("children : " + c.text)
if c.tag == AMAZON_STR + "Name":
print("name : " + c.text)
get_data_from(child)
# Finally call the function on the top node (root)
get_data_from(root)
输出:
$ ./test_script
children : 2892859031
name : Donna
children : 2892862031
name : Uomo
children : 2892857031
name : Bambine e ragazze
children : 2892858031
name : Bambini e ragazzi
children : 1739205031
name : Prima infanzia
children : 2892860031
name : Abbigliamento specifico e altre marche
children : 2844433031
name : Abbigliamento
附录:xml 字符串的内容一旦缩进,就更容易理解:
<?xml version="1.0" ?>
<BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
<OperationRequest>
<HTTPHeaders>
<Header Name="UserAgent" Value="Python-urllib/3.5"/>
</HTTPHeaders>
<RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId>
<Arguments>
<Argument Name="AWSAccessKeyId" Value="xxxxx"/>
<Argument Name="AssociateTag" Value="xxx-21"/>
<Argument Name="BrowseNodeId" Value="2844434031"/>
<Argument Name="Operation" Value="BrowseNodeLookup"/>
<Argument Name="Service" Value="AWSECommerceService"/>
<Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/>
<Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/>
<Argument Name="Version" Value="2011-08-01"/>
</Arguments>
<RequestProcessingTime>0.005945883</RequestProcessingTime>
</OperationRequest>
<BrowseNodes>
<Request>
<IsValid>True</IsValid>
<BrowseNodeLookupRequest>
<BrowseNodeId>2844434031</BrowseNodeId>
</BrowseNodeLookupRequest>
</Request>
<BrowseNode>
<BrowseNodeId>2844434031</BrowseNodeId>
<Name>Categorie</Name>
<IsCategoryRoot>1</IsCategoryRoot>
<Children>
<BrowseNode>
<BrowseNodeId>2892859031</BrowseNodeId>
<Name>Donna</Name>
</BrowseNode>
<BrowseNode>
<BrowseNodeId>2892862031</BrowseNodeId>
<Name>Uomo</Name>
</BrowseNode>
<BrowseNode>
<BrowseNodeId>2892857031</BrowseNodeId>
<Name>Bambine e ragazze</Name>
</BrowseNode>
<BrowseNode>
<BrowseNodeId>2892858031</BrowseNodeId>
<Name>Bambini e ragazzi</Name>
</BrowseNode>
<BrowseNode>
<BrowseNodeId>1739205031</BrowseNodeId>
<Name>Prima infanzia</Name>
</BrowseNode>
<BrowseNode>
<BrowseNodeId>2892860031</BrowseNodeId>
<Name>Abbigliamento specifico e altre marche</Name>
</BrowseNode>
</Children>
<Ancestors>
<BrowseNode>
<BrowseNodeId>2844433031</BrowseNodeId>
<Name>Abbigliamento</Name>
</BrowseNode>
</Ancestors>
</BrowseNode>
</BrowseNodes>
</BrowseNodeLookupResponse>