python xml parser with ElementTree amazon api



我需要一些帮助来理解python中的ElementTree来迭代这个xml字符串:

b'<?xml version="1.0" ?><BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><OperationRequest><HTTPHeaders><Header Name="UserAgent" Value="Python-urllib/3.5"/></HTTPHeaders><RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId><Arguments><Argument Name="AWSAccessKeyId" Value="xxxxx"/><Argument Name="AssociateTag" Value="xxx-21"/><Argument Name="BrowseNodeId" Value="2844434031"/><Argument Name="Operation" Value="BrowseNodeLookup"/><Argument Name="Service" Value="AWSECommerceService"/><Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/><Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/><Argument Name="Version" Value="2011-08-01"/></Arguments><RequestProcessingTime>0.005945883</RequestProcessingTime></OperationRequest><BrowseNodes><Request><IsValid>True</IsValid><BrowseNodeLookupRequest><BrowseNodeId>2844434031</BrowseNodeId></BrowseNodeLookupRequest></Request><BrowseNode><BrowseNodeId>2844434031</BrowseNodeId><Name>Categorie</Name><IsCategoryRoot>1</IsCategoryRoot><Children><BrowseNode><BrowseNodeId>2892859031</BrowseNodeId><Name>Donna</Name></BrowseNode><BrowseNode><BrowseNodeId>2892862031</BrowseNodeId><Name>Uomo</Name></BrowseNode><BrowseNode><BrowseNodeId>2892857031</BrowseNodeId><Name>Bambine e ragazze</Name></BrowseNode><BrowseNode><BrowseNodeId>2892858031</BrowseNodeId><Name>Bambini e ragazzi</Name></BrowseNode><BrowseNode><BrowseNodeId>1739205031</BrowseNodeId><Name>Prima infanzia</Name></BrowseNode><BrowseNode><BrowseNodeId>2892860031</BrowseNodeId><Name>Abbigliamento specifico e altre marche</Name></BrowseNode></Children><Ancestors><BrowseNode><BrowseNodeId>2844433031</BrowseNodeId><Name>Abbigliamento</Name></BrowseNode></Ancestors></BrowseNode></BrowseNodes></BrowseNodeLookupResponse>'

我需要在输出中子浏览ID和名称:

children : 2892859031
name : Donna
children : 2892862031
name : Uomo
children : 2892857031
name : Bambine e ragazze
...

有人可以帮助我用 Python 编写一个小脚本来解析这个 XML?

如果你仍然想用python ElementTree来做这件事,这基本上可以打印你正在寻找的信息。使用递归。

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
root=ET.fromstring(b'<?xml version="1.0" ?><BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><OperationRequest><HTTPHeaders><Header Name="UserAgent" Value="Python-urllib/3.5"/></HTTPHeaders><RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId><Arguments><Argument Name="AWSAccessKeyId" Value="xxxxx"/><Argument Name="AssociateTag" Value="xxx-21"/><Argument Name="BrowseNodeId" Value="2844434031"/><Argument Name="Operation" Value="BrowseNodeLookup"/><Argument Name="Service" Value="AWSECommerceService"/><Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/><Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/><Argument Name="Version" Value="2011-08-01"/></Arguments><RequestProcessingTime>0.005945883</RequestProcessingTime></OperationRequest><BrowseNodes><Request><IsValid>True</IsValid><BrowseNodeLookupRequest><BrowseNodeId>2844434031</BrowseNodeId></BrowseNodeLookupRequest></Request><BrowseNode><BrowseNodeId>2844434031</BrowseNodeId><Name>Categorie</Name><IsCategoryRoot>1</IsCategoryRoot><Children><BrowseNode><BrowseNodeId>2892859031</BrowseNodeId><Name>Donna</Name></BrowseNode><BrowseNode><BrowseNodeId>2892862031</BrowseNodeId><Name>Uomo</Name></BrowseNode><BrowseNode><BrowseNodeId>2892857031</BrowseNodeId><Name>Bambine e ragazze</Name></BrowseNode><BrowseNode><BrowseNodeId>2892858031</BrowseNodeId><Name>Bambini e ragazzi</Name></BrowseNode><BrowseNode><BrowseNodeId>1739205031</BrowseNodeId><Name>Prima infanzia</Name></BrowseNode><BrowseNode><BrowseNodeId>2892860031</BrowseNodeId><Name>Abbigliamento specifico e altre marche</Name></BrowseNode></Children><Ancestors><BrowseNode><BrowseNodeId>2844433031</BrowseNodeId><Name>Abbigliamento</Name></BrowseNode></Ancestors></BrowseNode></BrowseNodes></BrowseNodeLookupResponse>')
# This AMAZON_STR is a kind of "header". It comes from xmls attribute of BrowseNodeLookupResponse (could be extracted from it, in case it changes in the future)
AMAZON_STR="{http://webservices.amazon.com/AWSECommerceService/2011-08-01}"
# Recursive function that deals with a node
def get_data_from(xml_node):
    # Loop over the children of current node
    for child in xml_node:
        # To avoid the "Category" Node, we only check the ones who have exactly 2 children
        if len(child) == 2:
            # We only look into the content of BrowseNode nodes
            if child.tag == AMAZON_STR + "BrowseNode":
                # Looping over the children of a BrowseNode node,
                # we simply print the contents (.text)
                for c in child:
                    if c.tag == AMAZON_STR + "BrowseNodeId":
                        print("children : " + c.text)
                    if c.tag == AMAZON_STR + "Name":
                        print("name : " + c.text)
        get_data_from(child)
# Finally call the function on the top node (root)
get_data_from(root)

输出:

$ ./test_script
children : 2892859031
name : Donna
children : 2892862031
name : Uomo
children : 2892857031
name : Bambine e ragazze
children : 2892858031
name : Bambini e ragazzi
children : 1739205031
name : Prima infanzia
children : 2892860031
name : Abbigliamento specifico e altre marche
children : 2844433031
name : Abbigliamento

附录:xml 字符串的内容一旦缩进,就更容易理解:

<?xml version="1.0" ?>
<BrowseNodeLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
    <OperationRequest>
        <HTTPHeaders>
            <Header Name="UserAgent" Value="Python-urllib/3.5"/>
        </HTTPHeaders>
        <RequestId>54e05f2a-e792-11e5-8694-85b3fa7a9fcf</RequestId>
        <Arguments>
            <Argument Name="AWSAccessKeyId" Value="xxxxx"/>
            <Argument Name="AssociateTag" Value="xxx-21"/>
            <Argument Name="BrowseNodeId" Value="2844434031"/>
            <Argument Name="Operation" Value="BrowseNodeLookup"/>
            <Argument Name="Service" Value="AWSECommerceService"/>
            <Argument Name="Signature" Value="cf1A3M8S30Y32EdxVVoBljYUNrt4ZiqgvM+/B1uPrDg="/>
            <Argument Name="Timestamp" Value="2016-03-11T14:05:38Z"/>
            <Argument Name="Version" Value="2011-08-01"/>
        </Arguments>
        <RequestProcessingTime>0.005945883</RequestProcessingTime>
    </OperationRequest>
    <BrowseNodes>
        <Request>
            <IsValid>True</IsValid>
            <BrowseNodeLookupRequest>
                <BrowseNodeId>2844434031</BrowseNodeId>
            </BrowseNodeLookupRequest>
        </Request>
        <BrowseNode>
            <BrowseNodeId>2844434031</BrowseNodeId>
            <Name>Categorie</Name>
            <IsCategoryRoot>1</IsCategoryRoot>
            <Children>
                <BrowseNode>
                    <BrowseNodeId>2892859031</BrowseNodeId>
                    <Name>Donna</Name>
                </BrowseNode>
                <BrowseNode>
                    <BrowseNodeId>2892862031</BrowseNodeId>
                    <Name>Uomo</Name>
                </BrowseNode>
                <BrowseNode>
                    <BrowseNodeId>2892857031</BrowseNodeId>
                    <Name>Bambine e ragazze</Name>
                </BrowseNode>
                <BrowseNode>
                    <BrowseNodeId>2892858031</BrowseNodeId>
                    <Name>Bambini e ragazzi</Name>
                </BrowseNode>
                <BrowseNode>
                    <BrowseNodeId>1739205031</BrowseNodeId>
                    <Name>Prima infanzia</Name>
                </BrowseNode>
                <BrowseNode>
                    <BrowseNodeId>2892860031</BrowseNodeId>
                    <Name>Abbigliamento specifico e altre marche</Name>
                </BrowseNode>
            </Children>
            <Ancestors>
                <BrowseNode>
                    <BrowseNodeId>2844433031</BrowseNodeId>
                    <Name>Abbigliamento</Name>
                </BrowseNode>
            </Ancestors>
        </BrowseNode>
    </BrowseNodes>
</BrowseNodeLookupResponse>

最新更新