如果我们支持以下XML文件:
<XML Data>
<Record>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="C"></Product>
</Service>
</Record>
<Record>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="Y"></Product>
</Service>
</Record>
<Record>
<Service>
<Product id="U"></Product>
</Service>
</Record>
</XML Data>
正如您所看到的,每条记录都显示一个客户端,但没有唯一的标识符。每项服务都有多个产品。
我想获得与产品A一起销售的所有产品。因此,我试图得到这样一个列表:
ServiceID
B
C
Y
我一直在使用:
import xml.etree.ElementTree as ET
根据官方文档,您可以通过[@attrib='value']
基于属性选择元素。在测试时,我将您的标签<XML Data>
和</XML Data>
与<Data>
和</Data>
进行了交换。示例代码:
from xml.etree import ElementTree as ET
data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
for product in root.findall("./Record/Service/Product[@id='A']"):
print(product.attrib["id"])
print(product.text)
编辑
在再次阅读你的问题后,我注意到你首先想检查服务中是否存在id为a的产品,然后才存储id(唯一排序(,所以我修改了代码:
from xml.etree import ElementTree as ET
data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
product_ids = set()
for service in root.findall("./Record/Service"):
list_contains_a = False
# iterate once to identify if list contains product with ID = 'A'
for product in service.findall("./Product"):
if product.attrib["id"] == "A":
list_contains_a = True
# if list contains product with ID = 'A', iterate second time and fetch IDs
if list_contains_a:
for product in service.findall("./Product"):
if product.attrib["id"] == "A":
continue
# add to set to prevent duplicates
product_ids.add(product.attrib["id"])
ret_list = ["ServiceID"] + list(sorted(product_ids))
print(ret_list)