我使用beautifulsoup来解析XML文件,所以解析是通过标记名来完成的但我能不能在标签里加上另一个词来表示搜索?
Data = soup.find_all('Data')
for Data in Data:
Data = Data.get_text()
Data是标签的名称但是我可以在这个标签中选择一个词来解析它吗?比如像这样
Data = soup.find_all("Data", name = '"ObjectClass')
for Data in Data:Data = Data.get_text()打印(数据)
我尝试了这个,但得到这个错误TypeError: Tag.find_all()获得了参数name的多个值
这是一个XML示例:
<Document>
<Data Name="ObjectClass">computer</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
<Data Name="ObjectClass">computer</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
</Document>
我只搜索name =object class
这将只获得Name="ObjectClass"
的Data
标记。外部库需要pip install bs4 lxml
:
from bs4 import BeautifulSoup
xml = '''
<Document>
<Data Name="ObjectClass">computer</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
<Data Name="ObjectClass">computer</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
<Other Name="ObjectClass">other</Other>
</Document>
'''
soup = BeautifulSoup(xml,'xml')
for data in soup.find_all('Data',Name='ObjectClass'):
print(data.get_text())
输出:
computer
computer
注意大小写(Name
而不是name
)。
一行,不需要任何外部库:-)
import xml.etree.ElementTree as ET
xml = '''
<Document>
<Data Name="ObjectClass">computer1</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
<Data Name="ObjectClass">compute2r</Data>
<Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
<Other Name="ObjectClass">other</Other>
</Document>
'''
root = ET.fromstring(xml)
object_class_data = [x.text for x in root.findall('.//Data[@Name="ObjectClass"]')]
print(object_class_data)
输出['computer1', 'compute2r']