我使用lxml来解析html字符串,例如:
<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>
html.xpath的结果是
["111","222","20201208","26","1431"]
我的问题是我能得到像这样的结果吗
["111","222","20201208","","26","1431",""]
lxml中有什么选项可以实现吗
我使用以下代码来获取元素:
tds=tr.xpath(".//td/text()")
以下是如何使用ElementTree或lxml(其代码相同,只是导入不同(
import xml.etree.ElementTree as ET
from lxml import etree
xml = '''<tr>
<td>111</td>
<td>222</td>
<td>20201208</td>
<td></td>
<td>26</td>
<td>1431</td>
<td></td>
</tr>'''
root1 = ET.fromstring(xml)
data = [td.text if td.text else '' for td in root1.findall('.//td')]
print(data)
root2 = etree.fromstring(xml)
data = [td.text if td.text else '' for td in root2.findall('.//td')]
print(data)
输出
['111', '222', '20201208', '', '26', '1431', '']
['111', '222', '20201208', '', '26', '1431', '']