我正在尝试获取当前所选节点的前一个同级节点,但不确定我做错了什么。
这是html快照:
source = """
<div class="zg_itemImmersion">
<div class="zg_rankDiv"><span class="zg_rankNumber">10.</span></div>
<div class="zg_itemWrapper" style="height:285px">
<div class="zg_image">
<div class="zg_itemImageImmersion"><a href="
http://www.amazon.com/Oral-B-Action-Replacement-Electric-Toothbrush/dp/B000AUIFCA/ref=zg_mw_8517148011_10"><img src="http://ecx.images-amazon.com/images/I/41RHKIQXnhL._SL160_SL150_.jpg" alt="Oral-B Floss Action Replacement Elect..." title="Oral-B Floss Action Replacement Elect..."/></a></div>
</div>
</div>
"""
如果href包含ASIN:B000AUIFCA,,我想得到的是rankNumber
from lxml import html
source1 = html.fromstring(source)
links = source1.xpath('//div[@class="zg_itemImmersion"]//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]/@href')
上面给了我一个正确的链接,其中包含我需要的ASIN:B000AUIFCA
['nnnnnnnhttp://www.amazon.com/Oral-B-Action-Replacement-Electric-Toothbrush/dp/B000AUIFCA/ref=zg_mw_8517148011_10/191-4138574-0525467']
现在,如果('//span[@class="zg_rankNumber"]//a//@href')
中的ASIN==B000AUIFCA ,我想从上一个兄弟[span class="zg_rankNumber"]
中获得等级"10"
我正在使用的:link2 = source1.xpath('//div[@class="zg_itemImmersion"]//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]/preceding-sibling::*/text()')
但其返回的空
您可以使用以下XPath:
//div[@class="zg_itemImmersion"]
[.//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]]
//span[@class="zg_rankNumber"]
XPath首先查找"zg_itemImmersion"div
,其中包含目标文本"ASIN:B000AUIFCA"。然后从这样的div
返回'zg_rankNumber'span
。