尝试使用lxml-xpath选择先例节点



我正在尝试获取当前所选节点的前一个同级节点,但不确定我做错了什么。

这是html快照:

source = """
    <div class="zg_itemImmersion">
    <div class="zg_rankDiv"><span class="zg_rankNumber">10.</span></div>
    <div class="zg_itemWrapper" style="height:285px">
       <div class="zg_image">
          <div class="zg_itemImageImmersion"><a  href="
             http://www.amazon.com/Oral-B-Action-Replacement-Electric-Toothbrush/dp/B000AUIFCA/ref=zg_mw_8517148011_10"><img src="http://ecx.images-amazon.com/images/I/41RHKIQXnhL._SL160_SL150_.jpg" alt="Oral-B Floss Action Replacement Elect..." title="Oral-B Floss Action Replacement Elect..."/></a></div>
       </div>
    </div>
"""

如果href包含ASIN:B000AUIFCA,,我想得到的是rankNumber

from lxml import html 
source1 = html.fromstring(source)
links = source1.xpath('//div[@class="zg_itemImmersion"]//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]/@href')

上面给了我一个正确的链接,其中包含我需要的ASIN:B000AUIFCA

['nnnnnnnhttp://www.amazon.com/Oral-B-Action-Replacement-Electric-Toothbrush/dp/B000AUIFCA/ref=zg_mw_8517148011_10/191-4138574-0525467']

现在,如果('//span[@class="zg_rankNumber"]//a//@href')中的ASIN==B000AUIFCA ,我想从上一个兄弟[span class="zg_rankNumber"]中获得等级"10"

我正在使用的:link2 = source1.xpath('//div[@class="zg_itemImmersion"]//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]/preceding-sibling::*/text()')

但其返回的空

您可以使用以下XPath:

//div[@class="zg_itemImmersion"]
     [.//div[@class="zg_itemImageImmersion"]/a[contains(@href,"B000AUIFCA")]]
//span[@class="zg_rankNumber"]

XPath首先查找"zg_itemImmersion"div,其中包含目标文本"ASIN:B000AUIFCA"。然后从这样的div返回'zg_rankNumber'span

最新更新