从xpath scrapy中进行双重选择

我想使用带有scrapy的xpath提取数据。这是我的代码：

def parse(self, response):
        Coords = []
        for sel in response.xpath('//*[@id="pitch"]/image[contains(@class,"success")]'):
            item = PogbaItem()
            item['x'] = sel.xpath('@x').extract()
            item['y'] = sel.xpath('@y').extract()
            item['x'] = sel.xpath('@x1').extract()
            item['y'] = sel.xpath('@y1').extract()
            Coords.append(item)
        return Coords

问题是html包含两个不同的元素：第一个元素（image）具有属性x,y，另一个元素（line）具有属性x1,y1。我试图把它们放在一起，得到一个最终的csv，但我找不到合适的xpath。我该怎么解决？

更新：HTML:的两个例子

<image class="pitch-object timer-1-40 success" x="331.172" y="84.678" width="30" height="30" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="/sites/fourfourtwo.com/modules/custom/statzone/files/icons/successful_clearance.png"></image>
<line class="pitch-object timer-2-84 success" marker-end="url(#smallblue)" x1="453.076" y1="199.169" x2="509.104" y2="216.676" style="stroke:blue;stroke-width:3"></line>

据我所知，如果存在x属性，则需要将其作为x值，否则需要x1属性，对于y也是如此。以下是我解决问题的方法：

item['x'] = sel.xpath('@x').extract_first() or sel.xpath('@x1').extract_first()
item['y'] = sel.xpath('@y').extract_first() or sel.xpath('@y1').extract_first()

或者，您可以有一个纯XPath解决方案：

item['x'] = sel.xpath('(@x|@x1)').extract_first()
item['y'] = sel.xpath('(@y|@y1)').extract_first()

而且，由于您需要同时处理line和image元素，因此应该调整您的主表达式来处理它：

//*[@id="pitch"]/*[contains(@class,"success")]

或者：

//*[@id="pitch"]/*[(self::image or self::line) and contains(@class,"success")]

相关内容

最新更新

热门标签：