如何在response.css中正确定义,在scrapy中正确定义yield



我是Scrapy的新手,有一件事我尝试了两天,但仍然没有成功。我正在练习提取https://sofifa.com/中列出的足球运动员的信息。我采用了https://docs.scrapy.org/上的代码示例,并将其编辑如下。我练习提取的信息是OVA。

有谁知道我应该如何正确地定义元素"span.something…"在下面的代码中?

很多谢谢,詹姆斯。

import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "player-css"
start_urls = [
'https://sofifa.com/players?type=all&tm%5B0%5D=1&r=210024&set=true',
]
**def parse(self, response):
for playerInfor in response.css("div.card"):
yield {**
**'OVA': playerInfor.css("span.bp3-tag p::bp3-tag p").extract()**
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))

使用response.css("tbody.list")代替response.css("div.card")

对于response.css("tbody.list")数据很容易提取,但当我使用response.css("div.card")时,结果是一些空列表与预期输出。

for playerInfor in response.css("tbody.list"):
print( playerInfor.css('td.col.col-oa.col-sort span::text').getall())
<标题>输出

("87"、"84"、"84","82","80","80","80","80","79","79","79","79","79","78","77","77","77","76","76","76","75","75","74","74","73","72","72","70","62","62","60","58","56")

的另一种方法
def parse(self, response):
mydata =response.css('tbody.list td.col.col-oa.col-sort span::text').extract()
yield {
"OVA":mydata
}

#输出mydata

("87"、"84"、"84","82","80","80","80","80","79","79","79","79","79","78","77","77","77","76","76","76","75","75","74","74","73","72","72","70","62","62","60","58","56")

最新更新