我的废壳命令有效,但输出为空



我在Scrapy Shell中测试了代码,运行良好。

fetch('https://www.livescores.com/?tz=3')
response.css('div.dh')
gununMaclari = response.css('div.dh')
gununMaclari.css('span.hh span.ih span.kh::text').get()
gununMaclari.css('span.hh span.jh span.kh::text').get()

这些命令向我展示了主队和客场球队。如果我使用getall(),我可以访问主场和客场的所有数据。但是当我在代码下面运行时,输出是空的。这就是我无法解决的问题。有人能帮我找到这个问题吗?谢谢

import scrapy
from scrapy.crawler import CrawlerRunner
class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}
runnerTodayList = CrawlerRunner(settings = {
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
runnerTodayList.crawl(LivescoresTodayList)  

阅读本文。

蜘蛛本身很好。如果您使用的是CrawlerRunner,则需要配置日志记录和设置,并启动反应器。

CrawlerProcess示例:

import scrapy
from scrapy.crawler import CrawlerProcess

class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}

process = CrawlerProcess(settings={
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
process.crawl(LivescoresTodayList)
process.start()

CrawlerRunner示例:

import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from twisted.internet import reactor

class LivescoresTodayList(scrapy.Spider):
name = 'todayMatcheslist'
custom_settings = {'CONCURRENT_REQUESTS': '1'}
def start_requests(self):
yield scrapy.Request('https://www.livescores.com/?tz=3')
def parse(self, response):
for gununMaclari in response.css('div.dh'):
yield{
'Home': gununMaclari.css('span.hh span.ih span.kh::text').get(),
'Away': gununMaclari.css('span.hh span.jh span.kh::text').get()
}

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runnerTodayList = CrawlerRunner(settings={
"FEEDS": {
"todayMatcheslist.json": {"format": "json", "overwrite": True},
},
})
d = runnerTodayList.crawl(LivescoresTodayList)
d.addBoth(lambda _: reactor.stop())
reactor.run()

最新更新