循环标记时出现XPATH问题



我有这段代码,我试图下载这些论文,但循环只打印第一个元素。

进口废料从urllib.parse导入urljoin

class SimpleSpider(scratchy.Spider(:name="简单"start_urls=[]https://jmedicalcasereports.biomedcentral.com/articles?query=COVID-19&searchType=journalSearch&tab=关键字']

def parse(self, response):

for book in response.xpath('//*[@id="main-content"]/div/main/div[2]/ol'):

title= response.xpath('/li[3]/article/h3/a/text()').get()
link = urljoin(
'https://jmedicalcasereports.biomedcentral.com/',response.xpath('/li[3]/article/ul/li[2]/a/@href').get()
)
yield {
'Title':title,
'file_urls':[link]
}

我使用了css,然后使用了xpath,问题是循环代码。

首先,在代码的第三行中,响应可以更改为标题

title= book.xpath('.//a/text()').get()

其次,在第二行中,您给出了一个错误的xpath。所以结果是不正确的。这是我的密码。希望这能帮助到你。

def parse(self, response):
for book in response.xpath('//li[@class = "c-listing__item"]'):
title= book.xpath('.//a/text()').get()
link = urljoin(
'https://jmedicalcasereports.biomedcentral.com/',book.xpath('.//a/@href').get()
)
yield {
'Title':title,
'file_urls':[link]
}

答案是:

{'Title': 'Presentation of COVID-19 infection with bizarre behavior and 
encephalopathy: a case report', 'file_urls': 
['https://jmedicalcasereports.biomedcentral.com/articles/10.1186/s13256-021- 
02851-0']}
2022-04-17 21:54:27 [scrapy.core.scraper] DEBUG: Scraped from <200 
https://jmedicalcasereports.biomedcentral.com/articles?query=COVID- 
19&searchType=journalSearch&tab=keyword>
{'Title': 'Dysentery as the only presentation of COVID-19 in a child: axa0case 
report', 'file_urls': 
['https://jmedicalcasereports.biomedcentral.com/articles/10.1186/s13256-021- 
02672-1']}

最新更新