所以我正在从产品页面中提取一些信息,我想从img标签中获取img链接,但它有一个带有多个链接的srcset,我不知道如何使用scrapy:获取其数据
HTML:
<img width="768" height="1152" alt="Top com brilho - Preto - SENHORA | H&M PT" class="Top com brilho - Preto - SENHORA | H&M PT" srcset="//lp2.hm.com/hmgoepprod?set=quality[79],source[/e4/e9/e4e96ab4841af66083ba521c17c1c18a8e300426.jpg],origin[dam],category[ladies_tops_vests],type[DESCRIPTIVESTILLLIFE],res[y],hmver[1]&call=url[file:/product/main] 396w,
//lp2.hm.com/hmgoepprod?set=quality[79],source[/e4/e9/e4e96ab4841af66083ba521c17c1c18a8e300426.jpg],origin[dam],category[ladies_tops_vests],type[DESCRIPTIVESTILLLIFE],res[w],hmver[1]&call=url[file:/product/main] 564w,
//lp2.hm.com/hmgoepprod?set=quality[79],source[/e4/e9/e4e96ab4841af66083ba521c17c1c18a8e300426.jpg],origin[dam],category[ladies_tops_vests],type[DESCRIPTIVESTILLLIFE],res[s],hmver[1]&call=url[file:/product/main] 657w,
//lp2.hm.com/hmgoepprod?set=quality[79],source[/e4/e9/e4e96ab4841af66083ba521c17c1c18a8e300426.jpg],origin[dam],category[ladies_tops_vests],type[DESCRIPTIVESTILLLIFE],res[m],hmver[1]&call=url[file:/product/main] 820w" sizes="(max-width: 767px) 100vw, 50vw" src="//lp2.hm.com/hmgoepprod?set=quality[79],source[/e4/e9/e4e96ab4841af66083ba521c17c1c18a8e300426.jpg],origin[dam],category[ladies_tops_vests],type[DESCRIPTIVESTILLLIFE],res[m],hmver[1]&call=url[file:/product/main]">
有没有一种方法可以获取所有链接,或者列出所有链接的列表?
检查网站是否使用JSON或Javascript,这将影响scratchy处理数据的方式。点击网站中的检查元素,并尝试和查看它是否选择了所有的图像链接
//div[@class = 'product-detail-main-image-container']/img/@src
我设法使用以下代码使其工作:
self.img = response.xpath('/html/body/main/div[2]/div[2]/div[1]/figure[1]/div/img/@srcset').get()
self.img = self.img.split('r')[0][2:]
self.img, x = self.img.split(' ')