我想将下载图像的文件名从现在获得的哈希值更改为图像alt标签或类似的东西。
from scrapy.http import Request
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
class DocosPipeline(object):
def process_item(self, item, spider):
return item
class DocosImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
我尝试覆盖image_key类,但我似乎无法正确处理。这是类:
def image_key(self, url):
image_guid = hashlib.sha1(url).hexdigest()
return 'full/%s.jpg' % (image_guid)
我真的被困在这里,任何帮助将不胜感激。
我不确定你把image_key类放在哪里,但下面的代码对我来说工作正常
class MyImagesPipeline(ImagesPipeline):
#Name download version
def image_key(self, url):
image_guid = url.split('/')[-1]
return 'full/%s' % (image_guid)
def get_media_requests(self, item, info):