刮擦文件(未知错误):从 <GET 下载图像时出错 <None>http://www.xxx.jpg>在:'splash'



我想使用飞溅下载带有刮擦的图像。当我运行代码时,出现以下错误:

2019-04-09 11:09:32 [scrapy.pipelines.files] WARNING: File (unknown-error): Error downloading image from <GET https://www.xxxxx.jpg> referred in <None>: 'splash'

我尝试使用SplashRequest,但失败了。我该怎么办?请参阅下面的代码:

    def get_media_requests(self, item, info):
        try:
            for image_url in item['image']:
                yield SplashRequest(image_url,endpoint='render.html' )
        except:
            pass

查看文档,SplashRequest 需要两个参数:urlself.parse_result 。其余的都是可选的:

yield SplashRequest(url, self.parse_result,
    args={
        # optional; parameters passed to Splash HTTP API
        'wait': 0.5,
        # 'url' is prefilled from request url
        # 'http_method' is set to 'POST' for POST requests
        # 'body' is set to request body for POST requests
    },
    endpoint='render.json', # optional; default is render.html
    splash_url='<url>',     # optional; overrides SPLASH_URL
    slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN,  # optional
)

在代码中,您没有提供self.parse_result参数。您需要传递解析方法的名称。例如,如果您的解析方法称为 parse ,则使用:

yield SplashRequest(image_url, self.parse, endpoint='render.html' )

最新更新