无法在Scrapy项目中使用代理



我一直在尝试抓取一个网站,该网站似乎已经识别并阻止了我的IP,并抛出了429太多请求响应。

我从这个链接安装了报废代理:https://github.com/aivarsk/scrapy-proxies并遵循给定的指示。我从这里得到了一份代理列表:http://www.gatherproxy.com/现在是我的settings.py和proxylist.txt的样子:

设置.py

BOT_NAME = 'project'
SPIDER_MODULES = ['project.spiders']
NEWSPIDER_MODULE = 'project.spiders'
# Retry many times since proxies often fail
RETRY_TIMES = 10
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [429, 500, 503, 504, 400, 403, 404, 408]
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy_proxies.RandomProxy': 100,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
PROXY_LIST = "filepathproxylist.txt"
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 2
PROXY_MODE = 0
DOWNLOAD_HANDLERS = {'s3': None}
EXTENSIONS = {
'scrapy.telnet.TelnetConsole': None
}

proxylist.txt

http://195.208.172.20:8080
http://154.119.56.179:9999
http://124.12.50.43:8088
http://61.7.168.232:52136
http://122.193.188.236:8118

然而,当我运行我的爬网程序时,我会得到以下错误:

[scrapy.proxies] DEBUG: Proxy user pass not found

我试着在谷歌上搜索具体的错误,但找不到任何解决方案。

我们将非常感谢您的帮助。提前非常感谢。

我建议您创建自己的中间件来指定这样的IP:PORT,并将此proxies.py中间件文件放在项目的middleware文件夹中:

class ProxiesMiddleware(object):
def __init__(self, settings):
pass
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)
def process_request(self, request, spider):
request.meta['proxy'] = "http://IP:PORT"

ProxiesMiddleware中间件行添加到您的settings.py

DOWNLOADER_MIDDLEWARES = {
'yourproject.middleware.proxies.ProxiesMiddleware':400,
}

最新更新