尝试连接到MongoDB时出现刮擦错误类型错误:__init__()缺少2个必需的位置参数:'mongo_uri'和'mongo_db'



这是使用Scrapy CLI;我目前在安装了mongodb和pymongo的ubuntu 18服务器上运行scratchy,但每当我尝试运行我的spider"scratchy-crawl-event_spider"时,我都会收到以下运行时错误:TypeError:init((缺少2个必需的位置参数:"mongo_uri"one_answers"mongo_db">

在我的settings.py文件中,我定义了:

ITEM_PIPELINES = {
'astro_events.pipelines.MongoDBPipeline': 300,
}
MONGODB_SERVER = "localhost"
MONGODB_PORT = 27017
MONGODB_DB = "astro_events"
MONGODB_COLLECTION = "events" 

在我的piplines.py中,我有:

import pymongo
from scrapy.exceptions import DropItem
import logging
from itemadapter import ItemAdapter
class MongoDBPipeline(object):
collection_name = 'events'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
## pull in information from settings.py
return cls(
mongo_uri=crawler.settings.get('MONGODB_SERVER'),
mongo_db=crawler.settings.get('MONGODB_DB')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
self.db[self.collection_name].insert(dict(item))
logging.debug("Post added to MongoDB!")
return item

我想知道的是,我之所以出现这个错误,是因为我的服务器上没有正确设置MongoDB,还是因为我的设置或管道文件中有错误。我已经验证了MongoDB在27017端口的127.0.0.1(localhost(上运行。我只是有点不知所措。如果我需要发布更多信息,请告诉我。在实现数据库部分之前,我测试了我的spider,它运行得很好,所以我想我可以排除这种可能性。

编辑:以下是它抛出的确切错误:

Unhandled error in Deferred:
2020-11-09 23:34:19 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 192, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 196, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/home/jcmq6b/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/home/jcmq6b/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/home/jcmq6b/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/dist-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/utils/misc.py", line 173, in create_instance
instance = objcls(*args, **kwargs)
builtins.TypeError: __init__() missing 2 required positional arguments: 'mongo_uri' and 'mongo_db'

2020-11-09 23:34:19 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/jcmq6b/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/dist-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/usr/local/lib/python3.6/dist-packages/scrapy/utils/misc.py", line 173, in create_instance
instance = objcls(*args, **kwargs)
TypeError: __init__() missing 2 required positional arguments: 'mongo_uri' and 'mongo_db'

只是为了让其他人偶然发现同样的问题,

请记住,当涉及到Python时,识别很重要:

import pymongo
from scrapy.exceptions import DropItem
import logging
from itemadapter import ItemAdapter
class MongoDBPipeline(object):
collection_name = 'events'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db

'''
The problem was the following decorator and method declare 
inside the __init__ method (due to the wrong indentation in the orginal code)
That makes it a closure instead of a class method. 
And the decorator didn't run at all. Hence the error. 
'''
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGODB_SERVER'),
mongo_db=crawler.settings.get('MONGODB_DB')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
self.db[self.collection_name].insert(dict(item))
logging.debug("Post added to MongoDB!")
return item

相关内容

最新更新