Scrapy spider在用Channels实现WebSockets后无法在Django上工作(不能从异步上下文调用它

我提出了一个新问题，因为我在Django应用程序中遇到了Scrapy和Channels的问题，如果有人能引导我朝着正确的方向前进，我将不胜感激。

我之所以使用通道，是因为我想从Scrapyd API实时检索抓取状态，而不必一直使用setIntervals，因为这将成为一种SaaS服务，可能被许多用户使用。

我已经正确地实现了通道，如果我运行：

python manage.py runserver

我可以正确地看到系统现在正在使用ASGI:

System check identified no issues (0 silenced).
September 01, 2020 - 15:12:33
Django version 3.0.7, using settings 'seotoolkit.settings'
Starting ASGI/Channels version 2.4.0 development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

此外，客户端和服务器通过WebSocket:正确连接

WebSocket HANDSHAKING /crawler/22/ [127.0.0.1:50264]
connected {'type': 'websocket.connect'}
WebSocket CONNECT /crawler/22/ [127.0.0.1:50264]

到目前为止还不错，当我通过Scrapyd-API 运行scrapy时，问题就来了

2020-09-01 15:31:25 [scrapy.core.scraper] ERROR: Error processing {'url': 'https://www.example.com'}
raceback (most recent call last):
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/scrapy/utils/defer.py", line 157, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/private/var/folders/qz/ytk7wml54zd6rssxygt512hc0000gn/T/crawler-1597767314-spxv81dy.egg/webspider/pipelines.py", line 67, in process_item
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 411, in get
num = len(clone)
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 258, in __len__
self._fetch_all()
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 1261, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/query.py", line 57, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1150, in execute_sql
cursor = self.connection.cursor()
File "/Users/Andrea/anaconda3/envs/DjangoScrape/lib/python3.6/site-packages/django/utils/asyncio.py", line 24, in inner
raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.

我认为错误消息非常清楚：您不能从异步上下文中调用它-使用线程或sync_to_async=我想通过启用ASGI，与Scrapy库存在冲突，从而使其无法正常工作。

不幸的是，我无法理解这背后的原因，也无法理解我应该在哪里使用"；thread或sync_to_sync"；如建议的那样。

请注意，WebSocket只用于检查爬网状态，而不用于其他任何操作。

有谁能试着向我解释一下这种不相容背后的原因，并给我一些如何克服这种障碍的提示吗？我花了很多时间寻找答案，但都找不到。

非常感谢。

您只需转到pipelines.py文件即可解决此错误。从asgiref.sync.导入sync_to_async

from asgiref.sync import sync_to_async

导入sync_to_async后，您需要将其用作将数据存储到数据库的函数的装饰器。

例如

from itemadapter import ItemAdapter
from crawler.models import Movie
from asgiref.sync import sync_to_async

class MovieSpiderPipeline:
@sync_to_async
def process_item(self, item, spider):
movie = Movie(**item)
movie.save()
return item

相关内容

最新更新

热门标签：