编辑:我发现它只发生在Windows计算机上。Linux服务器上一切正常。
我在芹菜加工过程中运行了一个粗糙的爬行器,但一直出现这个错误。你知道我做错了什么吗?
[2021-08-18 11:28:42,294: INFO/MainProcess] Connected to sqla+sqlite:///celerydb.sqlite
[2021-08-18 11:28:42,313: INFO/MainProcess] celery@NP45086 ready.
[2021-08-18 09:46:58,330: INFO/MainProcess] Received task: app_celery.scraping_process_cli[e94dc192-e10e-4921-ad0c-bb932be9b568]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:Program FilesPython374libmultiprocessingspawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:Program FilesPython374libmultiprocessingspawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'app_celery'
[2021-08-18 09:46:58,773: INFO/MainProcess] Task app_celery.scraping_process_cli[e94dc192-e10e-4921-ad0c-bb932be9b568] succeeded in 0.4380000000091968s: None
我的应用程序目录如下:
app = Celery('app_celery', backend=..., broker=...)
def scrape_data():
process = CrawlerProcess(get_project_settings())
crawler = process.create_crawler(spider_cls)
process.crawl(spider_cls, **kwargs)
process.start()
@app.task(name='app_celery.scraping_process_cli', time_limit=1200, max_retries=3)
def scraping_process_cli(company_id):
import multiprocessing
a = multiprocessing.Process(target=scrape_data())
a.start()
a.join()
我正在运行芹菜作为:
celery -A app_celery worker -c 4 -n worker1 --pool threads
- 在执行步骤2之前,将目录更改为您的scratch项目的项目根目录
- 您必须告诉
CrawlerProcess()
加载settings.py
import os
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
def scrape_data():
os.chdir("/path/to/your/scrapy/project_root")
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()