我们有一个Django应用程序,需要使用Celery获取大量数据。每隔几分钟就有20个左右的芹菜工人在运行。我们在Google Kubernetes Engine上运行,使用Cloud memorystore的Redis队列。
我们用于芹菜的 Redis 实例正在填满,即使根据 Flower 的说法队列是空的。这会导致 Redis 数据库最终已满,并且 Celery 抛出错误。
在 Flower 中,我看到任务进进出出,我增加了工人,以至于现在队列总是空的。
如果我运行redis-cli --bigkeys
我会看到:
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type. You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).
[00.00%] Biggest set found so far '_kombu.binding.my-queue-name-queue' with 1 members
[00.00%] Biggest list found so far 'default' with 611 items
[00.00%] Biggest list found so far 'my-other-queue-name-queue' with 44705 items
[00.00%] Biggest set found so far '_kombu.binding.celery.pidbox' with 19 members
[00.00%] Biggest list found so far 'my-queue-name-queue' with 727179 items
[00.00%] Biggest set found so far '_kombu.binding.celeryev' with 22 members
-------- summary -------
Sampled 12 keys in the keyspace!
Total key length in bytes is 271 (avg len 22.58)
Biggest list found 'my-queue-name-queue' has 727179 items
Biggest set found '_kombu.binding.celeryev' has 22 members
4 lists with 816144 items (33.33% of keys, avg size 204036.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
8 sets with 47 members (66.67% of keys, avg size 5.88)
0 zsets with 0 members (00.00% of keys, avg size 0.00)
如果我使用 LRANGE 检查队列,我会看到很多这样的对象:
"{"body": "W1syNDQ0NF0sIHsicmVmZXJlbmNlX3RpbWUiOiBudWxsLCAibGF0ZXN0X3RpbWUiOiBudWxsLCAicm9sbGluZyI6IGZhbHNlLCAidGltZWZyYW1lIjogIjFkIiwgIl9udW1fcmV0cmllcyI6IDF9LCB7ImNhbGxiYWNrcyI6IG51bGwsICJlcnJiYWNrcyI6IG51bGwsICJjaGFpbiI6IG51bGwsICJjaG9yZCI6IG51bGx9XQ==", "content-encoding": "utf-8", "content-type": "application/json", "headers": {"lang": "py", "task": "MyDataCollectorClass", "id": "646910fc-f9db-48c3-b5a9-13febbc00bde", "shadow": null, "eta": "2019-08-20T02:31:05.113875+00:00", "expires": null, "group": null, "retries": 0, "timelimit": [null, null], "root_id": "beeff557-66be-451d-9c0c-dc622ca94493", "parent_id": "374d8e3e-92b5-423e-be58-e043999a1722", "argsrepr": "(24444,)", "kwargsrepr": "{'reference_time': None, 'latest_time': None, 'rolling': False, 'timeframe': '1d', '_num_retries': 1}", "origin": "gen1@celery-my-queue-name-worker-6595bd8fd8-8vgzq"}, "properties": {"correlation_id": "646910fc-f9db-48c3-b5a9-13febbc00bde", "reply_to": "e55a31ed-cbba-3d79-9ffc-c19a29e77aac", "delivery_mode": 2, "delivery_info": {"exchange": "", "routing_key": "my-queue-name-queue"}, "priority": 0, "body_encoding": "base64", "delivery_tag": "a83074a5-8787-49e3-bb7d-a0e69ba7f599"}}"
我们使用 django-celery-result 来存储结果,所以这些不应该进入那里,我们正在为 Django 的缓存使用一个单独的 Redis 实例。
如果我用FLUSHALL
清除 Redis,它会再次慢慢填满。
我有点困惑下一步该去哪里。我不太了解 Redis - 也许我可以做一些事情来检查数据以查看填充它的内容?也许是花儿没有正确报告?也许 Celery 会保留完成的任务一段时间,尽管我们使用 Django DB 来获得结果?
感谢负载的任何帮助。
听起来 Redis 没有设置为删除已完成的项目或报告和删除失败的项目——也就是说,它可能将任务放在列表中,但它不会将它们删除。
查看 pypi 包: rq, django-rq, django-rq-scheduler
你可以在这里阅读一些关于它应该如何工作的信息:https://python-rq.org/docs/
这似乎是 Celery 的一个已知(或故意(问题,提出了各种解决方案/解决方法: https://github.com/celery/celery/issues/436