蝗虫工人将在工作开始时立即"missing"



我在python 3.10上运行locustlocust==2.8.6。我通过AWS EKS在kubernetes上运行它。我运行它的分布式,并试图设置1主和5工人。

主pod启动命令:

command: ["locust"]
args: ["-f","$filename","--headless","--users=$clients","--spawn-rate=$hatch-rate","--run-time=$run-time","--only-summary","--master","--expect-workers=$num_slaves"]

和工人开始命令:

command: ["locust"]
args: ["-f","$filename","--worker","--master-host=locust-master$task_id"]

实际上,在工作pod上,我可以运行telnet locust-master1 5557并确认通信。(在这种情况下,$task_id=1)

我在主pod中看到如下日志:

[2022-04-27 22:53:16,969] locust-master1--1-z2lr8/INFO/root: Waiting for workers to be ready, 0 of 5 connected
[2022-04-27 22:53:17,109] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c' reported as ready. Currently 1 clients ready to swarm.
[2022-04-27 22:53:17,147] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632' reported as ready. Currently 2 clients ready to swarm.
[2022-04-27 22:53:17,261] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c' reported as ready. Currently 3 clients ready to swarm.
[2022-04-27 22:53:17,354] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097' reported as ready. Currently 4 clients ready to swarm.
[2022-04-27 22:53:17,364] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d' reported as ready. Currently 5 clients ready to swarm.
[2022-04-27 22:53:17,970] locust-master1--1-z2lr8/INFO/locust.main: Run time limit set to 5400 seconds
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.main: Starting Locust 2.8.6
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.runners: Sending spawn jobs of 50 users at 0.50 spawn rate to 5 ready clients
[2022-04-27 22:53:17,977] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Startup: job_id: 1434194
[2022-04-27 22:53:18,376] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,384] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,385] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust.runners: The last worker went missing, stopping test.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Teardown: sending query messages to Results DB

所以我确实看到工人注册自己,但一旦测试开始,主pod说工人未能发送心跳并将其设置为失踪。如果我运行没有--headless的主pod,这意味着我可以打开web UI并手动启动作业。我看到了同样的问题:当我手动启动作业时,出现了相同的心跳消息。

在worker pods上,我看到我的调试启动日志,没有任何提示问题。

我在网上找不到关于如何设置分布式蝗虫的指南(除了当它被称为locustio和0.x版本时),从那时起事情发生了很大变化。

这里需要设置什么?我不确定要包括哪些代码,而不包括许多行设置代码。我正试图测试对postgres,所以我在考虑以下https://docs.locust.io/en/stable/testing-other-systems.html,但在所有的例子中,他们都包装属性,这是从我继承的代码的偏离。

检查过CPU利用率了吗?我们有一个类似的情况,当VM的CPU消耗为100时,worker根本没有可能发送心跳。

取决于postgress test的实现,您可能需要确保您正确使用了gevent。请参阅文档中的注释:

重要的是,您使用的协议库可以通过gevent进行猴子补丁。

在我的例子中,我使用了Snowflake自定义测试类,由于请求被阻塞而遭受同样的问题。添加猴子补丁修复了这个问题。

相关内容

最新更新