我有一组工作人员MessageConsumer
,每个都有不同的职责:HTTP调用,CRUD Mongo/Redis, API调用等。
它们具有相同的结构:
class MessageConsumer
include Celluloid
def perform(sqs_message)
# Do something
end
end
我有一个文件[worker-name].rb
为每个工人,像这样:
Celluloid::Actor[:pool] = MessageConsumer.pool
while @still_running
sqs_message = @queue.receive_message(start_options)
if sqs_message
Celluloid::Actor[:pool].async.perform(sqs_message)
else
# sleep for a while as there's nothing in the queue.
sleep rand(2..6)
end
end
@queue.receive_message
接收到来自Amazon SQS的消息,并将消息传递给worker。
我们在每个服务器上运行一组[worker-name].rb
:
pgrep -fl ruby
14885 ruby bin/worker_http # two processes
15890 ruby bin/worker_http # ^^^
17956 ruby bin/worker_api
19734 ruby bin/worker_mongo
22637 ruby bin/worker_redis
问题:在运行进程一段时间后(在线程繁忙之后),我经常得到"No live threads left. Deadlock?"
。
我在服务器上使用ruby 2.0.0p451 (2014-02-24 revision 45167) [x86_64-linux]
,不确定这是否是与MRI相关的问题,也许我需要切换到JRuby。但有趣的是,我没有看到这个问题很常见,所以我认为这可能是我的实现问题。
任何想法?
似乎添加限制连接将解决这个问题:
[1] pry(main)> task = Thread.new { Thread.stop }
=> #<Thread:0x000055918c669018@(pry):1 sleep_forever>
[2] pry(main)> task.join
fatal: No live threads left. Deadlock?
2 threads, 2 sleeps current:0x000055918bc08ef0 main thread:0x000055918bc08ef0
* #<Thread:0x000055918bc3cbf8 sleep_forever>
rb_thread_t:0x000055918bc08ef0 native:0x00007efd46186080 int:0
(pry):2:in `join'
(pry):2:in `__pry__'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:355:in `eval'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:355:in `evaluate_ruby'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:323:in `handle_line'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:243:in `block (2 levels) in eval'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:242:in `catch'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:242:in `block in eval'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:241:in `catch'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:241:in `eval'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:77:in `block in repl'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:67:in `loop'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:67:in `repl'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:38:in `block in start'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/input_lock.rb:61:in `__with_ownership'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/input_lock.rb:79:in `with_ownership'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:38:in `start'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:13:in `start'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_class.rb:192:in `start'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/cli.rb:116:in `start'
/home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/bin/pry:12:in `<top (required)>'
/home/mifrill/.rbenv/versions/2.5.1/bin/pry:23:in `load'
/home/mifrill/.rbenv/versions/2.5.1/bin/pry:23:in `<main>'
* #<Thread:0x000055918c669018@(pry):1 sleep_forever>
rb_thread_t:0x000055918c742dd0 native:0x00007efd4377e700 int:0
depended by: tb_thread_id:0x000055918bc08ef0
(pry):1:in `stop'
(pry):1:in `block in __pry__'
from (pry):2:in `join'
[3] pry(main)> task.join 1
=> nil
在lib/celluloid/actor.rb:
def join(actor, timeout = nil)
actor.thread.join(timeout)
actor
end
这不是问题的(最终)答案,但可能有助于其他人进入这里,了解他们的问题。
请在您的源代码中添加Thread.new { sleep }
。
这不能解决问题!
它只是帮助-有时-在"坏事"发生后得到错误消息。
因为"僵局?"这只是一个建议,在很多情况下,它是一个"死胡同",所以后面有错误,指向真正的原因,你没有看到,因为Ruby采取了紧急跳闸。
如果一个无用的Thread
还活着,Ruby就不会这么快退出。
和再次:这不是一个解决方案。只是一个调试帮助。