没有活线程了.死锁



我有一组工作人员MessageConsumer,每个都有不同的职责:HTTP调用,CRUD Mongo/Redis, API调用等。

它们具有相同的结构:

class MessageConsumer
  include Celluloid
  def perform(sqs_message)
    # Do something
  end
end

我有一个文件[worker-name].rb为每个工人,像这样:

Celluloid::Actor[:pool] = MessageConsumer.pool
while @still_running
  sqs_message =  @queue.receive_message(start_options)
  if sqs_message
    Celluloid::Actor[:pool].async.perform(sqs_message)
  else
    # sleep for a while as there's nothing in the queue.
    sleep rand(2..6)
  end
end

@queue.receive_message接收到来自Amazon SQS的消息,并将消息传递给worker。

我们在每个服务器上运行一组[worker-name].rb:

pgrep -fl ruby
14885 ruby bin/worker_http # two processes
15890 ruby bin/worker_http # ^^^
17956 ruby bin/worker_api
19734 ruby bin/worker_mongo
22637 ruby bin/worker_redis

问题:在运行进程一段时间后(在线程繁忙之后),我经常得到"No live threads left. Deadlock?"

我在服务器上使用ruby 2.0.0p451 (2014-02-24 revision 45167) [x86_64-linux],不确定这是否是与MRI相关的问题,也许我需要切换到JRuby。但有趣的是,我没有看到这个问题很常见,所以我认为这可能是我的实现问题。

任何想法?

似乎添加限制连接将解决这个问题:

     [1] pry(main)> task = Thread.new { Thread.stop }
     => #<Thread:0x000055918c669018@(pry):1 sleep_forever>
     [2] pry(main)> task.join
     fatal: No live threads left. Deadlock?
     2 threads, 2 sleeps current:0x000055918bc08ef0 main thread:0x000055918bc08ef0
     * #<Thread:0x000055918bc3cbf8 sleep_forever>
        rb_thread_t:0x000055918bc08ef0 native:0x00007efd46186080 int:0
        (pry):2:in `join'
        (pry):2:in `__pry__'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:355:in `eval'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:355:in `evaluate_ruby'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:323:in `handle_line'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:243:in `block (2 levels) in eval'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:242:in `catch'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:242:in `block in eval'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:241:in `catch'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_instance.rb:241:in `eval'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:77:in `block in repl'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:67:in `loop'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:67:in `repl'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:38:in `block in start'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/input_lock.rb:61:in `__with_ownership'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/input_lock.rb:79:in `with_ownership'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:38:in `start'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/repl.rb:13:in `start'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/pry_class.rb:192:in `start'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/lib/pry/cli.rb:116:in `start'
        /home/mifrill/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/pry-0.11.3/bin/pry:12:in `<top (required)>'
        /home/mifrill/.rbenv/versions/2.5.1/bin/pry:23:in `load'
        /home/mifrill/.rbenv/versions/2.5.1/bin/pry:23:in `<main>'
     * #<Thread:0x000055918c669018@(pry):1 sleep_forever>
        rb_thread_t:0x000055918c742dd0 native:0x00007efd4377e700 int:0
         depended by: tb_thread_id:0x000055918bc08ef0
        (pry):1:in `stop'
        (pry):1:in `block in __pry__'
     from (pry):2:in `join'
     [3] pry(main)> task.join 1
     => nil

在lib/celluloid/actor.rb:

  def join(actor, timeout = nil)
    actor.thread.join(timeout)
    actor
  end

这不是问题的(最终)答案,但可能有助于其他人进入这里,了解他们的问题。

请在您的源代码中添加Thread.new { sleep }

这不能解决问题!

它只是帮助-有时-在"坏事"发生后得到错误消息。

因为"僵局?"这只是一个建议,在很多情况下,它是一个"死胡同",所以后面有错误,指向真正的原因,你没有看到,因为Ruby采取了紧急跳闸。

如果一个无用的Thread还活着,Ruby就不会这么快退出。

和再次:这不是一个解决方案。只是一个调试帮助。

最新更新