本地资源上同步期间的Java死锁



我看到多个线程在同一行代码上死锁的问题。我无法在本地或任何测试中重现这个问题,但生产中的线程转储已经非常清楚地显示了这个问题。

我不明白为什么线程会在下面的同步行上被阻塞,因为在调用堆栈或任何其他线程中,对象上都没有其他同步。有人知道发生了什么吗,或者我如何重现这个问题(目前正在尝试使用15个线程,所有线程都在循环中命中trim(),同时通过我的队列处理2000个任务-但无法重现)

在下面的线程转储中,我认为多个处于"锁定"状态的线程可能是Java Bug的表现:http://bugs.java.com/view_bug.do?bug_id=8047816其中JStack报告线程处于错误状态。(我使用的是JDK版本:1.7.0_51)

干杯!

以下是线程转储中的线程视图。。。。。

"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
   Locked ownable synchronizers:
    - <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    -  locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
   Locked ownable synchronizers:
    - <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
   Locked ownable synchronizers:
    - <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
    at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
    at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
    at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

这是提取的Java代码,显示了错误的位置…

public class Deadlock {
        final Deque<Object> delegate  = new ArrayDeque<>();
        final long maxSize = Long.MAX_VALUE;
        private final AtomicLong totalExec = new AtomicLong();
        private final Map<Object, AtomicLong> totals = new HashMap<>();
        private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
        public void trim() {
            //Possible optimization is evicting in chunks, segmenting by arrival time
            while (this.totalExec.longValue() > this.maxSize) {
                final Object t = this.delegate.peek();
                final Deque<Long> execTime = this.execTimes.get(t);
                final Long exec = execTime.peek();
                if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
                    //If Job Started Inside of Window, remove and re-loop
                    remove();
                }
                else {
                    //Otherwise exit the loop
                    break;
                }
            }
        }
        public Object remove() {
            Object removed;
            synchronized (this.delegate) { //4 Threads deadlocking on this line !
                removed = this.delegate.pollFirst();
            }
            if (removed != null) {
                itemRemoved(removed);
            }
            return removed;
        }
        public void itemRemoved(final Object t) {
            //Decrement Total & Queue
            final AtomicLong catTotal = this.totals.get(t);
            if (catTotal != null) {
                if (!this.execTimes.get(t).isEmpty()) {
                    final Long exec = this.execTimes.get(t).pollFirst();
                    if (exec != null) {
                        catTotal.addAndGet(-exec);
                        this.totalExec.addAndGet(-exec);
                    }
                }
            }
        }
    }

来自HashMap 的文档

请注意,此实现不是同步的如果有多个线程同时访问哈希映射,并且至少一个线程从结构上修改映射,它必须在外部同步。

(强调他们的)

您正在以不同步的方式读写Map

我认为没有理由认为您的代码是线程安全的。

我建议您在trim中有一个无限循环,这是由于缺乏线程安全性造成的。

进入同步块相对较慢,因此线程转储可能总是显示至少几个线程在等待获取锁。

您的第一个线程在等待pollFirst时持有锁。

"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
    - locked <0x00002aae6465a650> (a java.util.ArrayDeque)
    at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)

其他线程正在等待获取锁。您需要提供整个线程转储来确定哪个线程持有0x0000000052ec4000上的锁,这就是阻止pollFirst调用返回的原因。

为了实现死锁,您需要至少两个线程同时锁定同一线程中的至少两个对象,而您发布的代码似乎没有做到这一点。您指出的错误可能会出现,但正如我所读到的,这是一个外观问题,线程没有"锁定",而是等待获取有问题对象(ArrayDeque)的锁定。如果出现死锁,您应该在日志中看到一条"死锁"消息。它将调用相互阻塞的两个线程。

我不相信线程转储表明存在死锁。它只是告诉您在转储时有多少线程在监视器上等待。由于在给定的时刻可能只有一个线程拥有监视器,所以这并不奇怪。

您在应用程序中看到的哪些行为会导致您认为存在死锁?代码中缺少了很多内容,特别是代理Dequeue中的对象所在的位置。我的猜测是,你没有彻底的僵局,而是其他一些看起来像僵局的问题。

感谢这里的回复,很明显,问题不是线程安全使用多个集合。

为了解决这个问题,我同步了trim方法,并用ConcurrentHashMap替换了HashMap,用LinkedBlockingDeque替换了ArrayDeque(同时收款FTW!)

进一步计划的增强是将两个单独的Map的使用更改为包含自定义对象的单个Map,从而保持操作(在itemRemoved中)的原子性。

最新更新