我看到多个线程在同一行代码上死锁的问题。我无法在本地或任何测试中重现这个问题,但生产中的线程转储已经非常清楚地显示了这个问题。
我不明白为什么线程会在下面的同步行上被阻塞,因为在调用堆栈或任何其他线程中,对象上都没有其他同步。有人知道发生了什么吗,或者我如何重现这个问题(目前正在尝试使用15个线程,所有线程都在循环中命中trim(),同时通过我的队列处理2000个任务-但无法重现)
在下面的线程转储中,我认为多个处于"锁定"状态的线程可能是Java Bug的表现:http://bugs.java.com/view_bug.do?bug_id=8047816其中JStack报告线程处于错误状态。(我使用的是JDK版本:1.7.0_51)
干杯!
以下是线程转储中的线程视图。。。。。
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
这是提取的Java代码,显示了错误的位置…
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
来自HashMap
的文档
请注意,此实现不是同步的如果有多个线程同时访问哈希映射,并且至少一个线程从结构上修改映射,它必须在外部同步。
(强调他们的)
您正在以不同步的方式读写Map
。
我认为没有理由认为您的代码是线程安全的。
我建议您在trim
中有一个无限循环,这是由于缺乏线程安全性造成的。
进入同步块相对较慢,因此线程转储可能总是显示至少几个线程在等待获取锁。
您的第一个线程在等待pollFirst
时持有锁。
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
其他线程正在等待获取锁。您需要提供整个线程转储来确定哪个线程持有0x0000000052ec4000
上的锁,这就是阻止pollFirst
调用返回的原因。
为了实现死锁,您需要至少两个线程同时锁定同一线程中的至少两个对象,而您发布的代码似乎没有做到这一点。您指出的错误可能会出现,但正如我所读到的,这是一个外观问题,线程没有"锁定",而是等待获取有问题对象(ArrayDeque)的锁定。如果出现死锁,您应该在日志中看到一条"死锁"消息。它将调用相互阻塞的两个线程。
我不相信线程转储表明存在死锁。它只是告诉您在转储时有多少线程在监视器上等待。由于在给定的时刻可能只有一个线程拥有监视器,所以这并不奇怪。
您在应用程序中看到的哪些行为会导致您认为存在死锁?代码中缺少了很多内容,特别是代理Dequeue中的对象所在的位置。我的猜测是,你没有彻底的僵局,而是其他一些看起来像僵局的问题。
感谢这里的回复,很明显,问题不是线程安全使用多个集合。
为了解决这个问题,我同步了trim方法,并用ConcurrentHashMap替换了HashMap,用LinkedBlockingDeque替换了ArrayDeque(同时收款FTW!)
进一步计划的增强是将两个单独的Map的使用更改为包含自定义对象的单个Map,从而保持操作(在itemRemoved中)的原子性。