JVM 安全点暂停,但仅当代码在方法中时,而不是 GC 参与



UPDATE:这个问题至少可以通过-XX:+UseCountedLoopSafepoints和等于或大于JDK 1.8.0_92的JVM版本的组合来粗略解决(使用_161测试)。

奥里格: 包括 JDK 1.8.0 示例运行,但它也发生在 JDK 9.0.4 上。

我一直在研究一个奇怪的错误,它似乎表现为 JVM 停止所有线程,其方式与安全点暂停非常一致。 VM 操作有时被列为"无 VM 操作"。 GC 没有参与减速,正如-XX选项输出和jvisualVM所证实的那样。

这一切都在下面的一个 100 行类中。 我已经戏剧性地配对了代码,以根据简单的boolean设置显示问题。 寻找MAKE_THE_JVM_PAUSE_UNEXPECTEDLYboolean.

我正在创建一个 5000x5000 的boolean数组,并在第 66 列随机访问前 10 行。 真正的代码导致了同样的问题。

这个例子的算法本身是荒谬的。 这只是一个高度配对的例子。

用于运行此选项的选项位于标题注释中。

package conway;
/**
* Run this with the following java VM options:
*
*      -verbosegc
*      -XX:+PrintGCApplicationStoppedTime
*      -XX:+PrintSafepointStatistics
*      -XX:PrintSafepointStatisticsCount=1
*      -XX:+PrintGCDetails
*      -XX:+PerfDisableSharedMem
*/
public class Conway
{
public static void main(String[] args)
{
//-----------------------[HEARTBEAT]-----------------------
new Thread(() ->
{
long startTime = System.currentTimeMillis();
long lastTime = startTime;
for (int count=0;; count++)
{
// Time stamp....
long now = System.currentTimeMillis();
System.out.printf("%d-->Delta time from Start[%dms], from last[%dms]n",
count, now-startTime, now-lastTime);
System.out.flush();
lastTime = now;
// Waste time without sleeping...
for (long start = System.currentTimeMillis();
System.currentTimeMillis() - start < 200;);
}
}).start();
//-----------------------[COMPUTATION]-----------------------
one = new boolean[5000][5000];
boolean MAKE_THE_JVM_PAUSE_UNEXPECTEDLY = true;
for (;;)
{
if (MAKE_THE_JVM_PAUSE_UNEXPECTEDLY)
{
/**
* CALLING sweep() will cause all threads to periodically pause...
*/
sweep();
}
else
{
/**
* THIS ONE RUNS FINE.  Exact same contents as in sweep() below...
*/
for (int y=0; y<5000; y++)
{
for (int x=0; x<5000; x++)
{
int living=0;
for (int yy=y-1; yy <= y+1; yy++)
for (int xx=x-1; xx <= x+1; xx++)
{
if (one[randomInt(0, 10)][66])
living++;
}
} // for(x...)
} // for(y...)
}
} // for(;;)
}
public static void sweep()
{
for (int y=0; y<5000; y++)
{
for (int x=0; x<5000; x++)
{
int living=0;
for (int yy=y-1; yy <= y+1; yy++)
for (int xx=x-1; xx <= x+1; xx++)
{
if (one[randomInt(0,10)][66])
living++;
}
} // for(x...)
} // for(y...)
}
public static int randomInt(int first, int last)
{
return (int)(((last-first+1)*Math.random())+first);
}
public static boolean[][] one;
}

现在,当我在MAKE_THE_JVM_PAUSE_UNEXPECTEDLYboolean设置为true的情况下运行这个东西时,它会导致奇怪的减速。

请注意,问题首先出现在检测信号 49 和 50 之间。 检测信号来自代码中的第一个(匿名启动)线程,算法本身使用主类线程,并由如下所示的行标识:

11-->Delta time from Start[2210ms], from last[200ms]

我自己终止了运行。

[GC (Allocation Failure) [PSYoungGen: 512K->384K(1024K)] 512K->384K(523776K), 0.0004807 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.053: ParallelGCFailedAllocation       [       5          0              0    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0005755 seconds
[GC (Allocation Failure) [PSYoungGen: 896K->480K(1024K)] 896K->488K(523776K), 0.0003818 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.063: ParallelGCFailedAllocation       [       5          0              0    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0004564 seconds
[GC (Allocation Failure) [PSYoungGen: 992K->496K(1024K)] 1000K->536K(523776K), 0.0004566 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.072: ParallelGCFailedAllocation       [      10          1              1    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0007704 seconds
[GC (Allocation Failure) [PSYoungGen: 1008K->480K(1536K)] 1048K->660K(524288K), 0.0009025 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.082: ParallelGCFailedAllocation       [      10          0              0    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0009905 seconds
[GC (Allocation Failure) [PSYoungGen: 1499K->499K(1536K)] 1679K->1006K(524288K), 0.0006638 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.102: ParallelGCFailedAllocation       [      11          1              1    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0009292 seconds
[GC (Allocation Failure) [PSYoungGen: 1518K->992K(2560K)] 2025K->2018K(525312K), 0.0007172 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.103: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     0     0     0    ]  0   
Total time for which application threads were stopped: 0.0007980 seconds
[GC (Allocation Failure) [PSYoungGen: 2526K->1507K(3072K)] 3552K->3515K(525824K), 0.0009654 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.107: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     0     0     1    ]  0   
Total time for which application threads were stopped: 0.0010672 seconds
[GC (Allocation Failure) [PSYoungGen: 3041K->2547K(5120K)] 5049K->5128K(527872K), 0.0010321 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.108: ParallelGCFailedAllocation       [      11          0              2    ]      [     0     0     0     0     1    ]  0   
Total time for which application threads were stopped: 0.0011153 seconds
[GC (Allocation Failure) [PSYoungGen: 5102K->3072K(5632K)] 7684K->7718K(528384K), 0.0012587 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.109: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     0     0     1    ]  0   
Total time for which application threads were stopped: 0.0013348 seconds
[GC (Allocation Failure) [PSYoungGen: 5630K->4608K(7680K)] 10276K->10382K(530432K), 0.0016113 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.111: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     0     0     1    ]  0   
Total time for which application threads were stopped: 0.0016921 seconds
0-->Delta time from Start[0ms], from last[0ms]
[GC (Allocation Failure) [PSYoungGen: 7678K->3328K(8704K)] 13452K->13341K(531456K), 0.0013899 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.113: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     1     0     1    ]  0   
Total time for which application threads were stopped: 0.0024590 seconds
[GC (Allocation Failure) [PSYoungGen: 6399K->3360K(11264K)] 16412K->16419K(534016K), 0.0011903 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.115: ParallelGCFailedAllocation       [      11          1              2    ]      [     0     0     0     0     1    ]  0   
Total time for which application threads were stopped: 0.0013046 seconds
[GC (Allocation Failure) [PSYoungGen: 8477K->5536K(11264K)] 21537K->21642K(534016K), 0.0020710 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.117: ParallelGCFailedAllocation       [      11          0              1    ]      [     0     0     0     0     2    ]  0   
Total time for which application threads were stopped: 0.0021520 seconds
1-->Delta time from Start[210ms], from last[210ms]
2-->Delta time from Start[410ms], from last[200ms]
3-->Delta time from Start[610ms], from last[200ms]
4-->Delta time from Start[810ms], from last[200ms]
5-->Delta time from Start[1010ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
1.119: no vm operation                  [      11          2              2    ]      [     0     0     0     0     0    ]  2   
Total time for which application threads were stopped: 0.0001547 seconds
6-->Delta time from Start[1210ms], from last[200ms]
7-->Delta time from Start[1410ms], from last[200ms]
8-->Delta time from Start[1610ms], from last[200ms]
9-->Delta time from Start[1810ms], from last[200ms]
10-->Delta time from Start[2010ms], from last[200ms]
11-->Delta time from Start[2210ms], from last[200ms]
12-->Delta time from Start[2410ms], from last[200ms]
13-->Delta time from Start[2610ms], from last[200ms]
14-->Delta time from Start[2810ms], from last[200ms]
15-->Delta time from Start[3010ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
3.120: no vm operation                  [      11          2              2    ]      [     0     0     0     0     0    ]  2   
Total time for which application threads were stopped: 0.0000670 seconds
16-->Delta time from Start[3210ms], from last[200ms]
17-->Delta time from Start[3410ms], from last[200ms]
18-->Delta time from Start[3610ms], from last[200ms]
19-->Delta time from Start[3810ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
4.084: EnableBiasedLocking              [      11          2              2    ]      [     0     0     0     0     0    ]  2   
Total time for which application threads were stopped: 0.0000737 seconds
20-->Delta time from Start[4010ms], from last[200ms]
21-->Delta time from Start[4210ms], from last[200ms]
22-->Delta time from Start[4410ms], from last[200ms]
23-->Delta time from Start[4610ms], from last[200ms]
24-->Delta time from Start[4810ms], from last[200ms]
25-->Delta time from Start[5010ms], from last[200ms]
26-->Delta time from Start[5210ms], from last[200ms]
27-->Delta time from Start[5410ms], from last[200ms]
28-->Delta time from Start[5610ms], from last[200ms]
29-->Delta time from Start[5810ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
6.084: no vm operation                  [      11          2              2    ]      [     0     0     0     0     0    ]  2   
Total time for which application threads were stopped: 0.0000621 seconds
30-->Delta time from Start[6010ms], from last[200ms]
31-->Delta time from Start[6210ms], from last[200ms]
32-->Delta time from Start[6410ms], from last[200ms]
33-->Delta time from Start[6610ms], from last[200ms]
34-->Delta time from Start[6810ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7.085: no vm operation                  [      11          2              2    ]      [     0     0     0     0     0    ]  2   
Total time for which application threads were stopped: 0.0000605 seconds
35-->Delta time from Start[7010ms], from last[200ms]
36-->Delta time from Start[7210ms], from last[200ms]
37-->Delta time from Start[7410ms], from last[200ms]
38-->Delta time from Start[7610ms], from last[200ms]
39-->Delta time from Start[7810ms], from last[200ms]
40-->Delta time from Start[8010ms], from last[200ms]
41-->Delta time from Start[8210ms], from last[200ms]
42-->Delta time from Start[8410ms], from last[200ms]
43-->Delta time from Start[8610ms], from last[200ms]
44-->Delta time from Start[8810ms], from last[200ms]
45-->Delta time from Start[9010ms], from last[200ms]
46-->Delta time from Start[9210ms], from last[200ms]
47-->Delta time from Start[9410ms], from last[200ms]
48-->Delta time from Start[9610ms], from last[200ms]
49-->Delta time from Start[9810ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
10.087: no vm operation                  [      11          2              2    ]      [  3009     0  3009     0     0    ]  2   
Total time for which application threads were stopped: 3.0091028 seconds
50-->Delta time from Start[12994ms], from last[3184ms]
51-->Delta time from Start[13194ms], from last[200ms]
52-->Delta time from Start[13394ms], from last[200ms]
53-->Delta time from Start[13594ms], from last[200ms]
54-->Delta time from Start[13794ms], from last[200ms]
55-->Delta time from Start[13994ms], from last[200ms]
56-->Delta time from Start[14194ms], from last[200ms]
57-->Delta time from Start[14394ms], from last[200ms]
58-->Delta time from Start[14594ms], from last[200ms]
59-->Delta time from Start[14794ms], from last[200ms]
60-->Delta time from Start[14994ms], from last[200ms]
61-->Delta time from Start[15194ms], from last[200ms]
62-->Delta time from Start[15394ms], from last[200ms]
63-->Delta time from Start[15594ms], from last[200ms]
64-->Delta time from Start[15794ms], from last[200ms]
65-->Delta time from Start[15994ms], from last[200ms]
vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
16.098: no vm operation                  [      11          2              2    ]      [  1221     0  1221     0     0    ]  2   
Total time for which application threads were stopped: 1.2213130 seconds
66-->Delta time from Start[17217ms], from last[1223ms]
67-->Delta time from Start[17417ms], from last[200ms]
68-->Delta time from Start[17617ms], from last[200ms]
69-->Delta time from Start[17817ms], from last[200ms]
70-->Delta time from Start[18017ms], from last[200ms]
71-->Delta time from Start[18217ms], from last[200ms]
72-->Delta time from Start[18417ms], from last[200ms]
73-->Delta time from Start[18617ms], from last[200ms]
74-->Delta time from Start[18817ms], from last[200ms]
75-->Delta time from Start[19017ms], from last[200ms]
76-->Delta time from Start[19217ms], from last[200ms]
77-->Delta time from Start[19417ms], from last[200ms]
78-->Delta time from Start[19617ms], from last[200ms]
79-->Delta time from Start[19817ms], from last[200ms]
80-->Delta time from Start[20017ms], from last[200ms]
Process finished with exit code 1

现在,当我在MAKE_THE_JVM_PAUSE_UNEXPECTEDLYboolean设置为false的情况下运行此东西时,它不会在任何安全点上减慢速度。

为什么将此代码放在方法中会导致此问题?

由于安全点大约每秒发生一次,因此可能是GuaranteedSafepointInterval在起作用,即它由计时器触发。

我还没有测试过你的代码,但是一旦触发,很可能需要很长时间才能到达安全点,因为有一种优化可以忽略 JIT 已知在有限时间内终止的循环中的安全点轮询或调用其他方法,这些方法将在其体内具有安全点轮询。问题在于,对于 CPU 密集型纯计算循环来说,"有限时间量"仍然可能很大,从而导致安全点停滞。在 java 8 和 9 中,可以通过-XX:+UseCountedLoopSafepoints禁用此功能,但性能会略有下降。Java 10 将引入优化,以平衡有限循环中的安全点延迟和每个循环轮询 (JDK-8186027)。

最新更新