假设我们在循环中执行Thread.sleep(1)
,迭代n
次(这里和下面是Java 11):
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
@Param({"5", "10", "50"})
long delay;
@Benchmark
public int sleep() throws Exception {
for (int i = 0; i < delay; i++) {
Thread.sleep(1);
}
return hashCode();
}
}
该基准测试展示了以下结果:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,552 ± 0,071 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,343 ± 0,227 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 68,059 ± 1,441 ms/op
在这里,我们看到方法sleep()
需要超过n
毫秒,而直观地,我们预计它是~n
,因为在每次迭代时,当前线程睡眠1毫秒。这个例子演示了使线程睡眠和唤醒线程的成本
现在让我们修改基准:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
private final ExecutorService executor = Executors.newFixedThreadPool(1);
volatile boolean flag;
@Param({"5", "10", "50"})
long delay;
@Setup(Level.Invocation)
public void setUp() {
flag = true;
startThread();
}
@TearDown(Level.Trial)
public void tearDown() {
executor.shutdown();
}
@Benchmark
public int sleep() throws Exception {
while (flag) {
Thread.sleep(1);
}
return hashCode();
}
private void startThread() {
executor.submit(() -> {
try {
Thread.sleep(delay);
flag = false;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
});
}
}
在这里,我们运行一个后台线程,它等待n
毫秒,并在sleep()
方法迭代while(flag)
循环时放下标志。一旦标志在n
毫秒的延迟之后被放下,我们期望while
循环迭代大约n
次。
我们再次看到Thread.sleep(1)
的成本,但对于5和10的delay
,它们似乎几乎相同,对于delay
为50的情况,它们明显更低。请注意,这里的差异不是线性的:5的差异约为0.1 ms,10的差异约1,2 ms,50的差异约13 ms。
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 6,760 ± 0,070 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,496 ± 0,050 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 54,727 ± 0,599 ms/op
在Java 18上的结果类似:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,609 ± 0,105 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,233 ± 0,148 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 66,017 ± 0,714 ms/op
ThreadSleep2Benchmark.sleep 5 avgt 50 6,740 ± 0,067 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,400 ± 0,112 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 53,836 ± 0,250 ms/op
所以我的问题是:ThreadSleep2Benchmark
中降低成本的效果是编译器的成就(循环展开等),还是关于我如何迭代循环?
UPD
对于Linux,我得到了以下结果:
Java 11
Linux
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.597 ± 0.038 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.263 ± 0.069 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 56.079 ± 0.267 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.600 ± 0.032 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.558 ± 0.052 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.625 ± 0.049 ms/op
Java 18
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.581 ± 0.041 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.069 ± 0.067 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 55.719 ± 0.602 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.574 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.918 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.823 ± 0.055 ms/op
如果你想对暂停Java线程有更多的控制,可以看看LockSupport.parkNanos。在Linux下,默认情况下,你可以获得50 us的分辨率。有关更多信息以及如何调整,请参阅https://hazelcast.com/blog/locksupport-parknanos-under-the-hood-and-the-curious-case-of-parking/