我使用windows机器来获取logback+slf4j 的性能结果
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;
public class LogPerformanceAnalyser {
private static final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
public LogPerformanceAnalyser() {
((ch.qos.logback.classic.Logger) LOG).setLevel(Level.ERROR);
}
public long getTimeWithCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
if (LOG.isDebugEnabled()) {
LOG.debug("This log is {} check", "with");
}
}
return System.currentTimeMillis() - startTime;
}
public long getTimeWithoutCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
LOG.debug("This log is {} check", "without");
}
return System.currentTimeMillis() - startTime;
}
}
备注-1.调试关闭2.跑10次得到平均成绩。
性能结果为:
Total Time getTimeWithoutCheck: 26900 ms
Total Time getTimeWithCheck : 22536 ms
结果是,在记录之前进行检查可以为217亿个日志节省约3.5秒。
如果我将Logger更改为非静态:
private final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
我得到以下信息:
Total Time getTimeWithoutCheck: 37095 ms
Total Time getTimeWithCheck : 47006 ms
有人能解释一下吗?
首先,您需要正确的基准测试。在Java世界中,JMH是事实上的基准测试标准。
基准:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class LogBench {
private static final Logger LOG = LoggerFactory.getLogger(LogBench.class);
private final Logger localLog = LoggerFactory.getLogger(LogBench.class);
@Benchmark
public long baseline() {
return 0;
}
@Benchmark
public void getTimeWithCheck() {
if (LOG.isTraceEnabled()) {
LOG.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheck() {
LOG.trace("This log is {} check", "without");
}
@Benchmark
public void getTimeWithCheckBenchLocal() {
if (localLog.isTraceEnabled()) {
localLog.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheckLocal() {
localLog.trace("This log is {} check", "without");
}
}
我将调试更改为跟踪,以避免强制转换可能产生的影响。
结果
Benchmark Mode Samples Score Score error Units
o.o.j.s.LogBench.baseline avgt 3 0.539 0.047 ns/op
o.o.j.s.LogBench.getTimeWithCheck avgt 3 1.030 0.083 ns/op
o.o.j.s.LogBench.getTimeWithCheckLocal avgt 3 1.637 0.571 ns/op
o.o.j.s.LogBench.getTimeWithoutCheck avgt 3 1.140 0.112 ns/op
o.o.j.s.LogBench.getTimeWithoutCheckLocal avgt 3 1.628 0.311 ns/op
您可以看到,条件检查在这里是无用的,但静态版本比本地版本快1.6倍。让我们开始探究getTimeWithCheckLocal和getTimeWithCheck之间的区别。
静态LOG程序集
Beginning of getTimeWithCheck:
[Verified Entry Point]
0x00007f06b920b440: mov DWORD PTR [rsp-0x14000],eax
0x00007f06b920b447: push rbp
0x00007f06b920b448: sub rsp,0x20 ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheck@-1 (line 65)
execution:
0x00007f06b920b44c: movabs r10,0x76d8bdfd0 ; {oop(a 'ch/qos/logback/classic/Logger')}
0x00007f06b920b456: mov r11d,DWORD PTR [r10+0x28] ;*getfield loggerContext
0x00007f06b920b45a: mov r9d,DWORD PTR [r12+r11*8+0x60] ;*getfield turboFilterList
0x00007f06b920b45f: mov r8d,DWORD PTR [r12+r9*8+0x10] ;*getfield array
...
非静态LOG程序集
[Verified Entry Point]
0x00007f1c592111e0: mov DWORD PTR [rsp-0x14000],eax
0x00007f1c592111e7: push rbp
0x00007f1c592111e8: sub rsp,0x30 ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@-1 (line 77)
0x00007f1c592111ec: mov r11d,DWORD PTR [rsi+0xc] ;*getfield localLog
0x00007f1c592111f0: mov r10d,DWORD PTR [r12+r11*8+0x8] ; implicit exception: dispatches to 0x00007f1c59211281
0x00007f1c592111f5: cmp r10d,0xf80154ad ; {metadata('ch/qos/logback/classic/Logger')}
0x00007f1c592111fc: jne 0x00007f1c5921123c
execution:
0x00007f1c592111fe: lea r8,[r12+r11*8] ;*invokeinterface debug
0x00007f1c59211202: mov ecx,DWORD PTR [r8+0x28] ;*getfield loggerContext
您可以注意到,在第二个实验中,JIT必须执行Logger字段值的额外加载:lea r9,[r12+r10*8]
。
让我们使用perfasm探查器再次运行基准测试
0.04% 0.04% │↗ 0x00007f6c25229320: mov r10d,DWORD PTR [r8+0xc] ;*getfield localLog
││ ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
││ ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80% 7.29% ││ 0x00007f6c25229324: mov r11d,DWORD PTR [r12+r10*8+0x8]
││ ; implicit exception: dispatches to 0x00007f6c252294a5
0.02% ││ 0x00007f6c25229329: cmp r11d,0xf80197b1 ; {metadata('ch/qos/logback/classic/Logger')}
││ 0x00007f6c25229330: jne 0x00007f6c2522939b
││ 0x00007f6c25229332: lea r9,[r12+r10*8] ;*invokeinterface debug
││ 0x00007f6c25229336: mov ecx,DWORD PTR [r9+0x28] ;*getfield loggerContext
正如你所看到的,这个额外的加载不是免费的。这是因为存在许多更改最终变量的方法,所以对字段执行这种优化是不安全的。
由于实验特性jvm有一个特殊的选项-XX:+TrustFinalNonStaticFields
,所以它必须与-XX:+UnlockExperimentalVMOptions
一起使用。如果你用这个选项运行基准测试,你会看到另一个结果:
Benchmark Mode Cnt Score Error Units
LogBench.baseline avgt 3 2.124 ± 0.907 ns/op
LogBench.getTimeWithCheck avgt 3 0.695 ± 0.231 ns/op
LogBench.getTimeWithCheckBenchLocal avgt 3 1.608 ± 0.140 ns/op
LogBench.getTimeWithoutCheck avgt 3 0.675 ± 0.075 ns/op
LogBench.getTimeWithoutCheckLocal avgt 3 1.613 ± 0.176 ns/op
结果很奇怪,尽管现在没有额外的局部变量加载,内联被破坏,asm代码包含直接调用:
0x00007f2355205d33: call 0x00007f2355046020 ; OopMap{off=120}
;*invokespecial filterAndLog_1
结论
- JVM不能信任最终字段,因此在基准测试中每次都必须从内存中加载它(但在99.999999%的应用程序中这不会是问题)
- JVM有一个实验选项
TrustFinalNonStaticFields
,它似乎很不稳定,因为它破坏了CHA优化
结果或多或少与字段访问为总运行时间贡献10000毫秒的假设一致。没有支票,你就有一次开销,有了支票,你就会有两次开销。
差异如此之大仍然令人惊讶。