slf4j API的logback的性能问题



我使用windows机器来获取logback+slf4j 的性能结果

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;
public class LogPerformanceAnalyser {
private static final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
public LogPerformanceAnalyser() {
((ch.qos.logback.classic.Logger) LOG).setLevel(Level.ERROR);
}
public long getTimeWithCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
if (LOG.isDebugEnabled()) {
LOG.debug("This log is {} check", "with");
}
}
return System.currentTimeMillis() - startTime;
}
public long getTimeWithoutCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
LOG.debug("This log is {} check", "without");
}
return System.currentTimeMillis() - startTime;
}
}

备注-1.调试关闭2.跑10次得到平均成绩。

性能结果为:

Total Time getTimeWithoutCheck: 26900 ms
Total Time getTimeWithCheck : 22536 ms

结果是,在记录之前进行检查可以为217亿个日志节省约3.5秒。

如果我将Logger更改为非静态:

private final Logger LOG = 
LoggerFactory.getLogger(LogPerformanceAnalyser.class);

我得到以下信息:

Total Time getTimeWithoutCheck: 37095 ms
Total Time getTimeWithCheck : 47006 ms

有人能解释一下吗?

首先,您需要正确的基准测试。在Java世界中,JMH是事实上的基准测试标准。

基准:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class LogBench {
private static final Logger LOG = LoggerFactory.getLogger(LogBench.class);
private final Logger localLog = LoggerFactory.getLogger(LogBench.class);

@Benchmark
public long baseline() {
return 0;
}
@Benchmark
public void getTimeWithCheck() {
if (LOG.isTraceEnabled()) {
LOG.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheck() {
LOG.trace("This log is {} check", "without");
}
@Benchmark
public void getTimeWithCheckBenchLocal() {
if (localLog.isTraceEnabled()) {
localLog.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheckLocal() {
localLog.trace("This log is {} check", "without");
}
}

我将调试更改为跟踪,以避免强制转换可能产生的影响。

结果

Benchmark                                     Mode   Samples        Score  Score error    Units
o.o.j.s.LogBench.baseline                     avgt         3        0.539        0.047    ns/op
o.o.j.s.LogBench.getTimeWithCheck             avgt         3        1.030        0.083    ns/op
o.o.j.s.LogBench.getTimeWithCheckLocal        avgt         3        1.637        0.571    ns/op
o.o.j.s.LogBench.getTimeWithoutCheck          avgt         3        1.140        0.112    ns/op
o.o.j.s.LogBench.getTimeWithoutCheckLocal     avgt         3        1.628        0.311    ns/op

您可以看到,条件检查在这里是无用的,但静态版本比本地版本快1.6倍。让我们开始探究getTimeWithCheckLocal和getTimeWithCheck之间的区别。

静态LOG程序集

Beginning of getTimeWithCheck:
[Verified Entry Point]
0x00007f06b920b440: mov    DWORD PTR [rsp-0x14000],eax
0x00007f06b920b447: push   rbp
0x00007f06b920b448: sub    rsp,0x20           ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheck@-1 (line 65)
execution:
0x00007f06b920b44c: movabs r10,0x76d8bdfd0    ;   {oop(a 'ch/qos/logback/classic/Logger')}
0x00007f06b920b456: mov    r11d,DWORD PTR [r10+0x28] ;*getfield loggerContext
0x00007f06b920b45a: mov    r9d,DWORD PTR [r12+r11*8+0x60]  ;*getfield turboFilterList
0x00007f06b920b45f: mov    r8d,DWORD PTR [r12+r9*8+0x10] ;*getfield array
...     

非静态LOG程序集

[Verified Entry Point]
0x00007f1c592111e0: mov    DWORD PTR [rsp-0x14000],eax
0x00007f1c592111e7: push   rbp
0x00007f1c592111e8: sub    rsp,0x30           ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@-1 (line 77)
0x00007f1c592111ec: mov    r11d,DWORD PTR [rsi+0xc]  ;*getfield localLog
0x00007f1c592111f0: mov    r10d,DWORD PTR [r12+r11*8+0x8] ; implicit exception: dispatches to 0x00007f1c59211281
0x00007f1c592111f5: cmp    r10d,0xf80154ad    ;   {metadata('ch/qos/logback/classic/Logger')}
0x00007f1c592111fc: jne    0x00007f1c5921123c
execution:
0x00007f1c592111fe: lea    r8,[r12+r11*8]     ;*invokeinterface debug
0x00007f1c59211202: mov    ecx,DWORD PTR [r8+0x28]  ;*getfield loggerContext                                                                                     

您可以注意到,在第二个实验中,JIT必须执行Logger字段值的额外加载:lea r9,[r12+r10*8]

让我们使用perfasm探查器再次运行基准测试

0.04%    0.04%  │↗  0x00007f6c25229320: mov    r10d,DWORD PTR [r8+0xc]  ;*getfield localLog
││                                                ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
││                                                ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80%    7.29%  ││  0x00007f6c25229324: mov    r11d,DWORD PTR [r12+r10*8+0x8]
││                                                ; implicit exception: dispatches to 0x00007f6c252294a5
0.02%  ││  0x00007f6c25229329: cmp    r11d,0xf80197b1    ;   {metadata(&apos;ch/qos/logback/classic/Logger&apos;)}
││  0x00007f6c25229330: jne    0x00007f6c2522939b
││  0x00007f6c25229332: lea    r9,[r12+r10*8]     ;*invokeinterface debug
││  0x00007f6c25229336: mov    ecx,DWORD PTR [r9+0x28]  ;*getfield loggerContext

正如你所看到的,这个额外的加载不是免费的。这是因为存在许多更改最终变量的方法,所以对字段执行这种优化是不安全的。

由于实验特性jvm有一个特殊的选项-XX:+TrustFinalNonStaticFields,所以它必须与-XX:+UnlockExperimentalVMOptions一起使用。如果你用这个选项运行基准测试,你会看到另一个结果:

Benchmark                            Mode  Cnt  Score   Error  Units
LogBench.baseline                    avgt    3  2.124 ± 0.907  ns/op
LogBench.getTimeWithCheck            avgt    3  0.695 ± 0.231  ns/op
LogBench.getTimeWithCheckBenchLocal  avgt    3  1.608 ± 0.140  ns/op
LogBench.getTimeWithoutCheck         avgt    3  0.675 ± 0.075  ns/op
LogBench.getTimeWithoutCheckLocal    avgt    3  1.613 ± 0.176  ns/op

结果很奇怪,尽管现在没有额外的局部变量加载,内联被破坏,asm代码包含直接调用:

0x00007f2355205d33: call   0x00007f2355046020  ; OopMap{off=120}
;*invokespecial filterAndLog_1

结论

  • JVM不能信任最终字段,因此在基准测试中每次都必须从内存中加载它(但在99.999999%的应用程序中这不会是问题)
  • JVM有一个实验选项TrustFinalNonStaticFields,它似乎很不稳定,因为它破坏了CHA优化

结果或多或少与字段访问为总运行时间贡献10000毫秒的假设一致。没有支票,你就有一次开销,有了支票,你就会有两次开销。

差异如此之大仍然令人惊讶。

最新更新