为什么LongStream reduce和sum性能之间存在差异



我使用LongStreamrangeClosed来测试数字总和的性能。当我通过JMH测试性能时,结果如下。

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 1, jvmArgs = {"-Xms4G", "-Xmx4G"})
@State(Scope.Benchmark)
@Warmup(iterations = 10, time = 10)
@Measurement(iterations = 10, time = 10)
public class ParallelStreamBenchmark {
private static final long N = 10000000L;
@Benchmark
public long sequentialSum() {
return Stream.iterate(1L, i -> i + 1).limit(N).reduce(0L, Long::sum);
}
@Benchmark
public long parallelSum() {
return Stream.iterate(1L, i -> i + 1).limit(N).parallel().reduce(0L, Long::sum);
}
@Benchmark
public long rangedReduceSum() {
return LongStream.rangeClosed(1, N).reduce(0, Long::sum);
}
@Benchmark
public long rangedSum() {
return LongStream.rangeClosed(1, N).sum();
}
@Benchmark
public long parallelRangedReduceSum() {
return LongStream.rangeClosed(1, N).parallel().reduce(0L, Long::sum);
}
@Benchmark
public long parallelRangedSum() {
return LongStream.rangeClosed(1, N).parallel().sum();
}
@TearDown(Level.Invocation)
public void tearDown() {
System.gc();
}
Benchmark                                        Mode  Cnt   Score   Error  Units
ParallelStreamBenchmark.parallelRangedReduceSum  avgt   10   7.895 ± 0.450  ms/op
ParallelStreamBenchmark.parallelRangedSum        avgt   10   1.124 ± 0.165  ms/op
ParallelStreamBenchmark.rangedReduceSum          avgt   10   6.832 ± 0.165  ms/op
ParallelStreamBenchmark.rangedSum                avgt   10  21.564 ± 0.831  ms/op

rangedReduceSumrangedSum之间的区别在于只使用了内部函数sum((。为什么性能差异如此之大?

在验证了sum()函数最终使用reduce(0, Long::sum)之后,它不与在rangedReduceSum方法中使用reduce(0, Long::sum)相同吗?

我做了与OP相同的任务,并且我可以复制完全相同的结果:第二个任务慢了大约3倍。但当我把预热改为只有一次迭代时,事情开始变得有趣起来:

# Benchmark: test.ParallelStreamBenchmark.rangedReduceSum
# Warmup Iteration   1: 3.619 ms/op
Iteration   1: 3.931 ms/op
Iteration   2: 3.927 ms/op
Iteration   3: 3.834 ms/op
Iteration   4: 4.006 ms/op
Iteration   5: 4.605 ms/op
Iteration   6: 6.454 ms/op
Iteration   7: 6.466 ms/op
Iteration   8: 6.328 ms/op
Iteration   9: 6.370 ms/op
Iteration  10: 6.244 ms/op
# Benchmark: test.ParallelStreamBenchmark.rangedSum
# Warmup Iteration   1: 3.971 ms/op
Iteration   1: 4.034 ms/op
Iteration   2: 3.970 ms/op
Iteration   3: 3.957 ms/op
Iteration   4: 4.024 ms/op
Iteration   5: 4.278 ms/op
Iteration   6: 19.302 ms/op
Iteration   7: 19.132 ms/op
Iteration   8: 19.189 ms/op
Iteration   9: 18.842 ms/op
Iteration  10: 18.292 ms/op
Benchmark                                Mode  Cnt   Score    Error  Units
ParallelStreamBenchmark.rangedReduceSum  avgt   10   5.216 ±  1.871  ms/op
ParallelStreamBenchmark.rangedSum        avgt   10  11.502 ± 11.879  ms/op

每项任务在第5次迭代后都会显著减慢。对于第二个任务,它在第五次迭代后减慢了3次。如果我们将预热计算为迭代,那么在10次迭代之后,开始缓慢是有意义的。看起来像是Benchmark库中的一个bug,它不能很好地使用GC。但正如警告所说,这种情况下的基准结果只是供参考。

最新更新