我正在对几种算法进行基准测试，这些算法执行第k个最近邻问题的变体。当反复运行排序数据的算法时，我看到了令人不安的结果。

更新

似乎数据本身并没有被缓存，正如人们所说的那样。我运行了相同的n = 50000测试，但每次运行三个测试时都会在两个ArrayLists中生成随机点。我担心的加速还是发生了。这似乎是JITC的特性。

输出:

Stopping :: Brute Force. Time (ms): 21716
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 24014
Stopping :: Quickselect. Time (ms): 17655
Stopping :: Brute Force. Time (ms): 22034
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 23975
Stopping :: Quickselect. Time (ms): 18438
Stopping :: Brute Force. Time (ms): 20097
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15677
Stopping :: Quickselect. Time (ms): 14399
Stopping :: Brute Force. Time (ms): 20457
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15141
Stopping :: Quickselect. Time (ms): 14146
Stopping :: Brute Force. Time (ms): 20143
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15834
Stopping :: Quickselect. Time (ms): 14084
Stopping :: Brute Force. Time (ms): 20173
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15170
Stopping :: Quickselect. Time (ms): 13745
Stopping :: Brute Force. Time (ms): 19625
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 14924
Stopping :: Quickselect. Time (ms): 15972
Stopping :: Brute Force. Time (ms): 19388
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15209
Stopping :: Quickselect. Time (ms): 13639
Stopping :: Brute Force. Time (ms): 19420
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 14779
Stopping :: Quickselect. Time (ms): 13798
Stopping :: Brute Force. Time (ms): 19390
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15078
Stopping :: Quickselect. Time (ms): 13548

原始查询

背景/细节:

我有两个未排序的n大小的点数组列表- A和B。对于A中的每个点p，检查B中有多少个点与p的距离为d

算法:

蛮力
- 对于A中的每个点P1，检查b中的每个点P2
分类智能蛮力

按点的x值对C排序

扫描B，直到到达第一个P2。x大于(P1)x + d)

QuickSelect

创建一个临时数组列表C，并将B深拷贝到其中。

按点的x值对C排序

对于A中的每个点P1，通过C进行二分搜索，直到找到x值在P1范围内的点P2。x±d.

从找到的点向左和向右扫描。当到达d范围以外的点时停止扫描。

示例:

public void quickSelect(ArrayList<Point> field1, ArrayList<Point> input2, int distance){
    ArrayList<Point> field2 = new ArrayList<Point>();
    deepCopy(field2, input2);
    startTimer("Starting :: Quickselect.n");
    Collections.sort(field2, new ComparePoints());
    for(int i = 0; i < field1.size(); ++i){
        //Find pivot
        ...
        //Scan to its left until out of bounds.
        ...
        //Scan to its right until out of bounds.
        ...
    }
    endTimer("Stopping :: Quickselect. ");
    field2.clear();
}

其中deepCopy为:

private void deepCopy(ArrayList<Point> field1, ArrayList<Point> input1){
    for(int i = 0; i < input1.size(); ++i){
        field1.add(input1.get(i).getLocation());
    }
}

测试方法:

public static void main(String[] args){
    int points = 50000;
    ArrayList<Point> field1 = generator.makeGraph(points);
    ArrayList<Point> field2 = generator.makeGraph(points);
    GraphTester tester = new GraphTester(field1);
    int maxDistance = 300;
    for(int i = 0; i < 10; ++i){
        tester.bruteForce(field1, field2, maxDistance);
    }
    for(int i = 0; i < 10; ++i){
        tester.sortedBruteForce(field1, field2, maxDistance);
    }
    for(int i = 0; i < 10; ++i){
        tester.quickSelect(field1, field2, maxDistance);
    }
}

结果:

Stopping :: Brute Force. Time (ms): 22851
Stopping :: Brute Force. Time (ms): 22482
Stopping :: Brute Force. Time (ms): 20690
Stopping :: Brute Force. Time (ms): 21073
Stopping :: Brute Force. Time (ms): 20860
Stopping :: Brute Force. Time (ms): 21311
Stopping :: Brute Force. Time (ms): 20847
Stopping :: Brute Force. Time (ms): 21000
Stopping :: Brute Force. Time (ms): 20503
Stopping :: Brute Force. Time (ms): 21342
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 23083
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 22616
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15881
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15323
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15930
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15360
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 16072
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15601
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 16952
Stopping :: Sorted 'Smart' Brute Force. Time (ms): 15950
Stopping :: Quickselect. Time (ms): 18202
Stopping :: Quickselect. Time (ms): 18109
Stopping :: Quickselect. Time (ms): 14685
Stopping :: Quickselect. Time (ms): 14401
Stopping :: Quickselect. Time (ms): 14052
Stopping :: Quickselect. Time (ms): 14782
Stopping :: Quickselect. Time (ms): 14175
Stopping :: Quickselect. Time (ms): 14187
Stopping :: Quickselect. Time (ms): 13870
Stopping :: Quickselect. Time (ms): 14601

正如你所看到的，排序智能蛮力和快速选择算法开始的计算时间很高，但随着测试的重复，稳定到一个更好的数字。

虽然我正在清除临时数组列表，但我觉得排序后的数组列表以某种方式被JVM保留，并被重用/重新映射到新的数组列表指针。

我的问题是双重的，那么:

这看起来像是缓存的情况吗?
我能保证数据没有被缓存吗?
- 如果我每次生成图形时随机化点，我敢打赌会有更多的控制，但我不能确定算法是相当匹配的。

我不明白你为什么要深克隆数据。

这看起来像是缓存的情况吗?

不，这是JITC在你运行的东西上变得更好了。每个重要的Java基准测试都需要预热，在此期间收集统计信息，然后编译和大量优化代码的相关部分(HotSpot)。

更复杂的算法需要更多的预热。

我能保证数据没有被缓存吗?

我看不到任何缓存。谁会这么做?JITC无法理解你在做什么。如果它知道你一直在计算同样的东西，那么除了一次以外的所有时间都将是零。

如果我在每次生成图形时随机化点，我敢打赌会有更多的控制，但这样我就不能确定算法是公平匹配的。

如果你有这样的感觉，就去做吧。我猜你宁愿相信你的基准而不去做。

在注释后更新

Warmup:常见的情况是服务器运行了好几个小时(至少)，一直在做几乎相同的操作。对于这样的服务器，在最初几分钟内测量性能是没有意义的。另一种需要预热的情况是测量或优化非常快的操作，例如String。hashCode，通常会重复数千次。你的情况是一个长时间的计算似乎是一个例外。

尽管如此，我写的关于热身的东西是对你的观察的一个合理的解释。无论如何，Quickselect在所有的运行中都是赢家，所以选择是明确的。事实上，它后来得到了改善，这对胜利者来说是一个很好的奖励。接下来的位置不太清楚，但22851 vs 23083并不是真正的胜利，因为测量误差更大(因此可能是深度复制的成本)。如果您想了解更多信息(1次迭代应该足够了)，请重新运行基准测试几次(使用新的JVM)。

Java基准测试:确保对象在超出作用域后不被重用

原始查询

在注释后更新

相关内容

最新更新

热门标签：