效率:HashMap检查包含Key()与始终put()

如果我有一个HashMap，并且我想插入有很多重复项的键值对，最好这样做吗：if (!map.containsKey(key)) map.put(key, value);还是简单地执行map.put(key,value);，而不管密钥/值对是否已经存在？

几乎可以肯定的是，这并没有真正的区别。首先获取正确的代码。它的简介(你似乎没有做过，或者你知道这个案例(。不要随意优化/混淆东西，因为这似乎是个好主意。

通常，最大的性能打击将是加载和编译代码。因此，最短、最简单的代码可能已经是最优的了。

如果您不关心重复，只需执行map.put(key, value);即可。这样，就不需要进行额外的O(1(查找(包括对key进行哈希处理，然后检查它是否存在(。

在最坏的情况下，这可以为重复插入完全相同的(key, value)对的情况节省n-1的额外查找。

您应该使用方法putIfAbsent。如果您担心覆盖该值。此方法在Java 8以后版本中可用。

它具有与put相同的时间复杂度。

如果您可以使用最新的值进行覆盖，那么只需使用put即可。

put和contain都有相同的渐近运行时间，因此使用两者在最大值时将增加2倍，这是恒定的。所以你的渐近运行时不会改变。但是，由于put只会覆盖任何现有值(甚至返回旧值，如果是新值，则返回null(，因此它的速度是原来的2倍。

Map::put用相同的键覆盖条目，因此在不检查映射是否已包含该键的情况下使用.put会更高效。

我也处于类似的情况，决定编写一个快速的小测试来最终决定：

private static void TestMapAccess(int loops) {
HashMap<Integer, Integer> mapA = new HashMap<Integer, Integer>();
HashMap<Integer, Integer> mapB = new HashMap<Integer, Integer>();
HashMap<Integer, Integer> mapC = new HashMap<Integer, Integer>();

for (int i=0; i<10000; i++) {
mapA.put(i, 1);
mapB.put(i, 1);
mapC.put(i, 1);
}

System.out.println("=====" + loops + "=====");
System.out.println("-----------------------------------------------------------");
System.out.println("run: containsKey()");
long startTime = System.nanoTime();

for (int i=0; i<loops; i++) {
int rand = ThreadLocalRandom.current().nextInt(0,10050);
if (!mapA.containsKey(rand)) {
mapA.put(rand, 1);
}
}
long estimatedTime = System.nanoTime() - startTime;
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);

System.out.println("-----------------------------------------------------------");
System.out.println("run: just put it");
startTime = System.nanoTime();

for (int i=0; i<loops; i++) {
int rand = ThreadLocalRandom.current().nextInt(0,10050);
mapB.put(rand, 1);
}
estimatedTime = System.nanoTime() - startTime;
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);


System.out.println("-----------------------------------------------------------");
System.out.println("run: put if absent");
startTime = System.nanoTime();

for (int i=0; i<loops; i++) {
int rand = ThreadLocalRandom.current().nextInt(0,10050);
mapC.putIfAbsent(rand, 1);
}
estimatedTime = System.nanoTime() - startTime;
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);
}

在各种循环参数值下运行的输出：

=====100000=====
-----------------------------------------------------------
run: containsKey()
estimatedTime: 0.019388125
-----------------------------------------------------------
run: just put it
estimatedTime: 0.011162
-----------------------------------------------------------
run: put if absent
estimatedTime: 0.012437
=====1000000=====
-----------------------------------------------------------
run: containsKey()
estimatedTime: 0.03442175
-----------------------------------------------------------
run: just put it
estimatedTime: 0.030184333
-----------------------------------------------------------
run: put if absent
estimatedTime: 0.01281975
=====100000000=====
-----------------------------------------------------------
run: containsKey()
estimatedTime: 1.734175125
-----------------------------------------------------------
run: just put it
estimatedTime: 0.855194542
-----------------------------------------------------------
run: put if absent
estimatedTime: 0.798932875
=====2000000000=====
-----------------------------------------------------------
run: containsKey()
estimatedTime: 14.871508583
-----------------------------------------------------------
run: just put it
estimatedTime: 17.005798667
-----------------------------------------------------------
run: put if absent
estimatedTime: 15.617718958

我跑了很多次，结果只有细微的变化。实际上，大多数时候，当循环只有100000时，containsKey的速度最慢，是2倍或更多。当你将循环增加几个数量级时，它会慢慢变平，甚至变得更有效率。最初我只测试前两个案例，如果没有，就不测试put，我猜测这是由于java垃圾收集。根据我在网上的发现，当你put((覆盖一个现有的值时，它会删除它，垃圾收集器需要清理它，这似乎否定了不进行containsKey检查的任何效率

有趣的是，使用putIfAbsent，我们可以看到containsKey和put之间的混合。我想，在一天结束时，最高循环值的3个案例之间只有几秒钟的时间，但一直都是这样。

相关内容

最新更新

热门标签：