为什么我的哈希集中的顺序永远不会改变?

我正在将字符串(长句子)与HashSet一起使用，并且我试图在每次程序运行时将它们打乱以获得随机句子，但这并没有发生

public class testshuffle {
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
run();
}
}
public static void run() {
ArrayList<String> list = new ArrayList<>();
Set<String> set = new HashSet<>();
list.add("Alexandria And Mimy are good people");
list.add("Bob And Alexandria are better than Mimy");
list.add("Camelia And Johanness are better than Bob And Alexandria");
shuffle(list, ThreadLocalRandom.current());
set.addAll(list);
System.out.println(set);
}
}

我知道不能保证哈希集的顺序。使用整数或双精度时，返回的哈希代码可能会导致元素排序。

但是在这里我使用的是字符串，输出是：

[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]
.
.
.
[Alexandria And Mimy are good people, Bob And Alexandria are better than Mimy, Camelia And Johanness are better than Bob And Alexandria]

请不要将其标记为重复，因为这与我在这里找到的情况不同

HashSet 使用计算的 hashCodes 以存储桶方式放置这些字符串。

根据 String hashCode() 合约，两个相等的字符串在同一个 JVM 中将具有相同的哈希代码。这意味着只要字符串不更改，哈希代码就不会更改。

话虽如此，实际的hashCode()实现已经从一个JVM版本更改为另一个JVM版本和/或从一个JVM供应商更改为另一个JVM供应商。因此，不要完全依赖它，即使它在您的情况下似乎以可预测的方式运行。

字符串哈希代码() JavaDoc：

/** * Returns a hash code for this string. The hash code for a * {@code String} object is computed as * <blockquote><pre> * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] * </pre></blockquote> * using {@code int} arithmetic, where {@code s[i]} is the * <i>i</i>th character of the string, {@code n} is the length of * the string, and {@code ^} indicates exponentiation. * (The hash value of the empty string is zero.) * * @return a hash code value for this object. */

不保证哈希集顺序

这不完全正确，什么顺序？如果是原生顺序(1<2，a

如果将代码更改为以下内容：

list.add("Alexandria");
list.add("Bob");
list.add("Camelia");

结果是：

[Bob, Camelia, Alexandria]
[Bob, Camelia, Alexandria]
[Bob, Camelia, Alexandria]

你看？没有字母顺序！

这是对其他答案和评论的补充，但似乎 OP 仍然不理解，所以我会尝试举一个例子。

哈希集的结构是一个存储桶数组。存储桶包含集合的 0、1 或多个元素。如果存储桶中有多个元素，则它们将存储在该存储桶内的链表中。

(注意，这是一个简化：HashSet 比这更复杂，可以在某些条件下开始使用树)。

将元素添加到 HashSet 时，将根据元素的 hashCode 以确定性方式选择用于存储该元素的存储桶。

因此，假设 HashSet 有 7 个存储桶 b1 到 b7。

假设您将 3 个元素 A、B 和 C 添加到 HashSet 中。

想象一下，用于选择存储桶的确定性函数返回

B1 代表 A
B2 代表 B
b3 代表 C

因此，您将拥有一个这样的结构

[
b1 -> A,
b2 -> B,
b3 -> C,
b4 -> <empty>
b5 -> <empty>
b6 -> <empty>
b7 -> <empty>
]

迭代时，哈希集不会随机迭代。它将简单地从一个桶到另一个桶，并始终打印 A，然后是 B，然后是 C。由于选择存储桶的函数是确定性的，因此无论广告顺序如何，A、B 和 C 都将始终分别存储在 b1、b2 和 b3 中。

这就是为什么你总是得到相同的订单。

现在，假设 A、B 和 C 具有相同的哈希代码。或者至少，用于根据哈希代码查找 A、B 和 C 的存储桶的函数的结果返回 A、B 和 C 的相同存储桶：b3。

如果你插入 A，然后是 B，然后是 C，你最终会得到

[
b1 -> <empty>,
b2 -> <empty>,
b3 -> A -> B -> C
b4 -> <empty>
b5 -> <empty>
b6 -> <empty>
b7 -> <empty>
]

但是如果你插入 C，然后是 B，然后是 A，你最终会得到

[
b1 -> <empty>,
b2 -> <empty>,
b3 -> C -> B -> A
b4 -> <empty>
b5 -> <empty>
b6 -> <empty>
b7 -> <empty>
]

因此，当迭代 HashSet 时，顺序会有所不同，具体取决于广告顺序。

TL;DR：HashSet 可以自由地按照它想要的方式对其元素进行排序，因此你不应该依赖 HashSet 中元素的顺序。只需直接使用您的列表，因为它是随机的，并提供排序保证。

相关内容

最新更新

热门标签：