C语言哈希函数可减少冲突

我正在使用这个哈希函数，但我遇到了很多冲突。目的是添加元素的 ascii 值并输出值。有什么方法可以优化这个或其他功能以减少碰撞次数？

int hash(char* s)
{
int hash = 0;
while(*s)
{
hash = hash + *s;
s++;
}
return hash;
}

32 位int的范围超过 40 亿。(如果您的int是 64 位，则范围要大得多。但是您的代码只是将字符串中每个字符的值相加，它永远不会接近上限。您的所有哈希代码都将是较小的数字，挤占可能值的下限，并增加冲突的机会。

这就是为什么一个好的算法会比这更复杂。

这是一篇在快速谷歌搜索中出现的文章。

"foo bar"和"bar foo"哈希为相同的值，对吗？以这样的方式实现它，即 ascii 值及其在字符串中的位置用于计算哈希，我天真地认为这将显着减少冲突。

int hash(char* s)
{
int hash = 0;
int pos = 0;
while(*s)
{
pos++;
hash += (*s * pos);
s++;
}
return hash;
}

试试这个，看看是否有帮助。这个答案背后我没有太多的理论知识。

编辑*如下所述，您可能希望哈希是一个无符号的int。我在 codechef.com 上对此进行了测试，这是来源和结果：

#include <stdio.h>
unsigned int hash(char* s);
unsigned int hash2(char* s);
int main(void) {
unsigned int temp1 = hash("foo bar");
unsigned int temp2 = hash("bar foo");
printf("temp1 is %d and temp2 is %dn",temp1, temp2);
temp1 = hash2("foo bar");
temp2 = hash2("bar foo");
printf("temp1 is %d and temp2 is %dn",temp1, temp2);
return 0;
}
unsigned int hash(char* s)
{
unsigned int hash = 0;
while(*s)
{
hash = hash + *s;
s++;
}
return hash;
}
unsigned int hash2(char* s)
{
unsigned int hash = 0;
int pos = 0;
while(*s)
{
pos++;
hash += (*s * pos);
s++;
}
return hash;
}

带输出：

temp1 为 665，temp2 为 665

temp1 为 2655，temp2 为 2715

是的，您的"哈希"函数将发生由相同字母组成的字符串的冲突，例如"铁路安全"和"童话"。这是因为您只使用可交换的加法。

您可以使用这样的东西，它涉及素数作为因子。

unsigned long int hashBetter(const char* s)
{
unsigned long int hash = 1234567890ul;
while(*s)
{
hash = (*s + hash) * 4294967291ul;
s++;
}
return hash;
}

或者你涉及一个CRC，它将输入数据广泛分布在可能哈希值的有效范围内：

unsigned long int hashGood(const char* s)
{
unsigned long int hash = 1234567890ul;
while(*s)
{
hash = crc(hash, *s);
s++;
}
return hash;
}

相关内容

最新更新

热门标签：

C语言 哈希函数可减少冲突

相关内容

最新更新

热门标签：

C语言哈希函数可减少冲突