c-你们有没有一个简单的散列函数可以取前三个字母

我在cs50拼写器工作。这是我的散列函数。它运行良好。但我认为这不是很好，因为有很多if的情况。我试着为前两个字母有撇号的每个单词(例如，a'，B'，C'(制作一个不同的表格。

所以我用了水桶(26 x 26 x 26(+26=17602；

unsigned int hash(const char *word)
{
/*
this is HASHING algorithms in which we return value with checking first three letter
make sure every word set to lowercase so we can easily convert as integer value or alphabetical index
and also set every first element before first letter to be a place for every apostrophe that comes at second word
[A'=0] [Aaa=1] [Aab=2] ... [Aay=25] [Aaz=26] ... [B'=677] [Baa=678] --- base 676 number
n = ( 677*first(word ) 26*second(word) + third(word) ) + 1;
*/
int hash_value = 0;
if(strlen(word) <= 1)                                       // if there's only one word calculate that word, store to element after apostrophe
{
hash_value = ((677 * (tolower(word[0]) - 97)) + 1);
return hash_value;
}
else if(!isalpha(word[1]))                                  // if second word contain apostrophe just calucalte first word
{
hash_value = (677 * (tolower(word[0]) - 97));
return hash_value;
}
else if(strlen(word) <= 2)                                  // if there's two word calculate that two word, store to element after apostrophe
{
hash_value = ((677 * (tolower(word[0]) - 97)) + (27 * (tolower(word[1]) - 97))) + 1;
return hash_value;
}
else if(!isalpha(word[2]))                                  // if third word contain apostrophe just calucalte first and two word
{
hash_value = ((677 * (tolower(word[0]) - 97)) + (27 * (tolower(word[1]) - 97))) + 1;
return hash_value;
}
else
{
hash_value = ((677 * (tolower(word[0]) - 97)) + (27 * (tolower(word[1]) - 97)) + (tolower(word[2]) - 97)) + 1;
return hash_value;
}
}

它实际上运行得很好。

./speller texts/lalaland.txt
WORDS MISSPELLED:     955
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        17756
TIME IN load:         0.02
TIME IN check:        0.02
TIME IN size:         0.00
TIME IN unload:       0.00
TIME IN TOTAL:        0.05

我只是不喜欢我使用很多ELSE..IF条件的方式。所以，也许你想帮助我更好的代码(使用前三个字母(。

在某些方面，您正在使用if语句实现for循环。像这样的想法

int hash = 0;
for (char *current = word; *current != ''; current++) {
if (current == &word[2]) break;
hash += hash(*current);
}

可能允许您避免if语句，因为现在索引处理的末尾在循环中重复。

我不知道我是否知道这个散列的逻辑，但：

如果参数不是大写字母，则

tolower返回相同的字符。无需检查是否为alpha

unsigned int hash1(const char *word)
{
unsigned int result;
result = (!!word[0]) * (677 * (tolower(word[0]) - 97));
result += (word[0] && word[1]) * (27 * (tolower(word[1]) - 97));
result += (word[0] && word[1] && word[2]) * (tolower(word[2]) - 97) + 1;
return result;
}

只需测试isalpha()并利用isalpha(0)为false
只有3个char，几个if()就足够了。

_Static_assert(isalpha(0) == 0, "isalpha(0) unexpectedly true");
unsigned int hashf(const char *word) {
// best to use unsigned char access for tolower() to avoid UB.
const unsigned char *uword = word;
unsigned hash = 677u * (tolower(uword[0]) - 'a') + 1;
if (isalpha(uword[0]) && isalpha(uword[1])) {
hash += 27u * (tolower(uword[1]) - 'a');
if (isalpha(uword[2])) hash += tolower(uword[2]) - 'a';
}
return hash;  // see below
}

简化的可能性：不需要- 97或- 'a'。我们正在生成散列，但需要最后的模。

#define BUCKET_N 17602u
return hash % BUCKET_N;

IAC，一个由bucket大小进行最后模运算的函数，可以实现健壮的散列。考虑OP的hash("~")。OP的代码假设字符值-97在0到25的范围内，但是当一个字符是á时会发生什么？注意[0 CHAR_MAX]之外的角色值。

相关内容

最新更新

热门标签：