统计c中File中的字母个数

我正在尝试创建一个程序，从文件中读取并计算文件中每个字母字符的出现次数。下面是我到目前为止的结果，但是返回的计数(存储在计数器数组中)比预期的要高。

void count_letters(const char *filename, int counters[26]) {
FILE* in_file = fopen(filename, "r");
const char ALPHABET[] = "abcdefghijklmnopqrstuvwxyz";
if(in_file == NULL){
printf("Error(count_letters): Could not open file %sn",filename);
return;
}
char line[200];
while(fgets(line, sizeof(line),in_file) != NULL){ //keep reading lines until there's nothing left to read
for(int pos = 0; pos < sizeof(line); pos++){//iterate through each character in line...
if(isalpha(line[pos])){//skip checking and increment position if current char is not alphabetical
for(int i = 0; i < 26; i++){//...for each character in the alphabet
if(tolower(line[pos]) == tolower(ALPHABET[i]))//upper case and lower case are counted as same
counters[i]++;    // increment the current element in counters for each match in the line
}
}
}
}
fclose(in_file);
return;
}

在for(int pos = 0; pos < sizeof(line); pos++)中，sizeof(line)计算整个数组line的大小，而不是由最近的fgets调用填充的部分。因此，在长行之后，循环重复计数数组中读取短行的剩余字符。

修改循环，只迭代line最近被fgets填充的部分。您可以在看到空字符时退出循环。

我有一个更简单的解决方案(你有很多循环;))。在大多数情况下，逐行读取输入是首选，但由于这里只是简单地计数字符，我认为这不是其中之一，并最终增加了复杂性。这个答案也假设 ASCII字符编码，正如在另一个答案的注释中指出的那样，C标准不保证这一点。您可以根据需要修改char ALPHABET，以实现最终的可移植性

#include <stdio.h>
#include <ctype.h>
#define NUM_LETTERS 26
int main(void)
{
FILE* in_file = fopen("/path/to/my/file.txt", "r");
if (in_file == NULL) exit(-1);
unsigned charCounts[NUM_LETTERS] = {0};
int curChar;
// rather than reading line-by-line, read one character at a time
while ((curChar = fgetc(in_file)) != EOF)
{
// only proceed if it is a letter
if (isalpha(curChar))
{
// this is bad if not using ASCII, instead you'd need another
// loop to check your ALPHABET, but increment the count here
(charCounts[tolower(curChar) - 'a'])++;
}
}
// print out the results
for (int i=0; i<NUM_LETTERS; i++)
{
// 'A'+i also assumes ASCII encoding
printf("%c: %un", 'A'+i, charCounts[i]);
}
}

演示使用stdin代替文件。

您在for(int pos = 0; pos < sizeof(line); ...中出现错误。假设数组中的所有200个位置都是有效字符，但这只适用于每行有200个字符的文本。您应该只计算字符串初始化部分中的字符。它的长度每行不同:

for(int pos = 0; pos < strlen(line); ...

也不需要最内部的循环，因为所有的字母字符很可能都是有顺序ASCII码:

if(isalpha(line[pos]))
counters[tolower(line[pos]) - 'a']++;

我假设counters之前已经初始化为0。如果没有，必须在计数前初始化该数组。

您不需要使用fgets，因为字符函数的工作速度与文件系统使用自己的缓冲一样快。

#define NLETTERS    ('z' - 'a' + 1)
int countLetters(FILE *fi, size_t *counter)
{
int ch;
if(fi && counter)
{
memset(counter, 0, sizeof(*counter * NLETTERS));
while((ch = fgetc(fi)) != EOF)
{
if(isalpha(ch))
{
counter[tolower(ch) - 'a']++;
}
}
return 0;
}
return 1;
}

相关内容

最新更新

热门标签：