c语言 - 计算路径在参数上的文本文件给出的每个单词出现的程序不起作用 - c - Program that counts the occurrence of each word given by a text file which path is on arguments doesn't work 小贝子编程网

编辑:我发现strtok没有得到字符串正确出于某种原因，我怎么能解决这个问题?

我有这个练习，说明我必须找到每个单词的出现，并使用calloc显示它旁边的单词(例如单词"World")在文本中被发现2次将显示如下:World 2)。我的算法不起作用，输出是无意义的例如:"1,78c1,78"

这是我的算法，我不知道哪里出了问题

#include <stdio.h>
#include <string.h>
#include<stdlib.h>
int main(int argc, char *argv[]){
FILE *file=fopen(argv[1],"r");
if(!file) {
perror("File opening failed");
return 1;
}
int size = 1024;
int *wordcount = calloc(size, sizeof(int));

char **words = calloc(size, sizeof(char*));
char line[1024];
int p=0;
int i=0;
while (fgets(line, sizeof(line), file) != NULL) {
char *word = strtok(line, " ");
while (word != NULL) {
for (int j=0;j<i;j++){
if (strcmp(words[j],word)==0){
p=1;
wordcount[j]++;
break;
} 
}
if (p==0){
words[i]=strdup(word);
wordcount[i]++;
i++;
}
p=0;
word = strtok(NULL, " ");

}
}
for (int i = 0; i < size; i++) {
if (words[i] != NULL) {
printf("%s %d", words[i], wordcount[i]);
printf("n");
}
}
fclose(file);
for (int i = 0; i < size; i++) {
if (words[i] != NULL) {
free(words[i]);
}
}
free(words);
free(wordcount);
return 0;
}

这是我得到的一个输入，我必须运行测试。

Computer science is the study of computation, automation, and information.[1] Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (including the design and implementation of hardware and software).[2][3][4] Computer science is generally considered an area of academic research and distinct from computer programming.[5]

检查fopen是否成功

FILE *file=fopen(argv[1],"r");
if(!file) {
perror("File opening failed");
return 1;
}

strtok不分配新内存

strtok返回一个指向它正在标记的字符串line的指针。您将指向line的指针存储在words中。line是一个缓冲区，每次调用fgets时都会被重用。每次读取一行，line的内容都会发生变化，words所指向的单词也会发生变化。

例如，如果给定…

The quick brown fox
jumped over the lazy grey dog.

你的第一次迭代应该有这样的行和词…

line = "The0quick0brown0foxn"
words  0^  1^    2^    3^

strtok在字符串中插入空以分隔令牌，这里用0表示。words[0]是"The"，words[1]是"quick"，等等。

line = "jumped0over0the0lazy0grey0dog.n"
words  0^  1^    2^    3^
4^     5^   6^  7^   8^   9^

words[4]现在"跳转"了，但words[0]也是如此。words[1]为";ed"。words[2]为" "，依此类推。

相反，在words中放置指针之前复制单词。

if (p=0){
words[i]=strdup(word);
wordcount[i]++;
}

这种将行读入缓冲区，然后复制所需部分的技术是读取和解析文件的常用方法。

检查你在比较什么

for (int j=0;j<i;j++){
if (strcmp(words[i],word)==0){
p=1;
wordcount[j]++;
break;
} 
}

目的是检查您之前是否已经见过这个单词。你在循环j，但是你在和words[i]比较。此时，words[i]是一个空字符串。就像你从未见过这个世界。

与words[j]比较。

不要走到数组的末尾

你总是增加i，即使你不添加到wordcount。这意味着i是看到的单词数的计数，不是是words长度的计数。

一旦你看到一个单词两次，for (int j=0; j < i; j++) {将从words的末尾移开。

相反，只在看到新单词时增加i。i是一个不好的变量名，应该只用于计数器。它是words中使用的下一个索引。使用一些描述性的东西，比如next_words_idx.

去掉换行符

如果你有……

one two three
one two

…有两个2;"two"和"twon"。您需要删除尾随的换行符以使它们匹配。

c语言 - 计算路径在参数上的文本文件给出的每个单词出现的程序不起作用

相关内容

最新更新

热门标签：