我有一个非常大的文本文件,我想对它进行单词分析。在字数统计中,我可能也会寻找其他信息,但为了简单起见,我省略了这一点。在这个文本文件中,我有用星号"*"分隔的文本块。下面的代码扫描文本文件并打印出应有的#字符和单词,但我想在遇到星号后重置计数器,并将所有信息存储在某种表中。我不太担心如何制作表格,因为我不确定如何为星号之间的每个文本块循环相同的计数代码。
可能是像
这样的for循环for (arr = strstr(arr, "*"); arr; arr = strstr(arr + strlen("*"), "*"))
示例文本文件:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I have a sentence. I have two sentences now.
*
I have another sentence. And another.
*
I'd like to count the amount of words and characters from the asterisk above this
one until the next asterkisk, not including the count from the last one.
*
...
...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
(EOF)
Desired output:
*# #words #alphaChar
----------------------------
1 9 34
-----------------------------
2 5 30
-----------------------------
3 28 124
...
...
I have tried
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int characterCount=0;
int counterPosition, wordCount=0, alphaCount=0;
//input file
FILE *file= fopen("test.txt", "r");
if (file== NULL)
printf("Cannot find the file.n");
//Count total number of characters in file
while (1)
{
counterPosition = fgetc(speechFile);
if (counterPosition == EOF)
break;
++characterCount;
}
rewind(file); // Sends the pointer to the beginning of the file
//Dynamically allocate since array size cant be variable
char *arr= ( char*) malloc(totalCharacterCount);
while(fscanf(speechFile, "%c", &arr[i]) != EOF ) //Scan until the end of file.
i++; //increment, storing each character in a unique position
for(i = 0; i <characterCount; i++)
{
if(arr[i] == ' ') //count words
wordCount++;
if(isalpha(arr[i])) //count letters only
alphaCount++;
}//end for loop
printf("word count is %d and alpha count is %d", wordCount,alphaCount);
}
由于数组arr[]中有完整的文件文本,因此需要使用*
作为分隔符来划分字符串arr
。您可以使用strtok()
来分割该字符串,并使用*
作为分隔符。然后对每个标记执行单词计数和字符计数操作。阅读这个链接来了解中风。