使用fgets和strtok()读取文本文件-C



我正在尝试使用fgets((从stdin逐行读取文本,并将文本存储在变量"text"中。然而,当我使用strtok((来拆分单词时,它只适用于终止前的几行。我应该更改什么以使其贯穿整个文本?


#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200
int main(void) {
char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
char word[WORD_BUFFER_SIZE];
int numberOfWords = 0;

while(scanf("%s", word) == 1){
if (strcmp(word, "====") == 0){
break;
}
strcpy(stopWords[numberOfWords], word);
numberOfWords++;
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);

while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){  
strcat(text, buffer);
}

char *k;
k = strtok(text, " ");
while (k != NULL) {
printf("%sn", k);
k = strtok(NULL, " ");
}

}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);

sizeof(WORD_BUFFER_SIZE)是一个常数,它是整数的大小。你可能是指WORD_BUFFER_SIZE * TEXT_SIZE。但是,您可以找到文件大小,并准确计算出需要多少内存。

char *text = malloc(...)
strcat(text, buffer);

text未初始化,并且没有null终止符。strcat需要知道text的结束。在使用strcat(与strcpy不同(之前,您必须设置text[0] = ''

int main(void) 
{
fseek(stdin, 0, SEEK_END);
size_t filesize = ftell(stdin);
rewind(stdin);
if (filesize == 0)
{ printf("not using a file!n"); return 0; }
char word[1000] = { 0 };
//while (scanf("%s", word) != 1)
//    if (strcmp(word, "====") == 0)
//        break;
char* text = malloc(filesize + 1);
if (!text)
return 0;
text[0] = '';
while (fgets(word, sizeof(word), stdin) != NULL)
strcat(text, word);
char* k;
k = strtok(text, " ");
while (k != NULL) 
{
printf("%sn", k);
k = strtok(NULL, " ");
}
return 0;
}

根据您在注释部分提供的信息,输入文本超过800字节。

然而,在行

char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);

相当于

char *text = malloc(800);

您只分配了800字节作为CCD_ 9的存储空间。因此,您没有分配足够的空间将整个输入存储到text中。试图存储超过800个字节将导致缓冲区溢出,从而调用未定义的行为。

如果要将整个输入存储到text中,则必须确保它足够大。

然而,这可能没有必要。根据您的要求,一次处理一行可能就足够了,如下所示:

while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
char *k = strtok( buffer, " " );
while ( k != NULL )
{
printf( "%sn", k );
k = strtok( NULL, " " );
}
}

在这种情况下,您不需要数组text。您只需要数组buffer来存储行的当前内容。

由于您没有提供任何示例输入,我无法测试上面的代码。


EDIT:根据您对此答案的评论,您的主要问题似乎是如何在事先不知道输入长度的情况下读取stdin的所有输入并将其存储为字符串。

一个常见的解决方案是分配一个初始缓冲区,并在每次缓冲区满时将其大小增加一倍。您可以使用函数realloc进行以下操作:

#include <stdio.h>
#include <stdlib.h>
int main( void )
{
char *buffer;
size_t buffer_size = 1024;
size_t input_size = 0;
//allocate initial buffer
buffer = malloc( buffer_size );
if ( buffer == NULL )
{
fprintf( stderr, "allocation error!n" );
exit( EXIT_FAILURE );
}
//continuously fill the buffer with input, and
//grow buffer as necessary
for (;;) //infinite loop, equivalent to while(1)
{
//we must leave room for the terminating null character
size_t to_read = buffer_size - input_size - 1;
size_t ret;
ret = fread( buffer + input_size, 1, to_read, stdin );
input_size += ret;
if ( ret != to_read )
{
//we have finished reading from input
break;
}
//buffer was filled entirely (except for the space
//reserved for the terminating null character), so
//we must grow the buffer
{
void *temp;
buffer_size *= 2;
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
}
//make sure that `fread` did not fail end due to
//error (it should only end due to end-of-file)
if ( ferror(stdin) )
{
fprintf( stderr, "input error!n" );
exit( EXIT_FAILURE );
}
//add terminating null character
buffer[input_size++] = '';
//shrink buffer to required size
{
void *temp;
temp = realloc( buffer, input_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
//the entire contents is now stored in "buffer" as a
//string, and can be printed
printf( "contents of buffer:n%sn", buffer );
free( buffer );
}

上面的代码假设输入将由文件结束条件终止,如果输入是从文件中通过管道传输的,则可能是这种情况。

仔细想想,与其像在代码中那样为整个文件设置一个大字符串,不如为单个字符串设置一个char*数组,每个字符串代表一行,例如lines[0]将是第一行的字符串,lines[1]将是第二行的字符串。这样,就可以很容易地使用strstr来查找"deliminator和strchr在每一行上查找各个单词,并且仍然将所有行存储在存储器中以供进一步处理。

在这种情况下,我不建议您使用strtok,因为该函数是破坏性的,因为它通过用null字符替换deliminator来修改字符串。如果您需要字符串进行进一步处理,如您在注释部分所述,那么这可能不是您想要的。这就是为什么我建议您使用strchr

如果在编译时已知合理的最大行数,那么解决方案相当简单:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024
int main( void )
{
char *lines[MAX_LINES];
int num_lines = 0;
char buffer[MAX_LINE_LENGTH];
//read one line per loop iteration
while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
int line_length = strlen( buffer );
//verify that entire line was read in
if ( buffer[line_length-1] != 'n' )
{
//treat end-of file as equivalent to newline character
if ( !feof( stdin ) )
{
fprintf( stderr, "input line exceeds maximum line length!n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character from string
buffer[--line_length] = '';
}
//allocate memory for new string and add to array
lines[num_lines] = malloc( line_length + 1 );
//verify that "malloc" succeeded
if ( lines[num_lines] == NULL )
{
fprintf( stderr, "allocation error!n" );
exit( EXIT_FAILURE );
}
//copy line to newly allocated buffer
strcpy( lines[num_lines], buffer );
//increment counter
num_lines++;
}
//All input lines have now been successfully read in, so
//we can now do something with them.
//handle one line per loop iteration
for ( int i = 0; i < num_lines; i++ )
{
char *p, *q;
//attempt to find the " ==== " marker
p = strstr( lines[i], " ==== " );
if ( p == NULL )
{
printf( "Warning: skipping line because unable to find " ==== ".n" );
continue;
}
//skip the " ==== " marker
p += 6;
//split tokens on remainder of line using "strchr"
while ( ( q = strchr( p, ' ') ) != NULL )
{
printf( "found token: %.*sn", (int)(q-p), p );
p = q + 1;
}
//output last token
printf( "found token: %sn", p );
}
//cleanup allocated memory
for ( int i = 0; i < num_lines; i++ )
{
free( lines[i] );
}
}

当使用以下输入运行上述程序时

first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator

它有以下输出:

found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator

然而,如果在编译时没有已知的合理的最大行数,那么数组lines也必须被设计为以与前一程序中的buffer类似的方式增长。这同样适用于最大线路长度。

相关内容

  • 没有找到相关文章

最新更新