在 C 中操作从文件 I/O 存储的字符串时出现问题



我的方法是从文件中的每个字符中读取并保留计数,因此当我们遇到非法字符时,我会跟踪字符串长度以及遇到该长度的字符串数。现在我正在尝试使用我读入的字符构建字符串并将它们存储在数组中。它几乎可以工作,但是当我尝试将 2 个字符串加在一起时,我可以绕过中止和 seg 错误,以防读入的 2 个字符串长度相同。如果您在不介意给我一些反馈,我在代码的第 129 行上标记了我遇到问题的地方......我希望完成后打印各种长度的字符串

这是我用来测试的文本文件:

Tomorrow, and tomorrow, and tomorrow,
To the last syllable of recorded time;

源代码:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*
*this program reads in a text file from the command line
*then counts and stores the number of words of all lengths
*/
#define LENGTH 34
#define WORD_BUFFER 750
int strLengths[LENGTH],lengthsCopy[LENGTH];
char *array[WORD_BUFFER][LENGTH];
char strings[LENGTH];
int counter = 0;
int ch,tester;
//sorts the output of string lengths printing the largest amounts first
void sort()
{
int max_val =0;
int i,j,temp,val;
//create copy
for (i=0; i < LENGTH; i++)
{
lengthsCopy[i] = strLengths[i];
}
//for loop finds the max value in the array elements
for(i=0; i<LENGTH; i++)
{
if(lengthsCopy[i] > max_val)
max_val = lengthsCopy[i];
}
printf("max val in the array is %dn",max_val);
//prints the max value,decrements,checks,prints, rinse repeat...
//iterates until the max is 0
while(max_val!=0)
{
//checks all elements
for(i=LENGTH-1; i > 0; i--)
{
//print when max val is found
if(lengthsCopy[i] == max_val)
{
temp = i;
printf("Count[%02d]=%02d;n",i,max_val);
//check for doubles
for(j=LENGTH-1; j > 0; j--)
{
//if double is found that is not the original, print
if(lengthsCopy[j] == max_val && temp != j)
{
printf("Count[%02d]=%02d;n",j,max_val);
//erase value
lengthsCopy[j] = 0;
}
}
}
}
max_val--;
}
}
//print all array that are not null, represent count of word lenghts
void printList()
{
int i,val;
for(i=1; i<=LENGTH;i++)
{
if(strLengths[i] > 0)
{
val = strLengths[i];
printf("Count[%02d]=%02d;n",i,val);
}
}
}
int main (int argc, char *argv[])
{
//error message if input file is not passed
if(argc < 2)
{
printf("You have to give me a file!n");
exit(1);
}
FILE *text = fopen(argv[1], "r");
//errror message if no contents in the file
if(text == NULL)
{
printf("No content to read in %s. n", argv[1]);
exit(1);
}
//iterate through text until end of file
ch = fgetc(text);
int strPoint =0;
while(ch != EOF)
{
//if illegal char is met, add a count to the array value of current counter
//set counter back to 0
//scan next char
if(ch==' '||ch==','||ch=='('||ch==')'||ch==';'||ch=='n')
{
if(array[counter][0] == NULL)//if length not defined yet
{
array[counter][0] = strings;//add current string build to the array
printf("%sn",array[counter][0] );
}
else if(array[counter][0] != NULL && strings[0] != '')
{//else length is defined add to text bank
printf("else if reachedn");
printf("%sn",strings );
printf("%lun",strlen(array[counter][0]) );
int arrayptr = strlen(*array[counter]);
printf("ptr %d",arrayptr);
/* next line aborts / seg_faults */
strncat(*array[counter],strings,strlen(strings)); 
}
strLengths[counter]++;
counter = 0;
ch = fgetc(text);
memset(strings, 0, sizeof(strings));//clear stringBuild
strPoint =0;
}
//else a legal character, increase counter, scan next char
else
{
strings[strPoint] = ch;
printf("string build %cn",strings[strPoint]);
counter++;
strPoint++;
ch = fgetc(text);
}
}
fclose(text);
printf("stored string %sn",array[3][0] );
printList();
//call sort
sort();
exit(0);
}

从我从你的代码中可以看出,你的主要问题是你对正在发生的事情的误解:

array[counter][0] = strings;//add current string build to the array

您正在将指针设置为指向strings的地址array[counter][0]。你只有一个strings变量,所以每个array[counter][0]都指向同一件事(所以你array中的每一行都会指向strings中包含的最后一个字符串)

由于strncat的行为,您作为strcpy终止的strncat并没有错,但请注意,这可能是长时间缓冲的性能损失。您可能还有其他逻辑问题,但它们被代码的尴尬布局和指向字符数组数组的指针的非标准使用所混淆。

反馈

尝试简化您的实施。如果您主要关心存储从文件中读取的单词,以及每个单词的长度以进行排序,那么您可以简单地将单词存储在 char 的 2D 数组中,并在每次需要长度时调用strlen,或者对于int的大小,您可以使用简单的结构将每个单词的长度与单词本身相关联, 例如

typedef struct {
char word[LENGTH];
int len;
} wordinfo;

然后,您只需创建一个数组或结构(例如wordinfo words[WORD_BUFFER];)并将您的单词存储在words[x].word中,长度以word[x].len存储。如果你想放弃使用结构,那么只需声明一个 2D 数组(例如char words[LENGTH][WORD_BUFFER];并存储单词。(对于每个单词 4 个字节的成本,如果存储不是问题,您将通过存储从读取字符中已有的长度来节省重复函数调用的开销strlen)

您还可以声明指向char LENGTH 数组的指针(例如char (*array)[LENGTH];并使用array = malloc (sizeof *array * WORD_BUFFER);为其中WORD_BUFFER动态分配存储(您可以使用calloc来初始化分配给零的所有字节)。这是一个不错的选择,但动态分配似乎不是您的目标。

此外,避免使用全局变量。它们几乎从不需要,这会增加名称冲突和值覆盖的风险。将变量声明为main()本地变量,并根据需要将它们作为参数传递。例如,使用结构实现,您可以编写按长度排序和打印,如下所示,将指针指向结构数组和填充为参数的数字:

/* simple insertion sort on len (descending) */
void sort (wordinfo *a, int n)
{
int i, j;
wordinfo v;
for (i = 1; i < n; i++) {
v = a[i];
j = i;
while (j > 0 && a[j - 1].len < v.len ) {
a[j] = a[j - 1];
j -= 1;
}
a[j] = v;
}
}
/* tabular print of words read */
void printlist (wordinfo *a, int n)
{
int i;
for (i = 0; i < n; i++)
printf ("  %-34s  (%d-chars)n", a[i].word, a[i].len);
}

(注意:除非家庭作业需要,否则不要编写或使用自己的排序。C 提供了无限高效且经过良好测试的qsort,只需编写一个比较函数来比较您需要排序的任何元素的两个元素,然后让qsort完成工作)

最后,从文件中读取每个字符的逻辑根本不需要复杂。只需阅读字符,检查它,然后采取任何适当的行动。唯一增加的复杂性来自测试,以确保您保持在LENGTH字符和WORD_BUFFER单词范围内,以防止覆盖存储的边界。即使使用结构实现,声明和初始化为:

int c, len = 0, maxndx = 0, ndx = 0;
wordinfo words[WORD_BUFFER] = {{ .word = "", .len = 0 }};

您可以简化main的读取逻辑,只需

while (ndx < WORD_BUFFER && (c = fgetc (fp)) != EOF) {
if (len + 1 == LENGTH ||        /* check if full or c matches */
c==' ' || c==',' || c=='(' || c==')' || c==';' || c=='n') {
if (len) {                          /* if we started a word */
if (len > words[maxndx].len)    /* check if longest  */
maxndx = ndx;               /* update max index  */
words[ndx].len = len;           /* set words[x].len  */
words[ndx++].word[len] = 0;     /* nul-terminat word */
len = 0;                        /* reset length */
}
}
else
words[ndx].word[len++] = c; /* assign c to words[x].word[len] */
}

(注意:maxndx只是保存最长单词的结构的索引(ndx),或者最长的索引之一是您有多个相同的最大长度)

总而言之,您可以将代码归结为:

#include <stdio.h>
#define LENGTH 34
#define WORD_BUFFER 750
typedef struct {
char word[LENGTH];
int len;
} wordinfo;
/* simple insertion sort on len (descending) */
void sort (wordinfo *a, int n)
{
int i, j;
wordinfo v;
for (i = 1; i < n; i++) {
v = a[i];
j = i;
while (j > 0 && a[j - 1].len < v.len ) {
a[j] = a[j - 1];
j -= 1;
}
a[j] = v;
}
}
/* tabular print of words read */
void printlist (wordinfo *a, int n)
{
int i;
for (i = 0; i < n; i++)
printf ("  %-34s  (%d-chars)n", a[i].word, a[i].len);
}
int main (int argc, char **argv) {
int c, len = 0, maxndx = 0, ndx = 0;
wordinfo words[WORD_BUFFER] = {{ .word = "", .len = 0 }};
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {  /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.n", argv[1]);
return 1;
}
/* read each char and store in words[x].word up to 'ndx' words.
* save the length of each word in words[x].len.
*/
while (ndx < WORD_BUFFER && (c = fgetc (fp)) != EOF) {
if (len + 1 == LENGTH ||        /* check if full or c matches */
c==' ' || c==',' || c=='(' || c==')' || c==';' || c=='n') {
if (len) {                          /* if we started a word */
if (len > words[maxndx].len)    /* check if longest  */
maxndx = ndx;               /* update max index  */
words[ndx].len = len;           /* set words[x].len  */
words[ndx++].word[len] = 0;     /* nul-terminat word */
len = 0;                        /* reset length */
}
}
else
words[ndx].word[len++] = c; /* assign c to words[x].word[len] */
}
if (fp != stdin) fclose (fp);       /* close file if not stdin */
printf ("nlongest word: '%s'  (%d-chars)nn", 
words[maxndx].word, words[maxndx].len);
printf ("words read from file:nn");
printlist (words, ndx);     /* print words in order read */
sort (words, ndx);
printf ("nwords sorted by length:nn");
printlist (words, ndx);     /* print words sorted by length */
return 0;
}

(注意:程序期望文件名作为第一个参数读取,或者如果没有给出参数,它将从stdin读取(默认情况下)

示例使用/输出

$ ./bin/rdstrings3 <dat/tomorrow.txt
longest word: 'Tomorrow'  (8-chars)
words read from file:
Tomorrow                            (8-chars)
and                                 (3-chars)
tomorrow                            (8-chars)
and                                 (3-chars)
tomorrow                            (8-chars)
To                                  (2-chars)
the                                 (3-chars)
last                                (4-chars)
syllable                            (8-chars)
of                                  (2-chars)
recorded                            (8-chars)
time                                (4-chars)
words sorted by length:
Tomorrow                            (8-chars)
tomorrow                            (8-chars)
tomorrow                            (8-chars)
syllable                            (8-chars)
recorded                            (8-chars)
last                                (4-chars)
time                                (4-chars)
and                                 (3-chars)
and                                 (3-chars)
the                                 (3-chars)
To                                  (2-chars)
of                                  (2-chars)

仔细看看,如果您有任何问题,请告诉我。选择使用结构并存储len还是仅在需要时调用strlen完全取决于您。

相关内容

  • 没有找到相关文章