C-从文件中提取逗号分隔的字符串到数组-分段错误



我需要从文件中读取逗号分隔的不同字符串,并将它们存储到一个数组中。

我有以下代码,是我在网上阅读不同问题时开发的。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (){
int N = 200; // Number of sequences 
int L = 1000; // length of sequences
char Nseq[N][L];
FILE *myfile;
char *token;
const char s[2] = ",";
char line[300];
char* filename = "pathofile.txt";
int n = 0;
myfile = fopen(filename, "r");
if (myfile == NULL) {printf("could not open file %s", filename); exit(0);}
while (fgets(line, sizeof(line), myfile) != NULL){
token = strtok(line, s);
while (token != NULL){
strcpy(Nseq[n], token);
printf("%st%un", token, n);
token = strtok(NULL, s);
n++;
}
}
fclose(myfile);
for (int n=0; n<100; n++){
printf ("%st%un", Nseq[n], n);}
}

我的文件如下(有200个序列):

AAAGCCGCCAAAGUAGGCGG G,aaagccgccaauaggcgg,auagcccgccaauaggccgg,auagcaccgccauaaggccg,aaagccccgccaaauaaggcgg aaagccgccaaauaaggccgg,aaagccgccaaauaaggccg,aaagccccgcca-auaaggcgg,aaagaccgccaaaaaaggccgg,aaagcacgccaaaoaaggccggaaagcccgcaaaaggccaaaagggcgg,aa agccgcgaaaaggcg,aagcaccgccauaaggcg ugugagggcgg,aaagaccgccaaaagogcgg,aaagccgccaaaagugcggg,aaagccccgccaaa agggcgg,aaagaccgccaaaguaggcgg ggcgg,AAACCGCCCAAAUAGGCGG,aaagccgccaaaagcgg g,aaagrcgccaaaggcgg,aaagcgccaaaagggcgg,aaagccgcccaccggcgg cgg、cacugccggccaagugcggg、ucaauugccgcaaguggcgg、ucaeuugccggccaagugcgcggg,uuuaaggcgcacaugcgcgug,UUAAGGCCGCACAUUCGGCCGGG,uuaaccccgcacaaucggccgg,uuaagcccgcacaoucggccggs,uuaaagccccgaugcgcggg,UUaaccccccgaaucgcgcgcggg gcacaucggccggg,uaaggcccgcacauucggccgg,uaag,uaaggcgcacauucgcggg,uaaggcccacauucggccggg,UAagccccgcacauugccggg guggccggg,uaggccgcaaguccccggg,gauggccggcagcccccgcggg,gauggcgccgcgagccccccgcgg,gaucgccgcgccggcagcccccccggg,高高高高,高高,UAUCGCCGGCACCGUACCGGCGGG,AUUAGGGCCGCCAUAACGGCGG,auuagccgccaaacggcgg,auuauggccgccuaugcgcggg,guguugcgugcccuuaagggg,gugucgugcccccuuaggcg,guguggcgugccugcuuaaggccg,古古古古cuuaaggcg,GUGUUGCGUGCCGCCUUAAGGCG,guguugccugcccuuaagcgg,guugcgcgcgcuuaagggg,cuguugcgogccgccuua agggg cuuacggcggg,guugugcccgccuucgcggg,guugugcc gccuuacgcggg cagccuacggcgug,

并且当我运行代码时,我得到:

AAAGCCGCCAAAGUAGGCGG    0
AAAGCCGCCAAAGUAGGCGG    1
AAAGCCGCCAAAGUAGGCGG    2
AAAGCCGCCAAAGUAGGCGG    3
AAAGCCGCCAAAGUAGGCGG    4
AAAGCCGCCAAAGUAGGCGG    5
AAAGCCCGCCAAAGAAGGCGG   6
AAAGCCCGCCAAAGAAGGCGG   7
AAAGCCCGCCAAAGAAGGCGG   8
AAAGCCCGGCCAAAGAAGGCGG  9
AAAGCCCGCCAAAGUAGGCGG   10
AAAGCCCGCCAAAGUAGGCGG   11
AAAGCCCGCCAGAAGUAGGCGG  12
AAAGCCCGCCAAAGUAG       13
GCGG    14
AAAGCCCGCCAAAGUAGGCGG   15
AAAGCACCGCCAAUGGGCGG    16
AAAGCACCGCCAAUAGGCGG    17
AAAGCACCGCCAAUAGGCGG    18
AUAGCACCGCCAAUAGGCGG    19
AUAGCACCGCCAAUAGGCGG    20
AUAGCACCGCCAGUAGGCGG    21
AUAGCACCGCCAAUAGGCGG    22
AAAGCACCGCCAAAUAAGGCGGG 23
AAAGCACCGCCAAAUAAGGCGGG 24
AAAGCACCGCCAAAUAGGCGGG  25
AAAGCACCGCCAAAUAAGGCGG  26
AAAGCACCGCCAAAUAAGGCGG  27
AAAGCACC        28
GCCAAAUAAGGCGG  29
AAAGCACCGCCAAAUAAGGCGG  30
AAAGCACCGCCAAAUAAGGCGG  31
AAAGCACCGCCAAAUAAGGCGG  32
AAAGCACCGCCAAAUAAGGCGG  33
AAAGCACCGCCAAAUAAGGCGG  34
AAAGCACCGCCAAAUAAGGCGG  35
AAAGCACCGCCAAAUAAGGCGG  36
AAAGCACCGCCAAAUAAGGCGG  37
AAAGCACCGCCAAAUAAGGCGG  38
AAAGCACCGCCAAAUAAGGCGG  39
AAAGCACCGCCAAAUAAGGCGG  40
AAAGCACCGCCAAAUAAGGCGG  41
AAAGCACC        42
GCCAAAUAAGGCGG  43
AAAGCACGGCCAAAUAAGGCGG  44
AAAGCACCGCCAAAUAAGGCGG  45
AAAGCACCGCCAAUAAGGCGG   46
AAAGCACCGCCAAAAGUCGAGGCGG       47
AAAGCACCGCCAAAAUGUGAGGCGG       48
AAAGCACCGCCAAAUGUGAGGCGG        49
AAAGCACCGCCAAAAUGGUGAGGCGG      50
AAAGCACCGCCAAAAGUGAGGCGG        51
AAAGCACCGCCAAAAGUGAGGCGG        52
AAAGCACCGCCAAAAGUGAGGCGG        53
AAAGCACCGCCAAAAGUGAGGCGG        54
AAAGCACCGCCA    55
AAAGUAAGGCGG    56
AAAGACCGCCAAAAGUAAGGCGG 57
AAAGCACCGCCAAAAGUAAGGCGG        58
AAAGCACCGCCAAAAGUAAGGCGG        59
AAAGCACCGCCAAAGUUAAGGCGG        60
AAAGCACCGCCAAAGUAAGGCGG 61
AAAGCACCGCCAAAGUAAGGCGG 62
AAAGCACCGCCAAAGUAAGGCGG 63
UAACGCCGGCCAACUAGGGCGG  64
AACAGCCCGGCCAAAUAGGGCGG 65
AAAGCCGCCAAACUGGCGG     66
AAAGCCGCCAAACUGGCGG     67
AAACCGCCCAAAUAGGCGG     68
AAAGCCGC        69
CCAAAUAGGCGG    70
AAAGCCGCCCAAAUAGGCGG    71
AAAGCCGCCAAAUAGGCGG     72
AAAGCCGCCAAAUAGGCGG     73
AAAGCCGCCCAAAUAGGCGG    74
AAAUCCGCCCAAAUAGGCGG    75
UAAAGCCGCCCUAAAUAGGCGG  76
AAAGCCGCGCAAAUAGGCGG    77
AAAGCCGCCCCAAAUAGGCGG   78
AAAGCCGCCCCAAAUAGGCGG   79
AAAGCCGCCCAAAUAGGCGUG   80
AAAGCCGCCCAAAUAGGCGG    81
AAAGCCGCCCAAAUAGGCGG    82
AAAGCCGCCCAAAUAGGCGG    83
AAAGCCGCCC      84
AAAUAGGCGG      85
AAAGCCGCCAAAUAGGCGG     86
AAAGCCGCCAAAUAGGCGG     87
AAAGCCGCCAAAUAGGCGG     88
AAAGCCGCCCAAAUAGGCGG    89
AAAGCCGCCAAAUGGCGGA     90
AAAGCCGCCAACCGGCGG      91
AAAGCCGCCAACCGGCGG      92
AAAGCCGCCAACCGGCGG      93
AAAGCCGUCAACCGGCGG      94
AAAGCCGCCAACCGGCGG      95
AAAGCCGCCAACCGGCGG      96
AAAGCGCCAACCGGCGG       97
AAAGCCGCCAACCGGCGG      98
AAAGCCGCCAACCGGCGG      99
AAAGCCGCCAACCGGCG       100
G       101
CACUGCCGGCCAAGUCGGCGG   102
CAUUGCCGGCCAAGUCGGCGG   103
CACUGCCGGCCAAGUCGGCGG   104
CAUGCCGGCCAAGUCGGCGG    105
CACUCCGGCCAAGUCGGCGG    106
CACUGCCGGCCAAGUCGGCGG   107
CACUGCCGGACCAAGUCGGCGG  108
CACUGCCGGCCAAGUCGGCGG   109
UCAAUUGCCGGCCAAGUCGGCGG 110
UCAAUUGCCGGCCAAGUCGGCGG 111
UUUAAGGCCGCACAUGCGGCCGUG        112
UUAAGGCCGGAAACAUUCGGCCGUG       113
UUAAGGCCGCACAUUCGGCCGGG 114
UUAAGGCCGCACAUUCGGCCGGG 115
UUAAGGCCGCACAUUCGGCCGGG 116
UUAAAAGGCCGACAUUGCGGCCGGG       117
UUAAAGGCCGACAUUGCGGCCGGG        118
UUAAGUCCGCACAUUCGGCCGGG 119
UUAAGGCCGCACAUUCGGCCGGG 120
UUAAGGCCGCACAUUCGGCCGGG 121
UUAAGGCCGCACAUUCGGCCGGG 122
UUAAGGCCGCACAUUCGGCCGGG 123
UUAAGGCCGCACAUCGGCCGGG  124
UAAGGCCGCACAUUCGGCCGGG  125
UAAGGCCGCACAUUCGGCCGGG  126
UAAGGCCGGC      127
ACAUUCGGCCGGG   128
UAAGGCCGCACAUUCGGCCGGG  129
UAAGGCCGCACAUUCGGCCGGG  130
UAAGGCCGCACAUUCGGCCGGG  131
UAAGGCCGCACAUUCGGCCGGG  132
UAAGGCCGCACAUUCGGCCGGG  133
UAAGGCCGCACAUGUCGGCCGGGU        134
UAAGGCCGCACAUUCGGCCGGG  135
UAAGGCCGCACAUUCGGCCGGG  136
UAGGCCGCAAGUCGGCCGGG    137
UAGGCCGCAAGCGGCCGGG     138
UAGGCCGCAAGCGGCCGGG     139
UAGGCCGCAAGCGGCCGGG     140
UAGGCCGCAAGUCGGCCG      141
GG      142
UAGGCCGCAAGUCGGCCGGG    143
UAGGCCGCAAGUCGGCCGGG    144
UAGGCCGCAAGUCGGCCGGG    145
GAUCGGCCGGCAGCCUCCCGGCGG        146
GAUCGGCCGGCAGCCUCCCGGCGG        147
GAUCGGCCGGCAGCCUCCCGGCGG        148
GAUCGGCCGGCAGCCUCCCGGCGG        149
GAUCGGCCCGGCAGCCUCCCGGCGG       150
GAUCGGCCCGGCAGCCUCCCGGCGG       151
GAUCGGCCGGCAGCCGUACCGGCGG       152
AGAUCGGCCGGCAGCCGUACCGGCGG      153
GAUCGGCCGGCAGCCGUACCGGCGG       154
UA      155
UCGGCCGGCACCGUACCGGGGG  156
UAUCGGCCGGCACCGUACCGGCGGG       157
UAUCGGCGGCACCGUACCGGCGGG        158
UAUCGGCCGGCACCGUACCGGCGGG       159
UAUCGCCGGCACCGUACCGGCGGG        160
AUUAGGGCCGCCAUAACGGCGG  161
AUUAGGGCCGCCAAUAACGGCGG 162
AUUAGGGCCGCCUAUAACGGCGG 163
GUGUUGCGUGCCGCCUUAAGGCG 164
GUGUUGCGUGCGCCUUAAGGCG  165
GUGUUGCGUGCCGCCUUAAGGCG 166
GUGUUGCGUGCCGCCUUAAGGCG 167
GUGUUGCG        168
UGCCGCCUUAAGGCG 169
GUGUUGCGUGCCGCCUUAAGGCG 170
GUGUUGCGUGCCGCCUUAAGGCG 171
GUGUUGCGUGCCGUCUGAAGGCG 172
GUGUUGCGUGCCGCCUUAAUGCG 173
GUGUUGCGUGCCGCCUUAAGGCG 174
GUGUUGCGUGCCGCCUUAAGGCG 175
GUGUUGCGUGCCGCCUUAAGGCG 176
GUGUUGCGUGCCGCCUUAAGGCG 177
GUGUUGCGUGCCGCCUUAAGGCG 178
GUGUUGCGUGCCGCCUUAAGGCG 179
GUGUUGCGUGCCGCCCUUAAGGCG        180
GUGUUGCGUGCCGCCUUA      181
AGGCG   182
GUGUUGCGUGCCGCCUUAAGGCG 183
GUGUUGCGUGCGCCUUAAGGCG  184
CUGUUGCGUGCCGCCUUAAGGCG 185
CUGUUGCGUGCCGCCUUAAGGCG 186
CUGUUGCGUGCCGCCUUAAGGCG 187
GUGUUGCGUGCCGCCGUUACAGGCG       188
GUGUUGCGUGCCGCCGUUACAGGCG       189
GUGUUGCGUGCCGCCUUAAGGCG 190
GUGUUGCGUGCCGCCUUAAGGCG 191
GUGUUGCGUGCCGCCUUAAGGCG 192
UUGGUCCGCCUUACGGCGGG    193
UUGGUCCGCCUUACGGCGGG    194
UUGGUCCG        195
CCUUACGGCGGG    196
UUGGUCCGCCUUACGGCGGG    197
UUCGUCCGCCUUACGGCGGG    198
GUUGUAGCCCGCCUUCGGCGGG  199
GUUGUUGCCGCCUUACGGCGG   200
GUUGUUGCCGCCUUACGGCGG   201
GUUGUUGCCGCCUUACGGCGG   202
GUUGUUGCCGCCUGACGGCGG   203
GUUGUUGCCGCCUGACGGCGG   204
GUUGUUGCCGCCUGACGGCGG   205
GUUGUUGCCGCCUGACGGCGG   206
GUUGUUUGCCGCCUGACGGCGG  207
GUUCCUUGCCAGCCUUACGGCGG 208
Segmentation fault (core dumped)

我知道我必须解决新线的问题,但我不知道为什么我会出现分割错误。因为它看起来很有效,但我还没有到文件的末尾。你知道是什么原因造成的吗?谢谢

由于读取缓冲区的大小(任意),您正在拆分一些序列,因此程序看到的多于200个,因此无法容纳它们的数组太小。

char Nseq[N][L]; // N = 200;

这意味着Nseq最多可以存储200序列。但在您的代码中,n(在while循环中)的值至少达到208。为了避免这个问题,您可以定义N而不是208(例如,N = 300),或者您必须在while循环条件中为n添加一个条件,如下所示:

while (fgets(line, sizeof(line), myfile) && n < 200) {...}

如果你想读取文件中的所有文本,你可以使用双指针:char **Nseq,然后在每次迭代while循环后使用realloc

char **Nseq;
while (fgets(line, sizeof(line), myfile)){
token = strtok(line, s);
while (token != NULL){
Nseq = realloc(Nseq, sizeof(char *) * (n+1));
if(!Nseq) {return -1;}
Nseq[n] = malloc(strlen(token) + 1);
if(!Nseq[n]) {return -1;}
strcpy(Nseq[n], token);
printf("%st%un", token, n);
n++;
token = strtok(NULL, s);
}
}

最新更新