我需要从文件中读取逗号分隔的不同字符串,并将它们存储到一个数组中。
我有以下代码,是我在网上阅读不同问题时开发的。
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (){
int N = 200; // Number of sequences
int L = 1000; // length of sequences
char Nseq[N][L];
FILE *myfile;
char *token;
const char s[2] = ",";
char line[300];
char* filename = "pathofile.txt";
int n = 0;
myfile = fopen(filename, "r");
if (myfile == NULL) {printf("could not open file %s", filename); exit(0);}
while (fgets(line, sizeof(line), myfile) != NULL){
token = strtok(line, s);
while (token != NULL){
strcpy(Nseq[n], token);
printf("%st%un", token, n);
token = strtok(NULL, s);
n++;
}
}
fclose(myfile);
for (int n=0; n<100; n++){
printf ("%st%un", Nseq[n], n);}
}
我的文件如下(有200个序列):
AAAGCCGCCAAAGUAGGCGG G,aaagccgccaauaggcgg,auagcccgccaauaggccgg,auagcaccgccauaaggccg,aaagccccgccaaauaaggcgg aaagccgccaaauaaggccgg,aaagccgccaaauaaggccg,aaagccccgcca-auaaggcgg,aaagaccgccaaaaaaggccgg,aaagcacgccaaaoaaggccggaaagcccgcaaaaggccaaaagggcgg,aa agccgcgaaaaggcg,aagcaccgccauaaggcg ugugagggcgg,aaagaccgccaaaagogcgg,aaagccgccaaaagugcggg,aaagccccgccaaa agggcgg,aaagaccgccaaaguaggcgg ggcgg,AAACCGCCCAAAUAGGCGG,aaagccgccaaaagcgg g,aaagrcgccaaaggcgg,aaagcgccaaaagggcgg,aaagccgcccaccggcgg cgg、cacugccggccaagugcggg、ucaauugccgcaaguggcgg、ucaeuugccggccaagugcgcggg,uuuaaggcgcacaugcgcgug,UUAAGGCCGCACAUUCGGCCGGG,uuaaccccgcacaaucggccgg,uuaagcccgcacaoucggccggs,uuaaagccccgaugcgcggg,UUaaccccccgaaucgcgcgcggg gcacaucggccggg,uaaggcccgcacauucggccgg,uaag,uaaggcgcacauucgcggg,uaaggcccacauucggccggg,UAagccccgcacauugccggg guggccggg,uaggccgcaaguccccggg,gauggccggcagcccccgcggg,gauggcgccgcgagccccccgcgg,gaucgccgcgccggcagcccccccggg,高高高高,高高,UAUCGCCGGCACCGUACCGGCGGG,AUUAGGGCCGCCAUAACGGCGG,auuagccgccaaacggcgg,auuauggccgccuaugcgcggg,guguugcgugcccuuaagggg,gugucgugcccccuuaggcg,guguggcgugccugcuuaaggccg,古古古古cuuaaggcg,GUGUUGCGUGCCGCCUUAAGGCG,guguugccugcccuuaagcgg,guugcgcgcgcuuaagggg,cuguugcgogccgccuua agggg cuuacggcggg,guugugcccgccuucgcggg,guugugcc gccuuacgcggg cagccuacggcgug,
并且当我运行代码时,我得到:
AAAGCCGCCAAAGUAGGCGG 0
AAAGCCGCCAAAGUAGGCGG 1
AAAGCCGCCAAAGUAGGCGG 2
AAAGCCGCCAAAGUAGGCGG 3
AAAGCCGCCAAAGUAGGCGG 4
AAAGCCGCCAAAGUAGGCGG 5
AAAGCCCGCCAAAGAAGGCGG 6
AAAGCCCGCCAAAGAAGGCGG 7
AAAGCCCGCCAAAGAAGGCGG 8
AAAGCCCGGCCAAAGAAGGCGG 9
AAAGCCCGCCAAAGUAGGCGG 10
AAAGCCCGCCAAAGUAGGCGG 11
AAAGCCCGCCAGAAGUAGGCGG 12
AAAGCCCGCCAAAGUAG 13
GCGG 14
AAAGCCCGCCAAAGUAGGCGG 15
AAAGCACCGCCAAUGGGCGG 16
AAAGCACCGCCAAUAGGCGG 17
AAAGCACCGCCAAUAGGCGG 18
AUAGCACCGCCAAUAGGCGG 19
AUAGCACCGCCAAUAGGCGG 20
AUAGCACCGCCAGUAGGCGG 21
AUAGCACCGCCAAUAGGCGG 22
AAAGCACCGCCAAAUAAGGCGGG 23
AAAGCACCGCCAAAUAAGGCGGG 24
AAAGCACCGCCAAAUAGGCGGG 25
AAAGCACCGCCAAAUAAGGCGG 26
AAAGCACCGCCAAAUAAGGCGG 27
AAAGCACC 28
GCCAAAUAAGGCGG 29
AAAGCACCGCCAAAUAAGGCGG 30
AAAGCACCGCCAAAUAAGGCGG 31
AAAGCACCGCCAAAUAAGGCGG 32
AAAGCACCGCCAAAUAAGGCGG 33
AAAGCACCGCCAAAUAAGGCGG 34
AAAGCACCGCCAAAUAAGGCGG 35
AAAGCACCGCCAAAUAAGGCGG 36
AAAGCACCGCCAAAUAAGGCGG 37
AAAGCACCGCCAAAUAAGGCGG 38
AAAGCACCGCCAAAUAAGGCGG 39
AAAGCACCGCCAAAUAAGGCGG 40
AAAGCACCGCCAAAUAAGGCGG 41
AAAGCACC 42
GCCAAAUAAGGCGG 43
AAAGCACGGCCAAAUAAGGCGG 44
AAAGCACCGCCAAAUAAGGCGG 45
AAAGCACCGCCAAUAAGGCGG 46
AAAGCACCGCCAAAAGUCGAGGCGG 47
AAAGCACCGCCAAAAUGUGAGGCGG 48
AAAGCACCGCCAAAUGUGAGGCGG 49
AAAGCACCGCCAAAAUGGUGAGGCGG 50
AAAGCACCGCCAAAAGUGAGGCGG 51
AAAGCACCGCCAAAAGUGAGGCGG 52
AAAGCACCGCCAAAAGUGAGGCGG 53
AAAGCACCGCCAAAAGUGAGGCGG 54
AAAGCACCGCCA 55
AAAGUAAGGCGG 56
AAAGACCGCCAAAAGUAAGGCGG 57
AAAGCACCGCCAAAAGUAAGGCGG 58
AAAGCACCGCCAAAAGUAAGGCGG 59
AAAGCACCGCCAAAGUUAAGGCGG 60
AAAGCACCGCCAAAGUAAGGCGG 61
AAAGCACCGCCAAAGUAAGGCGG 62
AAAGCACCGCCAAAGUAAGGCGG 63
UAACGCCGGCCAACUAGGGCGG 64
AACAGCCCGGCCAAAUAGGGCGG 65
AAAGCCGCCAAACUGGCGG 66
AAAGCCGCCAAACUGGCGG 67
AAACCGCCCAAAUAGGCGG 68
AAAGCCGC 69
CCAAAUAGGCGG 70
AAAGCCGCCCAAAUAGGCGG 71
AAAGCCGCCAAAUAGGCGG 72
AAAGCCGCCAAAUAGGCGG 73
AAAGCCGCCCAAAUAGGCGG 74
AAAUCCGCCCAAAUAGGCGG 75
UAAAGCCGCCCUAAAUAGGCGG 76
AAAGCCGCGCAAAUAGGCGG 77
AAAGCCGCCCCAAAUAGGCGG 78
AAAGCCGCCCCAAAUAGGCGG 79
AAAGCCGCCCAAAUAGGCGUG 80
AAAGCCGCCCAAAUAGGCGG 81
AAAGCCGCCCAAAUAGGCGG 82
AAAGCCGCCCAAAUAGGCGG 83
AAAGCCGCCC 84
AAAUAGGCGG 85
AAAGCCGCCAAAUAGGCGG 86
AAAGCCGCCAAAUAGGCGG 87
AAAGCCGCCAAAUAGGCGG 88
AAAGCCGCCCAAAUAGGCGG 89
AAAGCCGCCAAAUGGCGGA 90
AAAGCCGCCAACCGGCGG 91
AAAGCCGCCAACCGGCGG 92
AAAGCCGCCAACCGGCGG 93
AAAGCCGUCAACCGGCGG 94
AAAGCCGCCAACCGGCGG 95
AAAGCCGCCAACCGGCGG 96
AAAGCGCCAACCGGCGG 97
AAAGCCGCCAACCGGCGG 98
AAAGCCGCCAACCGGCGG 99
AAAGCCGCCAACCGGCG 100
G 101
CACUGCCGGCCAAGUCGGCGG 102
CAUUGCCGGCCAAGUCGGCGG 103
CACUGCCGGCCAAGUCGGCGG 104
CAUGCCGGCCAAGUCGGCGG 105
CACUCCGGCCAAGUCGGCGG 106
CACUGCCGGCCAAGUCGGCGG 107
CACUGCCGGACCAAGUCGGCGG 108
CACUGCCGGCCAAGUCGGCGG 109
UCAAUUGCCGGCCAAGUCGGCGG 110
UCAAUUGCCGGCCAAGUCGGCGG 111
UUUAAGGCCGCACAUGCGGCCGUG 112
UUAAGGCCGGAAACAUUCGGCCGUG 113
UUAAGGCCGCACAUUCGGCCGGG 114
UUAAGGCCGCACAUUCGGCCGGG 115
UUAAGGCCGCACAUUCGGCCGGG 116
UUAAAAGGCCGACAUUGCGGCCGGG 117
UUAAAGGCCGACAUUGCGGCCGGG 118
UUAAGUCCGCACAUUCGGCCGGG 119
UUAAGGCCGCACAUUCGGCCGGG 120
UUAAGGCCGCACAUUCGGCCGGG 121
UUAAGGCCGCACAUUCGGCCGGG 122
UUAAGGCCGCACAUUCGGCCGGG 123
UUAAGGCCGCACAUCGGCCGGG 124
UAAGGCCGCACAUUCGGCCGGG 125
UAAGGCCGCACAUUCGGCCGGG 126
UAAGGCCGGC 127
ACAUUCGGCCGGG 128
UAAGGCCGCACAUUCGGCCGGG 129
UAAGGCCGCACAUUCGGCCGGG 130
UAAGGCCGCACAUUCGGCCGGG 131
UAAGGCCGCACAUUCGGCCGGG 132
UAAGGCCGCACAUUCGGCCGGG 133
UAAGGCCGCACAUGUCGGCCGGGU 134
UAAGGCCGCACAUUCGGCCGGG 135
UAAGGCCGCACAUUCGGCCGGG 136
UAGGCCGCAAGUCGGCCGGG 137
UAGGCCGCAAGCGGCCGGG 138
UAGGCCGCAAGCGGCCGGG 139
UAGGCCGCAAGCGGCCGGG 140
UAGGCCGCAAGUCGGCCG 141
GG 142
UAGGCCGCAAGUCGGCCGGG 143
UAGGCCGCAAGUCGGCCGGG 144
UAGGCCGCAAGUCGGCCGGG 145
GAUCGGCCGGCAGCCUCCCGGCGG 146
GAUCGGCCGGCAGCCUCCCGGCGG 147
GAUCGGCCGGCAGCCUCCCGGCGG 148
GAUCGGCCGGCAGCCUCCCGGCGG 149
GAUCGGCCCGGCAGCCUCCCGGCGG 150
GAUCGGCCCGGCAGCCUCCCGGCGG 151
GAUCGGCCGGCAGCCGUACCGGCGG 152
AGAUCGGCCGGCAGCCGUACCGGCGG 153
GAUCGGCCGGCAGCCGUACCGGCGG 154
UA 155
UCGGCCGGCACCGUACCGGGGG 156
UAUCGGCCGGCACCGUACCGGCGGG 157
UAUCGGCGGCACCGUACCGGCGGG 158
UAUCGGCCGGCACCGUACCGGCGGG 159
UAUCGCCGGCACCGUACCGGCGGG 160
AUUAGGGCCGCCAUAACGGCGG 161
AUUAGGGCCGCCAAUAACGGCGG 162
AUUAGGGCCGCCUAUAACGGCGG 163
GUGUUGCGUGCCGCCUUAAGGCG 164
GUGUUGCGUGCGCCUUAAGGCG 165
GUGUUGCGUGCCGCCUUAAGGCG 166
GUGUUGCGUGCCGCCUUAAGGCG 167
GUGUUGCG 168
UGCCGCCUUAAGGCG 169
GUGUUGCGUGCCGCCUUAAGGCG 170
GUGUUGCGUGCCGCCUUAAGGCG 171
GUGUUGCGUGCCGUCUGAAGGCG 172
GUGUUGCGUGCCGCCUUAAUGCG 173
GUGUUGCGUGCCGCCUUAAGGCG 174
GUGUUGCGUGCCGCCUUAAGGCG 175
GUGUUGCGUGCCGCCUUAAGGCG 176
GUGUUGCGUGCCGCCUUAAGGCG 177
GUGUUGCGUGCCGCCUUAAGGCG 178
GUGUUGCGUGCCGCCUUAAGGCG 179
GUGUUGCGUGCCGCCCUUAAGGCG 180
GUGUUGCGUGCCGCCUUA 181
AGGCG 182
GUGUUGCGUGCCGCCUUAAGGCG 183
GUGUUGCGUGCGCCUUAAGGCG 184
CUGUUGCGUGCCGCCUUAAGGCG 185
CUGUUGCGUGCCGCCUUAAGGCG 186
CUGUUGCGUGCCGCCUUAAGGCG 187
GUGUUGCGUGCCGCCGUUACAGGCG 188
GUGUUGCGUGCCGCCGUUACAGGCG 189
GUGUUGCGUGCCGCCUUAAGGCG 190
GUGUUGCGUGCCGCCUUAAGGCG 191
GUGUUGCGUGCCGCCUUAAGGCG 192
UUGGUCCGCCUUACGGCGGG 193
UUGGUCCGCCUUACGGCGGG 194
UUGGUCCG 195
CCUUACGGCGGG 196
UUGGUCCGCCUUACGGCGGG 197
UUCGUCCGCCUUACGGCGGG 198
GUUGUAGCCCGCCUUCGGCGGG 199
GUUGUUGCCGCCUUACGGCGG 200
GUUGUUGCCGCCUUACGGCGG 201
GUUGUUGCCGCCUUACGGCGG 202
GUUGUUGCCGCCUGACGGCGG 203
GUUGUUGCCGCCUGACGGCGG 204
GUUGUUGCCGCCUGACGGCGG 205
GUUGUUGCCGCCUGACGGCGG 206
GUUGUUUGCCGCCUGACGGCGG 207
GUUCCUUGCCAGCCUUACGGCGG 208
Segmentation fault (core dumped)
我知道我必须解决新线的问题,但我不知道为什么我会出现分割错误。因为它看起来很有效,但我还没有到文件的末尾。你知道是什么原因造成的吗?谢谢
由于读取缓冲区的大小(任意),您正在拆分一些序列,因此程序看到的多于200个,因此无法容纳它们的数组太小。
char Nseq[N][L]; // N = 200;
这意味着Nseq
最多可以存储200
序列。但在您的代码中,n
(在while
循环中)的值至少达到208。为了避免这个问题,您可以定义N
而不是208
(例如,N = 300
),或者您必须在while
循环条件中为n
添加一个条件,如下所示:
while (fgets(line, sizeof(line), myfile) && n < 200) {...}
如果你想读取文件中的所有文本,你可以使用双指针:char **Nseq
,然后在每次迭代while
循环后使用realloc
:
char **Nseq;
while (fgets(line, sizeof(line), myfile)){
token = strtok(line, s);
while (token != NULL){
Nseq = realloc(Nseq, sizeof(char *) * (n+1));
if(!Nseq) {return -1;}
Nseq[n] = malloc(strlen(token) + 1);
if(!Nseq[n]) {return -1;}
strcpy(Nseq[n], token);
printf("%st%un", token, n);
n++;
token = strtok(NULL, s);
}
}