我有一个包含2000万条记录的文件,我用malloc
分配了空间来容纳它。问题是,我想做得更一般,而不在for
声明和malloc
声明中放入2000万。有没有一种方法可以将代码推广到任何文件大小?如果我递给他一个更大的文件,他应该还能读。我该怎么做?
main.c
struct Fields{
int i;
char f1[20];
int f2;
float f3;
};
int main() {
struct Fields* files;
files = malloc(sizeof(struct Fields)*20000000);
//I have to generalize this 20000000
for (n=0; n<sizeof(struct Fields)*20000000; n++) {
//code
}
}
要读取基于行的文本文件(即CSV文件(,可以执行以下伪代码:
// Capacity of the allocated array, number of elements actually allocated
size_t current_capacity = 100000;
// Number of elements to increase capacity by if needed
size_t const capacity_increment = 100000;
// Current size of the array, the number of initialized elements in the array
size_t size = 0;
// Initial allocation
struct Fields *records = malloc(current_capacity * sizeof *records);
while (read_line_from_file(file_pointer, line_buffer))
{
// Is the current array full?
if (size >= current_capacity)
{
// Increase the capacity of the array
current_capacity += capacity_increment;
// And reallocate the array
struct Field *temp_records = realloc(records, current_capacity * sizeof *records);
if (temp_records == NULL)
{
// TODO: Handle error!
edit(EXIT_FAILURE);
}
records = temp_records;
}
records[size++] = parse_cvs_line(line_buffer);
}
// Unless there was an error reading the file, all records have been read
// from the file.
// The number or records read into the array is in the size variable.
// Just for debugging:
printf("The number of records in the file was %zun", size);
// You can now use the size variable for further loops,
// as in for (size_t i = 0; i < size; ++i) { ... }
在最坏的情况下,这将浪费99999
记录的内存。但是,如果真正的文件至少有2000万条记录,这将低于5%。您可以对此进行微调,以找到性能与可能的空间浪费的良好组合。
这取决于您处理的是二进制文件还是文本文件,但在这两种情况下,您都应该划分读取。我举了一个二进制文件的例子,我一个字节一个字节地读取。您可以对文本文件逐行执行相同的操作。在这两种情况下,您都可以为读取的每个数据包重用相同的缓冲区,并且必须在循环中进行读取:
buffer = malloc(sizeof(struct Fields)* CAPACITY); // Choose a capacity
size_t fileSize = fsize(filename);
size_t location = 0;
int file = open(filename, O_RDONLY);
while (location < fileSize)
{
size_t remaining = fileSize - location;
int result = read(file, buffer, remaining < capacity ? remaining : capacity);
if (result == -1) {
printf("Error while reading the file : %sn", filename);
break;
}
// Do something with the packet
location += capacity;
}
close(file);
free(buffer);