我一直在尝试使用 gzip> gzip 使用基于GZIP的基于 file io io io函数在C中我和我在一起的大小很大12 GB。未压缩的文件为 〜260 GB ,因此我不准备使用Gunzip取消压缩文件,然后从那里开始。
我专门使用以下代码读写我们可用的缓冲区 -
#define windowBits 15
#define ENABLE_ZLIB_GZIP 32
#define CHUNK 0x4000
#define CALL_ZLIB(x) {
int status;
status = x;
if (status < 0)
{
fprintf(stderr, "%s:%d: %s returned a bad status of %d.n", __FILE__, __LINE__, #x, status);
exit(EXIT_FAILURE);
}
}
int main ()
{
const char * file_name = "test.gz";
FILE * file;
z_stream strm = {0};
unsigned char in[CHUNK];
unsigned char out[CHUNK];
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.next_in = in;
strm.avail_in = 0;
CALL_ZLIB (inflateInit2 (& strm, windowBits | ENABLE_ZLIB_GZIP));
/* Open the file. */
file = fopen (file_name, "rb");
while (1) {
int bytes_read;
bytes_read = fread (in, sizeof (char), sizeof (in), file);
strm.avail_in = bytes_read;
do {
unsigned have;
strm.avail_out = CHUNK;
strm.next_out = out;
CALL_ZLIB (inflate (& strm, Z_NO_FLUSH));
have = CHUNK - strm.avail_out;
fwrite (out, sizeof (unsigned char), have, stdout);
}
while (strm.avail_out == 0);
if (feof (file)) {
inflateEnd (& strm);
break;
}
}
return 0;
}
代码根据您最初指定的缓冲区准确读取和写入ZLIB文件。缓冲区大小固定为某个值(在上述情况下为 0x4000 )。
现在的问题是,我不能将此缓冲区的大小增加到一定值之外(我可以将3276008用作缓冲区大小,而不是32760008 )。要阅读12 GB压缩值,我需要使用一个非常大的缓冲区。正如我的编辑中指定的那样,这看起来像是某种DATA_ERROR
不是BUFFER
错误...因此毕竟不是缓冲区错误!
有什么办法如何使用上面的zlib
函数记录整个12 GB压缩文件?
编辑#1
函数inflate
返回的错误代码由CALL_ZLIB
函数封装,我很遗憾未包括。因此,当我以0x4000的缓冲区大小运行时,我会得到以下错误代码。我也将CARN_ZLIB函数添加到代码中以供您参考。
错误msg:
parser.c:96: inflate(&strm, Z_NO_FLUSH) returned a bad status of -3
。这显然看起来像一个** data_error。
编辑#2
我尝试将windowbits 的负值添加到AttrateInit2()中,但这并不能解决我的任何问题。Attrate()函数最初正确读取我的文件 - 按照我想要的方式显示我的所有数据..
0x55b0 [0x40]: event: 3
.
. ... raw event: size 64 bytes
. 0000: 03 00 00 00 00 00 40 00 18 03 00 00 18 03 00 00 ......@.........
. 0010: 4d 6f 64 65 6d 4d 61 6e 61 67 65 72 00 00 00 00 ModemManager....
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0 0 0x55b0 [0x40]: PERF_RECORD_COMM: ModemManager:792/792
0x55f0 [0x40]: event: 7
.
. ... raw event: size 64 bytes
. 0000: 07 00 00 00 00 00 40 00 19 03 00 00 01 00 00 00 ......@.........
. 0010: 19 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0 0 0x55f0 [0x40]: PERF_RECORD_FORK(793:793):(1:1)
0x5630 [0x40]: event: 3
.
但是一段时间后,显示的输出变得乱七八糟,我再也无法从中读取了。
0x4d68 [0x38]: ........... 001 0..
0 0 00 00 00 0 00 000 00 ze 64s
. 0000: 07 00 00 00 00 00 40 00 19 03 00 00 01 00 00 00 .. 00 0 event: size 64 bytes
. 0000: 03 00 00 00 si sisizsiz4s
. 0000: 07 00 00 00 00 00 40 00 19 0....
. 0030: 00 00 00 00 00 00 00 00 00 00 00 00 ..@.@. 0010: 19 03 00 00 [0x38]: ........... 001 0..
0 0 00 00 00 0 00 000 00 ze 64s
. 0000: 07 00 00 00 00 00 40 00 100 00 00 00 00 ..............0 0 0x4d28 [0x40]: PERF_RECORD_FORK(135:135):(2:62)
0x4d68 [0x38]: ........... 001 0..
0 0 00 00 00 0 00 000 00 00 00 00: PERORD_FORK(135:135):(2:2)
这最终终止了我在编辑#1
我解决了问题。
基本问题是,我在循环内的代码中没有初始化z_stream的strm.next_in
成员。因此,在进行了1次迭代后,缓冲区被损坏,我遇到了上述错误。
我将代码修改为 -
strm.next_in = in;
strm.avail_in = 0;
CALL_ZLIB(inflateInit2 (&strm, windowBits | ENABLE_ZLIB_GZIP));
file = fopen(filename, "rb");
while(1)
{
int bytes_read;
strm.next_in = in; // added this line
bytes_read = fread(in, sizeof(char), sizeof(in), file);
strm.avail_in = bytes_read;
do
{
unsigned have;
strm.avail_out = CHUNK;
strm.next_out = out;