如何使用共享资源压缩对太大并导致内存不足崩溃的文件进行焦油?



在下面的代码中,如果我给(Apache(Commons Compress一个几GB大小的文件,它会崩溃,因为它用尽了我所有的内存。

我可以让它读取,然后一次写入文件的小块吗?我一直在研究分块,但我不确定如何做到这一点,以便在将碎片写入.tar格式后将文件重新组合在一起。

在这里处理任何大小的支持文件的最佳方法是什么?

FileOutputStream fileOutputStream = new FileOutputStream("output.tar");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
File currentFile = new File("Huge_MultiGB_File.txt");
String relativeFilePath = currentFile.getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();

您必须写入文件的一小部分并将其写入循环输出,而不是先使用IOUtils将整个文件读取到内存中

它或多或少是这样完成的:

FileInputStream source=new FileInputStream(....somefile);
tarArchiveOutputStream; prepared to w writing
byte[] buff = new byte[1024*10]; //10kb buff
int numBytesRead = -1; //number of bytes read

while(( numBytesRead = source.read(buff)) > 0 ) {
// while source has bytes, read from source and write
// the same number of bytes to the tar outputstream
tarArchiveOutputStream.write(buff, 0, numBytesRead);
}
}