Java中的文件压缩(Hadoop DefaultCodec)-如何使其具有可读性

我有一个文件是用org.apache.hadoop.io.compress.DefaultCodec压缩的，我想把这个文件恢复到它的原始格式——这是一个JSON格式的字符串。

我不太确定如何使用DefaultCodec的文档来实现这一点。有人能给我举个例子说明这会是什么样子吗？这是我到目前为止所拥有的，我不知道我是否走在了正确的轨道上。。。

//grab my file (it's on S3)
S3Object fileOnS3 = s3Service.getObject("mys3bucket", "myfilename");
DefaultCodec codec = new DefaultCodec();
Decompressor decompressor = codec.createDecompressor();
//does the following line create a input stream that parses DefaultCodec into uncompressed form?
CompressionInputStream is = codec.createInputStream(fileOnS3.getDataInputStream(), decompressor);
//also, I have no idea what to do from here.

我想将未压缩的版本存储在String变量中，因为我知道该文件是一个小的单行。

我会尝试以下操作：

使用hdfs-shell命令-text和unix shell解压缩该文件，如下所示：
hadoop dfs -text /path/on/hdfs/ > /local/path/for/local/raw/file
使用SequenceFileInputFormat加载文件作为输入，并使用标识映射器（和零减少器）将其设置为输出TextOutputFormat

我会选择第一个选项，特别是如果你说输入文件是一个小字符串。如果要在String变量中加载此文件，可以加载该文件（这似乎不必要地昂贵），也可以将-text命令的输出立即存储在String中（跳过>之后的部分）。

相关内容

最新更新

热门标签：