如何读取包含JSON内容的gzip文件,然后将该内容写入文本文件。
with open('.../notebooks/decompressed.txt', 'wb') as f_out:
with gzip.open(".../2020-04/statuses.log.2020-04-01-00.gz", 'rb') as f_in:
data = f_in.read()
json.dumps(data)
错误:字节类型的对象不是JSON可序列化
解压缩的.txt图像(前2行):在此处输入图像描述
如果日志内容已经是json序列化格式,那么只需要按原样写入解压缩的数据。
import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
with open('.../notebooks/decompressed.txt', 'wb') as fout:
data = fin.read()
fout.write(data)
如果文件很大,则导入shutil模块,并将read()和write()替换为:
shutil.copyfileobj(fin, fout)
如果要将JSON加载到对象中并进行保留,则:
import gzip
import json
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
with open('.../notebooks/decompressed.txt', 'w') as fout:
obj = json.load(fin)
json.dump(obj, fout)
如果日志文件是一系列JSON结构,每行一个,那么尝试:
import gzip
with gzip.open('.../2020-04/statuses.log.2020-04-01-00.gz', 'rb') as fin:
for line in fin:
obj = json.loads(line)
# next do something with obj
如果JSON太大而无法反序列化,那么请尝试ijson对巨大的JSON结构进行迭代。