我有大约700 MB的数据,格式为:
{"hash":"b2f405b1589efd8b013869d1d5e605367643db20844572ea7bf788f8575c38d6","block_timestamp":"2020-05-08 13:21:33 UTC","addresses":["3E17PiWGJqP8945KRZHuPdsFSU59othGEQ"]}
{"hash":"6609073b5979d768933f2ea7d4f1723d07c03a3e08f48adff21b9f1d79cee164","block_timestamp":"2020-05-08 13:39:39 UTC","addresses":["3CfewsC7Xjp2oJSBT2zUQkYSXfzo2nuGha"]}
{"hash":"5c7d95f903ea505d9ab82d1090944780c00e91d343ae66e94610bff1d614f90f","block_timestamp":"2020-04-05 23:19:30 UTC","addresses":["1ztVt2xwNwgzH3W9SJ2nMgPMuZpUg8m5w"]}
{"hash":"7eb120e9b50dbc25f13415b3c899efe2cfaf870a7f49995aa6e3b672a1992e56","block_timestamp":"2020-04-08 05:41:51 UTC","addresses":["1HckjUpRGcrrRAtFaaCAUaGjsPx9oYmLaZ"]}
{"hash":"be202b37aa218461827138ff32e3dfa74945808f3ecb574fb5287e99c8ae6a33","block_timestamp":"2020-04-04 09:53:28 UTC","addresses":["3Jk8HaC8Sjq6Ufig9NkWFoFcfzC5a3CNyL"]}
目前,我将这些数据保存为JSON文件格式。我想把它转换成csv。通过常见的python和BASH方法进行操作并不能给我正确的结果。(请注意,两行之间没有逗号,只是一条换行符(
我希望它是一个CSV格式的头:hash,block_timestamp和地址。我该怎么做?
最简单的方法是在一行bash中完成。例如,如果您的文件名为tt
,只需运行:
cat tt | sed -e "s/:/,/g" | awk -F"," '{print $2 "," $4 "," $6}'
或者,如果你真的想用Python做这件事,可以逐行读取文件,使用json.loads
解析每一行,然后以CSV格式打印这一行(这就是ranka47在回答中所做的(。问题是,对于大文件来说,速度要慢得多。
我想这就是你想要的。假设JSON的每个addresses
密钥中只有一个地址。
注意:As Python将一次加载每一行,因此如果您逐行读取,它可以处理大数据。
import json
fp = open("input.txt", "r")
ofp = open("output.txt", "w")
ofp.write("hash,block_timestamp,addressesn")
for line in fp:
json_obj = json.loads(line)
# print(json_obj)
ofp.write(json_obj["hash"] + "," + json_obj["block_timestamp"] + "," + json_obj["addresses"][0] + "n")
fp.close()
ofp.close()
fp = open("output.txt")
for line in fp:
print(line)
输出:
hash,block_timestamp,addresses
b2f405b1589efd8b013869d1d5e605367643db20844572ea7bf788f8575c38d6,2020-05-08 13:21:33 UTC,3E17PiWGJqP8945KRZHuPdsFSU59othGEQ
6609073b5979d768933f2ea7d4f1723d07c03a3e08f48adff21b9f1d79cee164,2020-05-08 13:39:39 UTC,3CfewsC7Xjp2oJSBT2zUQkYSXfzo2nuGha
5c7d95f903ea505d9ab82d1090944780c00e91d343ae66e94610bff1d614f90f,2020-04-05 23:19:30 UTC,1ztVt2xwNwgzH3W9SJ2nMgPMuZpUg8m5w
7eb120e9b50dbc25f13415b3c899efe2cfaf870a7f49995aa6e3b672a1992e56,2020-04-08 05:41:51 UTC,1HckjUpRGcrrRAtFaaCAUaGjsPx9oYmLaZ
be202b37aa218461827138ff32e3dfa74945808f3ecb574fb5287e99c8ae6a33,2020-04-04 09:53:28 UTC,3Jk8HaC8Sjq6Ufig9NkWFoFcfzC5a3CNyL
如果您有pandas
,则可以执行以下操作:
import json
import pandas as pd
contents = [json.loads(line) for line in open("input.txt", 'r')]
df = pd.DataFrame(contents)
df.to_csv("output.csv", index=False)
这可以通过取消每一行的序列化并将生成的dict
传递到csv来完成。DictWriter。
import csv
import json
with open('data.json') as jf, open('data.csv', 'w', newline='') as f:
# Handle the first row individually because we need to work out the
# column headings
line = next(jf)
dict_ = json.loads(line)
writer = csv.DictWriter(f, dict_.keys())
writer.writerow(dict_)
# Loop through the rest of the file
for line in jf:
dict_ = json.loads(line)
writer.writerow(dict_)
如果您不想要标题行,可以将代码简化为使用csv.writer
with open('data.json') as jf, open('data.csv', 'w', newline='') as f:
writer = csv.writer(f)
for line in jf:
dict_ = json.loads(line)
writer.writerow(dict_.values())
事实上,我的想法,最简单的方法是:-使用强力编辑器打开文件,因为您的文件大小很大。-将所有dict键替换为空文本,例如:replace<"hash":>为空。。。-也将{,},[,]替换为空文本。-左边部分是良好的CSV格式。
哈哈,不需要代码。