我有以下csv文件:
Topic,Characteristics,Total
Population and dwellings,Population-2016,183314
Population and dwellings,Population-2011,175779
Population and dwellings,Population percentage change,4.3
Age characteristics,0 to 14 years,30670
Age characteristics,0 to 4 years,9275
Age characteristics,5 to 9 years,10475
我想输出一个json文件,这样每个唯一的"主题"是一个键,值是一个"特征":"总"的字典,即输出将是:
{
"Population and dwellings": {
"Population-2016": 183314,
"Population-2011": 175779,
"Population percent change": 4.3
},
"Age characteristics": {
"0 to 14 years": 30670,
"0 to 4 years": 9275,
"5 to 9 years": 10475
}
}
我怎样才能正确地做这件事?到目前为止,我所尝试的所有方法都是相互覆盖的,任何帮助都将不胜感激。谢谢。
可以使用csv
模块读取文件,dict.setdefault
模块对元素进行分组:
import csv
out = {}
with open("your_file.csv", "r") as f_in:
reader = csv.reader(f_in)
next(reader) # skip headers
for topic, characteristics, total in reader:
out.setdefault(topic, {})[characteristics] = float(total)
print(out)
打印:
{
"Population and dwellings": {
"Population-2016": 183314.0,
"Population-2011": 175779.0,
"Population percentage change": 4.3,
},
"Age characteristics": {
"0 to 14 years": 30670.0,
"0 to 4 years": 9275.0,
"5 to 9 years": 10475.0,
},
}
要从out
输出JSON,您可以这样做:
import json
print(json.dumps(out, indent=4))
我的解决方案与Andrej类似,但我使用defaultdict
来简化代码。
import collections
import csv
import json
out = collections.defaultdict(lambda: collections.defaultdict(float))
with open("data.csv") as stream:
next(stream) # Skip the header
reader = csv.reader(stream)
for topic, characteristics, total in reader:
out[topic][characteristics] += float(total)
print(json.dumps(out, indent=4))
输出:
{
"Population and dwellings": {
"Population-2016": 183314.0,
"Population-2011": 175779.0,
"Population percentage change": 4.3
},
"Age characteristics": {
"0 to 14 years": 30670.0,
"0 to 4 years": 9275.0,
"5 to 9 years": 10475.0
}
}