如何从一个文件创建一个两级字典?

我有一个csv文件，看起来像这样(实际上它更大):

country;company1;company2;company3
finland;30;30;40
sweden;20;30;50
norway;10;20;70

我怎样才能最容易地读取这个文件，这样我就可以得到这样一个字典(字典中的字典):

{ 'company1': {'finland': 30, 'sweden': 20, 'norway': 10}
'company2': {'finland': 30, 'sweden': 30, 'norway': 20}
... 
}

我首先尝试从第一个文件中创建一个单独的列表，也就是公司列表，然后从它们创建一个字典。但是，当我试图读取第一行之后的行并在已经创建的字典中创建字典时，我遇到了问题。

如果解释不好，我很抱歉，我是编码新手!

@fsimonjetz的答案是伟大的，如果你已经在这个项目中使用熊猫。如果不是，那么仅将其用于此任务是非常多余的，因为我们可以使用简单的逻辑解析和转置数据。

import csv
from collections import defaultdict
output = defaultdict(dict)
with open('path/to/your.csv') as f:
reader = csv.DictReader(f, delimiter=';')
companies = reader.fieldnames[1:]
for line in reader:
country = line['country']
for company in companies:
output[company][country] = line[company]
# or output.setdefault(company, {})[country] = line[company]
# if you want 'output' to be a "normal" dict instead of defaultdict
print(dict(output))  # or just print(output) if you don't mind seeing OrderedDict
# repr

输出

{'company1': {'finland': '30', 'sweden': '20', 'norway': '10'}, 
'company2': {'finland': '30', 'sweden': '30', 'norway': '20'}, 
'company3': {'finland': '40', 'sweden': '50', 'norway': '70'}}

一种方法是使用pandas，如果需要处理表格数据，这是一个好主意:

>>> import pandas as pd
>>> df = pd.read_csv('path/to/your.csv', delimiter=';', index_col='country')
>>> df.to_dict()
{'company1': {'finland': 30, 'sweden': 20, 'norway': 10},
'company2': {'finland': 30, 'sweden': 30, 'norway': 20},
'company3': {'finland': 40, 'sweden': 50, 'norway': 70}}

我认为使用OrderedDict会有很大帮助。你可以这样做:

import csv
from collections import OrderedDict
with open('file.csv') as f:
reader = csv.reader(f, delimiter=';')
list_companies = next(reader)  # ['country', 'company1', 'company2', ...]
companies_dict = OrderedDict()
for company in list_companies[1:]:  # We forget about 'country'
companies_dict[company] = {}  # We initialize the companies' dicts in order
for country_values in reader:  # For every line after the first one
country = country_values[0]  # We get the country at the beginning of every line
for countries_dict, value in zip(companies_dict.values(), country_values[1:]):
countries_dict[country] = value  # And set the value for every company in order
print(dict(companies_dict))
# {'company1': {'finland': '30', 'sweden': '20', 'norway': '10'}, ...}

zip函数对您来说可能是新的，它是一个生成器，基本上接受两个(或更多)可迭代对象，并将元素放在相同的位置作为一个集合。例如:zip(['finland', 'sweden' , 'england'], [30, 30, 40]) == [('finland', 30), ('sweden', 30), ('england', 40)]

这可能不完全符合你的目的，但我相信这是一个足够好的方法来实现你想要的。

相关内容

最新更新

热门标签：