在python中将多标题csv转换为嵌套字典



我的csv文件的内容是:

[user@localhost ~]$ cat tempcsv.csv
info,info,auth,req,req
name,desc,username,key1,key2
a,alphabet,admin,1,team

在这里,csv 包含两个标头。

关于使用熊猫阅读csv,

>>> import pandas as pd
>>> pd.read_csv('tempcsv.csv', sep=',', header=[0,1], encoding = "utf-8-sig", skipinitialspace=True, tupleize_cols=True)
(info, name) (info, desc) (auth, username)  (req, key1) (req, key2)
0            a     alphabet            admin            1        team
>>> df = pd.read_csv('tempcsv.csv', sep=',', header=[0,1], encoding = "utf-8-sig", skipinitialspace=True, tupleize_cols=True)
>>> df.to_dict()
{(u'req', u'key1'): {0: 1}, (u'req', u'key2'): {0: u'team'}, (u'info', u'name'): {0: u'a'}, (u'auth', u'username'): {0: u'admin'}, (u'info', u'desc'): {0: u'alphabet'}}

现在尝试将数据帧转换为字典时,我得到了一个将键作为元组的列表字典。

>>> df.to_dict('records')
[{(u'req', u'key1'): 1, (u'req', u'key2'): u'team', (u'info', u'name'): u'a', (u'auth', u'username'): u'admin', (u'info', u'desc'): u'alphabet'}]

当我尝试将它们转换为适当的字典时,

>>> result = []
>>> row_data = {}
>>> for row in df.to_dict('records'):
...     for key,value in row.iteritems():
...             row_data.setdefault(key[0], {})[key[1]] = value
...
>>> row_data
{u'info': {u'name': u'a', u'desc': u'alphabet'}, u'req': {u'key2': u'team', u'key1': 1}, u'auth': {u'username': u'admin'}}

这就是我的预期输出——

{u'info': {u'name': u'a', u'desc': u'alphabet'}, u'req': {u'key2': u'team', u'key1': 1}, u'auth': {u'username': u'admin'}}

所以我的问题是,有没有办法使用 python 将数据帧转换为字典?

或者,有没有办法使用 csv 包做同样的事情?

熊猫版本:

>>> import pandas as pd
pd>>> pd.__version__
'0.14.1'

用熊猫 2.0.3 测试

import pandas as pd
from io import StringIO
csv_data = """info,info,auth,req,req
name,desc,username,key1,key2
a,alphabet,admin,1,team"""
csv_stream = StringIO(csv_data)
df = pd.read_csv(csv_stream, header=[0, 1])
df.columns = pd.MultiIndex.from_tuples(df.columns)
formatted_dict = {}
for (outer_key, inner_key), value in df.to_dict(orient='records')[0].items():
formatted_dict.setdefault(outer_key, {})[inner_key] = value
print(formatted_dict)

输出:

{'info': {'name': 'a', 'desc': 'alphabet'}, 'auth': {'username': 'admin'}, '

req': {'key1': 1, 'key2': 'team'}}

我认为pandas无法解析这样的 CSV,但您可以随时使用内置的csv模块并自己解析数据,例如:

import csv
import collections
with open("tempcsv.csv", "rb") as f: # on Python 3.x use: open("tempcsv.csv", "r", newline="")
reader = csv.reader(f)  # create a CSV reader
header = next(reader)  # collect the primary header
subheader = next(reader)  # collect the subheader
rows = []
for row in reader:  # iterate over the rest of the CSV file
parsed_row = collections.defaultdict(dict)  # use a dictionary factory
for i, v in enumerate(header):  # iterate over the primary header fields
# update each in the factory using the primary->secondary header map
parsed_row[v].update({subheader[i]: row[i]})  
rows.append(parsed_row)

这将创建一个rows列表,其中包含所有带有融合标题的行。对于您的数据,一行如下所示:

{'info': {'name': 'a', 'desc': 'alphabet'},
'auth': {'username': 'admin'},
'req': {'key1': '1', 'key2': 'team'}}

最新更新