来自嵌套dict列表的DataFrame,其中第一个dict的键是列和键,第二个dict值是行和值



我有一个数据结构,看起来像这样:


my_structure = [{'description': 'description',
'network_element': 'network-elem1',
'data_json': {'2018-01-31 00:00:00': 10860,
'2018-02-28 00:00:00': 11530,
'2018-03-31 00:00:00': 11530,
'2018-04-30 00:00:00': 8100,
'2018-05-31 00:00:00': 5060,
'2018-06-30 00:00:00': 4470,
'2018-07-31 00:00:00': 4390,
'2018-08-31 00:00:00': 6620,
'2018-09-30 00:00:00': 3070,
'2018-10-31 00:00:00': 18670,
'2018-11-30 00:00:00': 19880,
'2018-12-31 00:00:00': 4700}},
{'description': 'description',
'network_element': 'network-elem-2',
'data_json': {'2015-01-01 00:00:00': 92, '2016-01-01 00:00:00': 109}},
{'description': 'description',
'network_element': 'network-elem3',
'data_json': {'2018-01-31 00:00:00': 0,
'2018-02-28 00:00:00': 0,
'2018-03-31 00:00:00': 0,
'2018-04-30 00:00:00': 0,
'2018-05-31 00:00:00': 0,
'2018-06-30 00:00:00': 0,
'2018-07-31 00:00:00': 0,
'2018-08-31 00:00:00': 1000,
'2018-09-30 00:00:00': 0,
'2018-10-31 00:00:00': 0,
'2018-11-30 00:00:00': 7230,
'2018-12-31 00:00:00': 28630}},
{'description': 'description',
'network_element': 'network-elem...',
'data_json': {'2015-01-01 00:00:00': 264, '2016-01-01 00:00:00': 37}},
{'description': 'description',
'network_element': 'network-elem5',
'data_json': {'2018-01-31 00:00:00': 69220,
'2018-02-28 00:00:00': 80120,
'2018-03-31 00:00:00': 80800,
'2018-04-30 00:00:00': 60560,
'2018-05-31 00:00:00': 35250,
'2018-06-30 00:00:00': 0,
'2018-07-31 00:00:00': 290,
'2018-08-31 00:00:00': 0,
'2018-09-30 00:00:00': 540,
'2018-10-31 00:00:00': 69350,
'2018-11-30 00:00:00': 59410,
'2018-12-31 00:00:00': 70670}},
{'description': 'descr',
'network_element': 'network-elem',
'data_json': {'2015-01-01 00:00:00': 498, '2016-01-01 00:00:00': 526}},
.....

所以基本上是一个包含其他dict的dict列表。

我想从中创建一个DataFrame,其中network_element的值是我的DataFrame的列。嵌套dict的键应该是my索引,嵌套dict值应该是my值。

事实上,我使用了两个列表理解,然后填充df,这样转换:

columns = [elem["network_element"] for elem in my_structure]
df_data = [elem["data_json"] for elem in my_structure]
result = pd.DataFrame(df_data, index=columns).T.sort_index()

但我认为这不是一个太好的解决方案,因为我将数据分成两个列表。我正在寻找一个pandas-解决方案,它可以在单个循环中实现这一点。

像那样做loc

df = pd.DataFrame()
for elem in my_structure:
result.loc[elem["data_json"].keys(), elem["network_element"]] = elem["data_json"].values()

给我一个关键错误:

KeyError: "None of [Index .... ] are in the [index]"

有没有一个简单的解决方案可以实现这一点?如有帮助,不胜感激:)提前谢谢!

建议的pd.DataFrame.from_dict(....)输出

ne1     ne2    ne3    ne4    ne5   ne6     ne7   ne8  
2015-01-01 00:00:00  92      264    498    1086   1022   116    713    40      
2016-01-01 00:00:00  109     37     526    1177   1168   123    733    40      
ne9    ne10    ne11    ne12   ne13    ne14    ne15  
2015-01-01 00:00:00  123     61      21      159    14      37      756      
2016-01-01 00:00:00  117     115     23      160    8       22      777      
ne16  
2015-01-01 00:00:00  132    
2016-01-01 00:00:00  124

这样的东西能工作吗:

pd.DataFrame.from_dict({elem['network element']: elem['data_json'] for elem in my_structure})

我不能测试,因为你的my_structure不够大。

编辑:如果你想将数据作为行,你可以通过orient='index'

相关内容