我有一个"字典中的字典;看起来是这样的:
lis = [{'Health and Welfare Plan + Change Notification': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Computations': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Agreements': {'evidence_capture': 'null',
'test_result_justification': 'Due to the Policy',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Elements': {'evidence_capture': 'null',
'test_result_justification': 'xxx',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Quality Monitoring': {'evidence_capture': 'null',
'test_result_justification': 'xxx',
'latest_test_result_date': '2019-08-09',
'last_updated_by': 'null',
'test_execution_status': 'Completed',
'test_result': 'xxx'}},
{'Health and Welfare Plan + HPU Source Reliability': {'evidence_capture': 'null',
'test_result_justification': 'xxx.',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Lineage': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Metadata': {'evidence_capture': 'null',
'test_result_justification': 'Valid',
'latest_test_result_date': '2020-07-02',
'last_updated_by': 'null',
'test_execution_status': 'Completed',
'test_result': 'xxx'}},
{'Health and Welfare Plan + Usage Reconciliation': {'evidence_capture': 'null',
'test_result_justification': 'Test out of scope',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}}]
我想将列表转换为如下的数据帧:
evidence_capture last_updated_by latest_test_result_date test_execution_status test_result test_result_justification test_category
Change Notification null null null Not Started null null Health and Welfare Plan
Computations null null null Not Started null null Health and Welfare Plan
Data Agreements null null 2019-10-02 In Progress null Due to the Policy Health and Welfare Plan
Data Elements null null 2019-10-02 In Progress null xxx Health and Welfare Plan
Data Quality Monitoring null null 2019-08-09 Completed xxx xxx Health and Welfare Plan
HPU Source Reliability null null 2019-10-02 In Progress null xxx. Health and Welfare Plan
Lineage null null null Not Started null null Health and Welfare Plan
Metadata null null 2020-07-02 Completed xxx Valid Health and Welfare Plan
Usage Reconciliation null null 2019-10-02 In Progress null Test out of scope Health and Welfare Plan
我构建数据帧的代码是使用for循环逐列连接记录。之后对列名进行处理,然后对其进行转置;健康和福利计划";从每个行索引中删除,但作为新列追加。
df3 = pd.DataFrame(lis[0])
for i in range(1, len(lis)):
df3 = pd.concat([df3, pd.DataFrame(lis[i])], axis=1)
df3.columns = [col.split(' + ')[1] for col in df3.columns]
df3 = df3.T
df3['test_category'] = 'Health and Welfare Plan'
print(df3)
该代码能够产生最终输出;昂贵的";for循环和数据帧concat的函数。所以我想知道是否有更好的方法来输出相同的结果?
让我们进行dictcomp来压平字典列表
pd.DataFrame({k.split(' + ')[1]: v for d in lis for k, v in d.items()}).T
evidence_capture test_result_justification latest_test_result_date last_updated_by test_execution_status test_result
Change Notification null null null null Not Started null
Computations null null null null Not Started null
Data Agreements null Due to the Policy 2019-10-02 null In Progress null
Data Elements null xxx 2019-10-02 null In Progress null
Data Quality Monitoring null xxx 2019-08-09 null Completed xxx
HPU Source Reliability null xxx. 2019-10-02 null In Progress null
Lineage null null null null Not Started null
Metadata null Valid 2020-07-02 null Completed xxx
Usage Reconciliation null Test out of scope 2019-10-02 null In Progress null
您可以根据字典值构建数据帧,并通过'+'进行拆分来手动设置索引:
df = pd.DataFrame([list(e.values())[0] for e in lis])
df.index = map(lambda x: x.split('+')[-1].strip(), [list(e.keys())[0] for e in lis])
print(df)
输出:
evidence_capture test_result_justification ... test_execution_status test_result
Change Notification null null ... Not Started null
Computations null null ... Not Started null
Data Agreements null Due to the Policy ... In Progress null
Data Elements null xxx ... In Progress null
Data Quality Monitoring null xxx ... Completed xxx
HPU Source Reliability null xxx. ... In Progress null
Lineage null null ... Not Started null
Metadata null Valid ... Completed xxx
Usage Reconciliation null Test out of scope ... In Progress null
[9 rows x 6 columns]
编辑:在一个循环中完成所有操作可能会更快:
data = []
ind = []
for e in lis:
data.append(list(e.values())[0])
ind.append(list(e.keys())[0].split('+')[1].strip())
df = pd.DataFrame(data, index=ind)
您仍然可以使用df['test_category'] = 'Health and Welfare Plan'
在末尾添加新列
我修改了您的部分代码以获得相同的结果。使用字典操作应该更快
import pandas as pd
dic = {}
for e in lis:
dic[list(e.keys())[0]] = list(e.values())[0]
df3 = pd.DataFrame(dic)
df3.columns = [col.split('+')[-1].strip() for col in df3.columns]
df3 = df3.T
df3['test_category'] = 'Health and Welfare Plan'
print(df3)