如何将嵌套的词典列表平展为多行



我在熊猫数据帧中有一列,如下所示:

col1         list_of_dictionaries
1           [{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}]

如何平展同一数据帧中的字典列表,使其看起来像这样?

col1    id   tid   measure i_id  type    time                 status.calendar     status.business                
1       1    1      time    0     time   2000-06-19T05:08:11Z    0                         0  
1       2    2      time    1     time   2000-06-19T05:08:11Z    0                         0
1       3    3      time    2     time   2000-06-19T05:08:11Z    0                         0
1       4    4      time    1     time   2000-06-19T05:08:11Z    0                         0

我想保留原始数据并在其中扩展,同时为每次重复列名时创建更多行。

我尝试json_normalize列,但出现错误:

AttributeError: 'str' object has no attribute 'values'

编辑:

x is a tuple according to spyder:
[
{
'
i
d
'
:
你可以在

纯python中解嵌套,然后使用json_normalize

ids, x = zip(*[(id_, value) for id_, sub in zip(df['col1'], df.lod.values.tolist())
                            for value in sub])
ndf = pd.io.json.json_normalize(x)

这里有一种方法可以做到这一点:

df = pd.DataFrame([{"tt":[{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":0}, {"tt":[{'id': 5,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 6,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 7,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 8,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":1}])
res = df["tt"].values
# Add all the appropriate column values to dicts
for i, elem in enumerate(res):
    for dic in elem:
        dic["col1"]=df.iloc[i]["col1"].copy()
# Concatenate all so no need to append to DataFrame, append is slow
store = []
for x in res:
  store.extend(x)
# Now use normalize to expand and create the Dataframe
df2 = pd.io.json.json_normalize(store)
# Some fluff, if you care
df2.fillna(0, inplace=True)
for col in ["status.business", "status.calendar"]:
    df2[col] = df2[col].astype(int, copy=False)
print(df2)

输出:

   col1  i_id  id measure  status.business  status.calendar  tid                  time  type
0     0     0   1    time                0                0    1  2000-06-19T05:08:11Z  time
1     0     1   2    time                0                0    2  2000-06-19T05:08:11Z  time
2     0     2   3    time                0                0    3  2000-06-19T05:08:11Z  time
3     0     1   4    time                0                0    4  2000-06-19T05:08:11Z  time
4     1     0   5    time                0                0    1  2000-06-19T05:08:11Z  time
5     1     1   6    time                0                0    2  2000-06-19T05:08:11Z  time
6     1     2   7    time                0                0    3  2000-06-19T05:08:11Z  time
7     1     1   8    time                0                0    4  2000-06-19T05:08:11Z  time

最新更新