我有一个pandas df,其中一列有字典的字典:这是我的文件:
| ca| cb|
|:---- |:------:
| 1 | cat1:{paws:1 , hair:2} ,dog1:{paws:3 , hair:5}
| 2 | cat2:{paws:1 , hair:2} ,dog2:{paws:3 , hair:5}
| 3 | cat3:{paws:1 , hair:2} ,dog3:{paws:3 , hair:5}
| 4 | cat4:{paws:1 , hair:2} ,dog4:{paws:3 , hair:5}
我要的是:
| ca| animal| paws| hair|
|:----:| -----:| -----:| -----:|
| 1 | cat1 | 1 | 2
| 1 | dog1 | 3 | 5
| 2 | cat2 | 1 | 2
| 2 | dog2 | 3 | 5
最快的方法是什么?
我找到了一个解决方案:
我用以下字典复制了您的数据框架的演示版本,结果如下:
data = {
"ca": [1, 2],
"cb": [{"cat1": {"paws": 1, "hair": 2}, "dog1":{"paws":3 , "hair":5}},
{"cat2":{"paws":1 , "hair":2} , "dog2":{"paws":3 , "hair":5}}]
}
df = pandas.DataFrame(data)
df
ca cb
1 {'cat1': {'paws': 1, 'hair': 2}, 'dog1': {'paw...
2 {'cat2': {'paws': 1, 'hair': 2}, 'dog2': {'paw...
接着,我必须去掉字典的第一步,即同时提取猫和狗。
first_level = pandas.concat([df.drop(['cb'], axis=1), df['cb'].apply(pandas.Series)], axis=1)
first_level
ca cat1 dog1 cat2 dog2
0 1 {'paws': 1, 'hair': 2} {'paws': 3, 'hair': 5} NaN NaN
1 2 NaN NaN {'paws': 1, 'hair': 2} {'paws': 3, 'hair': 5}
这里的关键要点是,您需要应用melt函数将列转换为值,并将它们设置为各自的行。
first_level.melt(id_vars=["ca"]).dropna()
first_level
ca variable value
0 1 cat1 {'paws': 1, 'hair': 2}
2 1 dog1 {'paws': 3, 'hair': 5}
5 2 cat2 {'paws': 1, 'hair': 2}
7 2 dog2 {'paws': 3, 'hair': 5}
然后剩下的很简单,用同样的apply函数,我可以把这个字典也转换成列,问题就解决了:
second_level = pandas.concat([first_level.drop(['value'], axis=1), first_level['value'].apply(pandas.Series)], axis=1)
second_level
ca variable paws hair
0 1 cat1 1 2
2 1 dog1 3 5
5 2 cat2 1 2
7 2 dog2 3 5