我有一个包含很多列的pandas数据框架。我想创建一个只有两列的新数据框架。第一列应该包含在原始数据帧的特定列中出现的所有值。第二列应该包含原始数据帧中与第一列值匹配的所有其他数据。
例如,我的输入数据帧的结构如下:
Name Menu City
0 Foo Burgers Burgers and Fries New York
1 Cheesy's Cheeseburgers New York
2 Buggy Burgers Insect Burgers London
3 Fry Guy Fries London
4 Beermania Beer Berlin
在代码:
df = pd.DataFrame([["Foo Burgers", "Burgers and Fries", "New York"],
["Cheesy's", "Cheeseburgers", "New York"],
["Buggy Burgers", "Insect Burgers", "London"],
["Fry Guy", "Fries", "London"],
["Beermania", "Beer", "Munich"]], columns=["Name","Menu","City"])
如何方便地将数据帧转换为以下目标结构?
City Restaurants
0 New York [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}]
1 London [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}]
2 Munich [{'Name': 'Beermania', 'Menu': 'Beer'}]
在代码:
goal_df = pd.DataFrame([["New York", [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}], ],
["London", [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}], ],
["Munich", [{"Name": "Beermania", "Menu": "Beer"}], ]], columns=["City", "Restaurants"])
你可以用to_dict
做一个groupby().agg
:
(df.drop('City', axis=1).groupby(df['City'])
.apply(lambda x: x.to_dict(orient='records'))
.reset_index(name='Restaurants')
)
输出:
City Restaurants
0 London [{'Name': 'Buggy Burgers', 'Menu': 'Insect Bur...
1 Munich [{'Name': 'Beermania', 'Menu': 'Beer'}]
2 New York [{'Name': 'Foo Burgers', 'Menu': 'Burgers and ...