按特定列值分组数据框架

  • 本文关键字:数据 框架 python pandas
  • 更新时间 :
  • 英文 :


我有一个包含很多列的pandas数据框架。我想创建一个只有两列的新数据框架。第一列应该包含在原始数据帧的特定列中出现的所有值。第二列应该包含原始数据帧中与第一列值匹配的所有其他数据。

例如,我的输入数据帧的结构如下:

Name            Menu                City
0   Foo Burgers     Burgers and Fries   New York
1   Cheesy's        Cheeseburgers       New York
2   Buggy Burgers   Insect Burgers      London
3   Fry Guy         Fries               London
4   Beermania       Beer                Berlin

在代码:

df = pd.DataFrame([["Foo Burgers", "Burgers and Fries", "New York"], 
["Cheesy's", "Cheeseburgers", "New York"],
["Buggy Burgers", "Insect Burgers", "London"],
["Fry Guy", "Fries", "London"],
["Beermania", "Beer", "Munich"]], columns=["Name","Menu","City"])

如何方便地将数据帧转换为以下目标结构?

City        Restaurants
0   New York    [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}]
1   London      [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}]
2   Munich      [{'Name': 'Beermania', 'Menu': 'Beer'}]

在代码:

goal_df = pd.DataFrame([["New York", [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}], ],
["London", [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}], ],
["Munich", [{"Name": "Beermania", "Menu": "Beer"}], ]], columns=["City", "Restaurants"])

你可以用to_dict做一个groupby().agg:

(df.drop('City', axis=1).groupby(df['City'])
.apply(lambda x: x.to_dict(orient='records'))
.reset_index(name='Restaurants')
)

输出:

City                                        Restaurants
0    London  [{'Name': 'Buggy Burgers', 'Menu': 'Insect Bur...
1    Munich            [{'Name': 'Beermania', 'Menu': 'Beer'}]
2  New York  [{'Name': 'Foo Burgers', 'Menu': 'Burgers and ...

相关内容

  • 没有找到相关文章

最新更新