在Python中使用Pandas过滤和合并数据框架



我有一个数据框,我需要过滤出谁是哪些书的所有者,以便我们可以向他们发送通知。我在合并所需格式的数据时遇到了麻烦。

现有dataframe

<表类>的书所有者tbody><<tr>炼金术士结婚To Kill a Mockingbirdjohn蝇王abel麦田里的守望者marry阿拉巴马州茱莉亚;结婚道明><看不见的人>约翰

您需要split,explode,groupby.agg:

(df.assign(Owner=lambda d: d['Owner'].str.split(';'))
.explode('Owner')
.groupby('Owner', as_index=False, sort=False).agg(', '.join)
)

NB。如果您需要在列标题中使用复数形式,请添加.add_suffix('s').rename(columns={'Book': 'Books', 'Owner': 'Owners'})

输出:

Owner                                       Book
0  marry  The Alchemist, Catcher in the Ry, Alabama
1   john       To Kill a Mockingbird, Invisible Man
2   abel                          Lord of the Flies
3  julia                                    Alabama

让我们试试新的

s = df['Owner'].str.get_dummies(';')
(s.T @ df['Book'].add(', ')).str.rstrip(', ')

结果

abel                             Lord of the Flies
john          To Kill a Mockingbird, Invisible Man
julia                                      Alabama
marry    The Alchemist, Catcher in the Ry, Alabama
dtype: object

这不是最快的方法,但这是一个容易理解的方法。

import pandas as pd 
# Set up the example dataframe
data = {'Book':['The Alchemist','To Kill a Mockingbird','Lord of the Flies','Catcher in the Ry','Alabama','Invisible Man'],'Owner':['marry','john','abel','marry','julia;marry','john']}
df = pd.DataFrame(data)
# Turn your string of names into a list of names
df2['Owner'] = df2['Owner'].apply(lambda x: x.split(";"))
# get a unique list of customers
unique_owners = {single_owner for owners_list in df2['Owner'] for single_owner in owners_list}
# Gives a set -> {'abel', 'john', 'julia', 'marry'}
# for each customer, slice the dataframe for each customer
df2[['marry' in row for row in df2['Owner']]]
# select only the books, not the names
df2[['marry' in row for row in df2['Owner']]]['Book']
# convert the books to a list. Alternative - ",".join(df2[['marry' in row for row in df2['Owner']]]['Book']) turns all the books into a single piece of text.
df2[['marry' in row for row in df2['Owner']]]['Book'].to_list()
# set up data storage
names = []
books = []
# iterate through he unique owners set
[(names.append(single_owner), books.append(df2[[single_owner in row for row in df2['Owner']]]['Book'].to_list())) for single_owner in unique_owners]
new_df2 = pd.DataFrame({'Owner':names,'Books':books})
new_df2

相关内容

  • 没有找到相关文章

最新更新