我有一个数据框,我需要过滤出谁是哪些书的所有者,以便我们可以向他们发送通知。我在合并所需格式的数据时遇到了麻烦。
现有dataframe
<表类>的书 所有者 tbody><<tr>炼金术士 结婚 To Kill a Mockingbird john 蝇王 abel 麦田里的守望者 marry 阿拉巴马州茱莉亚;结婚 道明><看不见的人>约翰 看不见的人> 表类>
您需要split
,explode
,groupby.agg
:
(df.assign(Owner=lambda d: d['Owner'].str.split(';'))
.explode('Owner')
.groupby('Owner', as_index=False, sort=False).agg(', '.join)
)
NB。如果您需要在列标题中使用复数形式,请添加.add_suffix('s')
或.rename(columns={'Book': 'Books', 'Owner': 'Owners'})
。
输出:
Owner Book
0 marry The Alchemist, Catcher in the Ry, Alabama
1 john To Kill a Mockingbird, Invisible Man
2 abel Lord of the Flies
3 julia Alabama
让我们试试新的
s = df['Owner'].str.get_dummies(';')
(s.T @ df['Book'].add(', ')).str.rstrip(', ')
结果
abel Lord of the Flies
john To Kill a Mockingbird, Invisible Man
julia Alabama
marry The Alchemist, Catcher in the Ry, Alabama
dtype: object
这不是最快的方法,但这是一个容易理解的方法。
import pandas as pd
# Set up the example dataframe
data = {'Book':['The Alchemist','To Kill a Mockingbird','Lord of the Flies','Catcher in the Ry','Alabama','Invisible Man'],'Owner':['marry','john','abel','marry','julia;marry','john']}
df = pd.DataFrame(data)
# Turn your string of names into a list of names
df2['Owner'] = df2['Owner'].apply(lambda x: x.split(";"))
# get a unique list of customers
unique_owners = {single_owner for owners_list in df2['Owner'] for single_owner in owners_list}
# Gives a set -> {'abel', 'john', 'julia', 'marry'}
# for each customer, slice the dataframe for each customer
df2[['marry' in row for row in df2['Owner']]]
# select only the books, not the names
df2[['marry' in row for row in df2['Owner']]]['Book']
# convert the books to a list. Alternative - ",".join(df2[['marry' in row for row in df2['Owner']]]['Book']) turns all the books into a single piece of text.
df2[['marry' in row for row in df2['Owner']]]['Book'].to_list()
# set up data storage
names = []
books = []
# iterate through he unique owners set
[(names.append(single_owner), books.append(df2[[single_owner in row for row in df2['Owner']]]['Book'].to_list())) for single_owner in unique_owners]
new_df2 = pd.DataFrame({'Owner':names,'Books':books})
new_df2