开始与 Pandas 数据帧中需要的 () 函数帮助



我在数据帧中有一个名称列,其中有多个名称。

数据帧

import pandas as pd
df = pd.DataFrame({'name': ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
"Mr. Roderick Robert Crispin",
"Cunningham"," Mr. Alfred Fleming"]})`

输出

Name
0  Brailey, Mr. William Theodore Ronald
1                   Roger Marie Bricoux
2           Mr. Roderick Robert Crispin
3                            Cunningham
4                    Mr. Alfred Fleming

我写了一个行分类函数,就像如果我传递行/名称一样,它应该返回输出类

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke']
def classify_role(row):
if row.loc['name'] in mus:
return 'musician'

调用函数

is_brailey = df['name'].str.startswith('Brailey')
print(classify_role(df[is_brailey].iloc[0])) 

应该显示"音乐家" 但是输出显示不同的类,我想我在这里写错了什么classify_role()必须是此行if row.loc['name'] in mus:

总结: 如果我把一个人的名字放在startswith()中,我需要musi它应该返回musician

编辑:如果要测试列表中是否存在值,您可以创建字典并通过以下方式测试成员资格Series.isin

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
'John Frederick Preston Clarke']
cat1 = ['Mr. Alfred Fleming','Cunningham']
d = {'musician':mus, 'category':cat1}
for k, v in d.items():
df.loc[df['Name'].isin(v), 'type'] = k
print (df)
Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux  musician
2           Mr. Roderick Robert Crispin       NaN
3                            Cunningham  category
4                    Mr. Alfred Fleming  category

应更改解决方案:

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
'John Frederick Preston Clarke']
def classify_role(row):
if row in mus:
return 'musician'
df['type'] = df['Name'].apply(classify_role)
print (df)
Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux  musician
2           Mr. Roderick Robert Crispin      None
3                            Cunningham      None
4                    Mr. Alfred Fleming      None

您可以将元组中的值传递给Series.str.startswith,解决方案应扩展以按字典匹配更多类别:

d = {'musician': ['Brailey, Mr. William Theodore Ronald'],
'cat1':['Roger Marie Bricoux', 'Cunningham']}
for k, v in d.items():
df.loc[df['Name'].str.startswith(tuple(v)), 'type'] = k
print (df)
Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux      cat1
2           Mr. Roderick Robert Crispin       NaN
3                            Cunningham      cat1
4                    Mr. Alfred Fleming       NaN

最新更新