我试图将全名在大熊猫中划分为第一个中间名和姓氏,但我被替换



我试图将名称分为两个部分,并保留名字姓氏,最后替换了所有这些部分,以便将名字命名为姓氏,然后如果中间名是姓氏保持添加到列

df['owner1_first_name'] = df['owner1_name'].str.split().str[0].astype(str, 
errors='ignore')
df['owner1_last_name'] = 
df['owner1_name'].str.split().str[-1].str.replace(df['owner1_first_name'], 
"").astype(str, errors='ignore')
['owner1_middle_name'] = 
df['owner1_name'].str.replace(df['owner1_first_name'], 
"").str.replace(df['owner1_last_name'], "").astype(str, errors='ignore')

问题是我无法使用 。当我遇到错误时 " TypeError:'系列'对象是可变的,因此无法将它们进行掩盖"

我想实现的目标是否有pandas中的任何替代Sytax

我所需的输出是

全名= thomas mary d in column asher1_name

我想要

owner1_first_name = THOMAS
owner1_middle_name = MARY
owner1_last_name = D

我认为您需要mask,如果两列中相同的值替换为空字符串:

df = pd.DataFrame({'owner1_name':['THOMAS MARY D', 'JOE Long', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
df['owner1_middle_name'] = splitted.str[1]
df['owner1_middle_name'] = df['owner1_middle_name']
                             .mask(df['owner1_middle_name'] == df['owner1_last_name'], '')
print (df)
     owner1_name owner1_first_name owner1_last_name owner1_middle_name
0  THOMAS MARY D            THOMAS                D               MARY
1       JOE Long               JOE             Long                   
2     MARY Small              MARY            Small  

什么与:

相同
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
middle = splitted.str[1] 
df['owner1_middle_name'] = middle.mask(middle == df['owner1_last_name'], '')
print (df)
     owner1_name owner1_first_name owner1_last_name owner1_middle_name
0  THOMAS MARY D            THOMAS                D               MARY
1       JOE Long               JOE             Long                   
2     MARY Small              MARY            Small                   

编辑:

对于replace,可能是使用axis=1使用apply

df = pd.DataFrame({'owner1_name':['THOMAS MARY-THOMAS', 'JOE LongJOE', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['a'] = splitted.str[0]
df['b'] = splitted.str[-1]
df['c'] = df.apply(lambda x: x['b'].replace(x['a'], ''), axis=1)
print (df)
          owner1_name       a            b      c
0  THOMAS MARY-THOMAS  THOMAS  MARY-THOMAS  MARY-
1         JOE LongJOE     JOE      LongJOE   Long
2          MARY Small    MARY        Small  Small

在三行中的确切代码以实现我想要的问题是

df['owner1_first_name'] = df['owner1_name'].str.split().str[0]
df['owner1_last_name'] = df.apply(lambda x: x['owner1_name'].split()
[-1].replace(x['owner1_first_name'], ''), axis=1)
df['owner1_middle_name'] = df.apply(lambda x: 
x['owner1_name'].replace(x['owner1_first_name'], 
'').replace(x['owner1_last_name'], ''), axis=1)

只需更改您的作业并使用另一个变量:

split = df['owner1_name'].split()
df['owner1_first_name'] = split[0]
df['owner1_middle_name'] = split[-1]
df['owner1_last_name'] = split[1]
splitted = df['Contact_Name'].str.split()
df['First_Name'] = splitted.str[0]
df['Last_Name'] = splitted.str[-1]
df['Middle_Name'] = df['Contact_Name'].loc[df['Contact_Name'].str.split().str.len() == 3].str.split(expand=True)[1]

这可能会有所帮助!这里的一部分是正确地插入您可以通过此代码做的中间名。

我喜欢使用extract参数。它将返回带有名为0、1、2的列的新数据框架。您可以将它们重命名为一行:

col_names = ['owner1_first_name', 'owner1_middle_name', 'owner1_last_name']
df.owner1_name.str.split(extract=True).rename(dict(range(len(col_names), col_names)))

请注意,如果某人有四个名称,则此代码会断开。更好地分2个步骤:split(n=1, extract=True),然后rsplit(n=1, extract=True

最新更新