df.col.replace() vs df.col.str.replace()



我有一个要求,必须将"替换为"(删除双引号(。所以我尝试了以下方法:
方法1

test.Name = test.Name.replace('"', '')
test_label.name = test_label.name.replace('"', '')

两个数据帧都有相同的值,所以如果我试图查看两列的值之间的差异,我应该为null。但令我惊讶的是,它并不是无效的。我试过这个:

set(test.Name) - set(test_label.name)
{'Assaf Khalil, Mrs. Mariana (Miriam")"',
'Cotterill, Mr. Henry Harry""',
'Coutts, Mrs. William (Winnie Minnie" Treanor)"',
'Daly, Miss. Margaret Marcella Maggie""',
'Dean, Miss. Elizabeth Gladys Millvina""',
'Hocking, Miss. Ellen Nellie""',
'Johnston, Master. William Arthur Willie""',
'Johnston, Mrs. Andrew G (Elizabeth Lily" Watson)"',
'Katavelas, Mr. Vassilios (Catavelas Vassilios")"',
'Khalil, Mrs. Betros (Zahie Maria" Elias)"',
'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey")"',
'McCarthy, Miss. Catherine Katie""',
'Moubarek, Mrs. George (Omine Amenia" Alexander)"',
'Nakid, Mrs. Said (Waika Mary" Mowad)"',
'Nourney, Mr. Alfred (Baron von Drachstedt")"',
'Riihivouri, Miss. Susanna Juhantytar Sanni""',
'Riordan, Miss. Johanna Hannah""',
'Rosenshine, Mr. George (Mr George Thorne")"',
'Thomas, Mrs. Alexander (Thamine Thelma")"',
'Wells, Mrs. Arthur Henry (Addie" Dart Trevaskis)"',
'Wheeler, Mr. Edwin Frederick""',
'Willer, Mr. Aaron (Abi Weller")"'}

我仍然可以在值中看到",这意味着替换不起作用。所以我尝试了另一种方法。
方法2

test.Name = test.Name.str.replace('"', '', regex=False)
test_label.name = test_label.name.str.replace('"', '', regex=False)
set(test.Name) - set(test_label.name)
set()

第二种方法返回了我所期望的结果。所以我的问题是为什么df.col.replace((没有值?

通过检查,我们可以确定df.Namedf.Name.str:的类型

print(type(df.Name)) # <class 'pandas.core.series.Series'>  
print(type(df.Name.str)) # <class 'pandas.core.strings.StringMethods'>  

然后,我们可以分别在此处和此处找到SeriesStringMethods的文档。以下是各自替换方法的签名:

Series.str.replace(self, pat, repl, n=-1, case=None, flags=0, regex=True)
Series.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

请注意,Series.replaceregex参数的默认值为FalseSeries.str.replace的默认值是True。因此,如果您希望两个函数都具有您期望的结果,即删除双引号,则必须将Series.replacement方法的regex参数设置为True

下面是一个比较系列的结果的示例。用regex = False替换,用序列的结果替换regex = True。str.replace

import pandas as pd
data = { 
'Name': 
[
'Assaf Khalil, Mrs. Mariana (Miriam")"',
'Cotterill, Mr. Henry Harry""',
'Coutts, Mrs. William (Winnie Minnie" Treanor)"',
'Daly, Miss. Margaret Marcella Maggie""',
'Dean, Miss. Elizabeth Gladys Millvina""',
'Hocking, Miss. Ellen Nellie""',
'Johnston, Master. William Arthur Willie""',
'Johnston, Mrs. Andrew G (Elizabeth Lily" Watson)"',
'Katavelas, Mr. Vassilios (Catavelas Vassilios")"',
'Khalil, Mrs. Betros (Zahie Maria" Elias)"',
'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey")"',
'McCarthy, Miss. Catherine Katie""',
'Moubarek, Mrs. George (Omine Amenia" Alexander)"',
'Nakid, Mrs. Said (Waika Mary" Mowad)"',
'Nourney, Mr. Alfred (Baron von Drachstedt")"',
'Riihivouri, Miss. Susanna Juhantytar Sanni""',
'Riordan, Miss. Johanna Hannah""',
'Rosenshine, Mr. George (Mr George Thorne")"',
'Thomas, Mrs. Alexander (Thamine Thelma")"',
'Wells, Mrs. Arthur Henry (Addie" Dart Trevaskis)"',
'Wheeler, Mr. Edwin Frederick""',
'Willer, Mr. Aaron (Abi Weller")"'
]
}
df1 = pd.DataFrame.from_dict(data)
df2 = pd.DataFrame.from_dict(data)
df3 = pd.DataFrame.from_dict(data)
df1.Name = df1.Name.replace('"', '', regex = True)
df2.Name = df2.Name.replace('"', '', regex = False)
df3.Name = df3.Name.str.replace('"', '')
print("df1 equals df2?:", df1.equals(df2))
print("df1 equals df3?:", df1.equals(df3))
print(set(df1.Name) - set(df2.Name))
print(set(df1.Name) - set(df3.Name))

输出:

df1 equals df2?: False
df1 equals df3?: True
{'Moubarek, Mrs. George (Omine Amenia Alexander)', 'McCarthy, Miss. Catherine Katie', 'Cotterill, Mr. Henry Harry', 'Katavelas, Mr. Vassilios (Catavelas Vassilios)', 'Coutts, Mrs. William (Winnie Minnie Treanor)', 'Hocking, Miss. Ellen Nellie', 'Wheeler, Mr. Edwin Frederick', 'Thomas, Mrs. Alexander (Thamine Thelma)', 'Johnston, Mrs. Andrew G (Elizabeth Lily Watson)', 'Dean, Miss. Elizabeth Gladys Millvina', 'Willer, Mr. Aaron (Abi Weller)', 'Nourney, Mr. Alfred (Baron von Drachstedt)', 'Wells, Mrs. Arthur Henry (Addie Dart Trevaskis)', 'Assaf Khalil, Mrs. Mariana (Miriam)', 'Daly, Miss. Margaret Marcella Maggie', 'Johnston, Master. William Arthur Willie', 'Riihivouri, Miss. Susanna Juhantytar Sanni', 'Rosenshine, Mr. George (Mr George Thorne)', 'Nakid, Mrs. Said (Waika Mary Mowad)', 'Riordan, Miss. Johanna Hannah', 'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey)', 'Khalil, Mrs. Betros (Zahie Maria Elias)'}
set()

最新更新