Pandas不替换大型数据帧中的字符



当我创建一个小型测试数据帧时,此代码运行良好,但当在导入一个大型excel文件后尝试使用它时,它不会替换字符。

import pandas as pd
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':['`f;','d:','sda`sd'],
'D':['s','d;','d;p`'],
'E':[5,3,6],
'F':[7,4,3]})
df.replace({'`':''}, regex=True)

结果如预期:

A   B   C   D   E   F
0   1   4   f;  s   5   7
1   2   5   d:  d;  3   4
2   3   6   sdasd   d;p 6   3

然而,当我加载一个大型excel文件时:

import pandas as pd
excel_file = f'C:testfile.xlsx'
df = pd.read_excel(excel_file,sheet_name='Details', dtype=str)
df.iloc[20831].loc['Group Number']

结果:

'008513L-0005 `'

然后运行replace:

df.replace({'`':''}, regex=True)
df.iloc[20831].loc['Group Number']

结果:

'008513L-0005 `'

我们可以用"字符串标点符号";作用

import string #We have defined the string library.
def remove_punctuation (text): #We form our function.
no_punc = "".join([i for i in text if i not in string.punctuation])
return no_punc
#We apply our function to the corresponding column of our data set.
df['C'] = df['C'].apply(lambda x: remove_punctuation (x))
df['D'] = df['D'].apply(lambda x: remove_punctuation (x)) 

在此处输入图像描述

最新更新