删除python数据帧列中逗号后面的字符串之间的空格



df1

ID                       Col
1       new york, london school of economics, america
2       california & washington,  harvard university, america

预期输出为:

df1

ID                       Col
1       new york,london school of economics,america
2       california & washington,harvard university,america

我的尝试是:

df1[Col].apply(lambda x : x.str.replace(", ","", regex=True))

建议使用正则表达式,s+,它允许您捕获逗号后的几个连续空白字符,如washington, harvard

df = pd.DataFrame({'ID': [1, 2], 'Col': ['new york,           london school of economics,  america',
'california & washington,  harvard university, america']}).set_index('ID')
df.Col = df.Col.str.replace(r',s+', ',', regex=True)
print(df)
Col
ID                                                   
1         new york,london school of economics,america
2   california & washington,harvard university,ame...

您可以使用str.replace(', ', ",")而不是lambda函数。然而,只有在","之后只有一个空间的情况下,这才会起作用。

正如АлексейР所提到的,(r',s+', ",", regex=True)需要捕获","之后的任何额外空间。

参考:https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

示例:

import pandas as pd
data_ = ['new york, london school of economics, america', 'california & washington,  harvard university, america']
df1 = pd.DataFrame(data_)
df1.columns = ['Col']
df1.index.name = 'ID'
df1.index = df1.index + 1
df1['Col'] = df1['Col'].str.replace(r',s+', ",", regex=True)
print(df1)

结果:

Col
ID                                                   
1         new york,london school of economics,america
2   california & washington,harvard university,ame...

如果你提到轴,它将被解决

df.apply(lambda x: x.str.replace(', ',',',regex=True),axis=1)

您可以在','上拆分字符串,然后删除多余的空白并加入列表。

df1=df1['Col'].apply(lambda x : ",".join([w.strip() for w in x.split(',')]))

希望这能有所帮助。

最新更新