df1
ID Col
1 new york, london school of economics, america
2 california & washington, harvard university, america
预期输出为:
df1
ID Col
1 new york,london school of economics,america
2 california & washington,harvard university,america
我的尝试是:
df1[Col].apply(lambda x : x.str.replace(", ","", regex=True))
建议使用正则表达式,s+
,它允许您捕获逗号后的几个连续空白字符,如washington, harvard
df = pd.DataFrame({'ID': [1, 2], 'Col': ['new york, london school of economics, america',
'california & washington, harvard university, america']}).set_index('ID')
df.Col = df.Col.str.replace(r',s+', ',', regex=True)
print(df)
Col
ID
1 new york,london school of economics,america
2 california & washington,harvard university,ame...
您可以使用str.replace(', ', ",")
而不是lambda函数。然而,只有在","
之后只有一个空间的情况下,这才会起作用。
正如АлексейР所提到的,(r',s+', ",", regex=True)
需要捕获","
之后的任何额外空间。
参考:https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
示例:
import pandas as pd
data_ = ['new york, london school of economics, america', 'california & washington, harvard university, america']
df1 = pd.DataFrame(data_)
df1.columns = ['Col']
df1.index.name = 'ID'
df1.index = df1.index + 1
df1['Col'] = df1['Col'].str.replace(r',s+', ",", regex=True)
print(df1)
结果:
Col
ID
1 new york,london school of economics,america
2 california & washington,harvard university,ame...
如果你提到轴,它将被解决
df.apply(lambda x: x.str.replace(', ',',',regex=True),axis=1)
您可以在','
上拆分字符串,然后删除多余的空白并加入列表。
df1=df1['Col'].apply(lambda x : ",".join([w.strip() for w in x.split(',')]))
希望这能有所帮助。