我有下面提到的数据。
,name,link,address
0,Aasia Steel Industrial Group,http://www.aasiasteel.com/,"
Address
1
Saudi Arabia
"
1,ADES,http://investors.adihgroup.com/,"
Address
Al-Kifah Tower
King Fahad Road
Dhahran
Saudi Arabia
"
2,AEC,https://www.aecl.com,"
Address
King Khalid International Airport.
Industrial Estate P.O.Box 90916,
Riyadh 11623,
Saudi Arabia
"
有很多不需要的空间。我尝试使用以下功能,但无法清洁我的单元格。
df['address']=df.address.str.strip()
在控制台中,输出的地址列如下:
nAddressrnrnttttttttttAl-Kifah
df['address'].apply(lambda x:' '.join(x.split()))
如果列有其他变量excpet字符串,那么我们可以使用:
df['address'].apply(lambda x:' '.join(x.split()) if hasattr(x,'lower') else x)
由于您的地址单元格有换行符,因此最好使用换行符进行拆分。以下解决方案还将使用strip((方法删除尾部和尾部空格。
def format_address(address):
slines = address.splitlines() #split cell into lines
slines = [ l.strip() for l in lines ] # to remove trailing/ending spaces
slines.remove("") #to remove empty strings
return ",".join(slines) # joining all lines with comma
df.address = df.address.apply(format_address)