如何在DataFrame Pandas中删除数字之间的空格,但在同一列的名称之间留有空格



我希望以这样一种方式清理数据帧,即只有包含数字的单元格才不会有空格,但有名称的单元格保持不变。

Author
07 07 34    
08 26 20 
08 26 20 
Tata Smith
Jhon Doe
08 26 22
3409243

这是我的失败的方法

df.loc[df["Author"].str.isdigit(), "Author"] = df["Author"].strip()

我该如何处理?

您可能想要使用regex。

import pandas as pd
import re
# Create a sample dataframe
import io
df = pd.read_csv(io.StringIO('Authorn 07 07 34    n 08 26 20 n 08 26 20 n Tata Smithn Jhon Doen 08 26 22n 3409243'))
# Use regex
mask = df['Author'].str.fullmatch(r'[d ]*')
df.loc[mask, 'Author'] = df.loc[mask, 'Author'].str.replace(' ', '')
# You can also do the same treatment by the following line
# df['Author'] = df['Author'].apply(lambda s: s.replace(' ', '') if re.match(r'[d ]*$', s) else s)
Jhon Doe
作者
070734
082620
082620
Tata Smith
082622
3409243

这个怎么样?

import pandas as pd
df = pd.read_csv('two.csv')
# remove spaces on copy
df['Author_clean'] = df['Author'].str.replace(" ","")
# try conversion to numeric if possible
df['Author_clean'] = df['Author_clean'].apply(pd.to_numeric, errors='coerce')
# fill missing vals with original strings
df['Author_clean'].fillna(df['Author'], inplace=True)
print(df.head(10))

输出:

Author Author_clean
0  07 07 34          70734.0
1     08 26 20       82620.0
2     08 26 20       82620.0
3    Tata Smith   Tata Smith
4      Jhon Doe     Jhon Doe
5      08 26 22      82622.0
6       3409243    3409243.0

最新更新