替换 pandas 数据帧中独立"."的所有实例

入门问题。

我有一个从excel文件派生的数据帧，其中有一列我将称之为"；输入"；。在这一列中有浮动(例如7.4，8.1，2.2，…(。然而，也有一些错误的值，如字符串(很容易过滤掉(，以及我发现困难的"的单个实例"或"&"；。我想清理该列，以便只生成数字浮点值。

我已经在其他专栏中使用了这种方法，但在这里不能这样做，因为如果我去掉""实例，我的浮动将被搞砸：

for col in [col for col in new_df.columns if col.startswith("input")]:
new_df[col] = new_df[col].str.replace(r',| |-|^|+|#|j|0|.', '', regex=True)
new_df[col] = pd.to_numeric(new_df[col], errors='raise')

我也尝试过以下操作，但它会将列中的每个值都替换为None:

for index, row in new_df.iterrows():
col_input = row['input']
if re.match(r'^-?d+(?:.d+)$', str(col_input)) is None:
new_df["input"] = None

我该如何去除这些点？

谢谢！

您可以简单地使用pandas.to_numeric并在没有循环的情况下传递errors='coerce'：

from io import StringIO
import pandas as pd
s = """input
7.4
8.1
2.2
foo
foo.bar
baz/foo"""
df = pd.read_csv(StringIO(s))
df['input'] = pd.to_numeric(df['input'], errors='coerce')

#输出：

print(df)
input
0    7.4
1    8.1
2    2.2
3    NaN
4    NaN
5    NaN
df.dropna(inplace=True)
print(df)
input
0  7.4
1  8.1
2  2.2

如果您需要清理多个混合列，请使用：

cols = ['input', ...] # put here the name of the columns concerned
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
df.dropna(subset=cols, inplace=True)

#输出：

相关内容

最新更新

热门标签：