使用 df.drop 删除多行的函数

我想创建一个函数，让我输入一个值列表，并删除包含给定列中值的任何行。我将使用以下数据框作为示例：

data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
'Age': [20, 21, 19, 18]}
sample = pd.DataFrame(data)

我想删除"年龄"列中包含以下值的任何行。

remove_these = [20,21]

这是我到目前为止所拥有的：

def rem_out (df,column,x):
df.drop(df[df['column'] == x].index, inplace = True)
return df

在我的函数中，"df"是指数据框，"列"是应检查值的列的名称，"x"是值列表。能够为我的函数提供值列表非常重要，因为我将从数据中删除数百个值。

当我像这样运行我的函数时：

rem_out(sample, Age, remove_these)

我收到一个错误，说年龄未定义。如何指定感兴趣的列，以便从数据框中移除列表中的值？

理想情况下，我的函数会删除第一行和第二行。

有 ~3 个问题，其中 2 个是由于变量与字符串的区别。在Python中，如果你写一个"裸"字，它要么是一个关键字，如def，else等，要么是一个名称来指代函数，变量等。在您的情况下：

def rem_out (df,column,x):
df.drop(df[df['column'] == x].index, inplace = True)
return df

这里的column是一个名称，指的是传递给函数的内容。然而，"column"是文字字符串"column"。因此，您传递给函数的任何内容都将被忽略，而是寻求名为"column"的列，这是不希望的。所以需要删除那里的引号。

rem_out(sample, Age, remove_these)

在这里，rem_out、sample和remove_these是"裸露的"，实际上分别指的是一个函数、DataFrame 和一个列表;一切都很好。但是Age也是裸露的，Python会寻找已经命名为Age的东西，而你需要字符串"Age"才能看起来像一个列。

最后

df[column] == x

将在列中看起来相等x这是一个列表，但这不是需要的;您想知道列值是否在该列表中，而不是每个列表完全相等。所以，你需要在那里.isin。

整体：

def rem_out(df, column, to_remove):
return df.drop(df[df[column].isin(to_remove)].index)
new = rem_out(sample, "Age", remove_these)

应该做这个伎俩。还删除了inplace=True参数对，因为它很少有用。通过此更改，将从.drop返回一个新的数据帧，该数据帧又由函数rem_out返回并在调用站点中分配。

所有解释都在其他答案和我的评论中给出。您还可以避免使用drop并使用布尔掩码来过滤掉数据帧：

def rem_out (df, column, x):
# x can be a list as well as a single value
m = df[column].isin(x if isinstance(x, list) else [x])
return df[~m]
sample = rem_out(sample, 'Age', remove_these)

输出：

>>> sample
Name  Age
2  krish   19
3   jack   18

对于您的代码，请使用这个。

def rem_out (df,column,x):
df.drop(df[df[column].isin(x)].index, inplace = True)
return df
newdf = rem_out(sample, 'Age', remove_these)

您可以像拥有它一样使用函数。我个人只会使用更简单的东西。

df = sample[~sample['Age'].isin(remove_these)]
print(df)

~ 就像一个不在过滤器中。

相关内容

最新更新

热门标签：