使用for循环和if语句从pandas数据框中删除行

我试图清理一个数据集，虽然遇到一个错误，其中红色不被识别，我不确定我是否已经正确地编写了函数。理想情况下，我想根据每个颜色和长度的公差来删除行。我想为此创建一个函数。我希望能够通过颜色，上公差和低公差，并从数据集中删除行。

谢谢!

import pandas as pd
df = pd.DataFrame(
{
"Colour": [
"Red",
"Red",
"Red",
"Red",
"Red",
"Blue",
"Blue",
"Blue",
"Green",
"Green",
"Green",
],
"Length": [14, 15, 16, 20, 15, 15, 18, 17, 15, 19, 18],
}
)

def tolerance_drop(Colour, Upper, Lower):
for i in range(0, len(df)):
if (df.loc[i, "Colour"] == Colour) & (df.loc[i, "Length"] > Upper):
df.drop([i])
elif (df.loc[i, "Colour"] == Colour) & (df.loc[i, "Length"] < Lower):
df.drop([i])
else:
break

# should remove 2 red rows giving 9 remaining rows
tolerance_drop("Red", 19.150, 14.5)
print(df)

Output:
it simply prints the dataframe the same as before. No rows are deleted.

如果能够应用pandas向量化操作，则避免使用显式循环。

简单的过滤:

In [466]: df = df[~((df.Colour == 'Red') & ((df.Length > 19.150) | (df.Length < 14.5)))]
In [467]: df
Out[467]: 
Colour  Length
1     Red      15
2     Red      16
4     Red      15
5    Blue      15
6    Blue      18
7    Blue      17
8   Green      15
9   Green      19
10  Green      18

正如评论中指出的那样，有更好的方法来做到这一点。

但是如果你正在学习并且想知道为什么你的函数不起作用，你应该试试这个:

def tolerance_drop(Colour, Upper, Lower):
for i in range(0, len(df)):
if df.loc[i, "Colour"] == Colour and (df.loc[i, "Length"] > Upper or df.loc[i, "Length"] < Lower):
df.drop([i], inplace=True)
tolerance_drop("Red", 19.150, 14.5)
print(df)

在您的版本中，break语句将在到达该代码行后立即退出for-循环，因此您不希望这样。

在python中，&是一个具有不同含义的位操作符。如需组合使用，可使用and/or。

当您删除一行时，生成的数据框不会神奇地保存到相同的变量中，除非您使用inplace=True参数。

输出:

Colour  Length
1     Red      15
2     Red      16
4     Red      15
5    Blue      15
6    Blue      18
7    Blue      17
8   Green      15
9   Green      19
10  Green      18

相关内容

最新更新

热门标签：