如何启动具有混合值的数据帧行的条件选择

我正试图在以下数据集中使用感兴趣的行/列的条件选择：

import pandas as pd
already_read = [("Il nome della rosa","Umberto Eco", 1980), 
("L'amore che ti meriti","Daria Bignardi", 2014), 
("Memorie dal sottsuolo", " Fëdor Dostoevskij", 1864), 
("Oblomov", "Ivan Alexandrovich Goncharov ", '/')]
index = range(1,5,1)
data = pd.DataFrame(already_read, columns = ["Books'Title", "Authors", "Publishing Year"], index = index)
data

通过以下方式：

data[(data['Publishing Year'] >= 1850) & (data['Publishing Year'] <= 1950)]

正如您所看到的，我选择的列包含混合数据(int and str)，事实上，在运行代码后我出现了这个错误

TypeError: '>=' not supported between instances of 'str' and 'int'

如果可以的话，因为我正在进行Python的第一步，你能建议一些方法来运行代码，以排除字符串值或将其读取为整数，可能是通过实现*if statement?*(或其他方法(？

感谢

一种方法是将df.apply与自定义函数一起使用。类似这样的东西：

def check_int(x):
if isinstance(x, int):
return (x >= 1850) & (x <= 1950)
return False
data[data['Publishing Year'].apply(lambda x: check_int(x))]

这里，check_int将为每个不是int的值返回False，并仅对ints应用评估。因此，我们得到了：

data['Publishing Year'].apply(lambda x: check_int(x))
1    False
2    False
3     True
4    False
Name: Publishing Year, dtype: bool

接下来，我们使用这个带有booleans的pd.Series从数据中进行选择：

data[data['Publishing Year'].apply(lambda x: check_int(x))]
Books'Title             Authors Publishing Year
3  Memorie dal sottsuolo   Fëdor Dostoevskij            1864

相关内容

最新更新

热门标签：