我正试图在以下数据集中使用感兴趣的行/列的条件选择:
import pandas as pd
already_read = [("Il nome della rosa","Umberto Eco", 1980),
("L'amore che ti meriti","Daria Bignardi", 2014),
("Memorie dal sottsuolo", " Fëdor Dostoevskij", 1864),
("Oblomov", "Ivan Alexandrovich Goncharov ", '/')]
index = range(1,5,1)
data = pd.DataFrame(already_read, columns = ["Books'Title", "Authors", "Publishing Year"], index = index)
data
通过以下方式:
data[(data['Publishing Year'] >= 1850) & (data['Publishing Year'] <= 1950)]
正如您所看到的,我选择的列包含混合数据(int and str)
,事实上,在运行代码后我出现了这个错误
TypeError: '>=' not supported between instances of 'str' and 'int'
如果可以的话,因为我正在进行Python的第一步,你能建议一些方法来运行代码,以排除字符串值或将其读取为整数,可能是通过实现*if statement?*
(或其他方法(?
感谢
一种方法是将df.apply
与自定义函数一起使用。类似这样的东西:
def check_int(x):
if isinstance(x, int):
return (x >= 1850) & (x <= 1950)
return False
data[data['Publishing Year'].apply(lambda x: check_int(x))]
这里,check_int
将为每个不是int
的值返回False
,并仅对ints
应用评估。因此,我们得到了:
data['Publishing Year'].apply(lambda x: check_int(x))
1 False
2 False
3 True
4 False
Name: Publishing Year, dtype: bool
接下来,我们使用这个带有booleans
的pd.Series从数据中进行选择:
data[data['Publishing Year'].apply(lambda x: check_int(x))]
Books'Title Authors Publishing Year
3 Memorie dal sottsuolo Fëdor Dostoevskij 1864