似乎在Pandas中你可以做以下任何一件事:
age_is_null = pd.isnull(titanic_survival["age"])
age_is_null = titanic_survival["age"].isnull()
似乎两者都存在:Pandas 模块中的函数和数据帧类中的方法(在另一个模块中(。
来自 Obj-C 背景,这令人困惑。为什么两者兼而有之?
pd.isnull
适用于不同类型的输入(任何可迭代的输入(,例如
>>> import pandas as pd
>>> import numpy as np
>>> pd.isnull(np.array([1, 2]))
array([False, False], dtype=bool)
>>> pd.isnull([1, 2])
array([False, False], dtype=bool)
而df.isnull
是绑定到数据帧对象的成员函数。因此,每当首先创建数据帧的成本很高时,您都会使用pd.isnull
。
计时:
In [30]: %timeit pd.isnull([1,2])
The slowest run took 8.93 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.19 µs per loop
In [33]: %timeit pd.DataFrame([1,2]).isnull()
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 202 µs per loop