numpy/pands可以处理作用于null值的布尔运算符吗

如果我使用标准的Python布尔运算符and/or/not，一个很好的特性是它们以我逻辑上期望的方式处理None。也就是说，不仅

True and True == True
True and False == False

还有

True and None == None
False and None == False
True or None == True
False or None == None

这遵循的逻辑是，例如，如果A为False且B未知，则(A和B(必须仍然为False，而(A或B(未知。

我需要对缺少数据的Pandas DataFrames执行布尔运算，并希望能够使用相同的逻辑。对于numpy数组和Pandas系列上的布尔逻辑，我们需要使用位运算符&/|/~。熊猫的行为似乎与and/or/not部分相同，但部分不同。简而言之，当值在逻辑上应该未知时，它似乎返回False。

例如：

a = pd.Series([True,False,True,False])
b = pd.Series([True,True,None,None])

然后我们得到

> a & b
0     True
1    False
2    False
3    False
dtype: bool

和

> a | b
0     True
1     True
2     True
3    False

我希望a & b的输出应该是[True,False,None,False]系列，a | b的输出应该为[True,True,True,None]系列。除了返回False而不是任何丢失的值之外，实际结果与我预期的一致。

最后，~b只给出了一个TypeError：

TypeError：一元~的操作数类型错误："NoneType">

这似乎很奇怪，因为&和|至少部分工作。

在这种情况下，有更好的方法来实现布尔逻辑吗？这是熊猫身上的虫子吗？

numpy数组的类似测试只会给出类型错误，所以我假设Pandas在这里处理逻辑本身。

您可能需要这样的东西：

c = pd.Series([x and y for x,y in zip(a,b)])
print(c)

输出：

0     True
1    False
2     None
3    False

相应地，对于第二个表达式：

d = pd.Series([x or y for x,y in zip(a,b)])
print(d)

输出：

0    True
1    True
2    True
3    None

请参阅此处以了解and和&操作。

如果要and数据帧df的两列a和b，一种方法是定义一个函数并将其应用于df:

df = pd.DataFrame({'a':[True,False,True,False], 'b':[True,True,None,None]})
def and_(row):
return row['a'] and row['b']
df.loc[:, 'a_and_b'] = df.apply(and_, axis=1)
print(df)

输出：

a     b a_and_b
0   True  True    True
1  False  True   False
2   True  None    None
3  False  None   False

相关内容

最新更新

热门标签：