Pandas series.notna()对包含NaN的系列返回true



我有一个数据帧,我正在对其进行子集设置,以创建一个系列,然后在执行代码之前,如果该系列包含任何nan,则进行搜索。但是我得到了NaN被传递给后续代码的系列。

数据帧示例:

sample classification          type  similar  ...  respiraroty activity antimicrobial molecular details               Litrature
7     M41      predicted       melanin  melanin  ...                   NaN           NaN               NaN  10.1073/pnas.211433198
17    RB1      predicted       melanin  melanin  ...                   NaN           NaN               NaN  10.1073/pnas.211433198
21    RB1      predicted       melanin  melanin  ...                   NaN           NaN               NaN  10.1073/pnas.211433198
32   RB17      predicted  NRPS,melanin  melanin  ...                   NaN           NaN               NaN  10.1073/pnas.211433198

我通过similar列对数据帧进行子集设置,然后检查description列是否包含任何NaNs

for compound in compound_list:
subset_compound = antismash.loc[antismash['similar'] == compound]
if subset_compound['Description'].notna().any():
print(subset_compound['Description'])
# just one output example
3                                       NaN
12                                      NaN
25                                      NaN
33                                      NaN
43                                      NaN
49                                      NaN
51                                      NaN
54                                      NaN
76                                      NaN
82     aids in osmotic stress survivability
84                                      NaN
90                                      NaN
94                                      NaN
97                                      NaN
100                                     NaN
106                                     NaN
119    aids in osmotic stress survivability
129                                     NaN

检查对notna():的布尔响应时

print(subset_compound['Description'].notna().any())
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

if语句允许通过Description列中具有NaN的数据帧。如果我理解subset_compound['Description']输出熊猫系列。是什么原因导致notna()看不到系列中的NaNs

  1. pandas.notna(obj(:检测类似数组的对象的非缺失值。

    array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
    pd.notna(array)
    // array([[ True, False,  True], [ True,  True, False]])
    
  2. DataFrame.any((:返回是否有任何元素为True,可能在一个轴上。

    pd.Series([False, False]).any()
    // False
    pd.Series([True, False]).any()
    // True
    

如果numpy数组包含任何不同于NaN的值,则代码中的问题是notna((返回True。如果结果中有来自notna((的True值,则any((返回True。在它们的末尾,您的代码会检查"是否有不同于NaN的值"。要解决代码中的此问题:

for compound in compound_list:
subset_compound = antismash.loc[antismash['similar'] == compound]
if subset_compound['Description'].isna().any():
print(subset_compound['Description'])

df[col].notna().any()如果列有任何非NaN值,则为True。只有当列中的每个值都是NaN时,它才会为False。

=== Executing: ===
df1 = pd.DataFrame({'a': range(5), 'b': [np.nan for _ in range(5)]})                                                    
print(df1['a'].notna().any(), df1['b'].notna().any())
=== Output: ===
True False

最新更新