我有一个数据帧,我正在对其进行子集设置,以创建一个系列,然后在执行代码之前,如果该系列包含任何nan,则进行搜索。但是我得到了NaN被传递给后续代码的系列。
数据帧示例:
sample classification type similar ... respiraroty activity antimicrobial molecular details Litrature
7 M41 predicted melanin melanin ... NaN NaN NaN 10.1073/pnas.211433198
17 RB1 predicted melanin melanin ... NaN NaN NaN 10.1073/pnas.211433198
21 RB1 predicted melanin melanin ... NaN NaN NaN 10.1073/pnas.211433198
32 RB17 predicted NRPS,melanin melanin ... NaN NaN NaN 10.1073/pnas.211433198
我通过similar
列对数据帧进行子集设置,然后检查description
列是否包含任何NaNs
for compound in compound_list:
subset_compound = antismash.loc[antismash['similar'] == compound]
if subset_compound['Description'].notna().any():
print(subset_compound['Description'])
# just one output example
3 NaN
12 NaN
25 NaN
33 NaN
43 NaN
49 NaN
51 NaN
54 NaN
76 NaN
82 aids in osmotic stress survivability
84 NaN
90 NaN
94 NaN
97 NaN
100 NaN
106 NaN
119 aids in osmotic stress survivability
129 NaN
检查对notna()
:的布尔响应时
print(subset_compound['Description'].notna().any())
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
if语句允许通过Description
列中具有NaN的数据帧。如果我理解subset_compound['Description']
输出熊猫系列。是什么原因导致notna()
看不到系列中的NaNs
?
-
pandas.notna(obj(:检测类似数组的对象的非缺失值。
array = np.array([[1, np.nan, 3], [4, 5, np.nan]]) pd.notna(array) // array([[ True, False, True], [ True, True, False]])
-
DataFrame.any((:返回是否有任何元素为True,可能在一个轴上。
pd.Series([False, False]).any() // False pd.Series([True, False]).any() // True
如果numpy数组包含任何不同于NaN的值,则代码中的问题是notna((返回True。如果结果中有来自notna((的True值,则any((返回True。在它们的末尾,您的代码会检查"是否有不同于NaN的值"。要解决代码中的此问题:
for compound in compound_list:
subset_compound = antismash.loc[antismash['similar'] == compound]
if subset_compound['Description'].isna().any():
print(subset_compound['Description'])
df[col].notna().any()
如果列有任何非NaN值,则为True。只有当列中的每个值都是NaN时,它才会为False。
=== Executing: ===
df1 = pd.DataFrame({'a': range(5), 'b': [np.nan for _ in range(5)]})
print(df1['a'].notna().any(), df1['b'].notna().any())
=== Output: ===
True False