使用跨多个列的列表推导将所有非nan值转换为1

以下是一些数据:

test = pd.DataFrame([[np.nan,"cat","mouse", 'tiger'],
["tiger","dog","elephant", "mouse"],
["cat",np.nan,"giraffe", "cat"],
[np.nan,np.nan,"ant", "ant"]],  columns=["animal1","animal2","animal3", "animal4"])

我想把所有的nan都转换为0，所有的响应都转换为1。

#First I convert all NaNs to 0 and make everything string
test = test.fillna(0)
test = test.astype(str)

然后创建感兴趣的列的列表(这在本例中没有意义，因为只有2列，但在我的实际情况中有很多)

op = test.iloc[:,0:2].columns.tolist()

我想我可以这样做:

test[op] = [1 if x != '0' else 0 for x in test[op]]

但是它不起作用，因为它将所有内容转换为1。

然后我尝试手动按每列执行，它确实有效:

test['animal1'] = [1 if x != '0' else 0 for x in test['animal1']]

有谁知道为什么后一种方式有效而前一种不行吗?如果能提供一些指导，我将不胜感激。

编辑/更新:SeaBean提供了一个解决方案，工作(谢谢!!)。我仍然有兴趣知道为什么我使用的方法只在一次处理一列(手动)时有效。

您可以使用.notna()并通过.astype()转换为0/1，如下所示:

test.notna().astype(int)

结果:

animal1  animal2  animal3  animal4
0        0        1        1        1
1        1        1        1        1
2        1        0        1        1
3        0        0        1        1

编辑

解释为什么你的try方法只能在一次执行一列操作时工作，而不能执行多列操作:

当你一次处理一列时，你在列表推导中指定了例如test['animal1']，你是在迭代相关列的Pandas系列的元素。这将按照您的期望执行任务。

但是，当您通过在列表推导式中包含test[op]在多个列中执行此操作时，这里test[op]是一个数据框架，而不是Pandas系列。当您遍历该数据框时，您只能获得该数据框的列标签。当你尝试下面的列表推导时，你就会明白了:

[x for x in test[op]]

给了:

['animal1', 'animal2']

因此，在多列的列表推导中，您对x != '0'的比较将始终返回true并给出所有1，因为您正在比较的列标签不包含'0'。

您可以使用.isna()来反转结果:

print(~test.isna())
animal1  animal2  animal3  animal4
0    False     True     True     True
1     True     True     True     True
2     True    False     True     True
3    False    False     True     True

如果你想要0和1乘以1:

print((~test.isna())*1)
animal1  animal2  animal3  animal4
0        0        1        1        1
1        1        1        1        1
2        1        0        1        1
3        0        0        1        1

编辑

相关内容

最新更新

热门标签：