当满足特定条件时,我想更改多列中的NA值。下面给出了示例数据集。
Pool Area Pool Quality Pool Type Pool Condition Pool Finish
0 800 Good A Good Gunite
1 400 Good C Good Vinyl
2 485 Good B Good Fibreglass
3 360 Poor C Poor Vinyl
4 0 NaN NaN NaN NaN
5 600 Best A Best Gunite
6 500 Best B Best Fibreglass
7 0 NaN NaN NaN NaN
8 750 Best A Best Gunite
9 900 Best A Best Gunite
10 0 NaN NaN NaN NaN
11 900 Best A Best Gunite
12 400 Poor C Poor Fibreglass
13 0 NaN NaN NaN NaN
在上面的数据中,我想用"无池"替换NaN值,其中"池区域"列的值为"0"。
我知道我可以用np.其中函数来完成,我尝试了下面的代码。
df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']] = np.where(df['Pool Area']==0, 'No Pool', df[['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']])
它不起作用。
我单独尝试过,它很有效(参考下面的代码(。
df['Pool Quality'] = np.where(df['Pool Area']==0, 'No Pool', df['Pool Quality'])
但当我尝试一次处理多个列时,它不起作用。
下面是我得到的错误。
ValueError:操作数无法与形状(2919,((((2919,5(一起广播
注意:以上错误消息取自我的实际数据集,其中维度为2919行81列。
我不知道我的代码出了什么问题。请帮帮我。
使用布尔索引:
m = df['Pool Area'].eq(0)
df.loc[m] = df.loc[m].fillna('No Pool')
# or
# df[m] = df[m].fillna('No Pool')
# or to limit to given columns
# cols = ['Pool Quality', 'Pool Type', 'Pool Condition', 'Pool Finish']
# df.loc[m, cols] = df.loc[m, cols].fillna('No Pool')
更新的df
:
Pool Area Pool Quality Pool Type Pool Condition Pool Finish
0 800 Good A Good Gunite
1 400 Good C Good Vinyl
2 485 Good B Good Fibreglass
3 360 Poor C Poor Vinyl
4 0 No Pool No Pool No Pool No Pool
5 600 Best A Best Gunite
6 500 Best B Best Fibreglass
7 0 No Pool No Pool No Pool No Pool
8 750 Best A Best Gunite
9 900 Best A Best Gunite
10 0 No Pool No Pool No Pool No Pool
11 900 Best A Best Gunite
12 400 Poor C Poor Fibreglass
13 0 No Pool No Pool No Pool No Pool