在熊猫中的几列中应用地图



我正在尝试在熊猫中的几列中应用地图以反映数据何时无效。当 df['Count'] 列中的数据无效时,我想将我的 df['值']、df['下限置信区间']、df['上置信区间'] 和 df['分母'] 列设置为 -1。

这是数据帧的示例:

Count   Value       Lower Confidence Interval  Upper Confidence Interval  Denominator
121743  54.15758428 53.95153779                54.36348867                224794
280     91.80327869 88.18009411                94.38654088                305
430     56.95364238 53.39535553                60.44152684                755
970     70.54545455 68.0815009                 72.89492873                1375
nan             
70      28.57142857 23.27957213                34.52488678                245
125     62.5        55.6143037                 68.91456314                200

目前,我正在尝试:

set_minus_1s = {np.nan: -1, '*': -1, -1: -1}

然后:

df[['Value', 'Count', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Denominator']] = df['Count'].map(set_minus_1s)

并收到错误:

ValueError: Must have equal len keys and value when setting with an iterable

有没有办法链接列引用以对映射进行一次调用,而不是为每个列使用单独的行来调用set_minus_1s字典作为映射?

我认为您可以使用wheremask并替换应用后未isnull的所有行 map

val = df['Count'].map(set_minus_1s)
print (val)
0    NaN
1    NaN
2    NaN
3    NaN
4   -1.0
5    NaN
6    NaN
Name: Count, dtype: float64
cols =['Value','Count','Lower Confidence Interval','Upper Confidence Interval','Denominator']
df[cols] = df[cols].where(val.isnull(), val, axis=0)
print (df)
      Count      Value  Lower Confidence Interval  Upper Confidence Interval  
0  121743.0  54.157584                  53.951538                  54.363489   
1     280.0  91.803279                  88.180094                  94.386541   
2     430.0  56.953642                  53.395356                  60.441527   
3     970.0  70.545455                  68.081501                  72.894929   
4      -1.0  -1.000000                  -1.000000                  -1.000000   
5      70.0  28.571429                  23.279572                  34.524887   
6     125.0  62.500000                  55.614304                  68.914563   
   Denominator  
0     224794.0  
1        305.0  
2        755.0  
3       1375.0  
4         -1.0  
5        245.0  
6        200.0  

cols = ['Value', 'Count', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Denominator']
df[cols] = df[cols].mask(val.notnull(), val, axis=0)
print (df)
      Count      Value  Lower Confidence Interval  Upper Confidence Interval  
0  121743.0  54.157584                  53.951538                  54.363489   
1     280.0  91.803279                  88.180094                  94.386541   
2     430.0  56.953642                  53.395356                  60.441527   
3     970.0  70.545455                  68.081501                  72.894929   
4      -1.0  -1.000000                  -1.000000                  -1.000000   
5      70.0  28.571429                  23.279572                  34.524887   
6     125.0  62.500000                  55.614304                  68.914563   
   Denominator  
0     224794.0  
1        305.0  
2        755.0  
3       1375.0  
4         -1.0  
5        245.0  
6        200.0  

最新更新