如何让 numpy 选择识别我的列变量?



我运行了一个决策树,并希望通过预测来标记每个箱。我像这样从预测对象中提取了唯一的预测,如下所示。

test_df3_dummies['dt_predictions'].unique()
array([0.00617504, 0.00834542, 0.02429166, 0.01016155, 0.00258616,
0.44985403, 0.05977463, 0.08333904])

因此,我做了以下操作,根据预测制作了一列箱:

condition = [(test_df3_dummies['dt_predictions'] == 0.00617504)
,(test_df3_dummies['dt_predictions'] == 0.00834542)
,(test_df3_dummies['dt_predictions'] == 0.02429166)
,(test_df3_dummies['dt_predictions'] == 0.01016155)
,(test_df3_dummies['dt_predictions'] == 0.00258616)
,(test_df3_dummies['dt_predictions'] == 0.44985403)
,(test_df3_dummies['dt_predictions'] == 0.05977463)
,(test_df3_dummies['dt_predictions'] == 0.08333904)]
replace = [1,2,3,4,5,6,7,8]
test_df3_dummies['dt_bins'] = np.select(condition, replace, default = 0)

但它并没有获得价值。 熊猫切断了最后两位数字,所以我尝试这样做,它也失败了。float64变量类型和np.select有什么技巧吗?

这是factorize

test_df3_dummies = test_df3_dummies.sort_values('dt_predictions')
enum, codes = test_df3_dummies['dt_predictions'].factorize()

test_df3_dummies['dt_bins'] = enum

使用groupby.ngroup

#test_df3_dummies=test_df3_dummies.sort_values('dt_predictions')  #if is neccesary
test_df3_dummies['dt_bins']=test_df3_dummies.groupby('dt_predictions').ngroup + 1

最新更新