如何从功能列表和重量系数列表(逻辑回归)列表中选择前10个功能



我想在逻辑回归模型中选择前5个功能。我现在有两个数组,一个具有所有功能名称和另一个列表,该列表具有Model.coef_的Co-efficients,其中model = logisticRegression((。

feature_list = ['ball', 'cat', 'apple',....,] # this has 108 elements
coefficents = lr.coef_  
print(coefficents[0])

此打印如下:

[ 2.07587361e-04  5.59531750e-04  0.00000000e+00  0.00000000e+00
-5.16353886e-02 ......  1.66633057e-02]   #this also has 108 elements

当我尝试对系数值进行排序时,我会得到不同的值。

sorted_index = np.argsort(coefficents[0])
print(sorted_index)
[ 22  91  42  15  52  31  16  32  86 .... 17 106]   #this has 108 values

如何从这两个阵列中获得正确的前5个重要功能?

argsort正在按上升顺序进行排序,您想以降序(最高第一(

进行降序

在这里我给你一个简单的例子:

import numpy as np
feature_list = ['ball', 'cat', 'apple', 'house', 'tree', 'school', 'child']
coeff = np.array([0.7, 0.3, 0.8, 0.2, 0.4, 0.1, 0.9])
# negate the coeff. to sort them in descending order
idx = (-coeff).argsort()
# map index to feature list
desc_feature = [feature_list[i] for i in idx]
# select the top 5 feature
top_feature = desc_feature [:5]
print(top_feature)

导致您的最佳功能:

['child', 'apple', 'ball', 'tree', 'cat']

最新更新