解码熊猫数据帧

我有一个编码的数据帧。我用scitkit-learn的labelEncoder对其进行编码，创建一个机器学习模型并做了一些预测。但是现在我无法解码输出的熊猫数据帧中的值。我用文档中的inverse_transform尝试了几次，但每次我都收到错误，例如

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

这就是我的数据帧的样子：

    0   147 14931   9   0   0   1   0   0   0   4   ... 0   0   242 677 0   94  192 27  169 20
    1   146 14955   15  1   0   0   0   0   0   0   ... 0   1   63  42  0   94  192 27  169 20
    2   145 15161   25  1   0   0   0   1   0   5   ... 0   0   242 677 0   94  192 27  169 20

这是

代码，如果有必要，我如何编码它：

labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
    b[col] = labelEncoder.fit_transform(b[col])

列名是不必要的。我也尝试了lambda函数，它在这里的另一个问题中显示，但它仍然不起作用。我做错了什么？感谢您的帮助！

编辑：在Vivek Kumars代码实现之后，我收到以下错误：

KeyError: 'Predicted_Values'

这是我添加到数据帧的列，只是为了表示预测值。我通过以下方式做到这一点：

b = pd.concat([X_test, y_test], axis=1)  # features and actual predicted values
b['Predicted_Values'] = y_predict

这就是我从数据帧中删除将在 y 轴上的列并选择适合估计器的方式：

from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)

你可以在这里看看我的回答，以了解LabelEncoder在多列中的正确用法：-

为什么 sklearn 预处理 LabelEncoder inverse_transform只从一列应用？

解释是 LabelEncoder 仅支持单个维度作为输入。因此，对于每一列，您需要有一个不同的 labelEncoder 对象，然后该对象可用于仅反向转换该特定列。

您可以使用标签编码器对象的字典来转换多列。像这样：

labelencoder_dict = {}
for col in b.columns:
    labelEncoder = preprocessing.LabelEncoder()
    b[col] = labelEncoder.fit_transform(b[col])
    labelencoder_dict[col]=labelEncoder

解码时，您只需使用：

for col in b.columns:
    b[col] = labelencoder_dict[col].inverse_transform(b[col])

更新：-

现在您已经添加了用作y的列，以下是对其进行解码的方法(假设您已将"Predicted_Values"列添加到数据帧(：

for col in b.columns:
    # Skip the predicted column here
    if col != 'Predicted_valu‌es':
        b[col] = labelencoder_dict[col].inverse_transform(b[col])
# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_valu‌es'] = labelencoder_dict['Activity_Profile'].inverse_transfo‌rm(
                                                      b['Predicted_valu‌es'])

相关内容

最新更新

热门标签：