sklearn Imputer()返回的特征不适合fit函数



我有一个缺失值nan的特征矩阵,所以我需要首先初始化这些缺失值。然而,最后一行抱怨并抛出以下错误行: Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0) 。我检查了一下,似乎原因是train_fea_imputed不在np中。数组格式,但sklearn.预处理。我该如何解决这个问题?
顺便说一句,如果我使用train_fea_imputed = impp .fit_transform(train_fea),代码可以正常工作,但是train_fea_imputed返回一个比train_fea

小1维的数组
    import pandas as pd
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.preprocessing import Imputer
    imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
    train_fea_imputed = imp.fit(train_fea)
    # train_fea_imputed = imp.fit_transform(train_fea)
    rf = RandomForestClassifier(n_estimators=5000,n_jobs=1, min_samples_leaf = 3)
    rf.fit(train_fea_imputed, train_label)

update: I changed to

imp = Imputer(missing_values='NaN', strategy='mean', axis=1)

,现在尺寸问题没有发生。我认为在归算函数中存在一些固有的问题。我做完这个项目就回来。

对于scikit-learn,初始化模型,训练模型和获得预测是单独的步骤。在你的例子中,你有:

train_fea = np.array([[1,1,0],[0,0,1],[1,np.nan,0]])
train_fea
array([[  1.,   1.,   0.],
       [  0.,   0.,   1.],
       [  1.,  nan,   0.]])
#initialise the model
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
#train the model
imp.fit(train_fea)
#get the predictions
train_fea_imputed = imp.transform(train_fea)
train_fea_imputed
array([[ 1. ,  1. ,  0. ],
       [ 0. ,  0. ,  1. ],
       [ 1. ,  0.5,  0. ]])

我认为轴= 1在这种情况下是不正确的,因为你想在特征向量/列(轴= 0)的值上取平均值,而不是行(轴= 1)。

相关内容

  • 没有找到相关文章

最新更新