预处理机器学习的错误



我正在尝试在培训数据上应用预处理,我也尝试了rehsape函数,但这无效,我会得到以下错误:

ValueError: Found input variables with inconsistent numbers of samples: [34, 12700]

这是我的代码:

import pandas as pd
import numpy as np
from sklearn import preprocessing,neighbors
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
df=pd.read_csv('train.csv')
df.drop(['ID'],1,inplace=True)

X=np.array(df.drop(['label'],1))
y=np.array(df['label'])
print(X.shape)

X = preprocessing.StandardScaler().fit(X)
X=X.mean_

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
clf = RandomForestRegressor(n_estimators=1900,max_features='log2',max_depth=25)
clf.fit(X_train,y_train)
accuracy=clf.score(X_test,y_test)
print(accuracy)

问题是X = preprocessing.StandardScaler().fit(X) X=X.mean_

之后,您的X仅包含每个列的均值。

转换数据使用以下代码:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

有关更多详细信息,请参阅Scikit-Doc

最新更新