在 python 中划分训练数据和测试数据中的样本

我是编程新手，我正在解决 python 中的机器学习问题，我试图将我的数据集拆分为训练和测试，如代码所示，我遇到了以下错误，即使在谷歌和其他网站上进行一些搜索，我也无法克服：

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
#Load up the training dataset
df = pd.read_excel('Trainind data_2002.xls')
df.head()
df['training'] = np.random.uniform(0, 1, len(df)) <= .70
colsfeatures = ['c2', 'c3', 'c4', 'c5', 'c7', 'ndvi', 'vi7']
colclass = ['class']
train, test = df[df['training'] == True, df['training'] == False]
trainingMatrix = train.as_matrix(colsfeatures)
classMatrix = train.as_matrix(colclass)
rfc = RandomForestClassifier(n_estimators=100, n_jobs=2)
rfc.fit(traningMatrix, classMatrix)
testMatrix = test.as_matrix(colsfeatures)
result = rfc.predict(testMatrix)
test['predictions'] = result
test.head()

错误：类型错误："系列"对象是可变的，因此无法对其进行哈希处理

拜托，谁能帮助我，我将不胜感激。

你试过train_test_split吗？

from sklearn.model_selection import train_test_split
train , test = train_test_split(<<your data set >> , test_size = << ex : 0.2>>)

相关内容

最新更新

热门标签：