在 python 中划分训练数据和测试数据中的样本



我是编程新手,我正在解决 python 中的机器学习问题,我试图将我的数据集拆分为训练和测试,如代码所示,我遇到了以下错误,即使在谷歌和其他网站上进行一些搜索,我也无法克服:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
#Load up the training dataset
df = pd.read_excel('Trainind data_2002.xls')
df.head()
df['training'] = np.random.uniform(0, 1, len(df)) <= .70
colsfeatures = ['c2', 'c3', 'c4', 'c5', 'c7', 'ndvi', 'vi7']
colclass = ['class']
train, test = df[df['training'] == True, df['training'] == False]
trainingMatrix = train.as_matrix(colsfeatures)
classMatrix = train.as_matrix(colclass)
rfc = RandomForestClassifier(n_estimators=100, n_jobs=2)
rfc.fit(traningMatrix, classMatrix)
testMatrix = test.as_matrix(colsfeatures)
result = rfc.predict(testMatrix)
test['predictions'] = result
test.head()

错误:类型错误:"系列"对象是可变的,因此无法对其进行哈希处理

拜托,谁能帮助我,我将不胜感激。

你试过train_test_split吗?

from sklearn.model_selection import train_test_split
train , test = train_test_split(<<your data set >> , test_size = << ex : 0.2>>)

最新更新