每次运行训练数据集、预测测试集上的值,然后计算相关系数和MSE值的同一段代码时,值都会发生变化,这正常吗?为什么会发生这种情况?
import pandas as pd
from sklearn import tree
from scipy.stats import linregress
training = pd.read_csv('csvfile1.csv') #training data set
target = pd.DataFrame(training, columns=['target_column']) #target
testing = pd.read_csv('csvfile2.csv') #test set loaded from a different file
true = pd.DataFrame(testing, columns=['predicted_value']) #for comparison after predicting the target
X_train = training
target_vec = target['target_column']
Y_test = training
regression = tree.DecisionTreeRegressor(criterion='mse', splitter='best')
model = regression.fit(X_train, target_vec)
output = regression.predict(Y_test) #predictions
print(linregress(output, true.predicted_value)) #printing out rvalue
尝试使用这个:
import random
random.seed(123)
数字其实并不重要,关键是,它每次都将伪随机生成器设置为相同的起点。
我的猜测是,你一定是将数据随机划分为训练集和测试集,每次都不一样。下次向我们展示您的代码。你可能会因为不这样做而被投票否决。它也有助于我们帮助您。