我的预测结果应该有正负两个值。我使用两阶段预测。步骤如下:
1) split the data into 3 sets(training, test and out of sample)
2) train different base regressors using training set data,
the regressors are different types of trees, such as gradient boosting tree.
3) use trained regressor to predict test set
4) use the predicted output from step 3 to train svm to come
up with the second level model.
5) use each regressor to predict out of sample data
6) use step 5 output value as input to step 4 fitted model to
predict final result on out of sample data.
在我的响应变量(预测)中,我应该有正值和负值,但在步骤6中,我只看到所有的预测都是正值
1) Here is a sample of step 5, using only 3 base learners,
obviously they have both positive and negative values from different learners:
>>array([[ 6.72144956e-04, 1.56136199e-03, 1.58553265e-04],
[ -4.63248063e-04, 4.95401301e-04, 1.10566458e-04],
...
[ 1.48747688e-03, -1.11622013e-03, -7.57807887e-05]]
2) output of step 6, here all the values are positive, buy real value
obviously have both positive and negative.
>> array([ 4.56349996e-04, 4.43408819e-04, ...
4.36207927e-04])
下面是混合模型的python代码:
def fit(self):
sub_models = self.models
y_test_i = []
for i in range(len(sub_models)):
model = sub_models[i]
y_test_i.append( model.predict(self.x_test))
y_test_i = np.array(y_test_i).T
parameters = {
"kernel": "rbf",
"C": 0.001,
"gamma" : 0
}
self.blender = SVR(**parameters)
self.blender.fit( y_test_i, self.y_test.values)
def pred(self):
y_oos_i = []
for i in range(len(self.models)):
model = self.models[i]
y_oos_i.append( model.predict(self.x_oos))
y_oos_i = np.array(y_oos_i).T
y_pred = self.blender.predict(y_oos_i)
我的问题是如何调试这种情况?顺便说一句:我每组有大约800个数据点,大约20个特征作为第二步输入。
很可能你在某一阶段的训练数据都是非负的
为了调试,我建议在每个阶段的每个训练和测试集中计算阳性和阴性的数量(类似的):
print "N nonnegatives: ", np.count_nonzero(Y_train >= 0)
print "N negatives: ", np.count_nonzero(Y_train < 0)
特别是,似乎第4步的训练或测试数据可能值得怀疑。