概率SVM，回归

我目前已经为二进制类实现了一个概率(至少我认为是这样)。现在，我想将这种方法扩展到回归中，并尝试将其用于Boston数据集。不幸的是，我的算法似乎被卡住了，我目前运行的代码看起来是这样的：

from sklearn import decomposition
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
y = boston.target
inputs_train, inputs_test, targets_train, targets_test = train_test_split(X, y, test_size=0.33, random_state=42)
def plotting():
param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(svm.SVR(), cv = 5, param_grid= param_grid)
clf.fit(inputs_train, targets_train)
clf = SVR(C=clf.best_params_['C'], cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=5, gamma=clf.best_params_['gamma'],
kernel=clf.best_params_['kernel'],
max_iter=-1, probability=True, random_state=None, shrinking=True,
tol=0.001, verbose=False)
clf.fit(inputs_train, targets_train)
a = clf.predict(inputs_test[0])
print(a)

plotting()

有人能告诉我，这种方法出了什么问题吗？这并不是因为我收到了一些错误消息(我知道，我已经在上面支持了它们)，但代码从未停止运行。任何建议都将不胜感激。

您的代码有几个问题。

首先，永远需要的是第一个clf.fit(即网格搜索)，这就是为什么在第二个clf.fit中设置max_iter和tol时没有看到任何变化。
其次，clf=SVR()部分将不起作用，因为：
- 你必须导入它，SVR不可识别
- 您有一堆非法参数(decision_function_shape、probability、random_state等)-请检查文档中可接受的SVR参数
第三，您不需要再次明确地使用最佳参数进行拟合；您只需在GridSearchCV定义中要求refit=True，然后使用clf.best_estimator_进行预测(注释后编辑：简单的clf.predict也可以)。

因此，将这些东西移出任何函数定义之外，这里是代码的工作版本：

from sklearn.svm import SVR
# other imports as-is
# data loading & splitting as-is
param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(SVR(degree=5, max_iter=10000), cv = 5, param_grid= param_grid, refit=True,)
clf.fit(inputs_train, targets_train)
a = clf.best_estimator_.predict(inputs_test[0])
# a = clf.predict(inputs_test[0]) will also work 
print(a)
# [ 21.89849792]

除了degree之外，您正在使用的所有其他可接受的参数值实际上都是各自的默认值，因此在SVR定义中真正需要的参数只有degree和max_iter。

您将收到几个警告(而非错误)，即在拟合后：

/databricks/python/lib/python3.5/site packages/skmear/svm/base.py:220:聚合警告：解算器提前终止(max_iter=10000)。考虑使用StandardScaler或MinMaxScaler预处理数据。

和预测后：

/databricks/python/lib/python3.5/site packages/skmear/utils/validation.py:395:不推荐使用警告：0.17中不推荐使用将1d数组作为数据传递并将ValueError提高0.19。使用X.整形(-1，1)(如果数据只有一个功能)或X.整形(1，-1)如果它包含单个样本。折旧警告)

其中已经包含了一些关于下一步该做什么的建议。。。

最后但同样重要的是：概率分类器(即产生概率而不是硬标签的分类器)是有效的，但"概率"回归模型不是。。。

使用Python测试3.5和scikit学习0.18.1

相关内容

最新更新

热门标签：