类型错误:"TimeSeriesSplit"对象不可迭代



我正在对具有时间序列拆分的 SVR 设计进行网格搜索。我的代码是:

from sklearn.svm import SVR
from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit
from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing as pre
X_feature = X_feature.reshape(-1, 1)
y_label = y_label.reshape(-1,1)
param = [{'kernel': ['rbf'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],
'C': [1, 10, 100, 1000]},
{'kernel': ['poly'], 'C': [1, 10, 100, 1000], 'degree': [1, 2, 3, 4]}] 

reg = SVR(C=1)
timeseries_split = TimeSeriesSplit(n_splits=3)
clf = GridSearchCV(reg, param, cv=timeseries_split, scoring='neg_mean_squared_error')

X= pre.MinMaxScaler(feature_range=(0,1)).fit(X_feature)
scaled_X = X.transform(X_feature)

y = pre.MinMaxScaler(feature_range=(0,1)).fit(y_label)
scaled_y = y.transform(y_label)

clf.fit(scaled_X,scaled_y )

我的缩放 y 数据是:

[0.11321139]
[0.07218848]
...
[0.64844211]
[0.4926122 ]
[0.4030334 ]]

我的缩放 X 数据是:

[[0.2681013 ]
[0.03454225]
[0.02062136]
...
[0.92857565]
[0.64930691]
[0.20325924]]

但是,我收到错误消息

TypeError: 'TimeSeriesSplit' object is not iterable

我的回馈错误消息是:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-4403e696bf0d> in <module>()
19 
20 
---> 21 clf.fit(scaled_X,scaled_y )
~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in fit(self, X, y)
836 
837         """
--> 838         return self._fit(X, y, ParameterGrid(self.param_grid))
839 
840 
~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in _fit(self, X, y, parameter_iterable)
572                                     self.fit_params, return_parameters=True,
573                                     error_score=self.error_score)
--> 574                 for parameters in parameter_iterable
575                 for train, test in cv)
576 
~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
777             # was dispatched. In particular this covers the edge
778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
780                 self._iterating = True
781             else:
~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
618 
619         with self._lock:
--> 620             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
621             if len(tasks) == 0:
622                 # No more tasks available in the iterator: tell caller to stop.
~/anaconda3_501/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, iterator_slice)
125 
126     def __init__(self, iterator_slice):
--> 127         self.items = list(iterator_slice)
128         self._size = len(self.items)
129 
~/anaconda3_501/lib/python3.6/site-packages/sklearn/grid_search.py in <genexpr>(.0)
573                                     error_score=self.error_score)
574                 for parameters in parameter_iterable
--> 575                 for train, test in cv)
576 
577         # Out is a list of triplet: score, estimator, n_test_samples
TypeError: 'TimeSeriesSplit' object is not iterable

我不确定为什么会这样,我怀疑当我适合最后一行时会发生这种情况。对此的帮助将不胜感激。

首先,您使用的是不兼容的软件包。grid_search是旧版本,现已弃用,不适用于model_selection。

代替:

from sklearn.grid_search import GridSearchCV

这样做:

from sklearn.model_selection import GridSearchCV

其次,您只需要将TimeSeriesSplit(n_splits=3)发送到cv参数。喜欢这个:

timeseries_split = TimeSeriesSplit(n_splits=3)
clf = GridSearchCV(reg, param, cv=timeseries_split, scoring='neg_mean_squared_error')

无需打电话split().它将由grid_search内部调用。

找不到生成器的长度,它们不包含查找长度的完整信息,这些生成器仅保持当前状态。在您的grid_search.py文件第 579 行中,它试图查找生成器的长度。您需要将它们转换为迭代器以找到长度,因此您可以执行以下操作:

n_folds = 列表(n_folds(

在你这样做之前:

n_folds = len(cv(

如果要将其保留为生成器,请参阅:

如何拍摄(生成器(((

最新更新