我收到以下代码的错误:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.neighbors import KernelDensity
from sklearn.decomposition import PCA
from sklearn.grid_search import GridSearchCV
from sklearn import linear_model, mixture, decomposition, datasets
# load the data
digits = load_digits()
data = digits.data
pca = PCA(n_components=15, whiten=False)
data = pca.fit_transform(digits.data)
gmm = mixture.GMM()
# use grid search cross-validation
params = {'gmm__n_components':(2, 3)}
grid = GridSearchCV(gmm, params)
grid.fit(data)
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-07b1b825ee22> in <module>()
22
23 grid = GridSearchCV(gmm, params)
---> 24 grid.fit(data)
25
C:Anaconda2libsite-packagessklearngrid_search.pyc in fit(self, X, y)
802
803 """
--> 804 return self._fit(X, y, ParameterGrid(self.param_grid))
805
806
C:Anaconda2libsite-packagessklearngrid_search.pyc in _fit(self, X, y, parameter_iterable)
551 self.fit_params, return_parameters=True,
552 error_score=self.error_score)
--> 553 for parameters in parameter_iterable
554 for train, test in cv)
555
C:Anaconda2libsite-packagessklearnexternalsjoblibparallel.pyc in __call__(self, iterable)
802 self._iterating = True
803
--> 804 while self.dispatch_one_batch(iterator):
805 pass
806
C:Anaconda2libsite-packagessklearnexternalsjoblibparallel.pyc in dispatch_one_batch(self, iterator)
660 return False
661 else:
--> 662 self._dispatch(tasks)
663 return True
664
C:Anaconda2libsite-packagessklearnexternalsjoblibparallel.pyc in _dispatch(self, batch)
568
569 if self._pool is None:
--> 570 job = ImmediateComputeBatch(batch)
571 self._jobs.append(job)
572 self.n_dispatched_batches += 1
C:Anaconda2libsite-packagessklearnexternalsjoblibparallel.pyc in __init__(self, batch)
181 # Don't delay the application, to avoid keeping the input
182 # arguments in memory
--> 183 self.results = batch()
184
185 def get(self):
C:Anaconda2libsite-packagessklearnexternalsjoblibparallel.pyc in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
C:Anaconda2libsite-packagessklearncross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1518
1519 if parameters is not None:
-> 1520 estimator.set_params(**parameters)
1521
1522 start_time = time.time()
C:Anaconda2libsite-packagessklearnbase.pyc in set_params(self, **params)
259 'Check the list of available parameters '
260 'with `estimator.get_params().keys()`.' %
--> 261 (name, self))
262 sub_object = valid_params[name]
263 sub_object.set_params(**{sub_name: value})
ValueError: Invalid parameter gmm for estimator GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=1, n_init=1, n_iter=100, params='wmc', random_state=None,
thresh=None, tol=0.001, verbose=0). Check the list of available parameters with `estimator.get_params().keys()`.
虽然我在Scikit Learn上发现了一个类似的代码,效果很好,请参阅下面的代码,但上面的代码给了我错误,唯一的区别是算法,这会有区别吗?我该如何解决这个问题?谢谢
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.neighbors import KernelDensity
from sklearn.decomposition import PCA
from sklearn.grid_search import GridSearchCV
# load the data
digits = load_digits()
data = digits.data
# project the 64-dimensional data to a lower dimension
pca = PCA(n_components=15, whiten=False)
data = pca.fit_transform(digits.data)
# use grid search cross-validation to optimize the bandwidth
params = {'bandwidth': np.logspace(-1, 1, 20)}
grid = GridSearchCV(KernelDensity(), params)
grid.fit(data)
print("best bandwidth: {0}".format(grid.best_estimator_.bandwidth))
我发现您的代码有两个问题。
首先,因为您只向GridSearchCV传递一个估计器,所以不应该在参数网格中的参数名称开头包含gmm__
。删除它可以让你克服上面引用的错误。您可以按如下方式更改参数网格分配:
params = {'n_components':(2, 3)}
但一旦你克服了这个错误,你就会发现你遇到了第二个问题。GMM.score()
返回一个数组,而不是单个分值。从这个意义上讲,它不同于sklearn对KMeans、KernelDensity、PCA等的操作(请参阅此处对该问题的讨论:https://github.com/scikit-learn/scikit-learn/issues/2473)。GMM的分数数组导致GridSearchCV抛出一个错误,因为它需要一个值。您在sklearn网站上提供的示例使用KernelDensity,因此不会出现这样的问题。
我建议使用另一种具有分数函数的算法,该函数将与GridSearchCV的期望相一致,例如KMeans或KernelDensity。或者,您可以为要测试的每个n_component
级别分别运行gmm.fit(),并以对您最有意义的方式比较结果。