我正在使用GridSearchCV遇到以下问题:在使用n_jobs > 1
时,它会给我一个并行错误。同时,n_jobs > 1
与RadonmForestClassifier等单个型号的工作正常。
下面是一个简单的工作示例,显示错误:
train = np.random.rand(100,10)
targ = np.random.randint(0,2,100)
clf = ensemble.RandomForestClassifier(n_jobs = 2)
clf.fit(train,targ)
train = np.random.rand(100,10)
targ = np.random.randint(0,2,100)
clf = ensemble.RandomForestClassifier(n_jobs = 2)
clf.fit(train,targ)
Out[349]: RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=10, n_jobs=2, oob_score=False, random_state=None,
verbose=0, warm_start=False)
此示例工作正常。
同时,以下内容不起作用:
clf = ensemble.RandomForestClassifier()
param_grid = {'n_estimators': [10,20]}
grid_s= model_selection.GridSearchCV(clf, param_grid=param_grid_gb,n_jobs=-1,verbose=1)
grid_s.fit(train, targ)
并给出以下错误:
Fitting 3 folds for each of 2 candidates, totalling 6 fits
ImportErrorTraceback (most recent call last)
<ipython-input-351-b8bb45396026> in <module>()
2 param_grid = {'n_estimators': [10,20]}
3 grid_s= model_selection.GridSearchCV(clf, param_grid=param_grid_gb,n_jobs=-1,verbose=1)
----> 4 grid_s.fit(train, targ)
/root/anaconda3/envs/python2/lib/python2.7/site-packages/sklearn/model_selection/_search.pyc in fit(self, X, y, groups)
943 train/test set.
944 """
--> 945 return self._fit(X, y, groups, ParameterGrid(self.param_grid))
946
947
/root/anaconda3/envs/python2/lib/python2.7/site-packages/sklearn/model_selection/_search.pyc in _fit(self, X, y, groups, parameter_iterable)
562 return_times=True, return_parameters=True,
563 error_score=self.error_score)
--> 564 for parameters in parameter_iterable
565 for train, test in cv_iter)
566
/root/anaconda3/envs/python2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
726 self._aborting = False
727 if not self._managed_backend:
--> 728 n_jobs = self._initialize_backend()
729 else:
730 n_jobs = self._effective_n_jobs()
/root/anaconda3/envs/python2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _initialize_backend(self)
538 try:
539 return self._backend.configure(n_jobs=self.n_jobs, parallel=self,
--> 540 **self._backend_args)
541 except FallbackToBackend as e:
542 # Recursively initialize the backend in case of requested fallback.
/root/anaconda3/envs/python2/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in configure(self, n_jobs, parallel, **backend_args)
297 if already_forked:
298 raise ImportError(
--> 299 '[joblib] Attempting to do parallel computing '
300 'without protecting your import on a system that does '
301 'not support forking. To use parallel-computing in a '
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information
我认为您正在使用Windows。您需要将网格搜索包装在功能中,然后在__name__ == '__main__'
中调用。Joblib并行n_jobs=-1
确定要使用的作业数,这些作业始终在Windows上不起作用。
尝试在功能中包装网格搜索:
def somefunction():
clf = ensemble.RandomForestClassifier()
param_grid = {'n_estimators': [10,20]}
grid_s= model_selection.GridSearchCV(clf, param_grid=param_grid_gb,n_jobs=-1,verbose=1)
grid_s.fit(train, targ)
return grid_s
if __name__ == '__main__':
somefunction()
或:
if __name__ == '__main__':
clf = ensemble.RandomForestClassifier()
param_grid = {'n_estimators': [10,20]}
grid_s= model_selection.GridSearchCV(clf, param_grid=param_grid_gb,n_jobs=-1,verbose=1)
grid_s.fit(train, targ)
也许这可能仍然与某些相关!
我仅尝试使用Windows 10 Machine上的Anaconda :
我在环境中遇到了相同的问题,其中包括以下代码部分:
parameters = [{'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator = classifier, param_grid = parameters, scoring = 'accuracy', cv = 10, n_jobs = -1)
grid_search = grid_search.fit(X_train, y_train)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
我在互联网上找不到很多东西,所以我想也许我应该更新Joblib类。和惊喜 - 未安装在我的特定环境中中。安装并更新后 - 它运行得很好。使用n_jobs = -1
和n_jobs = 2
。
对我有用的是更改平行后端:
from sklearn.utils import parallel_backend
with parallel_backend('multiprocessing'): # 'multiprocessing' / 'threading'
# GridSearchCV code...
请参阅此处可接受的backend
值。
从这里获取的答案。