运行并行 KMean 时"index N is out of bounds for axis 0 with size N",而顺序 KMean 工作正常



我正在尝试并行使用scikit-learn实现运行KMeans,但我不断收到以下错误消息:

Traceback (most recent call last):
  File "run_kmeans.py", line 114, in <module>
    kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
    return_n_iter=True)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
    for seed in seeds)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683

当我使用 n_jobs=1 运行 KMean 时,即以顺序方式,我没有收到任何错误,一切正常。但是有了n_jobs=-1我不断收到错误。

这是我使用的代码:

kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)

descriptors是一个形状为 (11683, 128) 的 numpy 数组。


我做错了什么还是 KMeans实施中的错误?

我应该怎么做(例如使用BiniBatchKMeans等)?

PS:我在Ubuntu 16.04 64位机器上运行它,具有4 Gb的RAM和Intel Core i7-4700HQ 2.40GHz

这个问题可以通过将输入数据转换为 float64 来修复,作为描述符.astype(np.float64)。

https://github.com/scikit-learn/scikit-learn/issues/8583

相关内容

  • 没有找到相关文章

最新更新