scikit中大型数据集的ICA在线学习



我有一个大的数据集,我试图从图像中获得gabor过滤器。当数据集太大时,就会出现内存错误。到目前为止,我有以下代码:

import numpy
from sklearn.feature_extraction.image import extract_patches_2d
from sklearn.decomposition import MiniBatchDictionaryLearning
from sklearn.decomposition import FastICA
def extract_dictionary(image, patches_size=(16,16), projection_dimensios=25, previous_dictionary=None):
    """
    Gets a higher dimension ica projection image.
    """
    patches = extract_patches_2d(image, patches_size)
    patches = numpy.reshape(patches, (patches.shape[0],-1))[:LIMIT]
    patches -= patches.mean(axis=0)
    patches /= numpy.std(patches, axis=0)
    #dico = MiniBatchDictionaryLearning(n_atoms=projection_dimensios, alpha=1, n_iter=500)
    #fit = dico.fit(patches)
    ica = FastICA(n_components=projection_dimensios)
    ica.fit(patches)
    return ica

当LIMIT值较大时,表示内存错误。在scikit或其他python包中是否有一些在线(增量)替代ICA ?

没有。你真的需要ICA滤镜吗?试过在线的MiniBatchDictionaryLearningMiniBatchKMeans吗?

另外,虽然RandomizedPCA不是严格在线的,但如果要提取的组件数量很少,则可以处理中大型数据。

最新更新