我正在使用scikit learn进行Nystroem近似。主代码是:
feature_map_fourier = RBFSampler(gamma=0.5, random_state=1)
feature_map_nystroem = Nystroem(gamma=0.5, random_state=1)
fourier_approx_svm = pipeline.Pipeline([("feature_map", feature_map_fourier),
("svm", svm.LinearSVC(C=4))])
nystroem_approx_svm = pipeline.Pipeline([("feature_map", feature_map_nystroem),
("svm", svm.LinearSVC(C=4))])
# fit and predict using linear and kernel svm:
sample_sizes = np.arange(1,20)
print sample_sizes
fourier_scores = []
nystroem_scores = []
fourier_times = []
nystroem_times = []
for D in sample_sizes:
avgtime = 0.0
avgscore = 0.0
avgftime = 0.0
avgfscore = 0.0
ns = []
fs = []
for i in range(0, 10):
feature_map_fourier = RBFSampler(gamma=0.5, random_state=i)
feature_map_nystroem = Nystroem(gamma=0.5, random_state=i)
fourier_approx_svm = pipeline.Pipeline([("feature_map", feature_map_fourier),
("svm", svm.LinearSVC(C=1))])
nystroem_approx_svm = pipeline.Pipeline([("feature_map", feature_map_nystroem),("svm", svm.LinearSVC(C=1))])
nystroem_approx_svm.set_params(feature_map__n_components=D)
nystroem_approx_svm.fit(data_train, targets_train)
fourier_approx_svm.set_params(feature_map__n_components=D)
fourier_approx_svm.fit(data_train, targets_train)
start = time()
fourier_score = fourier_approx_svm.score(data_test, targets_test)
t = time() - start
avgftime += t
avgfscore += fourier_score
start = time()
nystroem_score = nystroem_approx_svm.score(data_test, targets_test)
t = time() - start
avgtime += t
avgscore += nystroem_score
ns.append(avgscore)
fs.append(avgfscore)
print 'Nstrrom '+str(np.std(ns))
print 'fs '+str(np.std(ns))
nystroem_times.append(avgtime/10.0)
nystroem_scores.append(avgscore/10.0)
fourier_times.append(avgftime/10.0)
fourier_scores.append(avgfscore/10.0)
我在尝试运行此代码时收到以下错误。
C:Userst-sujainDocumentsLDKL BaseLineNystreom>forestNormalized_kernel_appro
x.py
522910
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
Traceback (most recent call last):
File "C:Userst-sujainDocumentsLDKL BaseLineNystreomforestNormalized_kern
el_approx.py", line 70, in <module>
nystroem_approx_svm.fit(data_train, targets_train)
File "F:Python27libsite-packagessklearnpipeline.py", line 126, in fit
Xt, fit_params = self._pre_transform(X, y, **fit_params)
File "F:Python27libsite-packagessklearnpipeline.py", line 116, in _pre_tr
ansform
Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
File "F:Python27libsite-packagessklearnbase.py", line 364, in fit_transfo
rm
return self.fit(X, y, **fit_params).transform(X)
File "F:Python27libsite-packagessklearnkernel_approximation.py", line 470
, in transform
gamma=self.gamma)
File "F:Python27libsite-packagessklearnmetricspairwise.py", line 808, in
pairwise_kernels
return func(X, Y, **kwds)
File "F:Python27libsite-packagessklearnmetricspairwise.py", line 345, in
rbf_kernel
K = euclidean_distances(X, Y, squared=True)
File "F:Python27libsite-packagessklearnmetricspairwise.py", line 148, in
euclidean_distances
XX = X.multiply(X).sum(axis=1)
File "F:Python27libsite-packagesscipysparsecompressed.py", line 251, in
multiply
return self._binopt(other,'_elmul_')
File "F:Python27libsite-packagesscipysparsecompressed.py", line 676, in
_binopt
data = np.empty(maxnnz, dtype=upcast(self.dtype,other.dtype))
MemoryError
我正在使用 cygbin 和具有 100GB RAM 的系统,因此系统不可能内存不足。有人可以帮我吗?
根据评论中的讨论:此崩溃是由于转换方法在传递稀疏数据作为输入时发生的二次过度分配引起的。在 0.14.1 版本之后,它已在主分支中修复。
另请注意:在高维稀疏输入上使用 RBF 内核可能不是很有益。通常稀疏矩阵表示用于稀疏高维数据,例如文本文档的词袋特征。对于此类数据,线性内核通常与非线性内核一样好或更好,因此在这种情况下,Nystroem
方法可能毫无用处。