我正在尝试对双聚类建模,但它失败了,因为它说数组包含infs
和nans
,尽管我使用pd.isnull(DataFile).sum()
扫描了数组
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.datasets import samples_generator as sg
from sklearn.cluster.bicluster import SpectralCoclustering
from sklearn.metrics import consensus_score
DataFile=pd.read_csv("DatafilledProp.csv",sep='t')
DataFile.drop(DataFile.columns[[0, 1]], axis=1, inplace=True)
plt.matshow(DataFile.as_matrix(), cmap=plt.cm.Blues)
plt.title("Original TransMapping")
data, row_idx, col_idx = sg._shuffle(DataFile.as_matrix(), random_state=0)
plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Shuffled dataset")
plt.show()
Features=DataFile.values
model = SpectralCoclustering(n_clusters=10, random_state=0)
model.fit(Features)
这是我得到的错误:
File "C:Program Files (x86)Microsoft Visual Studio 11.0Common7IDEExtensio
nsMicrosoftPython Tools for Visual Studio2.1visualstudio_py_util.py", line 1 06, in exec_file
exec_code(code, file, global_variables)
File "C:Program Files (x86)Microsoft Visual Studio 11.0Common7IDEExtensio
nsMicrosoftPython Tools for Visual Studio2.1visualstudio_py_util.py", line 8
2, in exec_code
exec(code_obj, global_variables)
File "D:ClusteringDemoDataPreparation.pyDataPreparation.pyModel.py", line
19, in <module>
model.fit(Features)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
sklearnclusterbiclusterspectral.py", line 126, in fit
self._fit(X)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
sklearnclusterbiclusterspectral.py", line 275, in _fit
u, v = self._svd(normalized_data, n_sv, n_discard=1)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
sklearnclusterbiclusterspectral.py", line 139, in _svd
**kwargs)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
sklearnutilsextmath.py", line 299, in randomized_svd
Q = randomized_range_finder(M, n_random, n_iter, random_state)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
sklearnutilsextmath.py", line 226, in randomized_range_finder
Q, R = linalg.qr(Y, mode='economic')
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
scipylinalgdecomp_qr.py", line 127, in qr
a1 = numpy.asarray_chkfinite(a)
File "C:Usersvinay.sawantAppDataLocalContinuumAnacondalibsite-packages
numpylibfunction_base.py", line 613, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
Press any key to continue .
这已经在这里得到了回答:https://stackoverflow.com/a/42764378/2649309
这可能是scikit-learn 0.18.1中的PCA实现问题。
查看错误报告 https://github.com/scikit-learn/scikit-learn/issues/7568
所述的解决方法是将 PCA 与 svd_solver='full' 一起使用。所以试试这个 法典:
pipe = [('pca',PCA(whiten=True,svd_solver='full')),
('clf' ,lm)]
我能够解决这个问题。
pd.isnull(DataFile).sum()
只检查NaN
值,如下所示:
import pandas as pd
df = pd.DataFrame([[1,2],[3,4],[np.NaN,6]])
df
Out[12]:
0 1
0 1 2
1 3 4
2 NaN 6
pd.isnull(df).sum()
Out[13]:
0 1
1 0
dtype: int64
但它不会检查inf
,根据错误,这是可能的。
df3 = pd.DataFrame([[1,2],[3,4],[np.inf,6]])
pd.isnull(df3).sum()
Out[23]:
0 0
1 0
dtype: int64
因此,我怀疑错误是inf
而不是NaN
。
import numpy as np
np.isinf(df3).sum()
Out[25]:
0 1
1 0
dtype: int64