我为分层集群编写了以下代码,但我得到了以下错误,你能帮我吗?
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the Mall dataset with pandas
dataset =
pd.read_csv("https://raw.githubusercontent.com/akbarhusnoo/Chronic-Kidney-Disease-Prediction/main/chronic_kidney_disease.csv", na_values=["?"])
catCols = dataset.select_dtypes("object").columns
catCols = list(set(catCols))
for i in catCols:
dataset.replace({i: {'?': np.nan}}, regex=False,inplace=True)
dataset.dropna(how='all')
X = dataset.iloc[:, [3,4]].values
# Using the dendrogram to find the optimal number of clusters
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method='ward' ))
plt.title('Dendrogram')
plt.xlabel('C')
plt.ylabel('Euclidean distances')
plt.show()
# Fitting the hierarchical clustering to the mall dataset
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=5, affinity = 'euclidean', linkage = 'ward')
Y_hc = hc.fit_predict(X)
# Visualising the clusters
数据集:https://raw.githubusercontent.com/akbarhusnoo/Chronic-Kidney-Disease-Prediction/main/chronic_kidney_disease.csv
**ValueError Traceback (most recent call last)
<ipython-input-30-2c6a60c0a6d0> in <module>
12
13 import scipy.cluster.hierarchy as sch
---> 14 dendrogram = sch.dendrogram(sch.linkage(X, method='ward' ))
15 plt.title('Dendrogram')
16 plt.xlabel('C')
~anaconda3libsite-packagesscipyclusterhierarchy.py in linkage(y, method, metric, optimal_ordering)
1063
1064 if not np.all(np.isfinite(y)):
-> 1065 raise ValueError("The condensed distance matrix must contain only "
1066 "finite values.")
1067
ValueError: The condensed distance matrix must contain only finite values.*
输入数据集中存在问号,这会导致数据集值被读取/解释为字符串而不是整数。
您应该在读取CSV后将问号转换为NaN,或者直接从输入CSV文件中删除问号(在CSV中留下一个空单元格将被解释为NaN(,因此用,,
替换所有,?,
可能非常有效。
完成后,可以使用NaN删除行。注意
- 有些行只有一列带有NaN。使用
dropna(how='any')
,而不是dropna(how='all')
,以确保这些行也被删除 - 默认情况下,
dropna()
不起作用(当前版本中Pandas中的大多数操作都是默认的(。将结果分配给数据集,或使用inplace=True
参数
因此,使用
dataset = dataset.dropna('any')
删除具有NaN的行时。
尝试使用不同的链接方法,而不是"ward"(例如"single"、"complete"、"average"或"weighted"(
---> 14 dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
Ward计算可能导致infs或nans。。。