scikit learn Type error只有具有一个元素的整数数组才能转换为索引

我在调用余弦_相似性时遇到以下错误

numerator = sum(a*b for a,b in zip(x,y))
TypeError: only integer arrays with one element can be converted to an index

我正在尝试从CountVectorizer返回的文档关键字矩阵中获取关键字关键字共生矩阵。

我觉得cosine_similarity不喜欢我传递的数据类型，但我不确定具体是什么问题。这里，n是scipy.sparse.csc.csc_matrix类型，y是scipy.sparse.csr.csr_matrix 类型

documents = (
    "The sky is blue",
    "The sun is bright",
    "The sun in the sky is bright",
    "We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y =  countvectorizer.fit_transform(documents)
n  = y.T.dot(y) 
x = n.tocsr()
x = x.toarray()
numpy.fill_diagonal(x, 0) 
result = cosine_similarity(x, "None")

使用sklearn cosine_similarity运行此代码段并返回一个看起来合理的答案。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import distance_metrics
documents = (
    "The sky is blue",
    "The sun is bright",
    "The sun in the sky is bright",
    "We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y =  countvectorizer.fit_transform(documents)
n  = y.T.dot(y) 
x = n.tocsr()
x = x.toarray()
np.fill_diagonal(x, 0) 
cosine_similarity = distance_metrics()['cosine']
result = cosine_similarity(x, x)

相关内容

最新更新

热门标签：