如何在 Python 中正确使用特征哈希

我有很多相同维度的数组，例如

x = np.array([3,2,0,4,5,2,1...]) #the dimension of the vectors is above 50000 
y = np.array([1,3,4,2,4,1,4...])

我想做的是使用特征哈希来降低这些向量的维数(尽管会有冲突(。然后可以在分类器中使用低维向量。

我尝试的是

from sklearn.feature_extraction import FeatureHasher
hasher = FeatureHasher()
hash_vector = hasher.transform(x)

但是，似乎FeatureHasher不能直接使用，它说AttributeError: 'matrix' object has no attribute 'items'

因此，为了顺利进行特征哈希，接下来我应该怎么做？如果我错过了什么，谁能告诉我？或者是否有另一种方法可以更有效地进行特征哈希？

transform 方法的参数必须是样本的可迭代对象，而不是单个样本 -- 请参阅 http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html 。

但是，你的代码还有更多问题：你没有传递input_type来构建哈希器，所以它默认为dict - "字典超过(feature_name，值("(因此需要items：-(。

无论如何，没有输入类型可以使哈希器接受您似乎想要传递给transform的"未命名"特征......这不是功能哈希的工作方式。

您可以考虑不同的降维方法，例如 http://scipy-lectures.github.io/advanced/scikit-learn/#dimension-reduction-with-principal-component-analysis...

相关内容