如何在单词袋上进行K-NN



我有一个训练和测试集(大小相等(。我已经做了单词袋模型,我正在尝试在上面做K近邻,我不确定如何进行拟合。

单词袋模型:

from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')
bow = bow_vectorizer.fit(TrainData)
print(bow_vectorizer.vocabulary_)
bowTrain = bow_vectorizer.fit_transform(TrainData)
bowTest = bow_vectorizer.fit_transform(TestData)

试图在单词袋模型上进行KNN,我不确定我应该在"KNN.fit"部分中放入什么

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, ???? )
predict = knn.predict(bowTest[0:5000])
from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')
X_train = TrainData
#y_train = your array of labels goes here
bowVect = bow_vectorizer.fit(X_train)

您可能应该使用相同的矢量器,因为人声可能会发生变化。

bowTrain = bowVect.transform(X)
bowTest = bowVect.transform(TestData)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, y_train )
predict = knn.predict(bowTest[0:5000])

最新更新