值错误:可对预期的原始文本文档进行迭代,接收到字符串对象.使用 tfidf 和选择功能预测新的测试数据



所以我用sklearn朴素贝叶斯分类器构建了一个模型。 我需要知道如何用输入预测句子

当我只是硬编码句子时,它的工作正常,看起来像这样

new_sentence = ['its so broken']
new_testdata_tfidf= tfidf.transform(new_sentence) 
#transform it to matrix to see the score TFIDF on the training data
fit_feature_selection = selection.transform(new_testdata_tfidf) 
#transform the new data to see if the feature remove or not, because after tfidf i use chi2 selection feature.
predicted = classifier.predict(feature_selection )
#then predict it. the classificaiton out, the class is -1 which is the correct answer

我需要用手输入文本数据作为输入,所以我像这样使用

new_sentence = input[('')] 
#i input the same sentence its so broken 
new_testdata_tfidf= tfidf.transform(new_sentence) 
#transform it to matrix to see the score TFIDF on the training data
fit_feature_selection = selection.transform(new_testdata_tfidf) 
#transform the new data to see if the feature remove or not, because after tfidf i use chi2 selection feature.
predicted = classifier.predict(feature_selection )

但它给了我输出

File "C:UsersMyfileOneDriveDesktopmodel.py", line 170, in <module>
new_testdata_tfidf= tfidf.transform(new_sentence) 
File "E:anaconda3libsite-packagessklearnfeature_extractiontext.py", line 1898, in transform
X = super().transform(raw_documents)
File "E:anaconda3libsite-packagessklearnfeature_extractiontext.py", line 1265, in transform
"Iterable over raw text documents expected, "
ValueError: Iterable over raw text documents expected, string object received.

如何解决这个问题? 任何帮助真的非常感谢。

您是否尝试过将新句子作为数组传递?

new_testdata_tfidf= tfidf.transform([new_sentence])

第一个实例是传递一个带有一个字符串元素的数组,另一个实例只是传递一个字符串

如果您尝试在代码中传递带有new_sentence = input[('')]的字符串列表,则可能需要将其替换为

new_sentence = [input()]

希望这有帮助。

最新更新