未安装错误:词汇未安装或提供.当使用TFIDF和Ngram模型提取特征时



我试图从CSV文本数据文件中提取特征,我有两列"标签";以及";text_stemmed";。几天前,该项目运行良好,并显示了输出。但现在有一个错误,我试图找到解决方案,但没能做到。我是python的初学者,请大家帮忙。

我的代码:

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
import random
df= pd.read_csv('updated1.csv', encoding='UTF-8')
df.head()
df.loc[df["Label"]=='Acquittal',"Label",]=0
df.loc[df["Label"]=='Convictal',"Label",]=1
df_x=df["text_stemmed"]
df_y=df["Label"]
cv = TfidfVectorizer(min_df=1,stop_words='english')
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)
cv = TfidfVectorizer(min_df=1,stop_words='english')
x_traincv = cv.fit_transform(["Hi How are you How are you doing","Hi what's up","Wow that's awesome"])

cv1 = TfidfVectorizer(min_df=1, ngram_range=(1,1), stop_words='english')
x_traincv=cv1.fit_transform(x_train)
cv1 = TfidfVectorizer(min_df=1,stop_words='english', ngram_range = ('1,1'))
a=x_traincv.toarray()
a
cv1.inverse_transform(a)

错误:

NotFittedError                            Traceback (most recent call last)
~AppDataLocalTemp/ipykernel_5228/2571295384.py in <module>
----> 1 cv1.inverse_transform(a)
C:ProgramDataAnaconda3libsite-packagessklearnfeature_extractiontext.py in inverse_transform(self, X)
1270             List of arrays of terms.
1271         """
-> 1272         self._check_vocabulary()
1273         # We need CSR format for fast row manipulations.
1274         X = check_array(X, accept_sparse='csr')
C:ProgramDataAnaconda3libsite-packagessklearnfeature_extractiontext.py in _check_vocabulary(self)
470             self._validate_vocabulary()
471             if not self.fixed_vocabulary_:
--> 472                 raise NotFittedError("Vocabulary not fitted or provided")
473 
474         if len(self.vocabulary_) == 0:
NotFittedError: Vocabulary not fitted or provided

您正在此处重新创建TfidfVectorizer:

cv1 = TfidfVectorizer(min_df=1, ngram_range=(1,1), stop_words='english')
x_traincv=cv1.fit_transform(x_train)
cv1 = TfidfVectorizer(min_df=1,stop_words='english', ngram_range = ('1,1'))

它的第二个版本被写入cv1,而那个版本从来都不适合,所以它没有词汇表。

您希望使用哪些数据来训练模型?

最新更新