对 imdb 电影评论数据实现情绪分析时出现问题



我正在为 imdb 电影评论数据集实现情感分析,并在使用 LinearSVC() 制作 predicitions 时出现值错误。

# STOP IS FOR STOPWORDS

trainset,testset=dataloader(r'C:UserskkkDesktopnlpaclImdb_v1aclImdb')
trainset["text"]=trainset["text"].apply(lambda x:' '.join([word for word in x.split() if word not in (stop)] ))
trainset.iloc[1]["text"]
testset["text"]=testset["text"].apply(lambda x:' '.join([word for word in x.split() if word not in (stop)] ))
trainset["text"]=trainset["text"].apply(lambda x:x.lower())
replacebyspace=re.compile('[/(){}[]|@,;]')
badwords=re.compile('[^0-9a-z #+_]')     
testset["text"]=testset["text"].apply(lambda x:re.sub(replacebyspace," ",x))
trainset["text"]=trainset["text"].apply(lambda x:re.sub(replacebyspace," ",x))
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
nltk.download('wordnet')
stemmer = nltk.stem.WordNetLemmatizer()
tokenizer = nltk.tokenize.TreebankWordTokenizer()
vectorizer = TfidfVectorizer(ngram_range=(1, 2))
trainfeatures=vectorizer.fit_transform(trainset["text"])
testfeatures=vectorizer.fit_transform(testset["text"])
model=LinearSVC()
model.fit(trainfeatures,trainset["sentiment"])
pred=model.predict(testfeatures)

我期待该模型可以工作,但出现错误

Traceback (most recent call last):
  File "<ipython-input-65-e537b07a6a6a>", line 3, in <module>
    pred=model.predict(testfeatures)
  File "C:UserskkkAnaconda3libsite-packagessklearnlinear_modelbase.py", line 281, in predict
    scores = self.decision_function(X)
  File "C:UserskkAnaconda3libsite-packagessklearnlinear_modelbase.py", line 262, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 1860172 features per sample; expecting 1906325

替换这个

testfeatures=vectorizer.fit_transform(testset["text"])

testfeatures=vectorizer.transform(testset["text"])

相关内容

  • 没有找到相关文章

最新更新