r语言 - 处理字符数据的神经网络



我正在使用神经网络包库(neuralnet(进行情感分析的基本实验

the structure of my data is as follows:
'data.frame':   4442 obs. of  2 variables:
 $ comment_text: chr  "really briliant apptit's intuitive and informative giving all the information you could need and seemingly very accurate." "will not connect to gpstapp does not connect to gps no matter how long i have it on. i have gps set on high ac"| __truncated__ "wish this would interest more with google now to provide weekly or monthly summaries." "uselesstdoes not talk to gps on the phone. 20 minute run no data." ...
 $ rating      : int  5 1 5 1 4 5 4 3 4 5 ...

我将这些数据深入到训练和测试部分,并运行如下的神经网络预测:

senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]
library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)
plot(neuralmodel)

运行这个后,它给了我这个错误

Error in neurons[[i]] %*% weights[[i]] : 
requires numeric/complex matrix/vector arguments

如何将其作为文本解决是重要的部分

我已经标记了文本数据,使用 tm 包进行了一些文本清理,并更新了我的代码,如下所示:

nnsenti$comment_text <- VCorpus(VectorSource(nnsenti$comment_text))

#Text Cleaning
nnsenti$comment_text <- tm_map(nnsenti$comment_text,content_transformer(tolower))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeNumbers)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removePunctuation)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,stopwords('english'))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,c('please','sad')) #Additional words
nnsenti$comment_text <- tm_map(nnsenti$comment_text, stripWhitespace)
senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]
library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)

现在我收到此错误

Error in model.frame.default(formula.reverse, data) : 
  invalid type (list) for variable 'comment_text'

您似乎没有标准化数据。您的数据至少应该以数字方式输入神经网络,甚至最好在一定范围(主要是-1,10,1之间(。

您可以使用独热编码规范化文本。通过将值(如评级(除以某个最大值来规范化值。最大评级 = 10,因此将所有评级除以 10。

最新更新