r语言 - tm(文本挖掘)文档术语矩阵创建中的致命错误 - r - Fatal Error in tm (text mining) document term matrix creation 小贝子编程网

tm尝试

创建文档术语矩阵时抛出错误

library(tm)
data(crude)
#control parameters
dtm.control <- list(
    tolower           = TRUE, 
    removePunctuation = TRUE,
    removeNumbers     = TRUE,
    stopWords         = stopwords("english"),
    stemming          = TRUE, # false for sentiment
    wordLengths       = c(3, "inf"))
dtm <- DocumentTermMatrix(corp, control = dtm.control)

错误：

错误 simple_triplet_matrix（i = i， j =
j， v = as.numeric（v）， nrow = length（allTerms），： "i， j， v" 不同的长度另外：警告消息： 1： In mclapply（unname（content（x））， termFreq， control）：所有计划内核在用户代码中都遇到错误 2：在 simple_triplet_matrix（i = i， j = j， v = as.numeric（v）， nrow = length（allTerms），：胁迫引入的 NA

我做错了什么？也：

我正在使用这些教程：

基本文本挖掘
R 中的文本挖掘

是否有更好/更新的演练？

您可以考虑对代码进行一些更改，尤其是 removeStopWords 和创建语料库。下面对我有用：

library(tm)
data("crude")
#control parameters
dtm.control <- list(
  tolower           = TRUE, 
  removePunctuation = TRUE,
  removeNumbers     = TRUE,
  removestopWords   = TRUE,
  stemming          = TRUE, # false for sentiment
  wordLengths       = c(3, "inf"))
corp <- Corpus(VectorSource(crude))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
> inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 848)>>
Non-/sparse entries: 1877/15083
Sparsity           : 89%
Maximal term length: 16
Weighting          : term frequency (tf)

r语言 - tm(文本挖掘)文档术语矩阵创建中的致命错误

相关内容

最新更新

热门标签：