小贝子编程

r语言 - 如何在没有 tm 包的情况下获得所有可能的 2 个单词组合及其频率

本文关键字：有可能单词频率组合情况下 r语言 tm r text text-analysis
更新时间 : 2023-09-13
英文 : r - How to get all possible 2 words combinations with their frequency without tm package

我有这样的文本：

dat<-c("this is my farm this is my land")

我想获得所有可能的 2 个单词组合及其频率。我不能使用tm包，因此任何其他解决方案将不胜感激。输出应如下所示：

two words freq
this is     2
is my       2
my farm     1
my land     1

可以通过拆分dat然后提取连续的两个单词组合来生成组合。然后，gregexpr可以用来计算出场次数。

temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(2:length(temp), function(i)
paste(temp[(i-1):i], collapse = " ")))
sapply(temp2, function(x)
length(unlist(gregexpr(pattern = x, text = dat))))
#  this is     is my   my farm farm this   my land 
#        2         2         1         1         1

或三个单词组合

temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(3:length(temp), function(i)
paste(temp[(i-2):i], collapse = " ")))
sapply(temp2, function(x)
length(unlist(gregexpr(pattern = x, text = dat))))
#  this is my   is my farm my farm this farm this is   is my land 
#           2            1            1            1            1

r语言 - 如何在没有 tm 包的情况下获得所有可能的 2 个单词组合及其频率

相关内容

最新更新

热门标签：