使用Quanteda KWIC的结果来计算"NOT"发生的次数(R)



我试图捕捉关键字出现在"not"之后的次数文字中有大量的评论来衡量情绪。为了捕获非单词之后的单词,我使用了Quanteda的KWIC,并基于KWIC中的窗口后记为关键字创建了一个dtm。我的问题是KWIC数据帧比原始数据帧小,因此找不到相应的事件

我有这个:

library(dplyr)
library(quanteda)
text_column <- c("not safe","not safe and not listening","not safe never patient", "safe","not welcoming","nice people","corporate culture school tacos","successful words words coding","not scary")
test.df <- as.data.frame(text_column)
notwords <- c("not", "never", "don't", "seldom", "won't")
dictionary(list(possafety = c("open","open-minded", "listen*", "safe*", "patien*", "underst*", "willing to help", "helpful", "tight-knit", "hear*", "engage*", "support*", "comfortable", "belong*", "welcom*", "inclu*", "value", "respect*", "always someone you can go to for questions", "accept*")
不安全不欢迎好人企业文化学校玉米饼不可怕的

您可以对stringr包使用stringr::str_countpastecollapse = "|":

test.df$notpossafe <- stringr::str_count(test.df$text_column, 
paste(notwords, collapse = "|"))

输出:

#                      text_column notpossafe
# 1                       not safe          1
# 2     not safe and not listening          2
# 3         not safe never patient          2
# 4                           safe          0
# 5                  not welcoming          1
# 6                    nice people          0
# 7 corporate culture school tacos          0
# 8  successful words words coding          0

最新更新