不安全 不欢迎好人企业文化学校玉米饼不可怕的
我试图捕捉关键字出现在"not"之后的次数文字中有大量的评论来衡量情绪。为了捕获非单词之后的单词,我使用了Quanteda的KWIC,并基于KWIC中的窗口后记为关键字创建了一个dtm。我的问题是KWIC数据帧比原始数据帧小,因此找不到相应的事件
我有这个:
library(dplyr)
library(quanteda)
text_column <- c("not safe","not safe and not listening","not safe never patient", "safe","not welcoming","nice people","corporate culture school tacos","successful words words coding","not scary")
test.df <- as.data.frame(text_column)
notwords <- c("not", "never", "don't", "seldom", "won't")
dictionary(list(possafety = c("open","open-minded", "listen*", "safe*", "patien*", "underst*", "willing to help", "helpful", "tight-knit", "hear*", "engage*", "support*", "comfortable", "belong*", "welcom*", "inclu*", "value", "respect*", "always someone you can go to for questions", "accept*")
您可以对stringr
包使用stringr::str_count
和paste
与collapse = "|"
:
test.df$notpossafe <- stringr::str_count(test.df$text_column,
paste(notwords, collapse = "|"))
输出:
# text_column notpossafe
# 1 not safe 1
# 2 not safe and not listening 2
# 3 not safe never patient 2
# 4 safe 0
# 5 not welcoming 1
# 6 nice people 0
# 7 corporate culture school tacos 0
# 8 successful words words coding 0