for loop:选择在 R 中使用特定单词超过 x 次的用户



我有一个数据帧(df(,其中包含每个用户的user_names和文本。我还有另一data_frame重要的话。我想创建一个 for 循环,循环遍历每个用户并计算重要单词在其文本中出现的频率。

数据:

important_words = c("marcus", "yesterday", "democrat", "republican", "trump", "hillary")
df$user_names 
[1] "marc12"
[2] "jon"
[3] "67han"
[4] "XXmark"
[5] "mark"
[6] "mark"
df$text
[1] "hi my name is marcus and i am a republican"
[2] "i support hillary"
[3] "go trump!"
[4] "tomorrow i will vote democrat"
[5] "i don't think so"
[6] "yesterday was ok"

我们可以提取每个user_names的所有important_words,并计算每个用户拥有的唯一重要单词的数量。

library(dplyr)
library(stringr)
df %>%
group_by(user_names) %>%
summarise(unique_imp_word = n_distinct(unlist(str_extract_all(tolower(text),
str_c('\b', tolower(important_words), '\b', collapse = "|")))))

最新更新