我有一个数据帧(df(,其中包含每个用户的user_names和文本。我还有另一data_frame重要的话。我想创建一个 for 循环,循环遍历每个用户并计算重要单词在其文本中出现的频率。
数据:
important_words = c("marcus", "yesterday", "democrat", "republican", "trump", "hillary")
df$user_names
[1] "marc12"
[2] "jon"
[3] "67han"
[4] "XXmark"
[5] "mark"
[6] "mark"
df$text
[1] "hi my name is marcus and i am a republican"
[2] "i support hillary"
[3] "go trump!"
[4] "tomorrow i will vote democrat"
[5] "i don't think so"
[6] "yesterday was ok"
我们可以提取每个user_names
的所有important_words
,并计算每个用户拥有的唯一重要单词的数量。
library(dplyr)
library(stringr)
df %>%
group_by(user_names) %>%
summarise(unique_imp_word = n_distinct(unlist(str_extract_all(tolower(text),
str_c('\b', tolower(important_words), '\b', collapse = "|")))))