在R中标记和计数自由形式文本中字符串出现次数的最简单方法



给定一组自由形式文本和一组任意关键字的这两个R数据帧:

df <- as.data.frame(c(
"I have a social media account on Twitter",
"I love cheese recipes on Facebook",
"I love cheese recipes on Pinterest",
"I am a social media marketer on Instagram who loves social media",
"I love posting cheese recipes on social media",
"Conspiracy theories are logical fallacies"
)) |>
rename(phrase = 1)
keyword_df <- as.data.frame(c(
"social media",
"cheese recipe",
"tinfoil hat"
))

创造这种结果最简单的方法是什么?

cheese_recipe0我喜欢Pinterest上的奶酪食谱阴谋论是逻辑谬误
短语social_mediatinfoil_hat
我在推特上有一个社交媒体账号10
我喜欢脸书上的奶酪食谱01
我是Instagram上的一名社交媒体营销人员,喜欢社交媒体20
我喜欢在社交媒体上发布奶酪食谱110
df %>%
mutate(as.data.frame(lapply(
setNames(nm = keyword_df[[1]]),
function(z) lengths(stringr::str_extract_all(phrase, z))
)))
#                                                             phrase social.media cheese.recipe tinfoil.hat
# 1                         I have a social media account on Twitter            1             0           0
# 2                                I love cheese recipes on Facebook            0             1           0
# 3                               I love cheese recipes on Pinterest            0             1           0
# 4 I am a social media marketer on Instagram who loves social media            2             0           0
# 5                    I love posting cheese recipes on social media            1             1           0
# 6                        Conspiracy theories are logical fallacies            0             0           0

最新更新