给定一组自由形式文本和一组任意关键字的这两个R数据帧:
df <- as.data.frame(c(
"I have a social media account on Twitter",
"I love cheese recipes on Facebook",
"I love cheese recipes on Pinterest",
"I am a social media marketer on Instagram who loves social media",
"I love posting cheese recipes on social media",
"Conspiracy theories are logical fallacies"
)) |>
rename(phrase = 1)
keyword_df <- as.data.frame(c(
"social media",
"cheese recipe",
"tinfoil hat"
))
创造这种结果最简单的方法是什么?
短语 | social_media | cheese_recipetinfoil_hat | |
---|---|---|---|
我在推特上有一个社交媒体账号 | 1 | 0 | |
我喜欢脸书上的奶酪食谱 | 0 | 1 | 0|
我是Instagram上的一名社交媒体营销人员,喜欢社交媒体 | 2 | 0 | |
我喜欢在社交媒体上发布奶酪食谱 | 1 | 1 | 0 |
df %>%
mutate(as.data.frame(lapply(
setNames(nm = keyword_df[[1]]),
function(z) lengths(stringr::str_extract_all(phrase, z))
)))
# phrase social.media cheese.recipe tinfoil.hat
# 1 I have a social media account on Twitter 1 0 0
# 2 I love cheese recipes on Facebook 0 1 0
# 3 I love cheese recipes on Pinterest 0 1 0
# 4 I am a social media marketer on Instagram who loves social media 2 0 0
# 5 I love posting cheese recipes on social media 1 1 0
# 6 Conspiracy theories are logical fallacies 0 0 0