我正在收集不同国家的疫苗接种记录。我想通过指示1或0来添加各国使用的不同疫苗的多列。
预览";疫苗";数据集的列
list_4
OUTPUT
[1] "Pfizer/BioNTech" "Sputnik V" "Oxford/AstraZeneca" "Moderna" "Sinopharm/Beijing" "Sinovac" "Sinopharm/Wuhan"
[8] "Covaxin" "EpiVacCorona" "Johnson&Johnson"
[1] "character"
我用下面的代码(硬编码(实现了(最终结果(,有没有办法直接使用list_4来实现同样的结果?
data_6 <- data_5 %>%
mutate("Pfizer/BioNTech" = ifelse(str_detect(vaccines, "Pfizer/BioNTech"), 1, 0)) %>%
mutate("Sputnik V" = ifelse(str_detect(vaccines, "Sputnik V"), 1, 0)) %>%
mutate("Oxford/AstraZeneca" = ifelse(str_detect(vaccines, "Oxford/AstraZeneca"), 1, 0)) %>%
mutate("Moderna" = ifelse(str_detect(vaccines, "Moderna"), 1, 0)) %>%
mutate("Sinopharm/Beijing" = ifelse(str_detect(vaccines, "Sinopharm/Beijing"), 1, 0)) %>%
mutate("Sinovac" = ifelse(str_detect(vaccines, "Sinovac"), 1, 0)) %>%
mutate("Sinopharm/Wuhan" = ifelse(str_detect(vaccines, "Sinopharm/Wuhan"), 1, 0)) %>%
mutate("Covaxin" = ifelse(str_detect(vaccines, "Covaxin"), 1, 0)) %>%
mutate("EpiVacCorona" = ifelse(str_detect(vaccines, "EpiVacCorona"), 1, 0)) %>%
mutate("Johnson&Johnson" = ifelse(str_detect(vaccines, "Johnson&Johnson"), 1, 0))
也许OP正在寻找一种方法来获得虚拟cols,以判断每一行中包含的疫苗载体中是否存在给定的疫苗。
小型reprex
strings <- c('Oxford/AstraZeneca, Sputnik V', 'Moderna, Oxford/AstraZeneca')
vaccines <- c('Oxford/AstraZeneca', 'Sputnik V', 'Moderna')
df <- tibble(strings)
df
# A tibble: 2 × 1
strings
<chr>
1 Oxford/AstraZeneca, Sputnik V
2 Moderna, Oxford/AstraZeneca
答案
library(purrr)
library(dplyr)
library(stringr)
df %>%
mutate(map_dfc(vaccines, ~+str_detect(strings, .x)) %>%
set_names(vaccines))
# A tibble: 2 × 4
strings `Oxford/AstraZeneca` `Sputnik V` Moderna
<chr> <int> <int> <int>
1 Oxford/AstraZeneca, Sputnik V 1 1 0
2 Moderna, Oxford/AstraZeneca 1 0 1