以编程方式在 R 数据帧的单个列中统计多个选择条目



调查数据通常包含多项选择列,其中条目以逗号分隔,例如:

library("tidyverse")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

最好有一个函数multiple_choice_tally来统计问题的唯一答案:

my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

什么是最有效和灵活的构建multiple_choice_tally的方法,没有任何硬编码。

我们可以使用tidyr包中的separate_rows来扩展question.2中的内容。由于您使用的是tidyversetidyr已经加载了library("tidyverse"),我们不必再次加载它。my_survey2是最终输出。

my_survey2 <- my_survey %>%
separate_rows(question.2) %>%
count(question.2) %>%
rename(response = question.2, count = n)
my_survey2
# A tibble: 3 × 2
response count
<chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

更新:设计函数

我们可以将上面的代码转换为函数,如下所示。

multiple_choice_tally <- function(survey.data, question){
question <- enquo(question)
survey.data2 <- survey.data %>%
separate_rows(!!question) %>%
count(!!question) %>%
setNames(., c("response", "count"))
return(survey.data2)
}
my_survey %>%
multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
response count
<chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

我目前对这个问题的解决方案如下:

multiple_choice_tally <- function(survey.data, question) {
## Require a sym for the RHS of !!response := if_else
question_as_quo <- enquo(question)
question_as_string <- quo_name(question_as_quo)
target_question <- rlang::sym(question_as_string)
## Collate unique responses to the question
unique_responses <- survey.data %>%
select(!!target_question) %>%
na.omit() %>%
.[[1]] %>%
strsplit(",") %>%
unlist() %>%
trimws() %>%
unique()
## Extract responses to question
question_tally <- survey.data %>%
select(!!target_question) %>%
na.omit()
## Iteratively create a column for each unique response
invisible(lapply(unique_responses,
function(response) {
question_tally <<- question_tally %>%
mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))
}))
## Gather into tidy form
question_tally %>%
summarise_if(is.logical, funs(sum)) %>%
gather(response, value = count)
}

然后可以按如下方式使用:

library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)
my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

最新更新