假设我有如下数据:
d <- tibble::tribble(
~sit_comfy_sofa_1, ~sit_comfy_sofa_2, ~sit_comfy_sofa_3, ~sit_comfy_sofa_4, ~sit_comfy_couch_1, ~sit_comfy_couch_2, ~sit_comfy_couch_3, ~sit_comfy_couch_4, ~sit_comfy_settee_1, ~sit_comfy_settee_2, ~sit_comfy_settee_3, ~sit_comfy_settee_4,
1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L
)
这个标题有三个列"类别",一个用于_sofa_
,一个用于_couch_
,一个用于_settee_
。我正在尝试查看每个类别,并构建一个新的变量,该变量具有基于类别内的每个列是否== 1的条件值。
我写了这个函数来尝试:
cleaning_fcn <- function(.df, .x){
.df %>%
mutate(!!sym(paste0("explain_", .x)) := case_when(
!!sym(paste0("sit_comfy_", .x ,"_1")) == 1 ~ "Just better",
!!sym(paste0("sit_comfy_", .x, "_2")) == 1 ~ "Nice shape",
!!sym(paste0("sit_comfy_", .x ,"_3")) == 1 ~ "Like the color",
!!sym(paste0("sit_comfy_", .x ,"_4")) == 1 ~ "Nice material"),
!!sym(paste0("explain_", .x)) := factor(!!sym(paste0("explain_", .x)),
levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
}
然而,当我调用它时,我最终得到的标题是原始标题的3倍。
require(tidyverse)
purrr::map_dfr(
.x = tidyselect::all_of(c("sofa", "couch", "settee")),
.f = ~ cleaning_fcn(.df = d, .x))
有人能看出我错在哪里吗?
本质上,我想实现与下面代码相同的功能,但理想情况下,它应该是一个函数(并且通常具有更少的重复):
d <- d %>%
mutate(explain_sofa = case_when(
sit_comfy_sofa_1 == 1 ~ "Just better",
sit_comfy_sofa_2 == 1 ~ "Nice shape",
sit_comfy_sofa_3 == 1 ~ "Like the color",
sit_comfy_sofa_4 == 1 ~ "Nice material"),
explain_sofa = factor(explain_sofa, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
d <- d %>%
mutate(explain_couch = case_when(
sit_couch_sofa_1 == 1 ~ "Just better",
sit_couch_sofa_2 == 1 ~ "Nice shape",
sit_couch_sofa_3 == 1 ~ "Like the color",
sit_couch_sofa_4 == 1 ~ "Nice material"),
explain_couch = factor(explain_couch, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
d <- d %>%
mutate(explain_settee = case_when(
sit_settee_sofa_1 == 1 ~ "Just better",
sit_settee_sofa_2 == 1 ~ "Nice shape",
sit_settee_sofa_3 == 1 ~ "Like the color",
sit_settee_sofa_4 == 1 ~ "Nice material"),
explain_settee = factor(explain_settee, levels = c("Just better", "Nice shape",
"Like the color", "Nice material")))
使用map_dfr
,您正在创建数据帧的list
,每个类别一个,然后按行绑定。因此,最终得到的数据帧的行数是原来的3倍。一种选择是使用purrr::reduce
:
library(tidyverse)
purrr::reduce(.x = c("sofa", "couch", "settee"), .f = cleaning_fcn, .init = d)
#> # A tibble: 4 × 15
#> sit_comfy_sofa_1 sit_comfy_sofa_2 sit_comfy_sofa_3 sit_comfy_sofa_4
#> <int> <int> <int> <int>
#> 1 1 0 0 0
#> 2 0 0 0 1
#> 3 0 1 0 0
#> 4 0 0 1 0
#> # ℹ 11 more variables: sit_comfy_couch_1 <int>, sit_comfy_couch_2 <int>,
#> # sit_comfy_couch_3 <int>, sit_comfy_couch_4 <int>, sit_comfy_settee_1 <int>,
#> # sit_comfy_settee_2 <int>, sit_comfy_settee_3 <int>,
#> # sit_comfy_settee_4 <int>, explain_sofa <fct>, explain_couch <fct>,
#> # explain_settee <fct>