我正在处理学校数据,我需要复制那些只为ES或HS提供更大年级跨度的行。一些示例代码来说明我的问题:
# data
schools <- tibble(name = c("school a", "school b", "school c", "school z"),
type = c("es", "NA", "hs", "es"),
gslo = c("01", "08", "09", "KG"),
gshi = c("12", "12", "12", "05"))
schools
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
其中gslo
和gshi
分别是服务的最低和最高等级。在美国,这些学校将分为高中、中学或小学,即type
。
一些学校提供的不仅仅是小学年级的服务,但现在只被算作type
==";es";。
schools_attempt <- schools %>%
# add row based on condition and change type
# not generalized
rbind(schools %>% filter(gslo == "01", gshi == "12") %>% mutate(type = "hs"))
> schools_attempt
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
这是有效的,但不是一般性的。有可能避免一个大案子吗?注意已更改的学校类型分类(es->hs(
schools_want <- tibble(name = c("school a", "school b", "school c", "school z", "school a"),
type = c("es", "NA", "hs", "es", "hs"),
gslo = c("01", "08", "09", "KG", "01"),
gshi = c("12", "12", "12", "05", "12"))
> schools_want
# A tibble: 5 x 4
name type gslo gshi
<chr> <chr> <chr> <chr>
1 school a es 01 12
2 school b NA 08 12
3 school c hs 09 12
4 school z es KG 05
5 school a hs 01 12
谢谢!
作为一种通用方法,这可能就足够了。如果它从九年级以上开始,那就是一所高中。如果它在九年级之前结束,那就是小学。否则,两者都有,我们可以分成两行。
library(dplyr)
schools %>%
mutate(across(gslo:gshi, ~if_else(.x == "KG", 0, as.numeric(.x))),
type2 = case_when(
gslo >= 9 ~ "hs",
gshi <= 8 ~ "es",
TRUE ~ "hs, es"
)) %>%
separate_rows(type2)
# A tibble: 6 x 5
name type gslo gshi type2
<chr> <chr> <dbl> <dbl> <chr>
1 school a es 1 12 hs
2 school a es 1 12 es
3 school b NA 8 12 hs
4 school b NA 8 12 es
5 school c hs 9 12 hs
6 school z es 0 5 es
Edit:如果您想保持gslo/gshi列的原样,请将.names = "{.col}_num"),
添加到across()
调用中,并在case _when中使用gslo_num
和gshi_num
。