我需要一些帮助来解决这个问题。
我正在使用一个大型数据集,该数据集包含20多种癌症二元结果(癌症_{癌症类型}(以及相应的年龄({cancertype}_age)。一些个体缺失癌症表型信息-如果癌症表型缺失,我想将每个癌症类型的年龄变量设置为NA。我一直在尝试实现mutate(across(((,但在指定适当的参数时遇到了一些问题。
# load tidyverse lib
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
cancer_a = rep(0:1, length = 10),
cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)),
cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)),
a_age = sample(30:60, 10, FALSE),
b_age = sample(30:60, 10, FALSE),
c_age = sample(30:60, 10, FALSE)
)
cancer_ds
cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )
cancer_list
# attempted code
out_ds <- cancer_ds %>%
mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))
# expected output dataset
out_ds_exp <- cancer_ds %>%
mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age),
c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))
out_ds_exp
感谢您的帮助!谢谢
这里有一个选项。
cancer_ds %>%
rename_with(~ str_replace_all(.x, "([a-z])_([a-z]{2,})", "\2_\1")) %>%
pivot_longer(-id, names_to = c(".value", "grp"), names_sep = "_") %>%
mutate(age = if_else(is.na(cancer), NA_integer_, age)) %>%
pivot_wider(names_from = grp, values_from = c(cancer, age))
## A tibble: 10 x 7
# id cancer_a cancer_b cancer_c age_a age_b age_c
# <int> <dbl> <dbl> <dbl> <int> <int> <int>
# 1 1000 0 0 0 46 33 54
# 2 1001 1 0 0 34 54 56
# 3 1002 0 0 1 30 34 33
# 4 1003 1 NA 1 54 NA 34
# 5 1004 0 NA 0 39 NA 42
# 6 1005 1 1 0 33 55 57
# 7 1006 0 NA NA 47 NA NA
# 8 1007 1 1 NA 60 44 NA
# 9 1008 0 1 NA 44 32 NA
#10 1009 1 1 NA 36 38 NA
说明:我们首先使用rename_with
修复不一致的列名:您既有"<what>_<group>"
(例如"cancer_a"(又有"<group>_<what>"
(例如"a_age"(;然后这是一个简单的问题,将多个成对的列从宽到长进行整形。然后,如果cancer
是NA
,我们可以用NA
s替换age
值,然后再从长到宽进行整形。