r-基于另一组变量,将一组变量的特定值设置为NA



我需要一些帮助来解决这个问题。

我正在使用一个大型数据集,该数据集包含20多种癌症二元结果(癌症_{癌症类型}(以及相应的年龄({cancertype}_age)。一些个体缺失癌症表型信息-如果癌症表型缺失,我想将每个癌症类型的年龄变量设置为NA。我一直在尝试实现mutate(across(((,但在指定适当的参数时遇到了一些问题。

# load tidyverse lib
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
cancer_a = rep(0:1, length = 10), 
cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)), 
cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)), 
a_age = sample(30:60, 10, FALSE), 
b_age = sample(30:60, 10, FALSE), 
c_age = sample(30:60, 10, FALSE)
) 
cancer_ds
cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )
cancer_list
# attempted code
out_ds <- cancer_ds %>% 
mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))
# expected output dataset 
out_ds_exp <- cancer_ds %>% 
mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age), 
c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))
out_ds_exp

感谢您的帮助!谢谢

这里有一个选项。

cancer_ds %>%
rename_with(~ str_replace_all(.x, "([a-z])_([a-z]{2,})", "\2_\1")) %>%
pivot_longer(-id, names_to = c(".value", "grp"), names_sep = "_") %>%
mutate(age = if_else(is.na(cancer), NA_integer_, age)) %>%
pivot_wider(names_from = grp, values_from = c(cancer, age))
## A tibble: 10 x 7
#      id cancer_a cancer_b cancer_c age_a age_b age_c
#   <int>    <dbl>    <dbl>    <dbl> <int> <int> <int>
# 1  1000        0        0        0    46    33    54
# 2  1001        1        0        0    34    54    56
# 3  1002        0        0        1    30    34    33
# 4  1003        1       NA        1    54    NA    34
# 5  1004        0       NA        0    39    NA    42
# 6  1005        1        1        0    33    55    57
# 7  1006        0       NA       NA    47    NA    NA
# 8  1007        1        1       NA    60    44    NA
# 9  1008        0        1       NA    44    32    NA
#10  1009        1        1       NA    36    38    NA

说明:我们首先使用rename_with修复不一致的列名:您既有"<what>_<group>"(例如"cancer_a"(又有"<group>_<what>"(例如"a_age"(;然后这是一个简单的问题,将多个成对的列从宽到长进行整形。然后,如果cancerNA,我们可以用NAs替换age值,然后再从长到宽进行整形。

最新更新