我正在寻找一个函数来重复以下练习。我有一个大型数据集p
其中每行对应于一个患者。 每位患者都有对应于多次MRI扫描的日期和对应于多次临床随访的日期。每位患者每种随访类型最多可随访 20 次。
以下列名是在 REDCap 中自动生成的,我承认它们不必要地长。
磁共振成像日期
p$mr_daterd
对应于诊断时使用的第一次 MRI 扫描,并且
随后的每次MRI扫描表示为p$mr_daterd_fu1_v1
(第一次MRI随访),p$mr_daterd_fu1_v2
(第二次MRI随访),p$mr_daterd_v2_v3
(第三次MRI随访)...
p$mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20
(第20次MRI随访)
临床随访
p$date_contact_1
对应于第一次临床随访,并且
随后的每次临床随访表示为p$contact_date_2
(第二次临床随访)、p$contact_date_3
(第三次临床随访)...
p$contact_date_20
(第20次临床随访)。
因此,后缀_vX
和_x
在每次MRI和临床随访之间依次匹配。
我需要在临床随访中专门替换NA/缺失日期。 即如果p$date_contact_x
是NA/缺失,则应替换为相应的xth
MRI随访日期。
阿拉p$date_contact_x = ifelse(is.na(p$date_contact_x), p$mr_daterd_vx, p$date_contact_x)
. 但是,我不想这样定义每个随访日期,而是包含一个函数来为每个随访重复执行此操作。
我正在使用dplyr
,因此最好使用此软件包兼容的解决方案。
例
id date_contact_3 date_contact_7 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4_v5_v6_v7
1 10 <NA> <NA> <NA> 2009-03-16
2 14 <NA> <NA> 2012-03-09 <NA>
预期产出
id date_contact_3 date_contact_7 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4_v5_v6_v7
1 10 <NA> 2009-03-16 <NA> 2009-03-16
2 14 2012-03-09 <NA> 2012-03-09 <NA>
数据样本包括所有20例MRI和9名患者的临床随访
p <- structure(list(id = 32:40, mr_daterd = structure(c(15271, 12958,
15236, 12467, 12958, 15125, 12958, 11541, 13696), class = "Date"),
mr_daterd_fu1_v1 = structure(c(15716, 15785, 15391, 16307,
15764, 15474, 15932, 11765, 13976), class = "Date"), mr_daterd_fu1_v2 = structure(c(16086,
16504, 15758, NA, 16602, 15836, 16652, 12169, 14389), class = "Date"),
mr_daterd_fu1_v2_v3 = structure(c(16451, NA, 16097, NA, NA,
16209, NA, 12538, 14821), class = "Date"), mr_daterd_fu1_v2_v3_v4 = structure(c(17323,
NA, 16511, NA, NA, 16564, NA, 12888, 15146), class = "Date"),
mr_daterd_fu1_v2_v3_v4_v5 = structure(c(18130, NA, 17974,
NA, NA, 17365, NA, 13241, 15496), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 13732, 16232), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7 = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, 17308), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 15243, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 15693, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10 = structure(c(NA,
NA, NA, NA, NA, NA, NA, 16421, NA), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_1 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_2 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_3 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_4 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_6 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_8 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_10 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_11 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_12 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_13 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_14 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_15 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_16 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_17 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_18 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_19 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), date_contact_20 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date")), row.names = c(NA,
-9L), class = "data.frame")
这是一个dplyr
的方法
library(dplyr)
library(stringr)
p %>%
mutate(across(
starts_with("date_contact"),
~{
vn <- paste0("v", str_match(cur_column(), "(?<=_)\d+$"))
x <- cur_data_all() %>% select(starts_with("mr_daterd") & ends_with(vn)) %>% pull()
coalesce(., x)
}
))
输出
id mr_daterd mr_daterd_fu1_v1 mr_daterd_fu1_v2 mr_daterd_fu1_v2_v3 mr_daterd_fu1_v2_v3_v4 mr_daterd_fu1_v2_v3_v4_v5 mr_daterd_fu1_v2_v3_v4_v5_v6 mr_daterd_fu1_v2_v3_v4_v5_v6_v7 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 mr_daterd_fu1_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 date_contact_1 date_contact_2 date_contact_3 date_contact_4 date_contact_5 date_contact_6 date_contact_7 date_contact_8 date_contact_9 date_contact_10 date_contact_11 date_contact_12 date_contact_13 date_contact_14 date_contact_15 date_contact_16 date_contact_17 date_contact_18 date_contact_19 date_contact_20
1 32 2011-10-24 2013-01-11 2014-01-16 2015-01-16 2017-06-06 2019-08-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-01-11 2014-01-16 2015-01-16 2017-06-06 2019-08-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 33 2005-06-24 2013-03-21 2015-03-10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-03-21 2015-03-10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 34 2011-09-19 2012-02-21 2013-02-22 2014-01-27 2015-03-17 2019-03-19 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2012-02-21 2013-02-22 2014-01-27 2015-03-17 2019-03-19 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 35 2004-02-19 2014-08-25 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2014-08-25 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 36 2005-06-24 2013-02-28 2015-06-16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-02-28 2015-06-16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 37 2011-05-31 2012-05-14 2013-05-11 2014-05-19 2015-05-09 2017-07-18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2012-05-14 2013-05-11 2014-05-19 2015-05-09 2017-07-18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 38 2005-06-24 2013-08-15 2015-08-05 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2013-08-15 2015-08-05 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 39 2001-08-07 2002-03-19 2003-04-27 2004-04-30 2005-04-15 2006-04-03 2007-08-07 <NA> 2011-09-26 2012-12-19 2014-12-17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2002-03-19 2003-04-27 2004-04-30 2005-04-15 2006-04-03 2007-08-07 <NA> 2011-09-26 2012-12-19 2014-12-17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 40 2007-07-02 2008-04-07 2009-05-25 2010-07-31 2011-06-21 2012-06-05 2014-06-11 2017-05-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2008-04-07 2009-05-25 2010-07-31 2011-06-21 2012-06-05 2014-06-11 2017-05-22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>