我正在寻找一个最好是整洁的解决方案。我有长格式的4个时间点的纵向数据,偶尔我感兴趣的变量(var1
,var2
,…)在不同的时间有缺失的值。我正在寻找一种方法,仅在以下条件下为每个id
合并t1
和t2
行值:
- 对于以
var
开头的颜色,如果t1
处有NA
,则使用t2
对应的值。 - 同样,如果
t2
缺少值,尝试用t1
值填充它们。 - 如果两者都是
NA
,则保留它们为NA
。 - 如果
t1
和t2
的值不同,保持不变。 - 忽略
t3
和t4s
的值
我想我是在正确的轨道上与group_by
,starts_with
和summarise
函数的组合,但我不能得到确切的期望输出。这里的其他几个线程也类似,但不满足忽略其他时间的特定时间点标准。
示例数据:
df1
# A tibble: 16 x 4
id time var1 var2
<chr> <chr> <dbl> <dbl>
1 A1 t1 NA NA
2 A1 t2 2 3
3 A1 t3 2 2
4 A1 t4 3 2
5 A2 t1 1 2
6 A2 t2 NA NA
7 A2 t3 2 NA
8 A2 t4 1 2
9 A3 t1 2 3
10 A3 t2 NA 1
11 A3 t3 NA NA
12 A3 t4 2 2
13 A4 t1 NA NA
14 A4 t2 NA 1
15 A4 t3 1 1
16 A4 t4 1 1
所需输出
df2
# A tibble: 16 x 4
id time var1 var2
<chr> <chr> <dbl> <dbl>
1 A1 t1 2 3
2 A1 t2 2 3
3 A1 t3 2 2
4 A1 t4 3 2
5 A2 t1 1 2
6 A2 t2 1 2
7 A2 t3 2 NA
8 A2 t4 1 2
9 A3 t1 2 3
10 A3 t2 2 1
11 A3 t3 NA NA
12 A3 t4 2 2
13 A4 t1 NA 1
14 A4 t2 NA 1
15 A4 t3 1 1
16 A4 t4 1 1
reprex数据集:
df1 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2",
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1",
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4",
"t1", "t2", "t3", "t4"), var1 = c(NA, 2, 2, 3, 1, NA, 2, 1, 2,
NA, NA, 2, NA, NA, 1, 1), var2 = c(NA, 3, 2, 2, 2, NA, NA, 2,
3, 1, NA, 2, NA, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -16L), spec = structure(list(
cols = list(id = structure(list(), class = c("collector_character",
"collector")), time = structure(list(), class = c("collector_character",
"collector")), var1 = structure(list(), class = c("collector_double",
"collector")), var2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
df2 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2",
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1",
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4",
"t1", "t2", "t3", "t4"), var1 = c(2, 2, 2, 3, 1, 1, 2, 1, 2,
2, NA, 2, NA, NA, 1, 1), var2 = c(3, 3, 2, 2, 2, 2, NA, 2, 3,
1, NA, 2, 1, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -16L), spec = structure(list(
cols = list(id = structure(list(), class = c("collector_character",
"collector")), time = structure(list(), class = c("collector_character",
"collector")), var1 = structure(list(), class = c("collector_double",
"collector")), var2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
任何帮助将非常感激!
我们可以将t1和t2分开,并使用tidyr::fill(... .direction = "updown")
为它们创建您想要的逻辑,然后将其绑定到t3和t4值,并返回
bind_rows(
df1 %>%
group_by(id) %>%
filter(time %in% c("t1", "t2")) %>%
fill(c("var1", "var2"), .direction = "updown"),
df1 %>%
filter(time %in% c("t3", "t4"))
) %>%
arrange(id, time) %>%
ungroup()
# A tibble: 16 x 4
id time var1 var2
<chr> <chr> <dbl> <dbl>
1 A1 t1 2 3
2 A1 t2 2 3
3 A1 t3 2 2
4 A1 t4 3 2
5 A2 t1 1 2
6 A2 t2 1 2
7 A2 t3 2 NA
8 A2 t4 1 2
9 A3 t1 2 3
10 A3 t2 2 1
11 A3 t3 NA NA
12 A3 t4 2 2
13 A4 t1 NA 1
14 A4 t2 NA 1
15 A4 t3 1 1
16 A4 t4 1 1