r语言 - 长格式的时间序列数据,合并特定的行时间点,而忽略每个参与者的其他时间点



我正在寻找一个最好是整洁的解决方案。我有长格式的4个时间点的纵向数据,偶尔我感兴趣的变量(var1,var2,…)在不同的时间有缺失的值。我正在寻找一种方法,仅在以下条件下为每个id合并t1t2行值:

  • 对于以var开头的颜色,如果t1处有NA,则使用t2对应的值。
  • 同样,如果t2缺少值,尝试用t1值填充它们。
  • 如果两者都是NA,则保留它们为NA
  • 如果t1t2的值不同,保持不变。
  • 忽略t3t4s的值

我想我是在正确的轨道上与group_by,starts_withsummarise函数的组合,但我不能得到确切的期望输出。这里的其他几个线程也类似,但不满足忽略其他时间的特定时间点标准。

示例数据:

df1
# A tibble: 16 x 4
id    time   var1  var2
<chr> <chr> <dbl> <dbl>
1 A1    t1       NA    NA
2 A1    t2        2     3
3 A1    t3        2     2
4 A1    t4        3     2
5 A2    t1        1     2
6 A2    t2       NA    NA
7 A2    t3        2    NA
8 A2    t4        1     2
9 A3    t1        2     3
10 A3    t2       NA     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA    NA
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1 
所需输出

df2
# A tibble: 16 x 4
id    time   var1  var2
<chr> <chr> <dbl> <dbl>
1 A1    t1        2     3
2 A1    t2        2     3
3 A1    t3        2     2
4 A1    t4        3     2
5 A2    t1        1     2
6 A2    t2        1     2
7 A2    t3        2    NA
8 A2    t4        1     2
9 A3    t1        2     3
10 A3    t2        2     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA     1
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1

reprex数据集:

df1 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2", 
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1", 
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4", 
"t1", "t2", "t3", "t4"), var1 = c(NA, 2, 2, 3, 1, NA, 2, 1, 2, 
NA, NA, 2, NA, NA, 1, 1), var2 = c(NA, 3, 2, 2, 2, NA, NA, 2, 
3, 1, NA, 2, NA, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -16L), spec = structure(list(
cols = list(id = structure(list(), class = c("collector_character", 
"collector")), time = structure(list(), class = c("collector_character", 
"collector")), var1 = structure(list(), class = c("collector_double", 
"collector")), var2 = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))
df2 <- structure(list(id = c("A1", "A1", "A1", "A1", "A2", "A2", "A2", 
"A2", "A3", "A3", "A3", "A3", "A4", "A4", "A4", "A4"), time = c("t1", 
"t2", "t3", "t4", "t1", "t2", "t3", "t4", "t1", "t2", "t3", "t4", 
"t1", "t2", "t3", "t4"), var1 = c(2, 2, 2, 3, 1, 1, 2, 1, 2, 
2, NA, 2, NA, NA, 1, 1), var2 = c(3, 3, 2, 2, 2, 2, NA, 2, 3, 
1, NA, 2, 1, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -16L), spec = structure(list(
cols = list(id = structure(list(), class = c("collector_character", 
"collector")), time = structure(list(), class = c("collector_character", 
"collector")), var1 = structure(list(), class = c("collector_double", 
"collector")), var2 = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

任何帮助将非常感激!

我们可以将t1和t2分开,并使用tidyr::fill(... .direction = "updown")为它们创建您想要的逻辑,然后将其绑定到t3和t4值,并返回

bind_rows(
df1 %>%
group_by(id) %>%
filter(time %in% c("t1", "t2")) %>%
fill(c("var1", "var2"), .direction = "updown"),
df1 %>%
filter(time %in% c("t3", "t4"))
) %>%
arrange(id, time) %>%
ungroup()
# A tibble: 16 x 4
id    time   var1  var2
<chr> <chr> <dbl> <dbl>
1 A1    t1        2     3
2 A1    t2        2     3
3 A1    t3        2     2
4 A1    t4        3     2
5 A2    t1        1     2
6 A2    t2        1     2
7 A2    t3        2    NA
8 A2    t4        1     2
9 A3    t1        2     3
10 A3    t2        2     1
11 A3    t3       NA    NA
12 A3    t4        2     2
13 A4    t1       NA     1
14 A4    t2       NA     1
15 A4    t3        1     1
16 A4    t4        1     1

最新更新