通过R中的ID将每日纵向数据分解为每月观测



我有>每个受试者100行,代表日常观察。我想按主题ID将列折叠成每月的观察结果(即每个ID有多行,每30行(天(汇总一次数据(。

如何使用dplyr指定这样的天数分组?

同样值得注意的是,所有受试者的总天数不同

编辑:下方的数据样本

df<-structure(list(ID = structure(c(100087, 100087, 100087, 100087, 
100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 
100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3, 
                                                   4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), 
BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 
20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 
20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0, 
                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), 
Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1, 
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA, 
                                                                                      -20L), class = c("tbl_df", "tbl", "data.frame"))

我将使用group_byID和一个新的times变量,您可以在其中为30行指定具有time %/% 30的bin。由于您的示例数据只有几行,因此我将其设置为5。由于每个受访者都有不同数量的times,我们需要记录first_timelast_time,然后将times重写为x - y次,其中x和y是第一次和最后一次。

across调用中,您需要指定聚合数据的方式,下面我选择mean。如果要获得BMImeanDrug1max值,则需要在单独的函数调用中指定每一列。

library(dplyr)
df %>% 
group_by(ID, times = time %/% 5) %>% 
summarise(across(BMI:Drug2, mean),
time_first = first(time),
time_last = last(time)
) %>% 
ungroup() %>% 
mutate(times = paste0(time_first, "-", time_last)) %>% 
select(-c(time_first, time_last))
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#>       ID times   BMI   Dis Drug1 Drug2
#>    <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 100087 0-4    20.1     0     1     1
#> 2 100087 5-9    20.1     0     1     1
#> 3 100087 10-14  20.1     0     1     1
#> 4 100087 15-19  20.1     0     1     1
# OPs data
df <- structure(list(ID = structure(c(100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

由reprex包于2022-09-27创建(v0.3.0(

最新更新