我有>每个受试者100行,代表日常观察。我想按主题ID将列折叠成每月的观察结果(即每个ID有多行,每30行(天(汇总一次数据(。
如何使用dplyr指定这样的天数分组?
同样值得注意的是,所有受试者的总天数不同
编辑:下方的数据样本
df<-structure(list(ID = structure(c(100087, 100087, 100087, 100087,
100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087,
100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)),
BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06,
20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06,
20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)),
Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
我将使用group_by
ID
和一个新的times
变量,您可以在其中为30行指定具有time %/% 30
的bin。由于您的示例数据只有几行,因此我将其设置为5
。由于每个受访者都有不同数量的times
,我们需要记录first_time
和last_time
,然后将times
重写为x - y
次,其中x和y是第一次和最后一次。
在across
调用中,您需要指定聚合数据的方式,下面我选择mean
。如果要获得BMI
的mean
和Drug1
的max
值,则需要在单独的函数调用中指定每一列。
library(dplyr)
df %>%
group_by(ID, times = time %/% 5) %>%
summarise(across(BMI:Drug2, mean),
time_first = first(time),
time_last = last(time)
) %>%
ungroup() %>%
mutate(times = paste0(time_first, "-", time_last)) %>%
select(-c(time_first, time_last))
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#> ID times BMI Dis Drug1 Drug2
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 100087 0-4 20.1 0 1 1
#> 2 100087 5-9 20.1 0 1 1
#> 3 100087 10-14 20.1 0 1 1
#> 4 100087 15-19 20.1 0 1 1
# OPs data
df <- structure(list(ID = structure(c(100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
由reprex包于2022-09-27创建(v0.3.0(