我有一个数据集,其中包含来自多个参与者的平均动脉血压(MAP)随时间的变化。下面是一个示例数据框架:
df=structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), Time = structure(1:14, .Label = c("11:02:00",
"11:03:00", "11:04:00", "11:05:00", "11:06:00", "11:07:00", "11:08:00",
"13:30:00", "13:31:00", "13:32:00", "13:33:00", "13:34:00", "13:35:00",
"13:36:00"), class = "factor"), MAP = c(90.27999878, 84.25, 74.81999969,
80.87000275, 99.38999939, 81.51000214, 71.51000214, 90.08999634,
88.75, 84.72000122, 83.86000061, 94.18000031, 98.54000092, 51
)), class = "data.frame", row.names = c(NA, -14L))
我已经将数据分组:例如MAP 40-60、60-80、80-100,并在额外的列map_bin中添加了一个唯一的标志(1、2或3)。这是我到目前为止的代码:
library(dplyr)
#Mean Arterial Pressure
#Bin 1=40-60; Bin 2=60-80; Bin 3=80-100
map_bin=c("1","2","3")
output <- as_tibble(df) %>%
mutate(map_bin = case_when(
MAP >= 40 & MAP < 60 ~ map_bin[1],
MAP >= 60 & MAP < 80 ~ map_bin[2],
MAP >= 80 & MAP < 100 ~ map_bin[3]
))
对于我希望在另一列中计算的每个ID, MAP在每个bin中的总时间。我希望得到以下输出:
<表类>ID 时间 MAP map_bin map_bin_dur tbody><<tr>1 11:02:00 90.27999878 3 5 111:03:00 84.25 3 5 111:04:00 74.81999969 2 2 111:05:00 80.87000275 3 5 111:06:00 99.38999939 3 5 111:07:00 81.51000214 3 5 111:08:00 71.51000214 2 2 213:30:00 90.08999634 3 6 213:31:00 88.75 3 6 213:32:00 84.72000122 3 6 213:33:00 83.86000061 3 6 213:34:00 94.18000031 3 6 213:35:00 98.54000092 3 6 213:36:00 51 1 1 表类>
如果您的Time
列始终为1分钟持续时间,则可以使用add_count
-
library(dplyr)
output <- output %>% add_count(ID, map_bin, name = 'map_bin_dur')
output
# ID Time MAP map_bin map_bin_dur
# <int> <fct> <dbl> <chr> <int>
# 1 1 11:02:00 90.3 3 5
# 2 1 11:03:00 84.2 3 5
# 3 1 11:04:00 74.8 2 2
# 4 1 11:05:00 80.9 3 5
# 5 1 11:06:00 99.4 3 5
# 6 1 11:07:00 81.5 3 5
# 7 1 11:08:00 71.5 2 2
# 8 2 13:30:00 90.1 3 6
# 9 2 13:31:00 88.8 3 6
#10 2 13:32:00 84.7 3 6
#11 2 13:33:00 83.9 3 6
#12 2 13:34:00 94.2 3 6
#13 2 13:35:00 98.5 3 6
#14 2 13:36:00 51 1 1