r-忽略时间序列中指定点的计数重置

  • 本文关键字:时间序列 r time-series
  • 更新时间 :
  • 英文 :


我有一个类似的dataframe(编辑;添加分组变量measurement_type(:

data <- data.frame(ID = as.factor(c(rep(1, 10),
rep(2, 10))),
measurement_type = as.factor(c(rep("type_1", 5),
rep("type_2", 5),
rep("type_1", 5),
rep("type_2", 5))),
measurement_time = as.POSIXct(c("2014-06-17 04:00:00",
"2014-06-17 11:52:00",
"2014-06-17 18:58:00",
"2014-06-18 02:05:00",
"2014-06-18 08:00:00",
"2014-06-17 05:27:00",
"2014-06-17 11:10:00",
"2014-06-17 17:02:00",
"2014-06-17 23:56:00",
"2014-06-18 07:01:00",
"2014-07-03 16:01:00",
"2014-07-03 19:19:00",
"2014-07-03 23:55:00",
"2014-07-04 08:08:00",
"2014-07-04 13:55:00",
"2014-07-03 22:12:00",
"2014-07-04 08:59:00",
"2014-07-04 14:10:00",
"2014-07-04 17:00:00",
"2014-07-04 23:00:00")),
amount = c(350,470,310,470,650,
175,275,45,255,395,
130,460,540,790,69,
80,210,58,147,326),
entry_time = as.POSIXct(c(rep("2014-06-17 01:53:00", 10),
rep("2014-07-03 14:35:00", 10))))

具有ID 1ID 2的受试者在指定的entry_time进入,然后在特定的measurement_times测量累积amounts。然而,每天中午,金额都会再次归零,并再次开始计数(从零开始(。我想实现的是,一旦中午休息(因此重置为零(,它就会不断将新开始的累积量添加到中午之前已经累积的累积量中(由分组变量measurement_type分组(。

更新

多亏了@Istrel,我使用提供的答案几乎得到了正确的输出:

data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))

输出

# A tibble: 20 x 6
ID    measurement_type measurement_time    amount entry_time          amount_cums
<fct> <fct>            <dttm>               <dbl> <dttm>                    <dbl>
1 1     type_1           2014-06-17 04:00:00    350 2014-06-17 01:53:00         350
2 1     type_1           2014-06-17 11:52:00    470 2014-06-17 01:53:00         470
3 1     type_1           2014-06-17 18:58:00    310 2014-06-17 01:53:00         780
4 1     type_1           2014-06-18 02:05:00    470 2014-06-17 01:53:00        1250
5 1     type_1           2014-06-18 08:00:00    650 2014-06-17 01:53:00        1430
6 1     type_2           2014-06-17 05:27:00    175 2014-06-17 01:53:00         175
7 1     type_2           2014-06-17 11:10:00    275 2014-06-17 01:53:00         275
8 1     type_2           2014-06-17 17:02:00     45 2014-06-17 01:53:00         320
9 1     type_2           2014-06-17 23:56:00    255 2014-06-17 01:53:00         530
10 1     type_2           2014-06-18 07:01:00    395 2014-06-17 01:53:00         925
11 2     type_1           2014-07-03 16:01:00    130 2014-07-03 14:35:00         130
12 2     type_1           2014-07-03 19:19:00    460 2014-07-03 14:35:00         460
13 2     type_1           2014-07-03 23:55:00    540 2014-07-03 14:35:00         540
14 2     type_1           2014-07-04 08:08:00    790 2014-07-03 14:35:00        1330
15 2     type_1           2014-07-04 13:55:00     69 2014-07-03 14:35:00        1399
16 2     type_2           2014-07-03 22:12:00     80 2014-07-03 14:35:00          80
17 2     type_2           2014-07-04 08:59:00    210 2014-07-03 14:35:00         290
18 2     type_2           2014-07-04 14:10:00     58 2014-07-03 14:35:00         348
19 2     type_2           2014-07-04 17:00:00    147 2014-07-03 14:35:00         437
20 2     type_2           2014-07-04 23:00:00    326 2014-07-03 14:35:00         616

正如您所看到的,午休时间的累计总和会得到正确更新。然而,在夜间病例的情况下,代码将午夜后的数量添加到午夜前的总数中。然而,在午夜没有计数重置,并且该金额应该保持累积到午夜之前的金额(没有计数重置(。然而,在上面显示的输出中,累积量被添加到午夜之前的值:例如,第10行,它将395的值添加到530的amount_cums(第9行(,而它应该简单地将差值添加到最后一个值(395-255=140(,对于第10行来说,正确的amount_cums是670。

你知道我如何修改你的代码吗?

我可以提出这个策略。首先,按ID、日期(year_month_day(和AM/PM时间标记对数据进行分组。然后将累积和转换为每组中的原始值。然后按ID和日期分组重新计算累计总和。

解决方案可能是这样的:

library(tidyverse)
dat_alt <- data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))

最新更新