r-按月份划分的前瞻性滚动窗口中元素的总和



我有以下数据。框架有列:Id,Month,有

library(dplyr)
dt <- read.table(header = TRUE, text = '
Id  Month    have   want
1   01-Jan-2018  1.000000000000000   1.234567901220000 
1   01-Feb-2018  0.200000000000000   0.234567901233000 
1   01-Mar-2018  0.030000000000000   0.034567901234400 
1   01-Apr-2018  0.004000000000000   0.004567901234550 
1   01-May-2018  0.000500000000000   0.000567901234566 
1   01-Jun-2018  0.000060000000000   0.000067901234566 
1   01-Jul-2018  0.000007000000000   0.000007901234566 
1   01-Aug-2018  0.000000800000000   0.000000901234566 
1   01-Sep-2018  0.000000090000000   0.000000101234566 
1   01-Oct-2018  0.000000010000000   0.000000011234566 
1   01-Nov-2018  0.000000001100000   0.000000001234566 
1   01-Dec-2018  0.000000000120000   0.000000000134566 
1   01-Jan-2019  0.000000000013000   0.000000000014566 
1   01-Feb-2019  0.000000000001400   0.000000000001566 
1   01-Mar-2019  0.000000000000150   0.000000000000166 
1   01-Apr-2019  0.000000000000016   0.000000000000016 
2   01-Jan-2018 1337.00 1338.00
2   01-Feb-2018 1.00    1.00
3   01-Jan-2018  5.000000000000000000    5.000000000000000 
') %>% mutate(Month=as.Date(Month, format='%d-%b-%Y')

我想在一个12个月的前瞻性滚动窗口中按月份以编程方式计算元素的总和,并按Id分组,如列want所示。如果滚动观察窗口小于12个月,则应忽略缺失的元素。

对于奖励积分,该解决方案还将允许错过几个月,例如:

dt <- read.table(header = TRUE, text = '
Id  Month    have   want
1   01-Jan-18    1.000000000000000   1.200000000000000 
1   01-Dec-18    0.200000000000000   0.230000000000000 
1   01-Jan-19    0.030000000000000   0.030000000000000 
') %>% mutate(Month=as.Date(Month, format='%d-%b-%Y')

我尝试过不同的解决方案,例如zoo包中的rollapplyr()和runner包中的一些函数,但它似乎不能满足我的需求。

您可以将zoorollaplypartial = TRUE一起使用

library(dplyr)
dt %>%
group_by(Id) %>%
tidyr::complete(Month = seq(min(Month), max(Month), "month")) %>%
mutate(result = zoo::rollapply(have, 12, sum, na.rm = TRUE, 
align = 'left', partial = TRUE)) -> result
result

如果您有每个Id的每月数据,如共享示例中所示,则可以删除complete步骤。

我建议在这种情况下使用runner包。runner函数使您能够及时计算出具有完全控制的滚动窗口。k是窗口长度,lag是窗口的滞后,在idx中指定窗口所依赖的索引列。

library(runner)
dt %>%
group_by(Id) %>%
mutate(want2 = runner(
.,
f = function(x) sum(x$have),
k = 12,    # or "12 months"
lag = -11, # or "-11 months"
idx = Month)
)
# # A tibble: 19 x 5
# # Groups:   Id [3]
#    Id Month          have     want    want2
# <int> <date>        <dbl>    <dbl>    <dbl>
#   1     1 2018-01-01 1.00e+ 0 1.23e+ 0 1.00e+ 0
#   2     1 2018-02-01 2.00e- 1 2.35e- 1 2.00e- 1
#   3     1 2018-03-01 3.00e- 2 3.46e- 2 3.00e- 2
#   4     1 2018-04-01 4.00e- 3 4.57e- 3 4.00e- 3
#   5     1 2018-05-01 5.00e- 4 5.68e- 4 5.00e- 4
#   6     1 2018-06-01 6.00e- 5 6.79e- 5 6.00e- 5

最新更新