R递归+上一个依赖值的滞后



我读了很多关于滞后或累积的文章,但仍然不知道如何处理我的问题。我已经计算出了我的简化数据,如下所示,以及我对这个的变通方法的尝试

library(tidyverse)
library(lubridate)
currDt = ymd(20211130)
polyData = structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712, 8895, 18498, 18690, 18689, 18688, 18323), class = "Date"),
CRDATE = structure(c(19080, 18898, 18896, 18895, 18895, 18862, 19054, 19053, 19052, 18687), class = "Date"),
mREGNO = c("BBB", "BBB", "BBB", "BBB", "BBB", "BBB", "DDD", "DDD", "DDD", "DDD"),
SCHDRNUM = c(7711858, 7710223, 7709517, 7709318, 7709172, 7487762, 7686649, 7684567, 7683420, 7374917),
MCCDATE = structure(c(18686, 18686, 18685, 18684, 18864, 18467, 18662, 18661, 18660, 18294), class = "Date")),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
# sort accordingly for row processing next step
# by REGNO and latest CHDRNUM
polyData = polyData %>%
arrange(mREGNO, -SCHDRNUM)
n = polyData %>%
group_by(mREGNO) %>%
tally() %>%
ungroup() %>%
summarise(n = max(n))
n = n$n
polyData_backUp = polyData
# get carried backward CCDATE minus 1 month
# first round is temporary create columns for later use
polyData = polyData %>%
mutate(# temporary Keep for latest tran of every mREGNO
Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
TRUE ~ 0),

# temporary Carried_MCCDATE for latest and second latest tran of every mREGNO
Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
TRUE ~ currDt)) # temporary as at date
# keep calculate base on the updated info for every rows
for (i in 2:n) {
polyData = polyData %>%
mutate(Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
CCDATE <= Carried_MCCDATE ~ 1, # previous CCDATE must have at least one month gap
TRUE ~ 0),

Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
TRUE ~ lag(Carried_MCCDATE))) # if latest tran is not ok then will carry from carried
}

从CCDATE到MCCDATE的列基本上是按mREGNO和SCHDRNUM(降序(排序的原始数据;Carried_MCCDATE中要保留的列是我想要的理想结果。

标准如下,我已经在excel中计算出了它,并尝试在R中滞后,但它不起作用。原因是我需要同时处理Carried_MCDATE和另一个依赖列中的递归公式。

对于这个小样本,我设法用一个循环来完成它,但我是否可以通过使用tidyverse来实现这一点?因为我的实际数据要大得多。

The Carried_MCCDATE formula would be:
1) If first row of different mREGNO then = MCCDATE
2) If not first row and previous row's keep = 1 then = previous row's MCCDATE
3) Else = previous row's Carried_MCCDATE
Excel formula for cell F1 =IF(C2<>C1,E2,IF(G1=1,E1,F1))
The Keep formula would be:
1) If first row of different mREGNO then = 1
2) If CCDATE <= Carried_MCCDATE then = 1
3) Else = 0
Excel formula for cell G1 =IF(C2<>C1,1,IF(A2<=F2,1,0))

请提前帮助并感谢您!!

假设输入是polyData_backUp,我们可以使用purrr中的accumulate

library(dplyr)
library(lubridate)
library(purrr)    

polyData_backUp  %>%
group_by(mREGNO) %>%
mutate(carried = as_date(accumulate(2:n(), .init = first(MCCDATE),
function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))) %>%
ungroup

给予:

# A tibble: 10 x 6
CCDATE     CRDATE     mREGNO SCHDRNUM MCCDATE    carried   
<date>     <date>     <chr>     <dbl> <date>     <date>    
1 2021-03-30 2022-03-29 BBB     7711858 2021-02-28 2021-02-28
2 2021-03-29 2021-09-28 BBB     7710223 2021-02-28 2021-02-28
3 2021-03-27 2021-09-26 BBB     7709517 2021-02-27 2021-02-28
4 2021-03-26 2021-09-25 BBB     7709318 2021-02-26 2021-02-28
5 1994-05-10 2021-09-25 BBB     7709172 2021-08-25 2021-02-28
6 2020-08-24 2021-08-23 BBB     7487762 2020-07-24 2021-08-25
7 2021-03-04 2022-03-03 DDD     7686649 2021-02-04 2021-02-04
8 2021-03-03 2022-03-02 DDD     7684567 2021-02-03 2021-02-04
9 2021-03-02 2022-03-01 DDD     7683420 2021-02-02 2021-02-04
10 2020-03-02 2021-03-01 DDD     7374917 2020-02-02 2021-02-04

最新更新