r语言 - 时间序列数据中观测值的上次出现次数和出现次数



我有这样的数据

Date        ID  Observation
10-01-2020  1   0
11-01-2020  2   0
12-01-2020  3   1
13-01-2020  1   1
14-01-2020  4   0
15-01-2020  3   0
16-01-2020  7   0
17-01-2020  4   1
18-01-2020  1   0

有 1000 个唯一 ID,时间段为 6 个月。观察是二进制的,要么是 0 要么是 1。我需要对每行的相应 ID 进行最后的正 (1( 观察,以及在此之前对 ID 的数字积极观察。例

Date       ID   Observation Number_of_1_Obs Last_Observed_1
10-01-2020  1   0   0   
11-01-2020  2   0   0   
12-01-2020  3   1   0   
13-01-2020  1   1   0   
14-01-2020  4   0   0   
15-01-2020  3   0   1   12-01-2020
16-01-2020  7   0   0   
17-01-2020  4   1   0   
18-01-2020  1   0   1   13-01-2020

在此处输入图像描述

使用dplyr我们可以group_byID计算滞后值 1 并replaceNumber_of_1_Obs == 1DateObservation == 1的位置。

library(dplyr)
df %>%
group_by(ID) %>%
mutate(Number_of_1_Obs = lag(cumsum(Observation), default = 0), 
Last_Observed_1 = ifelse(Number_of_1_Obs == 1, Date[Observation == 1], NA))

#  Date          ID Observation Number_of_1_Obs Last_Observed_1
#  <chr>      <int>       <int>           <dbl> <chr>          
#1 10-01-2020     1           0               0 NA             
#2 11-01-2020     2           0               0 NA             
#3 12-01-2020     3           1               0 NA             
#4 13-01-2020     1           1               0 NA             
#5 14-01-2020     4           0               0 NA             
#6 15-01-2020     3           0               1 12-01-2020     
#7 16-01-2020     7           0               0 NA             
#8 17-01-2020     4           1               0 NA             
#9 18-01-2020     1           0               1 13-01-2020     

数据

df <- structure(list(Date = c("10-01-2020", "11-01-2020", "12-01-2020", 
"13-01-2020", "14-01-2020", "15-01-2020", "16-01-2020", "17-01-2020", 
"18-01-2020"), ID = c(1L, 2L, 3L, 1L, 4L, 3L, 7L, 4L, 1L), Observation = c(0L, 
0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, -9L))

使用非等值连接并选择最新 obs 的data.table选项:

DT[, c("Number_of_1_Obs", "Last_Observed_1") :=
DT[Observation==1L][DT, on=.(ID, Date<Date), mult="last", 
.(fcoalesce(x.Observation, 0L), x.Date)]
]

输出:

Date ID Observation Number_of_1_Obs Last_Observed_1
1: 2020-01-10  1           0               0            <NA>
2: 2020-01-11  2           0               0            <NA>
3: 2020-01-12  3           1               0            <NA>
4: 2020-01-13  1           1               0            <NA>
5: 2020-01-14  4           0               0            <NA>
6: 2020-01-15  3           0               1      2020-01-12
7: 2020-01-16  7           0               0            <NA>
8: 2020-01-17  4           1               0            <NA>
9: 2020-01-18  1           0               1      2020-01-13

数据:

library(data.table)
DT <- fread("Date        ID  Observation
10-01-2020  1   0
11-01-2020  2   0
12-01-2020  3   1
13-01-2020  1   1
14-01-2020  4   0
15-01-2020  3   0
16-01-2020  7   0
17-01-2020  4   1
18-01-2020  1   0")
DT[, Date := as.IDate(Date, format="%d-%m-%Y")]

最新更新