我有这样的数据
Date ID Observation
10-01-2020 1 0
11-01-2020 2 0
12-01-2020 3 1
13-01-2020 1 1
14-01-2020 4 0
15-01-2020 3 0
16-01-2020 7 0
17-01-2020 4 1
18-01-2020 1 0
有 1000 个唯一 ID,时间段为 6 个月。观察是二进制的,要么是 0 要么是 1。我需要对每行的相应 ID 进行最后的正 (1( 观察,以及在此之前对 ID 的数字积极观察。例
Date ID Observation Number_of_1_Obs Last_Observed_1
10-01-2020 1 0 0
11-01-2020 2 0 0
12-01-2020 3 1 0
13-01-2020 1 1 0
14-01-2020 4 0 0
15-01-2020 3 0 1 12-01-2020
16-01-2020 7 0 0
17-01-2020 4 1 0
18-01-2020 1 0 1 13-01-2020
在此处输入图像描述
使用dplyr
我们可以group_by
ID
计算滞后值 1 并replace
Number_of_1_Obs == 1
Date
Observation == 1
的位置。
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Number_of_1_Obs = lag(cumsum(Observation), default = 0),
Last_Observed_1 = ifelse(Number_of_1_Obs == 1, Date[Observation == 1], NA))
# Date ID Observation Number_of_1_Obs Last_Observed_1
# <chr> <int> <int> <dbl> <chr>
#1 10-01-2020 1 0 0 NA
#2 11-01-2020 2 0 0 NA
#3 12-01-2020 3 1 0 NA
#4 13-01-2020 1 1 0 NA
#5 14-01-2020 4 0 0 NA
#6 15-01-2020 3 0 1 12-01-2020
#7 16-01-2020 7 0 0 NA
#8 17-01-2020 4 1 0 NA
#9 18-01-2020 1 0 1 13-01-2020
数据
df <- structure(list(Date = c("10-01-2020", "11-01-2020", "12-01-2020",
"13-01-2020", "14-01-2020", "15-01-2020", "16-01-2020", "17-01-2020",
"18-01-2020"), ID = c(1L, 2L, 3L, 1L, 4L, 3L, 7L, 4L, 1L), Observation = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, -9L))
使用非等值连接并选择最新 obs 的data.table
选项:
DT[, c("Number_of_1_Obs", "Last_Observed_1") :=
DT[Observation==1L][DT, on=.(ID, Date<Date), mult="last",
.(fcoalesce(x.Observation, 0L), x.Date)]
]
输出:
Date ID Observation Number_of_1_Obs Last_Observed_1
1: 2020-01-10 1 0 0 <NA>
2: 2020-01-11 2 0 0 <NA>
3: 2020-01-12 3 1 0 <NA>
4: 2020-01-13 1 1 0 <NA>
5: 2020-01-14 4 0 0 <NA>
6: 2020-01-15 3 0 1 2020-01-12
7: 2020-01-16 7 0 0 <NA>
8: 2020-01-17 4 1 0 <NA>
9: 2020-01-18 1 0 1 2020-01-13
数据:
library(data.table)
DT <- fread("Date ID Observation
10-01-2020 1 0
11-01-2020 2 0
12-01-2020 3 1
13-01-2020 1 1
14-01-2020 4 0
15-01-2020 3 0
16-01-2020 7 0
17-01-2020 4 1
18-01-2020 1 0")
DT[, Date := as.IDate(Date, format="%d-%m-%Y")]