r语言 - 有条件的组的最后日期 - r - Last date by group with conditions 小贝子编程网

这个问题是这个问题的后续问题，但每个idPerson可以有多个decision == "d"。有多种idPerson，但有一个足以解释这个问题。idAppt嵌套在idPerson中。考虑此数据框。

idPerson idAppt decision date      
1 A             1 a        2021-09-10
2 A             1 b        2021-09-11
3 A             1 c        2021-09-12
4 A             1 d        2021-09-13
5 A             2 a        2021-09-20
6 A             2 b        2021-09-21
7 A             3 a        2021-09-10
8 A             3 b        2021-09-11
9 A             4 a        2021-09-21
10 A             4 b        2021-09-22
11 A             4 c        2021-09-23
12 A             4 d        2021-09-24
13 A             5 a        2021-09-10
14 A             5 b        2021-09-11
15 A             6 a        2021-10-10
16 A             6 b        2021-10-11

我想构建一个date2列来回复这些条件：

对于给定的idAppt，如果decision == "a"晚于任何其他日期，当decision == "d"同一idPerson时，报告该idPersondecision == "d"时的最新值date(最接近的之前)。例如，在组idAppt == 2中，decision == "a"的日期晚于组idAppt == 1的decision == "d"日期，因此date2应2021-09-13。这同样适用于组idAppt == 6，但这里有两个更早的decision == "d"(第 4 行和第 12 行)。在这种情况下，date2应该是2021-10-10之前最接近的，即2021-09-23.
当给定idAppt没有早于decision == "a"date的decision == "d"date时，取给定idPerson中最早的。

这给出了以下所需的输出：

idPerson idAppt decision date       date2       
1 A             1 a        2021-09-10 2021-09-10
2 A             1 b        2021-09-11 2021-09-10
3 A             1 c        2021-09-12 2021-09-10
4 A             1 d        2021-09-13 2021-09-10
5 A             2 a        2021-09-20 2021-09-13 #<- correspond to value of row 4
6 A             2 b        2021-09-21 2021-09-13  
7 A             3 a        2021-09-10 2021-09-10 
8 A             3 b        2021-09-11 2021-09-10
9 A             4 a        2021-09-21 2021-09-13
10 A             4 b        2021-09-22 2021-09-13
11 A             4 c        2021-09-23 2021-09-13
12 A             4 d        2021-09-24 2021-09-13
13 A             5 a        2021-09-11 2021-09-10 #<- earliest value because 2021-09-10 is earlier than 2021-09-13
14 A             5 b        2021-09-12 2021-09-10
15 A             6 a        2021-10-10 2021-09-24 #<- correspond to value of row 12
16 A             6 b        2021-10-11 2021-09-24

<小时 />

数据

df <- structure(list(idPerson = c("A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A"), idAppt = c(1L, 
1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L), 
decision = c("a", "b", "c", "d", "a", "b", "a", "b", "a", 
"b", "c", "d", "a", "b", "a", "b"), date = structure(c(18880, 
18881, 18882, 18883, 18890, 18891, 18880, 18881, 18891, 18892, 
18893, 18894, 18881, 18882, 18910, 18911), class = "Date")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -16L))
EO <- structure(list(idPerson = c("A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A"), idAppt = c(1L, 
1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L), 
decision = c("a", "b", "c", "d", "a", "b", "a", "b", "a", 
"b", "c", "d", "a", "b", "a", "b"), date = structure(c(18880, 
18881, 18882, 18883, 18890, 18891, 18880, 18881, 18891, 18892, 
18893, 18894, 18881, 18882, 18910, 18911), class = "Date"), 
date2 = c("2021-09-10", "2021-09-10", "2021-09-10", "2021-09-10", 
"2021-09-13", "2021-09-13", "2021-09-10", "2021-09-10", "2021-09-13", 
"2021-09-13", "2021-09-13", "2021-09-13", "2021-09-10", "2021-09-10", 
"2021-09-24", "2021-09-24")), row.names = c(NA, -16L), class = c("tbl_df", 
"tbl", "data.frame"))

使用data.table滚动连接：

library(data.table)
setDT(df)
# rolling join between decision "d" and "a"
df[decision == "a", date2 := df[decision == "d"][.SD, on = .(idPerson, date), x.date, roll = Inf]]
# set non-matching rows for decision "a" to min(date)
df[decision == "a" & is.na(date2), date2 := min(date), by = idPerson]
# replace other NA by last observation carried forward
setnafill(df, type = "locf", cols = "date2")
idPerson idAppt decision       date      date2
1:        A      1        a 2021-09-10 2021-09-10
2:        A      1        b 2021-09-11 2021-09-10
3:        A      1        c 2021-09-12 2021-09-10
4:        A      1        d 2021-09-13 2021-09-10
5:        A      2        a 2021-09-20 2021-09-13
6:        A      2        b 2021-09-21 2021-09-13
7:        A      3        a 2021-09-10 2021-09-10
8:        A      3        b 2021-09-11 2021-09-10
9:        A      4        a 2021-09-21 2021-09-13
10:        A      4        b 2021-09-22 2021-09-13
11:        A      4        c 2021-09-23 2021-09-13
12:        A      4        d 2021-09-24 2021-09-13
13:        A      5        a 2021-09-11 2021-09-10
14:        A      5        b 2021-09-12 2021-09-10
15:        A      6        a 2021-10-10 2021-09-24
16:        A      6        b 2021-10-11 2021-09-24

"idAppt"的相关性并不完全清楚，因为日期的比较似乎是在idPerson中进行的。

这是我解决问题的方法，尽管它看起来有点复杂：

library(dplyr)
df %>%
group_by(idPerson) %>%
mutate(d_date = list(date[decision == "d"]), min_date_person = min(date)) %>% 
group_by(idPerson, idAppt) %>%
mutate(date3 = unlist(map(d_date, (x){
dates <- date[decision == "a"] - x
w <- which.min(dates[dates > 0])
ifelse(is.null(w), NA, w)
})),
date2 = if_else(is.na(date3), min_date_person, do.call("c", map(d_date, ~ unique(.x[date3]))))) %>% 
ungroup() %>% 
select(1:4, date2)

# A tibble: 16 × 5
idPerson idAppt decision date       date2     
<chr>     <int> <chr>    <date>     <date>    
1 A             1 a        2021-09-10 2021-09-10
2 A             1 b        2021-09-11 2021-09-10
3 A             1 c        2021-09-12 2021-09-10
4 A             1 d        2021-09-13 2021-09-10
5 A             2 a        2021-09-20 2021-09-13
6 A             2 b        2021-09-21 2021-09-13
7 A             3 a        2021-09-10 2021-09-10
8 A             3 b        2021-09-11 2021-09-10
9 A             4 a        2021-09-21 2021-09-13
10 A             4 b        2021-09-22 2021-09-13
11 A             4 c        2021-09-23 2021-09-13
12 A             4 d        2021-09-24 2021-09-13
13 A             5 a        2021-09-11 2021-09-10
14 A             5 b        2021-09-12 2021-09-10
15 A             6 a        2021-10-10 2021-09-24
16 A             6 b        2021-10-11 2021-09-24

r语言 - 有条件的组的最后日期

相关内容

最新更新

热门标签：