我正在寻找一种直接的方法: 组的第一个组 ID 的 -> 仅选取日期差异至少为 1 个月的 ID,即日期差为 1 个月的条目
ID DATE
A15217177635833 25-08-2018
A15217177635833 28-06-2018
A15217177635833 05-05-2018
A15217177635833 30-05-2019
F15039820795577 22-08-2017
F15039820795577 15-06-2017
F15039820795577 15-08-2018
F15039820795577 25-08-2018
F15039820795577 15-08-2018
预期产出:
ID DATE
A15217177635833 05-05-2018
A15217177635833 28-06-2018 (its 1 Month ahead from 05-05-2018)
A15217177635833 25-08-2018
A15217177635833 30-05-2019
F15039820795577 15-06-2017
F15039820795577 22-08-2017
F15039820795577 15-08-2018
我希望通过分组和过滤器(dplyr(或apply((系列来实现这一点,但任何其他方式也可以。
在dplyr
中,您可以将DATE
对象转换为日期对象,通过ID
和DATE
arrange
它,并选择与上一个条目相差一个多月的条目。
library(dplyr)
df %>%
mutate(DATE = as.Date(DATE, "%d-%m-%Y")) %>%
arrange(ID, DATE) %>%
group_by(ID) %>%
filter(DATE > lag(DATE, default = TRUE) + months(1))
# ID DATE
# <fct> <date>
#1 A15217177635833 2018-05-05
#2 A15217177635833 2018-06-28
#3 A15217177635833 2018-08-25
#4 A15217177635833 2019-05-30
#5 F15039820795577 2017-06-15
#6 F15039820795577 2017-08-22
#7 F15039820795577 2018-08-15
数据
df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), .Label = c("A15217177635833", "F15039820795577"), class = "factor"),
DATE = structure(c(5L, 6L, 1L, 7L, 4L, 2L, 3L, 5L, 3L), .Label = c("05-05-2018",
"15-06-2017", "15-08-2018", "22-08-2017", "25-08-2018", "28-06-2018",
"30-05-2019"), class = "factor")), class = "data.frame", row.names = c(NA, -9L))
Base R,"month"定义为(年/12 中的天数(:
data.frame(do.call("rbind", lapply(split(df, df$ID), function(x){
x <- x[order(x$DATE),]
x$date_diff <- c(0, diff(x$DATE, n = 1))
x[c(TRUE, x$date_diff[2:nrow(x)] > (365/12)),
names(x) != "date_diff"]
}
)
),
row.names = NULL
)