我正在尝试计算df1中的行数,它包含事件的日期
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
在df2、的开始和结束日期内
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
在这个例子中,输出看起来像
Id Start End Count
1 1 2021-06-01 2021-08-15 3
2 2 2021-08-01 2021-09-15 3
我试过类似的例子。如何获得R中两个日期之间的计数?和如果日期落在R中系列中所有日期的日期范围内,则计数行,但没有成功。
如有任何帮助或建议,我们将不胜感激。非常感谢。
请注意:预期输出中的Id 1计数应该是4吗?
你可以group_by
你的数据,并将落在%within%
和interval
的日期相加,如下所示:
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
library(dplyr)
library(lubridate)
df2 %>%
group_by(Id) %>%
mutate(Count = sum(as_date(df1$date) %within% lubridate::interval(Start, End)))
#> # A tibble: 2 × 4
#> # Groups: Id [2]
#> Id Start End Count
#> <dbl> <chr> <chr> <int>
#> 1 1 2021-06-01 2021-08-15 4
#> 2 2 2021-08-01 2021-09-15 3
创建于2022-07-12由reprex包(v2.0.1)
在outer
中使用data.table::between
。
f <- Vectorize((i, j) data.table::between(df1[i, 1L], df2[j, 2], df2[j, 3]))
transform(df2, count=colSums(outer(seq_len(nrow(df1)), seq_len(nrow(df2)), f)))
# Id Start End count
# 1 1 2021-06-01 2021-08-15 4
# 2 2 2021-08-01 2021-09-15 3
请注意,"Date"
格式是必需的,所以您可能需要事先这样做:
df1[] <- lapply(df1, as.Date)
df2[-1] <- lapply(df2[-1], as.Date)
数据:
df1 <- structure(list(date = structure(c(18839, 18840, 18851, 18853), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")
df2 <- structure(list(Id = c(1, 2), Start = structure(c(18779, 18840
), class = "Date"), End = structure(c(18854, 18885), class = "Date")), row.names = c(NA,
-2L), class = "data.frame")
或带有base
:
df2$Count <- apply(df2, 1, function(x) sum(as.Date(df1$date) %in% seq(as.Date(x["Start"]), as.Date(x["End"]), by = "1 day")))
输出:
Id Start End Count
1 1 2021-06-01 2021-08-15 4
2 2 2021-08-01 2021-09-15 3