我有以下两个数据帧:
> Reaction_per_park_per_day_3
Park Date
14st NE - Coventry 2019-05-08
14st NE - Coventry 2019-05-15
14st NE - Coventry 2019-08-09
14st NE - Coventry 2019-08-22
Airways Park 2018-11-27
Airways Park 2020-12-16
Airways Park 2020-12-24
Arbour Lake East 2017-01-02
Arbour Lake East 2017-01-03
Arbour Lake East 2017-01-07
Arbour Lake East 2017-01-08
> Reports_per_park_per_day_3
Park Month
14st NE - Coventry 2019-05-16
14st NE - Coventry 2019-05-17
14st NE - Coventry 2019-08-14
Airways Park 2021-04-02
Arbour Lake East 2017-01-04
Arbour Lake East 2017-02-04
我想在Reports_per_park_per_day_3数据框(Number_AC)中添加一行,它将在相同公园内的Reports_per_park_per_day_3中的每个事件之前计算反动_per_park_per_day_3数据框中的事件总数。因此,我希望Reports_per_park_per_day_3数据框看起来像这样:
Park Month Number_AC
14st NE - Coventry 2019-05-16 2
14st NE - Coventry 2019-05-17 2
14st NE - Coventry 2019-08-14 3
Airways Park 2021-04-02 2
Arbour Lake East 2017-01-04 2
Arbour Lake East 2017-02-04 4
我尝试了以下操作,但它不起作用,因为它为整个行提供了0个事件:
> library(dplyr)
> Reports_per_park_per_day_3 <- Reports_per_park_per_day_3 %>%
left_join( Reaction_per_park_per_day_3, by="Park" ) %>%
filter( Date <= Month ) %>%
group_by( Park, Month) %>%
summarize(Number_AC = sum(Month <= Date & Month >= Date), .groups = "drop") %>%
distinct
通过使用merge,您可以做您想做的事情。请记住,这个解决方案不是最快的,如果数据帧很大,可能会导致内存问题。
Reaction_per_park_per_day_3 %>%
merge( Reports_per_park_per_day_3 , by=NULL ) %>%
filter( Date <= Month,Park.x==Park.y ) %>%
select(Park = Park.x,Month,Date) %>%
count(Park,Month,name = "Number_AC")
您可以做一个full_join
,并且对于Park
和Month
的每个组合,计算Date
中小于Month
值的个数。
library(dplyr)
Reaction_per_park_per_day_3 %>%
full_join(Reports_per_park_per_day_3, by = 'Park') %>%
group_by(Park, Month) %>%
summarise(Number_AC = sum(Date <= Month), .groups = 'drop')
# Park Month Number_AC
# <chr> <chr> <int>
#1 14stNE-Coventry 2019-05-16 2
#2 14stNE-Coventry 2019-05-17 2
#3 14stNE-Coventry 2019-08-14 3
#4 AirwaysPark 2021-04-02 3
#5 ArbourLakeEast 2017-01-04 2
#6 ArbourLakeEast 2017-02-04 4