和r中的重叠组

我的数据框架由给定位置的每月天气数据组成，如下所示

set.seed(123)
dat <- 
data.frame(Year = rep(1980:1985, each = 12),
Month = rep(1:12, times = 6),
value = runif(12*6))

我把一年分成季节，如下图所示。

s1 <- c(11, 12, 1, 2) #  season 1 consists of month 11, 12, 1 and 2 i.e. cuts across years
s2 <- c(3, 4, 5) # season 2 consists of month 3, 4, 5
s3 <- c(6, 7, 8, 9, 10) # season 3 consists of month 6, 7, 8, 9, 10

以1980 -为例第一季是1979年的11月到12月，1980年的1月到2月第二季从1980年3月到5月第三季是1980年6月- 10月

然而，对于1980年，第一季是不完整的，因为它只有1个月和2个月，并且缺失1979年的第11和第12个月。相比之下，1985年第一季到第三季已经完成，因此我不需要1985年的第11和12个月，因为它有助于1986年的季节1

有了这个背景，我想按年对每个季节的月值求和使数据框为X年季节格式，而不是年-月格式这样做将没有1980年第1季的值，因为它缺少月份。
对于月份与年份相交的情况，我不知道如何对单个月份求和?

library(dplyr)

season_list <- list(s1, s2, s3)
temp_list <- list()          
for(s in seq_along(season_list)){

season_ref <- unlist(season_list[s])

if(sum(diff(season_ref) < 0) != 0){  # check if season cuts across years

dat %>% 
dplyr::filter(Month %in% season_ref) %>%

# how do I sum across years for this exception 

} else { 

# if season does not cut across years, simply filter the months in each year and add
temp_list[[s]] <- 
dat %>% 
dplyr::filter(Month %in% season_ref) %>%
dplyr::group_by(Year) %>%
dplyr::summarise(season_value = sum(value)) %>%
dplyr::mutate(season = s)
}
}

假设要对每个赛季的值求和，计算season和endYear(赛季结束的年份)，然后将它们相加。

dat %>%
group_by(endYear = Year + (Month %in% 11:12),
Season = 1 * (Month %in% s1) + 
2 * (Month %in% s2) +
3 * (Month %in% s3)) %>%
summarize(value = sum(value), .groups = "drop")

给:

# A tibble: 19 x 3
endYear Season value
<int>  <dbl> <dbl>
1    1980      1 1.08 
2    1980      2 2.23 
3    1980      3 2.47 
4    1981      1 2.66 
5    1981      2 1.25 
6    1981      3 2.91 
7    1982      1 3.00 
8    1982      2 1.43 
9    1982      3 3.50 
10    1983      1 1.48 
11    1983      2 0.693
12    1983      3 1.49 
13    1984      1 1.82 
14    1984      2 1.29 
15    1984      3 1.77 
16    1985      1 2.03 
17    1985      2 1.47 
18    1985      3 3.31 
19    1986      1 1.38

相关内容

最新更新

热门标签：