我的数据集有1967年至2022年一年中每天的河流流量测量值。我想找到从4月开始和结束的年度平均流量(例如,1967年4月至1969年4月、1968年4月-1969年4月份、1969年4月至1970年4月等的平均流量(。
以下是我的数据示例:
A tibble: 20,100 x 7
river year season month date flow_rate quality
<chr> <fct> <fct> <fct> <date> <dbl> <chr>
1 wylye 1967 Winter January 1967-01-01 6.67 Good
2 wylye 1967 Winter January 1967-01-02 6.39 Good
3 wylye 1967 Winter January 1967-01-03 6.32 Good
4 wylye 1967 Winter January 1967-01-04 6.34 Good
5 wylye 1967 Winter January 1967-01-05 6.37 Good
6 wylye 1967 Winter January 1967-01-06 6.45 Good
7 wylye 1967 Winter January 1967-01-07 6.65 Good
8 wylye 1967 Winter January 1967-01-08 6.54 Good
9 wylye 1967 Winter January 1967-01-09 6.53 Good
10 wylye 1967 Winter January 1967-01-10 6.62 Good
# ... with 20,090 more rows
我看到过人们在同一年(7月至10月(的某些月份找到平均值的代码
例如
df %>%
mutate(date = as.Date(date),
day = day(date),
month = month(date),
year = year(date)) %>%
filter(between(month, 7, 10) |
day >= 7 & month == 6 |
day <= 9 & month == 11) %>%
group_by(year) %>%
summarise(tmax = mean(tmax, na.rm = TRUE))
但不是一个允许我查看多年平均年周期的代码(1967-1968/1968-1969/1969-1970等(。如有任何帮助,我们将不胜感激:(
这里有一种使用lubridate
的方法,通过改变年份来匹配您的间隔
df<-tibble::tribble(
~date, ~flow_rate,
"1967-01-01", 1,
"1967-04-01", 2,
"1968-01-01", 3,
"1968-04-01", 4 )
library(dplyr)
library(lubridate)
df_new<-df %>%
mutate(date=ymd(date),
year_shift=year(date-days(90)),
label=paste("April",year_shift,"-","March",year_shift+1)) %>%
group_by(label) %>%
summarize(flow_rate = mean(flow_rate)) %>%
ungroup()
df_new
#> # A tibble: 3 × 2
#> label flow_rate
#> <chr> <dbl>
#> 1 April 1966 - March 1967 1
#> 2 April 1967 - March 1968 2.5
#> 3 April 1968 - March 1969 4
创建于2022-01-25由reprex包(v2.0.1(