我有400个站点60年来的降水存在和不存在的每日时间序列数据。数据格式如下,在第二列中,1表示存在,0表示不存在:
Date Rainfall
---------------------
1981-01-01 0
1981-01-02 0
1981-01-03 0
1981-01-04 1
1981-01-05 0
1981-01-06 1
1981-01-07 1
1981-01-08 1
1981-01-09 0
1981-01-10 0
1981-01-11 1
1981-01-12 1
1981-01-13 1
1981-01-14 1
1981-01-15 1
1981-01-16 0
.......... .
现在我要计算连续3天以上有降雨的年份的连续降雨日数和一年中连续降雨日数最长的年份。如果连续3或超过3天(任意数量)收到降雨我会考虑它作为一个单独的事件。
我的输出将像这样
Year No of consecutive wet-days longest consecutive wet-days
1981 2 5
.
.
我们如何在R中做到这一点?如果我能解出一个电台,我就能迭代r中的所有电台。
提前感谢您的帮助:)
另一个可能的解决方案(我感谢@DarrenTsai的评论,他的评论改进了这个解决方案):
library(tidyverse)
library(lubridate)
df %>%
group_by(Year = year(ymd(Date))) %>%
mutate(x = list(rle(Rainfall))) %>%
summarise(ncons = sum(x[[1]]$lengths >= 3 & x[[1]]$values == 1),
longest = ifelse(sum(x[[1]]$values == 1) == 0, 0,
max(x[[1]]$lengths[x[[1]]$values == 1])))
#> # A tibble: 2 × 3
#> Year ncons longest
#> <dbl> <int> <int>
#> 1 1981 2 5
#> 2 1982 2 4
可以使用rle
创建事件。
library(dplyr)
df <- df %>%
mutate(event = with(rle(Rainfall), rep(seq_along(lengths), lengths)))
Date Rainfall event
1 1981-01-01 0 1
2 1981-01-02 0 1
3 1981-01-03 0 1
4 1981-01-04 1 2
5 1981-01-05 0 3
6 1981-01-06 1 4
7 1981-01-07 1 4
8 1981-01-08 1 4
9 1981-01-09 0 5
10 1981-01-10 0 5
11 1981-01-11 1 6
12 1981-01-12 1 6
13 1981-01-13 1 6
14 1981-01-14 1 6
15 1981-01-15 1 6
16 1981-01-16 0 7
这样你就可以计算出每次降雨事件的连续天数。
df %>% filter(Rainfall == 1) %>% group_by(event) %>% tally()
# A tibble: 3 x 2
event n
<int> <int>
1 2 1
2 4 3
3 6 5
进一步与Year争吵并计算累计的降雨事件将得到您期望的摘要。
仅使用{base} R函数的简单解决方案,特别是diff
和tapply
。摘要统计信息适用于开始日期为该年的事件。
date <- seq(as.Date("2000/1/1"), as.Date("2010/1/1"), "days")
rainfall <- sample(c(0,1),length(date), replace = T)
df <- data.frame(date,rainfall)
events <- df$date[which(c(1,diff(df$rainfall)) > 0)]
event.lengths <- tapply(df$rainfall, cumsum(c(1, diff(df$rainfall) > 0 )), sum)
df.events <- data.frame(events,event.lengths)
Total_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),sum)
Max_consecutive_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),max)
Year <- names(Total_rain_days)
output <- data.frame(Year, Total_rain_days, Max_consecutive_rain_days)
output
> output
Year Total_rain_days Max_consecutive_rain_days
2000 2000 192 8
2001 2001 175 9
2002 2002 196 13
2003 2003 193 9
2004 2004 164 11
2005 2005 183 7
2006 2006 196 10
2007 2007 176 7
2008 2008 179 7
2009 2009 178 9
2010 2010 1 1
另一个可能的解决方案,尽管我认为已经提出了更好的解决方案
library(lubridate)
library(dplyr)
library(tibble)
rainfall_data <- tibble::tribble(
~ date, ~rainfall,
"1981-01-01", 0,
"1981-01-02", 0,
"1981-01-03", 0,
"1981-01-04", 1,
"1981-01-05", 0,
"1981-01-06", 1,
"1981-01-07", 1,
"1981-01-08", 1,
"1981-01-09", 0,
"1981-01-10", 0,
"1981-01-11", 1,
"1981-01-12", 1,
"1981-01-13", 1,
"1981-01-14", 1,
"1981-01-15", 1,
"1981-01-16", 0
)
rainfall_data %>%
mutate(
csum = ave(rainfall, cumsum(rainfall == 0), FUN = cumsum),
event = ave(csum, cumsum(rainfall == 0), FUN = max)
) %>%
filter(event >= 3) %>%
distinct(event, .keep_all = TRUE) %>%
group_by(year = year(ymd(date))) %>%
summarise(
No_of_consecutive_wet_days = n(),
longest_consecutive_wet_days = max(event)
)
#> # A tibble: 1 × 3
#> year No_of_consecutive_wet_days longest_consecutive_wet_days
#> <dbl> <int> <dbl>
#> 1 1981 2 5
在2022-07-06由reprex包(v2.0.1)创建