在R.日时间序列数据中按年连续日的存在情况



我有400个站点60年来的降水存在和不存在的每日时间序列数据。数据格式如下,在第二列中,1表示存在,0表示不存在:

Date         Rainfall
---------------------
1981-01-01   0
1981-01-02   0
1981-01-03   0
1981-01-04   1
1981-01-05   0
1981-01-06   1
1981-01-07   1
1981-01-08   1
1981-01-09   0
1981-01-10   0
1981-01-11   1
1981-01-12   1
1981-01-13   1
1981-01-14   1
1981-01-15   1
1981-01-16   0
..........   .

现在我要计算连续3天以上有降雨的年份的连续降雨日数和一年中连续降雨日数最长的年份。如果连续3或超过3天(任意数量)收到降雨我会考虑它作为一个单独的事件。

我的输出将像这样

Year      No of consecutive wet-days   longest consecutive wet-days
1981      2                            5
.
.

我们如何在R中做到这一点?如果我能解出一个电台,我就能迭代r中的所有电台。

提前感谢您的帮助:)

另一个可能的解决方案(我感谢@DarrenTsai的评论,他的评论改进了这个解决方案):

library(tidyverse)
library(lubridate)
df %>% 
group_by(Year = year(ymd(Date))) %>%
mutate(x = list(rle(Rainfall))) %>% 
summarise(ncons = sum(x[[1]]$lengths >= 3 & x[[1]]$values == 1),
longest = ifelse(sum(x[[1]]$values == 1) == 0, 0, 
max(x[[1]]$lengths[x[[1]]$values == 1])))
#> # A tibble: 2 × 3
#>    Year ncons longest
#>   <dbl> <int>   <int>
#> 1  1981     2       5
#> 2  1982     2       4

可以使用rle创建事件。

library(dplyr)
df <- df %>%
mutate(event = with(rle(Rainfall), rep(seq_along(lengths), lengths))) 
Date Rainfall event
1  1981-01-01        0     1
2  1981-01-02        0     1
3  1981-01-03        0     1
4  1981-01-04        1     2
5  1981-01-05        0     3
6  1981-01-06        1     4
7  1981-01-07        1     4
8  1981-01-08        1     4
9  1981-01-09        0     5
10 1981-01-10        0     5
11 1981-01-11        1     6
12 1981-01-12        1     6
13 1981-01-13        1     6
14 1981-01-14        1     6
15 1981-01-15        1     6
16 1981-01-16        0     7

这样你就可以计算出每次降雨事件的连续天数。

df %>% filter(Rainfall == 1) %>% group_by(event) %>% tally()
# A tibble: 3 x 2
event     n
<int> <int>
1     2     1
2     4     3
3     6     5

进一步与Year争吵并计算累计的降雨事件将得到您期望的摘要。

仅使用{base} R函数的简单解决方案,特别是difftapply。摘要统计信息适用于开始日期为该年的事件。

date <- seq(as.Date("2000/1/1"), as.Date("2010/1/1"), "days")
rainfall <- sample(c(0,1),length(date), replace = T)
df <- data.frame(date,rainfall)
events <- df$date[which(c(1,diff(df$rainfall)) > 0)]
event.lengths <- tapply(df$rainfall, cumsum(c(1, diff(df$rainfall) > 0 )), sum)
df.events <- data.frame(events,event.lengths)
Total_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),sum)
Max_consecutive_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),max)
Year <- names(Total_rain_days)
output <- data.frame(Year, Total_rain_days, Max_consecutive_rain_days)
output
> output
Year Total_rain_days Max_consecutive_rain_days
2000 2000             192                         8
2001 2001             175                         9
2002 2002             196                        13
2003 2003             193                         9
2004 2004             164                        11
2005 2005             183                         7
2006 2006             196                        10
2007 2007             176                         7
2008 2008             179                         7
2009 2009             178                         9
2010 2010               1                         1

另一个可能的解决方案,尽管我认为已经提出了更好的解决方案

library(lubridate)
library(dplyr)
library(tibble)

rainfall_data <- tibble::tribble(
~ date, ~rainfall,
"1981-01-01",   0,
"1981-01-02",   0,
"1981-01-03",   0,
"1981-01-04",   1,
"1981-01-05",   0,
"1981-01-06",   1,
"1981-01-07",   1,
"1981-01-08",   1,
"1981-01-09",   0,
"1981-01-10",   0,
"1981-01-11",   1,
"1981-01-12",   1,
"1981-01-13",   1,
"1981-01-14",   1,
"1981-01-15",   1,
"1981-01-16",   0
)

rainfall_data %>%
mutate(
csum = ave(rainfall, cumsum(rainfall == 0), FUN = cumsum),
event = ave(csum, cumsum(rainfall == 0), FUN = max)
) %>%
filter(event >= 3) %>%
distinct(event, .keep_all = TRUE) %>%
group_by(year = year(ymd(date))) %>%
summarise(
No_of_consecutive_wet_days = n(),
longest_consecutive_wet_days = max(event)
)
#> # A tibble: 1 × 3
#>    year No_of_consecutive_wet_days longest_consecutive_wet_days
#>   <dbl>                      <int>                        <dbl>
#> 1  1981                          2                            5

在2022-07-06由reprex包(v2.0.1)创建

最新更新