如何查找R中具有相同值的三个或三个以上连续日期



我试图找到3天或更长时间内值相同的时段例如,如果1月1日、2日和3日的值都为2,则应将它们包括在内,但如果1月2日的值为3,则不应将它们都包括在内。

到目前为止,我已经尝试了几种方法,但没有成功!如有任何帮助,我们将不胜感激!

再版:

library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))
head(data)                   
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24
#My attempt so far but it doesn't work 
attempt1 <- data %>% 
group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
filter(Value == Value) %>%  #looking for the values being the same inbetween, but this doesn't work for that
ungroup() %>% 
select(-group_id)
head(attempt1)

使用rle:

rl <- rle(data$Value)
data[rep(rl$lengths>=3,rl$lengths),]
Date Value
1  2000-01-01     2
2  2000-01-02     2
3  2000-01-03     2
11 2000-01-11     7
12 2000-01-12     7
13 2000-01-13     7
21 2000-01-21     6
22 2000-01-22     6
23 2000-01-23     6
24 2000-01-24     6

或使用dplyr:

library(dplyr)
data %>% filter(rep(rle(Value)$length>=3,rle(Value)$length))
Date Value
1  2000-01-01     2
2  2000-01-02     2
3  2000-01-03     2
4  2000-01-11     7
5  2000-01-12     7
6  2000-01-13     7
7  2000-01-21     6
8  2000-01-22     6
9  2000-01-23     6
10 2000-01-24     6

您可以使用data.table包中的rleid创建一个临时变量。

data %>%
group_by(data.table::rleid(Value)) %>%
filter(n() >= 3) %>%
ungroup() %>%
select(Date, Value)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

或者,如果你想避免使用另一个包,你可以等效地进行

data %>%
group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
filter(n() > 2) %>%
ungroup() %>%
select(-temp)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

创建于2022-09-12,reprex v2.0.2

相关内容

  • 没有找到相关文章

最新更新