我试图找到3天或更长时间内值相同的时段例如,如果1月1日、2日和3日的值都为2,则应将它们包括在内,但如果1月2日的值为3,则不应将它们都包括在内。
到目前为止,我已经尝试了几种方法,但没有成功!如有任何帮助,我们将不胜感激!
再版:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24
#My attempt so far but it doesn't work
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
filter(Value == Value) %>% #looking for the values being the same inbetween, but this doesn't work for that
ungroup() %>%
select(-group_id)
head(attempt1)
使用rle
:
rl <- rle(data$Value)
data[rep(rl$lengths>=3,rl$lengths),]
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
11 2000-01-11 7
12 2000-01-12 7
13 2000-01-13 7
21 2000-01-21 6
22 2000-01-22 6
23 2000-01-23 6
24 2000-01-24 6
或使用dplyr
:
library(dplyr)
data %>% filter(rep(rle(Value)$length>=3,rle(Value)$length))
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
4 2000-01-11 7
5 2000-01-12 7
6 2000-01-13 7
7 2000-01-21 6
8 2000-01-22 6
9 2000-01-23 6
10 2000-01-24 6
您可以使用data.table
包中的rleid
创建一个临时变量。
data %>%
group_by(data.table::rleid(Value)) %>%
filter(n() >= 3) %>%
ungroup() %>%
select(Date, Value)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
或者,如果你想避免使用另一个包,你可以等效地进行
data %>%
group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
filter(n() > 2) %>%
ungroup() %>%
select(-temp)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
创建于2022-09-12,reprex v2.0.2