r语言 - 直到/之后的行从递增顺序出发(最后一个升序周期)



我的数据类似于带有时间戳的多个时间序列。

group组织,并有一个循环分量,其中time

在某些周期内增加,这些周期由这种增加模式的突然变化(即减少)所划分。我想只保留在上升趋势(最后一个上升周期)的最后一次变化之前或之后的数据(行)。

一些合成最小数据:

df <- 
data.frame(group = c(rep("A", 10), rep("B", 10), rep("C", 10)),
time = c(c(1:3, 2, 3:6, 5, 6), c(1:2, 1, 3, 7, 6:10), c(4, 3, 6, 4, 6, 7, 6, 8:10))         
)

我所说的最后一次上升趋势变化是什么意思:

library(dplyr)

# Just exemplying last change in monotonic increasing trend
df %>%
dplyr::group_by(group) %>%
dplyr::mutate(
row_num = dplyr::row_number(),
time_order = dplyr::case_when(time - dplyr::lag(time, n = 1) >= 0 ~ "increase",  
time - dplyr::lag(time, n = 1) < 0 ~ "decrease",
TRUE ~ "increase"),
where_split = dplyr::if_else(dplyr::last(which(time_order == "decrease")) == row_num, "here", NA_character_)
) %>%
print(n = Inf)
#> # A tibble: 30 x 5
#> # Groups:   group [3]
#>    group  time row_num time_order where_split
#>    <chr> <dbl>   <int> <chr>      <chr>      
#>  1 A         1       1 increase   <NA>       
#>  2 A         2       2 increase   <NA>       
#>  3 A         3       3 increase   <NA>       
#>  4 A         2       4 decrease   <NA>       
#>  5 A         3       5 increase   <NA>       
#>  6 A         4       6 increase   <NA>       
#>  7 A         5       7 increase   <NA>       
#>  8 A         6       8 increase   <NA>       
#>  9 A         5       9 decrease   here       
#> 10 A         6      10 increase   <NA>       
#> 11 B         1       1 increase   <NA>       
#> 12 B         2       2 increase   <NA>       
#> 13 B         1       3 decrease   <NA>       
#> 14 B         3       4 increase   <NA>       
#> 15 B         7       5 increase   <NA>       
#> 16 B         6       6 decrease   here       
#> 17 B         7       7 increase   <NA>       
#> 18 B         8       8 increase   <NA>       
#> 19 B         9       9 increase   <NA>       
#> 20 B        10      10 increase   <NA>       
#> 21 C         4       1 increase   <NA>       
#> 22 C         3       2 decrease   <NA>       
#> 23 C         6       3 increase   <NA>       
#> 24 C         4       4 decrease   <NA>       
#> 25 C         6       5 increase   <NA>       
#> 26 C         7       6 increase   <NA>       
#> 27 C         6       7 decrease   here       
#> 28 C         8       8 increase   <NA>       
#> 29 C         9       9 increase   <NA>       
#> 30 C        10      10 increase   <NA>

在2022-05-17由reprex包(v2.0.1)创建

为了便于验证,我提供了我的解决方案:

# All rows until last change in trend, by group
check_until <- 
df %>%
dplyr::group_by(group) %>%
dplyr::mutate(
row_num = dplyr::row_number(),
time_order = dplyr::case_when(time - dplyr::lag(time, n = 1) >= 0 ~ "increase",  
time - dplyr::lag(time, n = 1) < 0 ~ "decrease",
TRUE ~ "increase")) %>%
dplyr::slice(1:dplyr::last(which(time_order == "decrease"))) %>%
dplyr::select(-c(row_num, time_order))
# All rows after last change in trend, by group
check_after <- 
df %>%
group_by(group) %>%
dplyr::mutate(
row_num = dplyr::row_number(),
time_order = dplyr::case_when(time - lag(time, n = 1) >= 0 ~ "increase",  
time - lag(time, n = 1) < 0 ~ "decrease",
TRUE ~ "increase")) %>%
dplyr::slice(dplyr::last(which(time_order == "decrease")):max(row_num)) %>%
dplyr::select(-c(row_num, time_order)) 

我的解决方案有效,但它们似乎太啰嗦和效率低下。我相信还有更优雅的解决方案。欢迎任何见解,我也对datatable解决方案持开放态度。

cumsum+diff+slice(_max)都可以实现。

(1)所有行直到最后的变化趋势:

df %>%
group_by(group) %>%
slice(1:which.max(cumsum(c(1, diff(time) < 0)))) %>%
ungroup()
# # A tibble: 22 × 2
#    group  time
#    <chr> <dbl>
#  1 A         1
#  2 A         2
#  3 A         3
#  4 A         2
#  5 A         3
#  6 A         4
#  7 A         5
#  8 A         6
#  9 A         5
# 10 B         1
# 11 B         2
# 12 B         1
# 13 B         3
# 14 B         7
# 15 B         6
# 16 C         4
# 17 C         3
# 18 C         6
# 19 C         4
# 20 C         6
# 21 C         7
# 22 C         6

(2)趋势变化之后的所有行:

df %>%
group_by(group) %>%
slice_max(cumsum(c(1, diff(time) < 0))) %>%
ungroup()
# A tibble: 11 × 2
#    group  time
#    <chr> <dbl>
#  1 A         5
#  2 A         6
#  3 B         6
#  4 B         7
#  5 B         8
#  6 B         9
#  7 B        10
#  8 C         6
#  9 C         8
# 10 C         9
# 11 C        10

最新更新