r语言 - 在使用RLE查找连续实例后,保留DPLYR中分组的第一个和最后一个日期



解释一下-我有一个按时间顺序排列的游戏结果数据集。每一行显示球队名称、对手、日期,以及他们是否赢了。我想分成两个级别(团队和对手),看看一个团队连续赢了多少场比赛。我能做到的。我想添加的是还保留该记录的第一个日期和最后一个日期。

下面是一些示例代码供您使用:

library(tidyverse)
test <- data.frame(date = c(1:10),
team = c(rep(c("red", "blue"),5)),
opponent = c("black", "white", "black", "white", "white",
"black", "white", "white", "black", "white"),
result = c(1,1,1,0,0,1,0,0,1,1))
test %>% 
group_by(team, opponent) %>% 
mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>% 
summarise(consec_wins = max(consec_wins))
output
# A tibble: 4 × 3
# Groups:   team [2]
team  opponent consec_wins
<chr> <chr>          <dbl>
1 blue  black              1
2 blue  white              1
3 red   black              3
4 red   white              0

这个代码能够识别出红队连续三次击败黑队,但没有说明连胜的开始/结束。我试着在总结中添加first()和last()函数,但注意到它是在组级别(球队和对手)上完成的,而不仅仅是连胜的范围。

我希望这对你来说足够了。感谢!

让我知道这是否有效。我使用data.table::rleid()来识别唯一条纹的记录。

library(dplyr)
library(data.table)
test <- data.frame(date = c(1:10),
team = c(rep(c("red", "blue"),5)),
opponent = c("black", "white", "black", "white", "white",
"black", "white", "white", "black", "white"),
result = c(1,1,1,0,0,1,0,0,1,1))
output <- test %>%
group_by(team, opponent) %>%
mutate(consec_wins = ifelse(result == 0, 0, sequence(rle(result)$lengths))) %>%
mutate(win_id = if_else(result == 0, 0, data.table::rleid(result))) %>%
group_by(team, opponent, win_id) %>%
mutate(min = min(date),
max = max(date)) %>%
group_by(team, opponent) %>%
arrange(desc(consec_wins), desc(result)) %>%
slice(1) %>%
select(team, opponent, consec_wins, min, max)
output
#> # A tibble: 4 x 5
#> # Groups:   team, opponent [4]
#>   team  opponent consec_wins   min   max
#>   <chr> <chr>          <dbl> <int> <int>
#> 1 blue  black              1     6     6
#> 2 blue  white              1     2     2
#> 3 red   black              3     1     9
#> 4 red   white              0     5     7

创建于2023-04-07 with reprex v2.0.2

相关内容

  • 没有找到相关文章

最新更新