如果我有以下数据:
library(dplyr)
tibble(
id = rep(c("A", "B"), each = 3) %>% rep(2)
)
#> # A tibble: 12 x 1
#> id
#> <chr>
#> 1 A
#> 2 A
#> 3 A
#> 4 B
#> 5 B
#> 6 B
#> 7 A
#> 8 A
#> 9 A
#> 10 B
#> 11 B
#> 12 B
如果我想按组计算行的顺序,我通常会做以下操作:
tibble(
id = rep(c("A", "B"), each = 3) %>% rep(2)
) %>%
group_by(id) %>%
mutate(sequence_group = seq_along(id))
#> # A tibble: 12 x 2
#> # Groups: id [2]
#> id sequence_group
#> <chr> <int>
#> 1 A 1
#> 2 A 2
#> 3 A 3
#> 4 B 1
#> 5 B 2
#> 6 B 3
#> 7 A 4
#> 8 A 5
#> 9 A 6
#> 10 B 4
#> 11 B 5
#> 12 B 6
然而,我希望每次组更改时都能重新启动计数。这是预期输出:
#> # A tibble: 12 x 2
#> id sequence_group
#> <chr> <int>
#> 1 A 1
#> 2 A 2
#> 3 A 3
#> 4 B 1
#> 5 B 2
#> 6 B 3
#> 7 A 1
#> 8 A 2
#> 9 A 3
#> 10 B 1
#> 11 B 2
#> 12 B 3
有什么建议吗?
使用data.table
辅助函数:
library(data.table)
df$sequence_group <- rowid(rleid(df$id))
df
# id sequence_group
# <chr> <int>
# 1 A 1
# 2 A 2
# 3 A 3
# 4 B 1
# 5 B 2
# 6 B 3
# 7 A 1
# 8 A 2
# 9 A 3
# 10 B 1
# 11 B 2
# 12 B 3
与dplyr
工作流程更相似的是:
df %>%
group_by(tmp = rleid(id)) %>%
mutate(sequence_group = seq_along(id)) %>%
ungroup() %>%
select(-tmp)
# Or simply
df <- df %>% mutate(sequence_group = rowid(rleid(id)))
最后,只使用基本R:
df$sequence_group <- unlist(lapply(rle(df$id)$lengths, seq_len))
这行吗:
library(dplyr)
df %>% mutate(rids = rep(seq_along(rle(id)$values), rle(id)$lengths)) %>%
group_by(rids) %>% mutate(sequence_group = row_number()) %>% ungroup() %>% select(-rids)
# A tibble: 12 x 2
id sequence_group
<chr> <int>
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 B 3
7 A 1
8 A 2
9 A 3
10 B 1
11 B 2
12 B 3