假设我有一个年-周序列:
s <- c('2020 WK 01', '2021 WK 41', '2021 WK 42', '2021 WK 43', '2021 WK 45')
我想在一个plot标题中显示给用户,但是结果标题太长了。我的想法是将相邻的年-周连字符,例如我期望的结果:
title <- "2020 WK 01, 2021 WK 41 - 43, 2021 WK 45"
在R中是否有一种习惯的方法来做到这一点?
这是一个基本R选项-
#Get the week number
week_number <- as.numeric(sub('.*WK\s+', '', s))
#If the weeks are consecutive group them in one
#get the week number from last value and paste it to first value.
unname(tapply(s, cumsum(c(TRUE, diff(week_number) > 1)), function(x) {
if(length(x) > 1) paste(x[1], sub('.*WK\s+', '', x[length(x)]), sep = '-')
else x
}))
#[1] "2020 WK 01" "2021 WK 41-43" "2021 WK 45"
上面的代码对于同一年的数据工作得很好,但是如果输入跨越多个年份,则返回不正确的输出,因为它没有考虑年份值。我们可以扩展同样的逻辑,包括year
值。我使用了tidyverse
库,因为它很容易使用。
library(dplyr)
library(tidyr)
s = c('2020 WK 40', '2021 WK 41', '2021 WK 42', '2021 WK 43', '2022 WK 44')
tibble(s) %>%
separate(s, c('YEAR', 'WEEK_NUM'), sep = '\s*WK\s*',
convert = TRUE, remove = FALSE) %>%
arrange(YEAR, WEEK_NUM) %>%
group_by(YEAR, group = cumsum(c(TRUE, diff(WEEK_NUM) > 1))) %>%
summarise(title = if(n() > 1) paste(first(s), last(WEEK_NUM), sep = '-') else s) %>%
pull(title)
#[1] "2020 WK 40" "2021 WK 41-43" "2022 WK 44"