我在每个阶段都有一个具有id和状态的status_df
:
您的尝试方向是正确的,但是,在比较(==
)之前,您提前关闭了any
/all
括号。此外,由于您只需要每个id
有一行,您可以使用summarise
而不是mutate
,这也将避免使用select
。
library(dplyr)
status_df %>%
group_by(id) %>%
summarise(final_status = case_when(any(status == "Pending") ~ "Pending",
any(status == "Rejected") ~ "Rejected",
all(status == "Approved") ~ "Approved"))
# id final_status
#* <int> <chr>
#1 15 Pending
#2 16 Rejected
#3 20 Approved
我们可以使用summarise
而不是mutate
(因为mutate
返回的输出列与输入列的length
相同,并且它用于创建/修改列而不是汇总)。
另外,一个更简单的选择是用自定义顺序指定的levels
转换为factor
,删除未使用的级别(droplevels
)并在按'id'分组后选择first
levels
library(dplyr)
status_df %>%
group_by(id) %>%
summarise(final_status = first(levels(droplevels(factor(status,
levels = c("Pending", "Rejected", "Approved"))))), .groups = 'drop')
与产出
# A tibble: 3 x 2
# id final_status
# <int> <chr>
#1 15 Pending
#2 16 Rejected
#3 20 Approved
在OP的代码中,any(status)
返回NA
,而不是它应该包装在一个逻辑向量上,即any(status == "Pending")
。此外,如上所述,它将是summarise
而不是mutate
status_df <- structure(list(id = c(15L, 15L, 16L, 16L, 16L, 16L, 20L, 20L,
20L), stage = c(1L, 2L, 1L, 2L, 3L, 4L, 1L, 2L, 3L), status = c("Pending",
"Not Sent", "Approved", "Rejected", "Not Sent", "Not Sent", "Approved",
"Approved", "Approved")), class = "data.frame", row.names = c(NA,
-9L))