所以,我有一个数据集,其中有很多X个体的观察结果,每个个体有更多的行。对于每一行,我都分配了一个分类(变量clinical_significance),它按优先顺序接受三个值:确定疾病、可能的、定植。现在,我希望每个人只排一排,并且"最高分类"。跨行,如确定如果存在,辅助可能和殖民化。对于如何克服这个问题有什么好的建议吗?
例如,如示例所示,我希望所有ID #23 clinical_signi都是"明确疾病",因为这高于"可能"。
id id_row number_of_samples species_ny clinical_significa…
18 1 2 MAC possible
18 2 2 MAC possible
20 1 2 scrofulaceum possible
20 2 2 scrofulaceum possible
23 1 2 MAC possible
23 2 2 MAC definite disease
树立可复制的榜样:
df <- structure(
list(
id = c("18", "18", "20", "20", "23", "23"),
id_row = c("1","2", "1", "2", "1", "2"),
number_of_samples = c("2", "2", "2","2", "2", "2"),
species_ny = c("MAC", "MAC", "scrofulaceum", "scrofulaceum", "MAC", "MAC"),
clinical_significance = c("possible", "possible", "possible", "possible", "possible", "definite disease")
),
row.names = c(NA, -6L), class = c("data.frame")
)
我们的想法是将临床意义转化为一个因子,以整数而不是字符的形式存储(即1 = definite, 2 = possible, 3 = colonization)。然后,对于每个ID,取编号最小的行。
df_prio <- df |>
mutate(
fct_clin_sig = factor(
clinical_significance,
levels = c("definite disease", "possible", "colonization")
)
) |>
group_by(id) |>
slice_min(fct_clin_sig)
我用
修复了它df <- df %>%
group_by(id) %>%
mutate(clinical_significance_new = ifelse(any(clinical_significance == "definite disease"), "definite disease", as.character(clinical_significance)))