相同id的跨行分层

  • 本文关键字:分层 id 相同 r
  • 更新时间 :
  • 英文 :


所以,我有一个数据集,其中有很多X个体的观察结果,每个个体有更多的行。对于每一行,我都分配了一个分类(变量clinical_significance),它按优先顺序接受三个值:确定疾病、可能的、定植。现在,我希望每个人只排一排,并且"最高分类"。跨行,如确定如果存在,辅助可能和殖民化。对于如何克服这个问题有什么好的建议吗?

例如,如示例所示,我希望所有ID #23 clinical_signi都是"明确疾病",因为这高于"可能"。

id   id_row number_of_samples  species_ny   clinical_significa…
18     1         2                  MAC            possible           
18     2         2                  MAC            possible           
20     1         2                  scrofulaceum   possible           
20     2         2                  scrofulaceum   possible           
23     1         2                  MAC            possible           
23     2         2                  MAC            definite disease

树立可复制的榜样:

df <- structure(
list(
id = c("18", "18", "20", "20", "23", "23"),
id_row = c("1","2", "1", "2", "1", "2"), 
number_of_samples = c("2", "2", "2","2", "2", "2"), 
species_ny = c("MAC", "MAC", "scrofulaceum", "scrofulaceum", "MAC", "MAC"), 
clinical_significance = c("possible", "possible", "possible", "possible", "possible", "definite disease")
),
row.names = c(NA, -6L), class = c("data.frame")
)

我们的想法是将临床意义转化为一个因子,以整数而不是字符的形式存储(即1 = definite, 2 = possible, 3 = colonization)。然后,对于每个ID,取编号最小的行。

df_prio <- df |> 
mutate(
fct_clin_sig = factor(
clinical_significance, 
levels = c("definite disease", "possible", "colonization")
)
) |> 
group_by(id) |> 
slice_min(fct_clin_sig)

我用

修复了它
df <- df %>% 
group_by(id) %>% 
mutate(clinical_significance_new = ifelse(any(clinical_significance == "definite disease"), "definite disease", as.character(clinical_significance)))

最新更新