R dataframe删除副本/选择要删除的副本



我有一个数据框架,它根据它们的标识ID有重复项,但是有些列是不同的。我想保留行(或重复),有额外的信息位。df的结构如下:

id <- c("3235453", "3235453", "21354315", "21354315", "2121421")
Plan_name<- c("angers", "strasbourg",  "Benzema", "angers", "montpellier")
service_line<- c("", "AMRS", "", "Therapy", "")
treatment<-c("", "MH", "", "MH", "")
df <- data.frame (id, Plan_name, treatment, service_line)

正如您所看到的,ID行有重复项,但是我想保留第二个重复项,因为在treatmentservice_line中有更多的信息。

我试过使用

df[duplicated(df[,c(1,3)]),]

,但它不起作用,因为返回空df。有什么建议吗?

也许你想要这样的东西:首先用NA替换所有空白,然后排列Section.Bslice(),从组的第一行开始:

library(dplyr)
df %>%
mutate(across(-c(id, Plan_name),~ifelse(.=="", NA, .))) %>% 
group_by(id) %>% 
arrange(Section.B, .by_group = TRUE) %>% 
slice(1)
id       Plan_name   Section.B Section.C
<chr>    <chr>       <chr>     <chr>    
1 2121421  montpellier NA        NA       
2 21354315 angers      MH        Therapy  
3 3235453  strasbourg  MH        AMRS  

Try with

library(dplyr)
df %>%
filter(if_all(treatment:service_line, ~ .x != ""))

与产出

id  Plan_name Section.B Section.C
1  3235453 strasbourg        MH      AMRS
2 21354315     angers        MH   Therapy

如果我们需要空白且不重复的id

df %>% 
group_by(id) %>%
filter(n() == 1|if_all(treatment:service_line, ~ .x != "")) %>%
ungroup

与产出

# A tibble: 3 × 4
id       Plan_name   treatment service_line
<chr>    <chr>       <chr>     <chr>       
1 3235453  strasbourg  "MH"      "AMRS"      
2 21354315 angers      "MH"      "Therapy"   
3 2121421  montpellier ""        ""          

最新更新