我有一个数据框架,它根据它们的标识ID有重复项,但是有些列是不同的。我想保留行(或重复),有额外的信息位。df的结构如下:
id <- c("3235453", "3235453", "21354315", "21354315", "2121421")
Plan_name<- c("angers", "strasbourg", "Benzema", "angers", "montpellier")
service_line<- c("", "AMRS", "", "Therapy", "")
treatment<-c("", "MH", "", "MH", "")
df <- data.frame (id, Plan_name, treatment, service_line)
正如您所看到的,ID行有重复项,但是我想保留第二个重复项,因为在treatment
和service_line
中有更多的信息。
我试过使用
df[duplicated(df[,c(1,3)]),]
,但它不起作用,因为返回空df。有什么建议吗?
也许你想要这样的东西:首先用NA替换所有空白,然后排列Section.B
和slice()
,从组的第一行开始:
library(dplyr)
df %>%
mutate(across(-c(id, Plan_name),~ifelse(.=="", NA, .))) %>%
group_by(id) %>%
arrange(Section.B, .by_group = TRUE) %>%
slice(1)
id Plan_name Section.B Section.C
<chr> <chr> <chr> <chr>
1 2121421 montpellier NA NA
2 21354315 angers MH Therapy
3 3235453 strasbourg MH AMRS
Try with
library(dplyr)
df %>%
filter(if_all(treatment:service_line, ~ .x != ""))
与产出
id Plan_name Section.B Section.C
1 3235453 strasbourg MH AMRS
2 21354315 angers MH Therapy
如果我们需要空白且不重复的id
df %>%
group_by(id) %>%
filter(n() == 1|if_all(treatment:service_line, ~ .x != "")) %>%
ungroup
与产出
# A tibble: 3 × 4
id Plan_name treatment service_line
<chr> <chr> <chr> <chr>
1 3235453 strasbourg "MH" "AMRS"
2 21354315 angers "MH" "Therapy"
3 2121421 montpellier "" ""