r语言 - 使用另外两个列条件过滤重复数据



我想在切换到类型2之前,仅过滤类型1中具有相同未连续抑制的组id的数据。我的数据是这样的,

data<- data.frame(
id= c(215, 215, 215, 215, 297, 297, 297,297, 297,297,317,317,317,382,382,382,459,459,459),
type=c(1,1,2,2,1,1,1,2,2,2,1,1,2,1,1,2,1,2,2),
status=c("Unsuppressed","Unsuppressed","Unsuppressed","Unsuppressed","Unsuppressed","Suppressed","Unsuppressed","Unsuppressed","Suppressed",     "Suppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Unsuppressed", "Suppressed") )
data
id type       status
1  215    1 Unsuppressed
2  215    1 Unsuppressed
3  215    2 Unsuppressed
4  215    2 Unsuppressed
5  297    1 Unsuppressed
6  297    1   Suppressed
7  297    1 Unsuppressed
8  297    2 Unsuppressed
9  297    2   Suppressed
10 297    2   Suppressed
11 317    1 Unsuppressed
12 317    1 Unsuppressed
13 317    2 Unsuppressed
14 382    1 Unsuppressed
15 382    1 Unsuppressed
16 382    2 Unsuppressed
17 459    1 Unsuppressed
18 459    2 Unsuppressed
19 459    2   Suppressed

I am trying

library(tidyverse)
library(data.table)
data1 <- data %>%
group_by(id) %>%
mutate(Seq = map(type, ~seq.int(.x, .x + 1L))) %>%
mutate(Flag = map_lgl(Seq, ~all(.x %in% type))) %>%
filter(Flag) %>%
select(-Seq, -Flag)
%>% ungroup()
data1  
1   215     1 Unsuppressed
2   215     1 Unsuppressed
3   297     1 Unsuppressed
4   297     1 Suppressed
5   297     1 Unsuppressed
6   317     1 Unsuppressed
7   317     1 Unsuppressed
8   382     1 Unsuppressed
9   382     1 Unsuppressed
10   459     1 Unsuppressed

但是期望的输出是

id      type     status
215     1       Unsuppressed
215     1       Unsuppressed
215     2       Unsuppressed
317     1       Unsuppressed
317     1       Unsuppressed
317     2       Unsuppressed
382     1       Unsuppressed
382     1       Unsuppressed
382     2       Unsuppressed

您可以使用group_by+filter-

library(dplyr)
data %>%
group_by(id) %>%
filter(all(status == 'Unsuppressed') & sum(type == 1) > 1 & 
(type == 1 | row_number() == match(2, type))) %>%
ungroup
#    id  type status      
#  <dbl> <dbl> <chr>       
#1   215     1 Unsuppressed
#2   215     1 Unsuppressed
#3   215     2 Unsuppressed
#4   317     1 Unsuppressed
#5   317     1 Unsuppressed
#6   317     2 Unsuppressed
#7   382     1 Unsuppressed
#8   382     1 Unsuppressed
#9   382     2 Unsuppressed

这将只选择所有行中具有'Unsuppressed'值的组。在这些组中,它将选择具有type = 1的所有行和具有type = 2的第一行。

最新更新