r语言 - 如果连续值出现在组id的末尾,则删除



我尝试过滤至少有两个yes的组idb列的连续值,但后面应该至少有一个值。我想删除如果连续的b==yes出现在b的末尾,表示组id。组id必须先开始yes。例如,在id 2中,第一行必须丢弃,因为它的y值以开头,而不是。前两次连续.

data<-data.frame(id=c(1,1,1, 1,2,2,2,3,3,3, 3,4,4,4, 5,5,5,5), a=c(1,1,1,1,1,2,1,1,2,2,1,1,1,2,1,1,1,2),
b=c("yes", "yes","no","no","no", "yes", "yes","no","yes","yes","no", "yes","yes","yes","yes","no","yes","yes" ))

电流输出

-output:
id a   b
1 1 yes
1 1 yes
1 1  no
1 1  no
2 1  no
2 2 yes
2 1 yes
3 1  no
3 2 yes
3 2 yes
3 1  no
4 1 yes
4 1 yes
4 2 yes
5 1 yes
5 1 yes
5 1 no
5 1 yes
5 2 yes

输出应该是:

id a   b
1 1 yes
1 1 yes
1 1  no
1 1  no
3 2 yes
3 2 yes
3 1  no
4 1 yes
4 1 yes
4 2 yes

当前代码尝试:

data1 <- data %>% group_by(id) %>%
filter(any(with(rle(b == 'yes'), lengths[values] > 1)) ) %>% 
ungroup()

但是我不能得到想要的输出。任何人,请帮助我????????

我建议如下:

library(dplyr)
data<-data.frame(id=c(1,1,1, 1,2,2,2,3,3,3, 3,4,4,4), a=c(1,1,1,1,1,2,1,1,2,2,1,1,1,2),
b=c("yes", "yes","no","no","no", "yes", "yes","no","yes","yes","no", "yes","yes","yes"))

data %>%
group_by(id) %>%
# create indicators for two consecutive 'yes'
mutate(prev_b = lag(b, 1),
two_yes = b == 'yes' & prev_b == 'yes') %>%
# create indicators for starting 'no'
mutate(ones = 1,
position = cumsum(ones),
prev_no = cumsum(ifelse(b == 'no', 1, 0)),
leading_no = position == prev_no) %>%
# create indicator for final record
mutate(next_b = lead(b, 1),
last_record = is.na(next_b)) %>%
# combine indicators at group level
mutate(group_end_two_yes = any(two_yes & last_record),
group_leading_no = any(leading_no)) %>%
# drop
mutate(drop_group = group_end_two_yes & group_leading_no) %>%
filter(!drop_group,
!leading_no) %>%
# select initial columns
select(id, a, b)

最新更新