我正在使用R编程语言。
我有以下数据集:
id = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
col1 = c(0,0,1,1,0,0,1,0,0,1,1,0,1,0,1,0)
col2 = c("A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A", "B")
my_data = data.frame(id, col1, col2)
my_data$row_num = 1:nrow(my_data)
对于每个唯一的ID -每当col1 = 1或col2 = A时,我想删除在此条件之后发生的所有剩余行(即保留第一次发生)。
我在这里发现了这个问题(如何在条件发生后过滤出每组行),其中提供了类似问题的答案。我试着用这个答案来解决我的问题:
library(dplyr)
my_data %>%
group_by(id) %>%
slice(seq_len(which((col1 == 1) | (col2 == "A"))[1]))
有人能确认我是否做对了吗?我不确定是否在"切片"中正确插入了OR语句。函数。
谢谢!
您可以根据row_number
在条件发生之前使用which
进行filter
:
id = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
col1 = c(0,0,1,1,0,0,1,0,0,1,1,0,1,0,1,0)
col2 = c("A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A","A", "B", "A", "B")
my_data = data.frame(id, col1, col2)
my_data$row_num = 1:nrow(my_data)
library(dplyr)
my_data %>%
group_by(id) %>%
filter(row_number() <= min(which((col1 == 1) | (col2 == "A"))))
#> # A tibble: 4 × 4
#> # Groups: id [3]
#> id col1 col2 row_num
#> <dbl> <dbl> <chr> <int>
#> 1 1 0 A 1
#> 2 2 0 B 5
#> 3 2 0 A 6
#> 4 3 1 A 10
创建于2023-01-20与reprex v2.0.2
不确定是否有一些整洁的魔法可以完成这项工作,但这里有一个"哑巴";方法是按ID划分数据集并循环遍历每个部分:
filtered_data <- data.frame(matrix(NA, nrow=0, ncol=4))
colnames(filtered_data) <- colnames(my_data)
rows_added <- 0
for(id in 1:3) {
relevant_data <- my_data[my_data$id == id,]
for(row in 1:nrow(my_data)) {
rows_added <- rows_added + 1
filtered_data[rows_added,] <- relevant_data[row,]
jump_condition <- relevant_data[row, "col1"] == 1 | relevant_data[row, "col2"] == "A"
if(jump_condition) {
break
}
}
}