r语言 - 删除每个组具有特定条件的非最后行



我有下面的数据帧df (dput):

group indicator value
1     A     FALSE     2
2     A     FALSE     1
3     A     FALSE     2
4     A      TRUE     4
5     B     FALSE     5
6     B     FALSE     1
7     B      TRUE     3

我想删除每组indicator == FALSE的非最后一行。这意味着在df中,应该删除行:1、2和5,因为它们不是每个组中最后一个带有FALSE的行。下面是期望的输出:

group indicator value
1     A     FALSE     2
2     A      TRUE     4
3     B     FALSE     1
4     B      TRUE     3

所以我想知道是否有人知道如何在R中删除具有特定条件的非最后一行?


dputof df:

df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
-7L))

使用last(which())进行筛选以查找每组FALSE的最后一行的行号:

library(dplyr)
df %>%
group_by(group) %>%
filter(indicator | row_number() == last(which(!indicator))) %>%
ungroup()
# A tibble: 4 × 3
group indicator value
<chr> <lgl>     <dbl>
1 A     FALSE         2
2 A     TRUE          4
3 B     FALSE         1
4 B     TRUE          3

您可以使用lead执行此操作,并检查以下指示符是否为TRUE

library(tidyverse)
df <- structure(list(group = c("A", "A", "A", "A", "B", "B", "B"), 
indicator = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE
), value = c(2, 1, 2, 4, 5, 1, 3)), class = "data.frame", row.names = c(NA, 
                       -7L))
df |> 
group_by(group) |> 
mutate(slicer = if_else(lead(indicator) ==F, 1, 0)) |> 
mutate(slicer = if_else(is.na(slicer), 0 , slicer)) |> 
filter(slicer == 0) |> 
select(-slicer)
#> # A tibble: 4 × 3
#> # Groups:   group [2]
#>   group indicator value
#>   <chr> <lgl>     <dbl>
#> 1 A     FALSE         2
#> 2 A     TRUE          4
#> 3 B     FALSE         1
#> 4 B     TRUE          3

另一种方法:

library(dplyr)
df %>%
group_by(group) %>%
slice_max(cumsum(!indicator))

注意:虽然这种方法涵盖了所示的示例和OP的澄清,即T总是最后出现,但它不适用于T, F, F, T等序列,其中您希望保留Ts而不仅仅是F之后的序列。

输出:

# A tibble: 4 x 3
# Groups:   group [2]
group indicator value
<chr> <lgl>     <dbl>
1 A     FALSE         2
2 A     TRUE          4
3 B     FALSE         1
4 B     TRUE          3

我们可以提出一些替代方案:

"Dumb"解决方案

should_be_kept <- logical(nrow(df))
for(row in 1:nrow(df)) {
if(df[row,"Indicator"]) {
should_be_kept[row] <- TRUE
} else if(row == max(which(!df[, "Indicator"] & df$Group == df[row, "Group"]))) {
should_be_kept[row] <- TRUE
} else {
should_be_kept[row] = FALSE
}
}
df[should_be_kept, ]

使用自定义函数从每个组中查找最后一个FALSE指示器的解决方案

rows_to_keep <- logical(nrow(df)) #We create a TRUE/FALSE vector with one entry for each row of df
rows_to_keep[df$Indicator] <- TRUE #If Indicator is TRUE, we mark that row as "selectable"
get_last_false_in_group <- function(df, group) {
return(max(which(df$Group == group & !df$Indicator))) #Gets the last time the condition inside of which() is met
}
#The following chunk does a group-by-group search of the last false indicator. There's probably some apply magic that simplifies this but I'm too dumb to come up with it.
groups <- levels(factor(df$Group))
for(current_group in groups) {
rows_to_keep[get_last_false_in_group(df, current_group)] <- TRUE
}
#Now that our rows_to_keep vector is ready, we can filter the corresponding rows and get the intended result:
df[rows_to_keep,]

使用data.table包,可以将对max(which(…))的调用替换为对last函数

的调用。

最新更新