r语言 - dplyr::过滤使用延迟/前导匹配多个列的字符的行



我正在尝试使用下一行中特定列(col1,col2col3)中的内容从数据框(df)中过滤行。

这个问题很接近,但只使用了一列来延迟

大多数展示如何使用lag/lead过滤的帖子都有数字列,在我的情况下,它们都是文本。

df <- tibble::tribble(
~col1,  ~col2,     ~col3,          ~Effect,
"Jim", "Walk", "optionA",      "col1×col2",
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim",  "Run", "optionB",           "col1",
"Jim",  "Run", "optionB",      "col1×col2",
"Jim",  "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA",           "col1",
"Joe", "Walk", "optionA",      "col1×col2",
"Joe",  "Run", "optionB", "col1×col2×col2"
)

如果下一行(Effect列除外)相同,我想过滤行。

最终的数据帧看起来像这样

df_result <- tibble::tribble(
~col1,  ~col2,     ~col3,          ~Effect,
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim",  "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA",      "col1×col2",
"Joe",  "Run", "optionB", "col1×col2×col2"
)

有人有什么建议吗?如果可能的话,我想用宇宙来得到答案。

我们可以使用distinct

library(dplyr)
df %>%
slice(rev(row_number())) %>%
distinct(across(col1:col3), .keep_all = TRUE)

-ouptut

# A tibble: 4 x 4
col1  col2  col3    Effect        
<chr> <chr> <chr>   <chr>         
1 Joe   Run   optionB col1×col2×col2
2 Joe   Walk  optionA col1×col2     
3 Jim   Run   optionB col1×col2×col2
4 Jim   Walk  optionA col1×col2×col2

或者使用nchar

df %>%
group_by(across(col1:col3)) %>%
slice(which.max(nchar(Effect))) %>% 
ungroup

tidyverse的解决方案可能是

library(dplyr)
df %>% 
group_by(across(-Effect)) %>% 
slice_tail(n = 1) %>%
ungroup()

这返回

# A tibble: 4 x 4
col1  col2  col3    Effect        
<chr> <chr> <chr>   <chr>         
1 Jim   Run   optionB col1×col2×col2
2 Jim   Walk  optionA col1×col2×col2
3 Joe   Run   optionB col1×col2×col2
4 Joe   Walk  optionA col1×col2 

您可以尝试duplicatedfromLast = TRUE选项如下

df[!duplicated(df[-4], fromLast = TRUE), ]

最新更新