我正在尝试使用下一行中特定列(col1
,col2
和col3
)中的内容从数据框(df
)中过滤行。
这个问题很接近,但只使用了一列来延迟
大多数展示如何使用lag/lead过滤的帖子都有数字列,在我的情况下,它们都是文本。
df <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2",
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1",
"Jim", "Run", "optionB", "col1×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
如果下一行(Effect
列除外)相同,我想过滤行。
最终的数据帧看起来像这样
df_result <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
有人有什么建议吗?如果可能的话,我想用宇宙来得到答案。
我们可以使用distinct
library(dplyr)
df %>%
slice(rev(row_number())) %>%
distinct(across(col1:col3), .keep_all = TRUE)
-ouptut
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Joe Run optionB col1×col2×col2
2 Joe Walk optionA col1×col2
3 Jim Run optionB col1×col2×col2
4 Jim Walk optionA col1×col2×col2
或者使用nchar
df %>%
group_by(across(col1:col3)) %>%
slice(which.max(nchar(Effect))) %>%
ungroup
tidyverse
的解决方案可能是
library(dplyr)
df %>%
group_by(across(-Effect)) %>%
slice_tail(n = 1) %>%
ungroup()
这返回
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Jim Run optionB col1×col2×col2
2 Jim Walk optionA col1×col2×col2
3 Joe Run optionB col1×col2×col2
4 Joe Walk optionA col1×col2
您可以尝试duplicated
与fromLast = TRUE
选项如下
df[!duplicated(df[-4], fromLast = TRUE), ]