我有一个类似于以下的数据集:
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
> df
animal_1 predation_type animal_2
1 cat eats mouse
2 dog eats squirrel
3 mouse eaten by cat
4 squirrel eats nuts
我正在寻找将第1行和第3行标识为重复的代码,因为它们显示出相同的现象(猫吃老鼠或老鼠被猫吃(。我甚至不知道该怎么问我在找什么样的重复案件,所以我希望有人能帮忙。我曾尝试将文本组合成一列(即"猫鼠"、"狗松鼠"等(,然后反转字母,但很快就证明太复杂了。
非常感谢你能提供的任何帮助。
tidyverse
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE
创建于2022-01-17由reprex包(v2.0.1(
删除重复
library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts
创建于2022-01-17由reprex包(v2.0.1(
您可以sort()
数据帧以使duplicated()
有用。
newdf = df[, c('animal_1', 'animal_2')]
for (i in 1:nrow(df)){
newdf[i, ] = sort(df[i,])
}
newdf[!(duplicated(newdf$animal_1) & duplicated(newdf$animal_2)),]
animal_1 animal_2
1 cat mouse
2 dog squirrel
4 nuts squirrel