在R中灵活地跨列查找重复值的独特情况

我有一个类似于以下的数据集：

df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
> df
animal_1 predation_type animal_2
1      cat           eats    mouse
2      dog           eats squirrel
3    mouse       eaten by      cat
4 squirrel           eats     nuts

我正在寻找将第1行和第3行标识为重复的代码，因为它们显示出相同的现象(猫吃老鼠或老鼠被猫吃(。我甚至不知道该怎么问我在找什么样的重复案件，所以我希望有人能帮忙。我曾尝试将文本组合成一列(即"猫鼠"、"狗松鼠"等(，然后反转字母，但很快就证明太复杂了。

非常感谢你能提供的任何帮助。

tidyverse

df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>% 
rowwise() %>% 
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>% 
group_by(duplicates) %>% 
mutate(duplicates = n() > 1) %>% 
ungroup()
#> # A tibble: 4 x 4
#>   animal_1 predation_type animal_2 duplicates
#>   <chr>    <chr>          <chr>    <lgl>     
#> 1 cat      eats           mouse    TRUE      
#> 2 dog      eats           squirrel FALSE     
#> 3 mouse    eaten by       cat      TRUE      
#> 4 squirrel eats           nuts     FALSE

^{创建于2022-01-17由reprex包(v2.0.1(}

删除重复


library(tidyverse)
df %>% 
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#>   animal_1 predation_type animal_2
#> 1      cat           eats    mouse
#> 2      dog           eats squirrel
#> 3 squirrel           eats     nuts

^{创建于2022-01-17由reprex包(v2.0.1(}

您可以sort()数据帧以使duplicated()有用。

newdf = df[, c('animal_1', 'animal_2')]
for (i in 1:nrow(df)){
newdf[i, ] = sort(df[i,])
}
newdf[!(duplicated(newdf$animal_1) & duplicated(newdf$animal_2)),]
animal_1 animal_2
1      cat    mouse
2      dog squirrel
4     nuts squirrel

相关内容

最新更新

热门标签：