r语言 - 将 dplyr 突变函数与整个表格的搜索相结合



我对R很陌生,尤其是对整洁的诗句。我正在尝试编写一个脚本,我们可以使用它重写分类列表。我们已经有一个使用了很多 for 和 if 循环,我想尝试用整洁的宇宙来简化它,但我有点卡住了如何做到这一点。

我有一个看起来像这样的表格(真的很简化(

taxon_file<- tibble(name = c( "cockroach","cockroach2", "grasshopper", "spider",    "lobster",  "insect",   "crustacea",    "arachnid"), 
                Id = c(445,448,446,778,543,200,400,300),
                parent_ID = c(200,200,200,300,400,200,400,300),
                rank = c("genus","genus","genus","genus","genus","order","order","order")
                )    

+-------------+-----+-----------+----------+
|    name     | Id  | parent_ID |   rank   |
+=============+=====+===========+==========+
| cockroach   | 445 | 200       | genus    |
| cockroach2  | 448 | 200       | genus    |
| grasshopper | 446 | 200       | genus    |
| spider      | 778 | 300       | genus    |
| lobster     | 543 | 400       | genus    |
| insect      | 200 | 200       | order    |
| crustacea   | 400 | 400       | order    |
| arachnid    | 300 | 300       | order    |
+-------------+-----+-----+------------+----------+

现在我想重新排列它,以便我得到一个新列,我可以在其中添加与parent_ID匹配的顺序(所以当 == ID parent_ID时,然后按列顺序写 name(。最终结果应该看起来像这样

+-------------+------------+------+-----------+
|    name     |    order   |  Id  | parent_ID |
+=============+============+======+===========+
| cockroach   |  insect    |  445 |       200 |
| cockroach2  |  insect    |  448 |       200 |
| grasshopper |  insect    |  446 |       200 |
| spider      |  arachnid  |  778 |       300 |
| lobster     |  crustacea |  543 |       400 |
+-------------+------------+------+-----------+

我尝试将 mutate 与 ifelse 语句组合在一起,但这只会将 NA 添加到整个订单列中。

蒂布尔被命名为taxon_list

taxon_list %>%    
   mutate(order = ifelse(parent_ID == Id, Name, NA))

我知道这行不通,因为它不会在整个数据集中搜索正确的行(这就是我之前对 alle for 循环所做的(。也许有人可以指出我正确的方向?

一种方法是将每个等级类型filter为 2 个单独的 dfs,子集使用 select ,并merge 2。

  df <- tibble(name = c( "cockroach","cockroach2", "grasshopper", "spider",    "lobster",  "insect",   "crustacea",    "arachnid"), 
                  Id = c(445,448,446,778,543,200,400,300),
                  parent_ID = c(200,200,200,300,400,200,400,300),
                  rank = c("genus","genus","genus","genus","genus","order","order","order"))     
library(tidyverse)
df_order <- df %>%
  filter(rank == "order") %>% 
  select(order = name, parent_ID)
df_genus <- df %>%
  filter(rank == "genus") %>% 
  select(name, Id, parent_ID) %>% 
  merge(df_order, by = "parent_ID")

结果:

  parent_ID        name  Id     order
1       200   cockroach 445    insect
2       200  cockroach2 448    insect
3       200 grasshopper 446    insect
4       300      spider 778  arachnid
5       400     lobster 543 crustacea

最新更新