r-在数据帧中创建一列,其中包含其他列的信息



我想在大数据帧中生成一列,其中包含其他列的信息。我举了一个非常小的可复制的例子:

tax <- data.frame(
Family = c("Brassicacae", "Pinaceae", "Rosaceae", "Liliaceae"), 
Genus = c("NA" ,"Pinus", "NA", "Lilia"),
Species = c("NA" ,"Pinus_sylvestris", "NA", "Calochortus nuttallii"))

我想创建一个名为tax_rank的列,在该列中,您的分类达到的物种将具有值species,但如果您达到的等级比属更高,则值将为genusfamily,如以下输出所示:

tax <- data.frame(
Family = c("Brassicacae", "Pinaceae", "Rosaceae", "Liliaceae"), 
Genus = c("NA" ,"Pinus", "NA", "Lilia"),
Species = c("NA" ,"Pinus_sylvestris", "NA", "Calochortus nuttallii"),
tax_rank = c("family" ,"species", "family", "species"))

但我想用一个大数据集自动完成,用dplyr可能吗?谢谢

base R中,可以对非NA值使用max.col,并选择ties.method = "last"以保留最新的非NA值。

names(tax)[max.col(!is.na(tax), ties.method = "last")]

这可以将其转换为dplyr:

library(dplyr)
tax %>% 
mutate(tax_rank = names(tax)[max.col(!is.na(tax), ties.method = "last")])
#        Family Genus               Species tax_rank
# 1 Brassicacae  <NA>                  <NA>   Family
# 2    Pinaceae Pinus      Pinus_sylvestris  Species
# 3    Rosaceae  <NA>                  <NA>   Family
# 4   Liliaceae Lilia Calochortus nuttallii  Species

数据(注意,我将"NA"转换为NA(

tax <- data.frame(
Family = c("Brassicacae", "Pinaceae", "Rosaceae", "Liliaceae"), 
Genus = c(NA ,"Pinus", NA, "Lilia"),
Species = c(NA ,"Pinus_sylvestris", NA, "Calochortus nuttallii"))

首先,数据帧应该包含NA对象,而不是字符:

tax <- data.frame(
Family = c("Brassicacae", "Pinaceae", "Rosaceae", "Liliaceae"), 
Genus = c(NA ,"Pinus",NA, "Lilia"),
Species = c(NA,"Pinus_sylvestris", NA, "Calochortus nuttallii"))

那么你想要的列就是下一个

tax %>% mutate(tax_rank = ifelse(!is.na(Species), "species", ifelse(!is.na(Genus), "genus", "family")))

这是输出

Family Genus               Species tax_rank
1 Brassicacae  <NA>                  <NA>   family
2    Pinaceae Pinus      Pinus_sylvestris  species
3    Rosaceae  <NA>                  <NA>   family
4   Liliaceae Lilia Calochortus nuttallii  species

使用base R

tax$tax_rank <- apply(tax, 1, (x) tail(names(x)[!is.na(x)], 1))

-输出

> tax
Family Genus               Species tax_rank
1 Brassicacae  <NA>                  <NA>   Family
2    Pinaceae Pinus      Pinus_sylvestris  Species
3    Rosaceae  <NA>                  <NA>   Family
4   Liliaceae Lilia Calochortus nuttallii  Species

最新更新