我需要重新编码一个几乎有90个级别的因子变量。这是从数据库的性状名称,我然后需要枢轴得到数据集进行分析。有没有一种方法可以自动完成,而不需要输入每个OldName=NewName?
这就是我如何使用dplyr进行更少的级别:
df$TraitName <- recode_factor(df$TraitName, 'Old Name' = "new.name")
我的想法是使用一个关键数据帧与一列旧名称和相应的新名称,但我不知道如何将其馈送到recode
您可以很容易地从查找表中创建一个命名向量,并使用拼接将其传递给重新编码。
library(tidyverse)
# test data
df <- tibble(TraitName = c("a", "b", "c"))
# Make a lookup table with your own data
# Youll bind your two columns instead here
# youll want to keep column order to deframe it.
# column names doesnt matter.
lookup <- tibble(old = c("a", "b", "c"), new = c("aa", "bb", "cc"))
# Convert to named vector and splice it within the recode
df <-
df |>
mutate(TraitNameRecode = recode_factor(TraitName, !!!deframe(lookup)))
一种方法是使用查找表、连接和coalesce
(以获得第一个非na值):
my_data <- data.frame(letters = letters[1:6])
levels_to_change <- data.frame(letters = letters[4:5],
new_letters = LETTERS[4:5])
library(dplyr)
my_data %>%
left_join(levels_to_change) %>%
mutate(new = coalesce(new_letters, letters))
结果
Joining, by = "letters"
letters new_letters new
1 a <NA> a
2 b <NA> b
3 c <NA> c
4 d D D
5 e E E
6 f <NA> f