r-是否有一种方法可以使用映射族设置数据帧的变量标签



我有一个数据集和附带的datat字典。我想使用数据字典来设置数据集的变量标签。我尝试使用显式for loop,但它看起来相当慢。有没有办法使用tidyverse的map家族来实现同样的目标?

library(tidyverse)
mydata <- tibble(
a_1 = c(20,22, 13,14,44),
a_2 = c(42, 13, 32, 31, 14),
b = c("male", "female", "male", "female", "male"),
c = c("Primary", "secondary", "Tertiary", "Primary", "Secondary")
)
dictionary <- tibble(
variable = c("a", "b", "c"),
label = c("Age", "Gender", "Education"),
type = c("mselect", "select", "select")
)

variables <- names(mydata)

for (var in variables){
vm <- unique(str_remove_all(var, "_.*")) # Take care of the variables with _
varlbl <- filter(dictionary, variable == vm) %>%
select(label) %>% pull

attr(mydata[[var]], "label") <- varlbl
}

#---- Map the variable labels using map
#

基R

mydata[] <- Map(
function(x, lbl) if (!is.na(lbl)) `attr<-`(x, "label", lbl) else x, 
mydata, dictionary$label[ match(gsub("_.*", "", names(mydata)),
dictionary$variable) ])
str(mydata)
# tibble [5 x 4] (S3: tbl_df/tbl/data.frame)
#  $ a_1: num [1:5] 20 22 13 14 44
#   ..- attr(*, "label")= chr "Age"
#  $ a_2: num [1:5] 42 13 32 31 14
#   ..- attr(*, "label")= chr "Age"
#  $ b  : chr [1:5] "male" "female" "male" "female" ...
#   ..- attr(*, "label")= chr "Gender"
#  $ c  : chr [1:5] "Primary" "secondary" "Tertiary" "Primary" ...
#   ..- attr(*, "label")= chr "Education"

mydata[] <-的重新分配是有意的,也是一个小黑客:如果我们执行mydata <-(没有括号(,则Map返回一个list;框架";属性丢失。然而,mydata[] <-用新数据重新分配内容(列(,并且替换作为列表/帧,并且mydata类帧属性被保留。

当我想(例如(将列的子集转换为其他列时,我经常使用此选项。我可能会做somedata[3:6] <- lapply(somedata[3:6], as.numeric),我认为它比其他方法可读性更强,可以获得同样的效果。

purrr

library(dplyr) # just for %>% here
library(purrr)
mydata <- map2_dfc(
mydata,
dictionary$label[ match(gsub("_.*", "", names(mydata)), dictionary$variable) ],
~ `attr<-`(.x, "label", .y))

对于两者,我都使用快捷方式";欺骗":这两个是等价的:

{
attr(x, "label") <- "something"
x
}
## is equivalent to
{
`attr<-`(x, "label", "something")
}

因为它们都返回更新后的CCD_ 10。这是一个小代码高尔夫,一个小美学(减少了对分号和大括号的要求(,但如果你愿意,你可以很容易地转换到更传统的(第一(方法。

labelVector包是另一个选项,虽然没有Map快,但对眼睛来说更容易一些(至少我认为是这样(:

library(labelVector)
idx <- match(gsub("_.*", "", names(mydata)), dictionary$variable)
var_label <- dictionary$label[idx]
names(var_label) <- names(mydata)
mydata <- set_label(mydata, .dots = var_label)

最新更新