我的数据看起来像这个
aList <- list(a1 = c("apple", "banana", "orange", "strawberry", "cherry"),
a2 = c("banana", "cherry", "apple"),
a3 = c("apple", "strawberry", "pineapple"),
a4 = c("raspberry", "strawberry", "apple"),
a5 = c("pineapple", "lemon", "orange", "banana", "apple"),
a6 = c("lemon", "apple", "blueberry"),
a7 = c("watermelon", "apple", "banana", "mango"),
a8 = c("mango", "cherry", "apple", "lemon"),
a9 = c("orange", "banana", "strawberry"),
a10 = c("mango", "strawberry"))
我想把它变成一个垂直格式,就像你运行这个代码时会发生的那样:
vertical_data <- list()
for (x in names(aList)) {
for (y in aList[[x]]) {
if (is.null(vertical_data[[y]])) {
vertical_data[[y]] <- x
} else {
vertical_data[[y]] <- c(x, vertical_data[[y]])
}
}
}
vertical_data
我希望每个条目都能告诉我特定水果的产地。
这对于一个双for循环来说已经足够容易了。但是,当我对嵌套的lapply函数做同样的事情时,它看起来根本没有修改列表(即vertical_data(。为什么?我之所以想用application函数来做这件事,是因为它更快。我的实际数据集将有数千个项目和"水果"。for循环将花费太长时间。
我真的很感激你的帮助。
感谢
我们可以在unlist
ed数据上使用split
split(rep(names(aList), lengths(aList)), unlist(aList))
或者另一个选项是将stack
转换为两列的"data.frame",然后执行split
with(stack(aList), split(as.character(ind), values))
#$apple
#[1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8"
#$banana
#[1] "a1" "a2" "a5" "a7" "a9"
#$blueberry
#[1] "a6"
#$cherry
#[1] "a1" "a2" "a8"
#$lemon
#[1] "a5" "a6" "a8"
#$mango
#[1] "a7" "a8" "a10"
#$orange
#[1] "a1" "a5" "a9"
#$pineapple
#[1] "a3" "a5"
#$raspberry
#[1] "a4"
#$strawberry
#[1] "a1" "a3" "a4" "a9" "a10"
#$watermelon
#[1] "a7"
或如@rawr所述
unstack(stack(aList)[2:1])
关于lapply
和for
循环内的分配,它是基于环境的。在for
循环中,赋值修改全局env中的对象,但在lapply
中,它是一个自包含的env,否则必须执行<<-
(不可取(或将env指定为全局env
vertical_data <- list()
lapply(names(aList), function(x) lapply(aList[[x]],
function(y) if (is.null(vertical_data[[y]])) {
vertical_data[[y]] <<- x
} else {vertical_data[[y]] <<- c(x, vertical_data[[y]])
}))
我们可以使用enframe
将名称列表转换为数据帧,然后基于value
拆分name
。
tibble::enframe(aList) %>% tidyr::unnest(value) %>% {split(.$name, .$value)}
#$apple
#[1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8"
#$banana
#[1] "a1" "a2" "a5" "a7" "a9"
#$blueberry
#[1] "a6"
#$cherry
#[1] "a1" "a2" "a8"
#$lemon
#[1] "a5" "a6" "a8"
#$mango
#[1] "a7" "a8" "a10"
#$orange
#[1] "a1" "a5" "a9"
#$pineapple
#[1] "a3" "a5"
#$raspberry
#[1] "a4"
#$strawberry
#[1] "a1" "a3" "a4" "a9" "a10"
#$watermelon
#[1] "a7"