使用行索引和列名从tibble获取单个值的这两种方法的性能如何比较?
library(tidyverse) # or minimally, `library(tibble)`
# 10000 rows and 50 columns of random values
tibble_text <- paste0(
"tibble(",
paste0("'col", 1:50, "' = rnorm(10000)", collapse = ", "),
")")
my_tibble <- eval(parse(text = tibble_text))
获取行,然后获取值:
i <- 542
my_tibble[i,]$col18
获取列,然后获取值:
i <- 542
my_tibble$col18[i]
首先检索列(my_tibble$col18[i]
(要快得多:
# I chose to randomize the column index, in case
# something sneaky was happening under the hood.
{
ptm <- proc.time()
for (i in 1:10000) {
eval(parse(text=paste0("my_tibble[i,]$col", sample(1:50, 1))))
}
proc.time() - ptm
}
# user system elapsed
# 2.53 0.00 2.52
{
ptm <- proc.time()
for (i in 1:10000) {
eval(parse(text=paste0("my_tibble$col", sample(1:50, 1), "[i]")))
}
proc.time() - ptm
}
# user system elapsed
# 0.33 0.00 0.33
我认为这主要是因为tibble不是作为矩阵构建的,而是作为一个包含列的对象构建的。首先获取行时,您将获取一个长度为1的具有50列的tibble对象,然后询问所选列的值。当你用另一种方式做时,它会得到列,它基本上只是一个命名的向量,然后是它的第i个值