我正在尝试迭代每一行,并从A列到E列计算Value,从WhichCol计算相应的列名。它是有效的,但这一步骤对于50000行数据来说需要很长时间。有有效的方法吗?
library(data.table)
df<-structure(list(Id = 1:10, A = c(73L, 61L, 46L, 26L, 18L, 29L,
88L, 18L, 56L, 81L), B = c(68L, 49L, 27L, 10L, 37L, 72L, 71L,
60L, 52L, 62L), C = c(98L, 59L, 76L, 46L, 46L, 31L, 77L, 83L,
51L, 6L), D = c(40L, 18L, 27L, 18L, 72L, 95L, 87L, 29L, 35L,
80L), E = c(74L, 87L, 27L, 98L, 54L, 91L, 100L, 71L, 13L, 15L
), WhichCol = c("A", "C", "E", "B", "A", "D", "A", "C", "E",
"B"), Value = c(73L, 59L, 27L, 10L, 18L, 95L, 88L, 83L, 13L,
62L)), .Names = c("Id", "A", "B", "C", "D", "E", "WhichCol",
"Value"), class = "data.frame")
setDT(df)
df[["Value"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] })
这里的示例数据中添加了Value列,但这正是我想要得到的。。
您可以不在每一行上循环,而是使用这样一个事实,即对于WhichCol
的每个值,您都知道您想要哪个列。(例如,对于每个WhichCol == "A"
,取列A
(。
df[, ValueNew := get(unique(WhichCol)), by = WhichCol]
我做了一个小小的速度测试:
n <- 1000
df <- rbindlist(rep(list(df), n))
# over unique WhichCol
system.time(df[, ValueNew := get(unique(WhichCol)), by = WhichCol])
user system elapsed
0.002 0.000 0.001
system.time(df[["Value2"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] }))
user system elapsed
5.445 0.021 5.472
我希望这对你有帮助。