r-在数据表中对行进行迭代并用get列方法替换的有效方法



我正在尝试迭代每一行,并从A列到E列计算Value,从WhichCol计算相应的列名。它是有效的,但这一步骤对于50000行数据来说需要很长时间。有有效的方法吗?

library(data.table)
df<-structure(list(Id = 1:10, A = c(73L, 61L, 46L, 26L, 18L, 29L, 
88L, 18L, 56L, 81L), B = c(68L, 49L, 27L, 10L, 37L, 72L, 71L, 
60L, 52L, 62L), C = c(98L, 59L, 76L, 46L, 46L, 31L, 77L, 83L, 
51L, 6L), D = c(40L, 18L, 27L, 18L, 72L, 95L, 87L, 29L, 35L, 
80L), E = c(74L, 87L, 27L, 98L, 54L, 91L, 100L, 71L, 13L, 15L
), WhichCol = c("A", "C", "E", "B", "A", "D", "A", "C", "E", 
"B"), Value = c(73L, 59L, 27L, 10L, 18L, 95L, 88L, 83L, 13L, 
62L)), .Names = c("Id", "A", "B", "C", "D", "E", "WhichCol", 
"Value"), class = "data.frame")
setDT(df)
df[["Value"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] })

这里的示例数据中添加了Value列,但这正是我想要得到的。。

您可以不在每一行上循环,而是使用这样一个事实,即对于WhichCol的每个值,您都知道您想要哪个列。(例如,对于每个WhichCol == "A",取列A(。

df[, ValueNew := get(unique(WhichCol)), by = WhichCol]

我做了一个小小的速度测试:

n <- 1000
df <- rbindlist(rep(list(df), n))
# over unique WhichCol
system.time(df[, ValueNew := get(unique(WhichCol)), by = WhichCol])
user  system elapsed 
0.002   0.000   0.001 
system.time(df[["Value2"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] }))
user  system elapsed 
5.445   0.021   5.472 

我希望这对你有帮助。

最新更新