我有一个概览表-项目计数、实际成本和预测成本的列表
myData <- data.table("itemCount" = c(3000, 20, 50, 9),
"cost" = c(120, 118, 165, 93),
"prediction" = c(120, 100, 150, 120))
然后我计算个人和整体利润:
myData[, "profit" := cost/prediction]
total <- myData[, .(itemsTotal = sum(itemCount),
costTotal = sum(cost),
predictionTotal = sum(prediction))][
, "profit" := costTotal/predictionTotal
]
现在,对于每一行,我想计算一下如果将该行排除在分析之外,整体利润会是多少。例如,如果第二行缺失:
myData$diffinProfit <- NA
myDataEx <- myData[- 2, ]
totalEx <- myDataEx[, .(itemsTotal = sum(itemCount),
costTotal = sum(cost),
predictionTotal = sum(prediction))][
, "profit" := costTotal/predictionTotal
所以我写了一个for循环来做这个
myData$diffinProfit <- NA
for(observation in seq_along(length(myData)-1)){
myDataEx <- myData[- observation, ]
totalEx <- myDataEx[, .(itemsTotal = sum(itemCount),
costTotal = sum(cost),
predictionTotal = sum(prediction))][
, "profit" := costTotal/predictionTotal
]
myData$diffinProfit[[observation]] <- totalEx$profit
}
然而,我只得到第一次观察的结果。如何找到for循环?有什么方法可以让我使用apply函数吗?我在考虑mapply?或者可能是purrr函数?
您遇到的第一个问题是length(myData)
报告的是列数,而不是行数。但我认为我们可以不使用for
循环(尽管sapply
在更深层次的代码中与之相似(。
myData[, otherProfit := sapply(seq_len(.N), function(z) sum(cost[-z])/sum(prediction[-z]))]
myData
# itemCount cost prediction profit otherProfit
# <num> <num> <num> <num> <num>
# 1: 3000 120 120 1.000 1.0162162
# 2: 20 118 100 1.180 0.9692308
# 3: 50 165 150 1.100 0.9735294
# 4: 9 93 120 0.775 1.0891892
尽管从数学上讲,完全不需要循环也可以做到:
sumcost <- sum(myData$cost)
sumpred <- sum(myData$prediction)
myData[, profit2 := (sumcost-cost)/(sumpred-prediction)]
myData
# itemCount cost prediction profit otherProfit profit2
# <num> <num> <num> <num> <num> <num>
# 1: 3000 120 120 1.000 1.0162162 1.0162162
# 2: 20 118 100 1.180 0.9692308 0.9692308
# 3: 50 165 150 1.100 0.9735294 0.9735294
# 4: 9 93 120 0.775 1.0891892 1.0891892
我不打算对4行进行基准测试,但如果这第二行";矢量化的";该方法并不比上面的sapply
或for
循环替代方案更有效。
您可以使用行ID和data.table本机.GRP
组计数器
library(data.table)
myData <- data.table("itemCount" = c(3000, 20, 50, 9),
"cost" = c(120, 118, 165, 93),
"prediction" = c(120, 100, 150, 120))
myData[, "profit" := cost/prediction]
# assign row ids
myData[, ID := .I]
# loop over each row and take all values that are not in the current row
# .GRP is a group identifier and since you loop over all rows, there are as many groups as rows
myData[, total_profit_excl := myData[ID != .GRP, sum(cost) / sum(prediction)],
by = ID]
myData
#> itemCount cost prediction profit ID total_profit_excl
#> 1: 3000 120 120 1.000 1 1.0162162
#> 2: 20 118 100 1.180 2 0.9692308
#> 3: 50 165 150 1.100 3 0.9735294
#> 4: 9 93 120 0.775 4 1.0891892