r - 想知道在这种情况下我是否可以在这里使用 APPLY 而不是使用 FOR LOOP 进行优化



股票净值 * 股票权重/股票种类数量 + 债券 * 债券权重/债券类型数量 + 现金 * 现金重量/现金种类数量


匹配方法:例如,BackTest.table 中日期在 2008-7-15 之前的任何行都将匹配 Weight.table 中的第一行(涵盖从 2008-5-01 到 2008-7-15 的日期)。

这是BackTest.table的一部分,以便于想象。此表的标题为:日期、股票类型 1、股票类型 2、...、债券类型 1、债券类型 2、...、现金类型 4。(这只是类型数字的示例)它实际上与 Weight.table 中的类型编号匹配。

Date s1         s2           s3 s4 s5 b1 b2 b3 b4 b5 b6 b7 c1 c2 c3 c4
2  2008-07-01  0 -3.0158124 -0.055652040  1  0  0  0  0  0  0  0  0  0  0  0  0
3  2008-07-02  0  0.3838345 -0.119046476  1  0  0  0  0  0  0  0  0  0  0  0  0
4  2008-07-03  0  2.7602604  0.009611965  1  0  0  0  0  0  0  0  0  0  0  0  0
5  2008-07-04  0 -0.5370067 -0.009611041  1  0  0  0  0  0  0  0  0  0  0  0  0
6  2008-07-05  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0
7  2008-07-06  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0
8  2008-07-07  0  5.1583803  0.032680681  1  0  0  0  0  0  0  0  0  0  0  0  0
9  2008-07-08  0  0.8500539  0.048044124  1  0  0  0  0  0  0  0  0  0  0  0  0
10 2008-07-09  0  3.6352579  0.048981473  1  0  0  0  0  0  0  0  0  0  0  0  0
11 2008-07-10  0 -1.5689846  0.052797297  1  0  0  0  0  0  0  0  0  0  0  0  0
12 2008-07-11  0 -0.6688334  0.045093882  1  0  0  0  0  0  0  0  0  0  0  0  0
13 2008-07-12  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0
14 2008-07-13  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0
15 2008-07-14  0  1.0436299  0.033565414  1  0  0  0  0  0  0  0  0  0  0  0  0
16 2008-07-15  0 -3.8589001  0.004793450  1  0  0  0  0  0  0  0  0  0  0  0  0
17 2008-07-16  0 -4.0513392  0.034511187  1  0  0  0  0  0  0  0  0  0  0  0  0
18 2008-07-17  0 -1.0070062  0.009583134  1  0  0  0  0  0  0  0  0  0  0  0  0
19 2008-07-18  0  3.5303394  0.014373323  1  0  0  0  0  0  0  0  0  0  0  0  0
20 2008-07-19  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0
21 2008-07-20  0  0.0000000  0.000000000  1  0  0  0  0  0  0  0  0  0  0  0  0


Date          Stock  Numbers1 Bond  Number2 Cash     Number3
1 2008-04-30 0.0642        5 0.7858       2 0.1500       2
2 2008-07-15 0.0801        5 0.7699       2 0.1500       2
3 2008-07-31 0.0727        6 0.7773       2 0.1500       1
4 2008-10-31 0.1373        4 0.7127       2 0.1500       1
5 2008-11-30 0.1457        3 0.7144       2 0.1399       2
6 2009-01-31 0.1791        5 0.7242       2 0.0967       1

以下是 Weight.table 和 BackTest.table 的标头

structure(list(Date = structure(c(13999, 14075, 14091, 14183, 
14213, 14275), class = "Date"), Stock = c(0.0642, 0.0801, 0.0727, 
0.1373, 0.1457, 0.1791), Numbers1 = c(5L, 5L, 6L, 4L, 3L, 5L), 
Bond = c(0.7858, 0.7699, 0.7773, 0.7127, 0.7144, 0.7242), 
Number2 = c(2L, 2L, 2L, 2L, 2L, 2L), 现金 = c(0.15, 0.15, 
0.15, 0.15, 0.1399, 0.0967), Number3 = c(2L, 2L, 1L, 1L, 
2L, 1L)), row.names = c(NA, 6L), class = "data.frame")
structure(list(Date = structure(c(14061, 14062, 14063, 14064, 
14065, 14066), class = "Date"), s1 = c(0, 0, 0, 0, 0, 0), s2 = c(-3.01581241943634, 
0.383834486785705, 2.76026041158503, -0.537006711952127, 0, 0
), s3 = c(-0.0556520404148886, -0.119046476128297, 0.00961196497399089, 
-0.00961104116408056, 0, 0), s4 = c(1, 1, 1, 1, 1, 1), s5 = c(0, 
0, 0, 0, 0, 0), b1 = c(0, 0, 0, 0, 0, 0), b2 = c(0, 0, 0, 0, 
0, 0), b3 = c(0, 0, 0, 0, 0, 0), b4 = c(0, 0, 0, 0, 0, 0), b5 = c(0, 
0, 0, 0, 0, 0), b6 = c(0, 0, 0, 0, 0, 0), b7 = c(0, 0, 0, 0, 
0, 0), c1 = c(0, 0, 0, 0, 0, 0), c2 = c(0, 0, 0, 0, 0, 0), c3 = c(0, 
0, 0, 0, 0, 0), c4 = c(0, 0, 0, 0, 0, 0)), row.names = 2:7, class = "data.frame")

但是,需要很长时间才能得到我想要的东西。所以我尝试使用 sapply,但结果不同。看来申请没有经过IFELSE过程?

为了获得这些值,我设置了一个常量 k,每当 Backtest.table 中的日期与 Weight.table 中的日期匹配时,k = k+1,因此它移动到下一行并使用新的权重来计算净值。


k <- 1
for (t in 1:nrow(BackTest.table)) {
if (BackTest.table[t, 1] %in% Weight.table[, 1] == FALSE) {
NetReturnPt.table[t, 2] <- sum(BackTest.table[t, 2: ncol(BackTest.table)]* 
c(rep(Weight.table[k, 2]/ Weight.table[k, 3], Weight.table[k, 3]),
rep(Weight.table[k, 4]/ Weight.table[k, 5], Weight.table[k, 5]),
rep(Weight.table[k, 6]/ Weight.table[k, 7], Weight.table[k, 7])
), na.rm = TRUE)
else {NetReturnPt.table[t, 2] <- sum(BackTest.table[t, 2: ncol(BackTest.table)]* 
c(rep(Weight.table[k, 2]/ Weight.table[k, 3], Weight.table[k, 3]),
rep(Weight.table[k, 4]/ Weight.table[k, 5], Weight.table[k, 5]),
rep(Weight.table[k, 6]/ Weight.table[k, 7], Weight.table[k, 7])
), na.rm = TRUE)
k <- k + 1
dput(head(NetReturnPt.table[, 2]))
[1] -0.026597604  0.016239878  0.048405161  0.005821428  0.012840000  0.012840000
dput(NetReturnPt.table[20:25, 2])
[1]  0.016020000  0.073282388  0.014539880  0.003858773  0.065490672 -0.003378064

在前几个数据之后没有给出正确值的 APPLY 函数:

k <- 1
TestApply <- function(t) {
if (BackTest.table[t, 1] %in% Weight.table[, 1] == FALSE) {
NetReturnPt.table[t, 2] <- sum(BackTest.table[t, 2: ncol(BackTest.table)] * 
c(rep(Weight.table[k, 2]/ Weight.table[k, 3], Weight.table[k, 3]),
rep(Weight.table[k, 4]/ Weight.table[k, 5], Weight.table[k, 5]),
rep(Weight.table[k, 6]/ Weight.table[k, 7], Weight.table[k, 7])
), na.rm = TRUE)
else { NetReturnPt.table[t, 2] <- sum(BackTest.table[t, 2:ncol(BackTest.table)] * 
c(rep(Weight.table[k, 2]/ Weight.table[k, 3], Weight.table[k, 3]),
rep(Weight.table[k, 4]/ Weight.table[k, 5], Weight.table[k, 5]),
rep(Weight.table[k, 6]/ Weight.table[k, 7], Weight.table[k, 7])
), na.rm = TRUE)
k <- k + 1
test.result <- sapply(1: nrow(BackTest.table), function(t) TestApply(t))
[1] -0.026597604  0.016239878  0.048405161  0.005821428  0.012840000  0.012840000
[1]  0.012840000  0.058735697  0.011653687  0.003092799  0.052490651 -0.002707512

您可以看到前几个值与使用 FORLOOP 的值相同。因此,我想知道它是否没有经过 IFELSE 过程。

感谢您的宝贵时间,我要感谢Steven Lee告诉我显示我的代码的更好方法。


你的代码不起作用的原因是 k 现在是sapply函数参数的内部变量。sapply对 BackTest.table 的每一行重复调用TestApply,但每次调用TestApply时,k 都保持 1,因为k <- k + 1TestApply之外没有任何影响。

处理此问题的一种方法是使用k <<- k + 1,这会在父环境中进行赋值(特别是定义k变量的第一个父环境)。虽然我认为这会起作用,但这既不是一个优雅也不安全的解决方案。通常,函数应该只通过返回值来影响其环境(更改k称为"副作用",通常不鼓励)。



sum(BackTest.table[t, 2: ncol(BackTest.table)] * 
c(rep(Weight.table[k, 2]/ Weight.table[k, 3], Weight.table[k, 3]),
rep(Weight.table[k, 4]/ Weight.table[k, 5], Weight.table[k, 5]),
rep(Weight.table[k, 6]/ Weight.table[k, 7], Weight.table[k, 7])

这段代码是邪恶的。这很令人困惑,你不知道它做了什么(它似乎在尝试获得加权平均值,但有更快的方法可以做到这一点,例如weighted.mean),这取决于 BackTest.table 中的列数。我会努力先清理这个烂摊子。

另请注意,具有仿行的向量需要计算t次,但实际上可以预先复制一次 Weight.table 的列,因为此过程对于每一行都是相同的。



这是一个建议的方法。正如 1 月所建议的那样,如果您重组数据,这应该很简单。


Weight.table.tidy <- Weight.table %>%
# Renaming here so the first character represents the class, and the second
#  character represents (w)eight or (n)number.
rename(s_w = "Stock", s_n = "Numbers1",
b_w = "Bond",  b_n = "Number2",
c_w = "现金",  c_n = "Number3") %>%
gather(col, val, -Date) %>% 
separate("col", c("class", "stat")) %>%
spread(stat, val)
#         Date class n      w
#1  2008-07-01     b 2 0.7858
#2  2008-07-01     c 2 0.1500
#3  2008-07-01     s 5 0.0642
#4  2008-07-04     b 2 0.7699
#5  2008-07-04     c 2 0.1500
#6  2008-07-04     s 5 0.0801


BackTest.table.tidy <- BackTest.table %>%
gather(type, val, -Date) %>%
separate("type", c("class", "num"), sep = 1) %>% 
left_join(Weight.table.tidy) %>%
group_by(class, num) %>%
fill(n, w) %>% ungroup()
## A tibble: 6 x 6
#  Date       class num     val     n     w
#  <date>     <chr> <chr> <dbl> <dbl> <dbl>
#1 2008-07-01 b     1         0     2 0.786
#2 2008-07-02 b     1         0     2 0.786
#3 2008-07-03 b     1         0     2 0.786
#4 2008-07-04 b     1         0     2 0.770
#5 2008-07-05 b     1         0     2 0.770
#6 2008-07-06 b     1         0     2 0.770


BackTest.table.tidy %>% 
mutate(val_wtd = val * w / n) %>%
count(Date, wt = val_wtd)
