我昨天写了一个类似的问题,但随着时间的推移,问题变得更加复杂,所以我想再问一次。
在以下情况下,我有一个关于计算回报的问题。对于每个ID,如果Date=EventDate,我希望计算"事件日期";买入并持有回报;从活动日期前5天到活动日期后5天。
为了更具体地使用下表,我想计算每个ID在事件日期(-5~+5天(前后的回报率,其中回报率为,例如,ID=1时的(9/10-1(、(12/10-1(、(14/10-1(,(17/10-1(和(16/10-1((也就是说,事件日期前六天的价格是回报率计算中的分母。(
+----+------------+-------+------------+
| ID | Date | Price | EventDate |
+----+------------+-------+------------+
| 1 | 2011-03-06 | 10 | NA |
| 1 | 2011-03-07 | 9 | NA |
| 1 | 2011-03-08 | 12 | NA |
| 1 | 2011-03-09 | 14 | NA |
| 1 | 2011-03-10 | 15 | NA |
| 1 | 2011-03-11 | 17 | NA |
| 1 | 2011-03-12 | 12 | 2011-03-12 |
| 1 | 2011-03-13 | 14 | NA |
| 1 | 2011-03-14 | 17 | NA |
| 1 | 2011-03-15 | 14 | NA |
| 1 | 2011-03-16 | 17 | NA |
| 1 | 2011-03-17 | 16 | NA |
| 1 | 2011-03-18 | 15 | NA |
| 1 | 2011-03-19 | 16 | NA |
| 1 | 2011-03-20 | 17 | NA |
| 1 | 2011-03-21 | 18 | NA |
| 1 | 2011-03-22 | 11 | NA |
| 1 | 2011-03-23 | 15 | NA |
| 1 | 2011-03-24 | 12 | 2011-03-24 |
| 1 | 2011-03-25 | 13 | NA |
| 1 | 2011-03-26 | 15 | NA |
| 2 | 2011-06-11 | 48 | NA |
| 2 | 2011-06-12 | 49 | NA |
| 2 | 2011-06-13 | 50 | NA |
| 2 | 2011-06-14 | 57 | NA |
| 2 | 2011-06-15 | 60 | NA |
| 2 | 2011-06-16 | 49 | NA |
| 2 | 2011-06-17 | 64 | NA |
| 2 | 2011-06-18 | 63 | NA |
| 2 | 2011-06-19 | 67 | 2011-06-19 |
| 2 | 2011-06-20 | 70 | NA |
| 2 | 2011-06-21 | 58 | NA |
| 2 | 2011-06-22 | 65 | NA |
| 2 | 2011-06-23 | 57 | NA |
| 2 | 2011-06-24 | 55 | NA |
| 2 | 2011-06-25 | 57 | NA |
| 2 | 2011-06-26 | 60 | NA |
+----+------------+-------+------------+
最后,我希望在下表中添加一个新的专栏。
+----+------------+-------+------------+---------------+
| ID | Date | Price | EventDate | BuyHoldReturn |
+----+------------+-------+------------+---------------+
| 1 | 2011-03-06 | 10 | NA | NA |
| 1 | 2011-03-07 | 9 | NA | -0.1 |
| 1 | 2011-03-08 | 12 | NA | 0.2 |
| 1 | 2011-03-09 | 14 | NA | 0.4 |
| 1 | 2011-03-10 | 15 | NA | 0.5 |
| 1 | 2011-03-11 | 17 | NA | 0.7 |
| 1 | 2011-03-12 | 12 | 2011-03-12 | 0.2 |
| 1 | 2011-03-13 | 14 | NA | 0.4 |
| 1 | 2011-03-14 | 17 | NA | 0.7 |
| 1 | 2011-03-15 | 14 | NA | 0.4 |
| 1 | 2011-03-16 | 17 | NA | 0.7 |
| 1 | 2011-03-17 | 16 | NA | 0.6 |
| 1 | 2011-03-18 | 15 | NA | NA |
| 1 | 2011-03-19 | 16 | NA | 0.066666667 |
| 1 | 2011-03-20 | 17 | NA | 0.133333333 |
| 1 | 2011-03-21 | 18 | NA | 0.2 |
| 1 | 2011-03-22 | 11 | NA | -0.266666667 |
| 1 | 2011-03-23 | 15 | NA | 0 |
| 1 | 2011-03-24 | 12 | 2011-03-24 | -0.2 |
| 1 | 2011-03-25 | 13 | NA | -0.133333333 |
| 1 | 2011-03-26 | 15 | NA | 0 |
| 2 | 2011-06-11 | 48 | NA | NA |
| 2 | 2011-06-12 | 49 | NA | NA |
| 2 | 2011-06-13 | 50 | NA | NA |
| 2 | 2011-06-14 | 57 | NA | 0.14 |
| 2 | 2011-06-15 | 60 | NA | 0.2 |
| 2 | 2011-06-16 | 49 | NA | -0.02 |
| 2 | 2011-06-17 | 64 | NA | 0.28 |
| 2 | 2011-06-18 | 63 | NA | 0.26 |
| 2 | 2011-06-19 | 67 | 2011-06-19 | 0.34 |
| 2 | 2011-06-20 | 70 | NA | 0.4 |
| 2 | 2011-06-21 | 58 | NA | 0.16 |
| 2 | 2011-06-22 | 65 | NA | 0.3 |
| 2 | 2011-06-23 | 57 | NA | 0.14 |
| 2 | 2011-06-24 | 55 | NA | 0.1 |
| 2 | 2011-06-25 | 57 | NA | NA |
| 2 | 2011-06-26 | 60 | NA | NA |
+----+------------+-------+------------+---------------+
我有一个使用下面代码的想法,但不知道如何计算活动日期前后的11次买入和持有回报。
data<-data%>%
group_by(ID)%>%
mutate(BuyHoldReturn=ifelse(Date==EventDate, ....
提前感谢!
您可以按如下方式使用dplyr
:
# ensure Date and EventDate are Date columns
df <- df %>% mutate(across(c(Date,EventDate), ~as.Date(.x)))
left_join(
select(df,ID, Date),
inner_join(df %>% select(-EventDate),filter(df,!is.na(EventDate)) %>% distinct(ID, EventDate), by="ID") %>%
filter(abs(EventDate-Date)<=6) %>%
group_by(ID, EventDate) %>%
mutate(BuyHoldReturn = c(NA,Price[-1]/Price[1]-1)),
by=c("ID", "Date")
)
输出:
ID Date Price EventDate BuyHoldReturn
1 1 2011-03-06 10 2011-03-12 NA
2 1 2011-03-07 9 2011-03-12 -0.10000000
3 1 2011-03-08 12 2011-03-12 0.20000000
4 1 2011-03-09 14 2011-03-12 0.40000000
5 1 2011-03-10 15 2011-03-12 0.50000000
6 1 2011-03-11 17 2011-03-12 0.70000000
7 1 2011-03-12 12 2011-03-12 0.20000000
8 1 2011-03-13 14 2011-03-12 0.40000000
9 1 2011-03-14 17 2011-03-12 0.70000000
10 1 2011-03-15 14 2011-03-12 0.40000000
11 1 2011-03-16 17 2011-03-12 0.70000000
12 1 2011-03-17 16 2011-03-12 0.60000000
13 1 2011-03-18 15 2011-03-12 0.50000000
14 1 2011-03-18 15 2011-03-24 NA
15 1 2011-03-19 16 2011-03-24 0.06666667
16 1 2011-03-20 17 2011-03-24 0.13333333
17 1 2011-03-21 18 2011-03-24 0.20000000
18 1 2011-03-22 11 2011-03-24 -0.26666667
19 1 2011-03-23 15 2011-03-24 0.00000000
20 1 2011-03-24 12 2011-03-24 -0.20000000
21 1 2011-03-25 13 2011-03-24 -0.13333333
22 1 2011-03-26 15 2011-03-24 0.00000000
23 2 2011-06-11 NA <NA> NA
24 2 2011-06-12 NA <NA> NA
25 2 2011-06-13 50 2011-06-19 NA
26 2 2011-06-14 57 2011-06-19 0.14000000
27 2 2011-06-15 60 2011-06-19 0.20000000
28 2 2011-06-16 49 2011-06-19 -0.02000000
29 2 2011-06-17 64 2011-06-19 0.28000000
30 2 2011-06-18 63 2011-06-19 0.26000000
31 2 2011-06-19 67 2011-06-19 0.34000000
32 2 2011-06-20 70 2011-06-19 0.40000000
33 2 2011-06-21 58 2011-06-19 0.16000000
34 2 2011-06-22 65 2011-06-19 0.30000000
35 2 2011-06-23 57 2011-06-19 0.14000000
36 2 2011-06-24 55 2011-06-19 0.10000000
37 2 2011-06-25 57 2011-06-19 0.14000000
38 2 2011-06-26 NA <NA> NA
这可以使用data.table
进行类似操作
library(data.table)
setDT(df)
df[,(c("Date", "EventDate")):=lapply(.SD, as.Date), .SDcols=c("Date", "EventDate")]
df[,!c("EventDate")][unique(df[!is.na(EventDate), .(ID, EventDate)]), on="ID", allow.cartesian=T][
abs(EventDate-Date)<=6][,BuyHoldReturn:=c(NA,Price[-1]/Price[1]-1), .(ID,EventDate)][
df[,.(ID,Date)], on=.(ID,Date)
]
以上两种方法都使用笛卡尔连接,然后进行过滤。如果你有一个非常大的表,并且你想提高速度,你也可以使用data.table
非等连接来隔离每个ID
的EventDate
(s(之前和之后的行
- 加载库并将数据设置为data.table
library(data.table)
setDT(dt)
- 创建一个事件帧,它只有ID和events
events = unique(df[!is.na(EventDate),.(ID,EventDate)])
ID EventDate
<int> <IDat>
1: 1 2011-03-12
2: 1 2011-03-24
3: 2 2011-06-19
- 向框架中添加一些辅助列(这有助于保留非等联接中的列
events[, eDate:=EventDate]
df[, `:=`(s=Date-6, e=Date+6)]
- 使用非等联接:要求ID匹配,EventDate介于上面创建的下限和上限之间(即
s
和e
(
bhr = events[df, on=.(ID, EventDate>=s, EventDate<=e), nomatch=0]
- 通过
ID
和EventDate
生成BuyHoldReturn
列
bhr = bhr[, .(Date, BuyHoldReturn=c(NA, Price[-1]/Price[1] -1)), by = .(ID,eDate)]
- 合并回原始帧以保留具有
NA
的行,并选择具有所需名称和排列的列
bhr[df,on=.(ID,Date),.(ID,Date,Price,EventDate=i.EventDate,BuyHoldReturn)]
输入:
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Date = c("2011-03-06",
"2011-03-07", "2011-03-08", "2011-03-09", "2011-03-10", "2011-03-11",
"2011-03-12", "2011-03-13", "2011-03-14", "2011-03-15", "2011-03-16",
"2011-03-17", "2011-03-18", "2011-03-19", "2011-03-20", "2011-03-21",
"2011-03-22", "2011-03-23", "2011-03-24", "2011-03-25", "2011-03-26",
"2011-06-11", "2011-06-12", "2011-06-13", "2011-06-14", "2011-06-15",
"2011-06-16", "2011-06-17", "2011-06-18", "2011-06-19", "2011-06-20",
"2011-06-21", "2011-06-22", "2011-06-23", "2011-06-24", "2011-06-25",
"2011-06-26"), Price = c(10L, 9L, 12L, 14L, 15L, 17L, 12L, 14L,
17L, 14L, 17L, 16L, 15L, 16L, 17L, 18L, 11L, 15L, 12L, 13L, 15L,
48L, 49L, 50L, 57L, 60L, 49L, 64L, 63L, 67L, 70L, 58L, 65L, 57L,
55L, 57L, 60L), EventDate = c(NA, NA, NA, NA, NA, NA, "2011-03-12",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-03-24", NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-06-19", NA, NA, NA,
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -37L
))