我有一个关于面板数据集的问题。我的数据集看起来是这样的:没有变量"eventidentifierdummy"。现在我要创建变量"eventidentifierdummy"所以它看起来像这样:
是否有办法为每个符号的事件编号?例如,定义的每个事件窗口,例如,在-4到4之间,每个符号出现几次,这是编号的吗?在某种程度上,作为事件和公司的附加标识符,使变量"eventidentifierdummy";看起来像这样?
Symbol Date Close Time Event Window Eventidentifierdummy
AAPL 09/03/2020 66,542503 16:25:00 NA NA NA
AAPL 09/03/2020 71,334999 16:26:00 NA -4 1
AAPL 09/03/2020 68,857498 16:27:00 NA -3 1
AAPL 09/03/2020 62,057499 16:28:00 NA -2 1
AAPL 09/03/2020 69,4925 16:29:00 NA -1 1
AAPL 09/03/2020 60,552502 16:30:00 1 0 1
AAPL 09/03/2020 63,215 16:31:00 NA 1 1
AAPL 10/03/2020 61,6675 09:30:00 NA 2 1
AAPL 10/03/2020 61,195 09:31:00 NA 3 1
AAPL 10/03/2020 57,310001 09:32:00 NA 4 1
AAPL 10/03/2020 56,092499 09:33:00 NA NA NA
AAPL 15/03/2020 65,535603 15:45:00 NA NA NA
AAPL 15/03/2020 66,357545 15:46:00 NA NA NA
AAPL 15/03/2020 62,852345 15:47:00 NA -4 2
AAPL 15/03/2020 64,057325 15:48:00 NA -3 2
AAPL 16/03/2020 66,494545 09:30:00 NA -2 2
AAPL 16/03/2020 63,557967 09:31:00 1 -1 2
AAPL 16/03/2020 64,415454 09:32:00 NA 0 2
AAPL 16/03/2020 62,2357 09:33:00 NA 1 2
AAPL 16/03/2020 64,4576 09:34:00 NA 2 2
AAPL 16/03/2020 59,457579 09:35:00 NA 3 2
AAPL 16/03/2020 58,092470 09:36:00 NA 4 2
VISA 05/03/2020 186,960007 16:26:00 NA NA NA
VISA 05/03/2020 184,360001 16:27:00 NA -4 1
VISA 05/03/2020 171,130005 16:28:00 NA -3 1
VISA 05/03/2020 182,600006 16:29:00 NA -2 1
VISA 05/03/2020 172,949997 16:30:00 NA -1 1
VISA 06/03/2020 160,080002 09:32:00 1 0 1
VISA 06/03/2020 175,830002 09:33:00 NA 1 1
VISA 06/03/2020 152,009995 09:34:00 NA 2 1
VISA 06/03/2020 157,889999 09:35:00 NA 3 1
VISA 06/03/2020 148,479996 09:36:00 NA 4 1
VISA 06/03/2020 152,25 09:37:00 NA NA NA
VISA 06/03/2020 146,830002 09:38:00 NA NA NA
VISA 20/03/2020 192,203826 16:12:00 NA NA NA
VISA 20/03/2020 193,293752 16:13:00 NA -4 2
VISA 20/03/2020 192,204726 16:14:00 NA -3 2
VISA 20/03/2020 192,2396 16:15:00 NA -2 2
VISA 20/03/2020 194,185620 16:16:00 NA -1 2
VISA 20/03/2020 196,614289 16:17:00 1 0 2
VISA 20/03/2020 197,826200 16:18:00 NA 1 2
VISA 21/03/2020 197,49176 09:29:00 NA 2 2
VISA 21/03/2020 197,239230 09:30:00 NA 3 2
VISA 21/03/2020 198,2300 09:31:00 NA 4 2
VISA 21/03/2020 198,230028 09:32:00 NA NA NA
VISA 21/03/2020 197,247020 09:33:00 NA NA NA
不幸的是,我不知道如何解决这个问题....
每个符号也有不同的事件,例如有时有两个事件,有时有三个事件。例如,我为符号" apl "在上述情况下。示例数据集显示了两个符号,但在我的整个数据集中,有超过1000个不同的符号,它们各自的事件和相关的事件窗口。
我已经尝试使用命令"count"来解决它,但是它不工作....
Companies$Window <- as.numeric(Companies$Window)
setDT(Companies)[,Window:= count(Window)][]
你能帮我创建虚拟变量&;eventidentifierdummy&;吗?
提前感谢!
PS:这是我的数据的输出():
> dput(Eventdummy)
structure(list(V1 = c("Symbol", "AAPL", "AAPL", "AAPL", "AAPL",
"AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",
"AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL",
"AAPL", "AAPL", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA",
"VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA",
"VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA",
"VISA", "VISA"), V2 = c("Date", "09/03/2020", "09/03/2020", "09/03/2020",
"09/03/2020", "09/03/2020", "09/03/2020", "09/03/2020", "10/03/2020",
"10/03/2020", "10/03/2020", "10/03/2020", "15/03/2020", "15/03/2020",
"15/03/2020", "15/03/2020", "16/03/2020", "16/03/2020", "16/03/2020",
"16/03/2020", "16/03/2020", "16/03/2020", "16/03/2020", "05/03/2020",
"05/03/2020", "05/03/2020", "05/03/2020", "05/03/2020", "06/03/2020",
"06/03/2020", "06/03/2020", "06/03/2020", "06/03/2020", "06/03/2020",
"06/03/2020", "20/03/2020", "20/03/2020", "20/03/2020", "20/03/2020",
"20/03/2020", "20/03/2020", "20/03/2020", "21/03/2020", "21/03/2020",
"21/03/2020", "21/03/2020", "21/03/2020"), V3 = c("Close", "66,542503",
"71,334999", "68,857498", "62,057499", "69,4925", "60,552502",
"63,215", "61,6675", "61,195", "57,310001", "56,092499", "65,535603",
"66,357545", "62,852345", "64,057325", "66,494545", "63,557967",
"64,415454", "62,2357", "64,4576", "59,457579", "58,092470",
"186,960007", "184,360001", "171,130005", "182,600006", "172,949997",
"160,080002", "175,830002", "152,009995", "157,889999", "148,479996",
"152,25", "146,830002", "192,203826", "193,293752", "192,204726",
"192,2396", "194,185620", "196,614289", "197,826200", "197,49176",
"197,239230", "198,2300", "198,230028", "197,247020"), V4 = c("Time",
"16:25:00", "16:26:00", "16:27:00", "16:28:00", "16:29:00", "16:30:00",
"16:31:00", "09:30:00", "09:31:00", "09:32:00", "09:33:00", "15:45:00",
"15:46:00", "15:47:00", "15:48:00", "09:30:00", "09:31:00", "09:32:00",
"09:33:00", "09:34:00", "09:35:00", "09:36:00", "16:26:00", "16:27:00",
"16:28:00", "16:29:00", "16:30:00", "09:32:00", "09:33:00", "09:34:00",
"09:35:00", "09:36:00", "09:37:00", "09:38:00", "16:12:00", "16:13:00",
"16:14:00", "16:15:00", "16:16:00", "16:17:00", "16:18:00", "09:29:00",
"09:30:00", "09:31:00", "09:32:00", "09:33:00"), V5 = c("Event",
NA, NA, NA, NA, NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "1", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "1", NA, NA, NA, NA, NA,
NA), V6 = c("Window", NA, "-4", "-3", "-2", "-1", "0", "1", "2",
"3", "4", NA, NA, NA, "-4", "-3", "-2", "-1", "0", "1", "2",
"3", "4", NA, "-4", "-3", "-2", "-1", "0", "1", "2", "3", "4",
NA, NA, NA, "-4", "-3", "-2", "-1", "0", "1", "2", "3", "4",
NA, NA), V7 = c("Eventidentifierdummy", NA, "1", "1", "1", "1",
"1", "1", "1", "1", "1", NA, NA, NA, "2", "2", "2", "2", "2",
"2", "2", "2", "2", NA, "1", "1", "1", "1", "1", "1", "1", "1",
"1", NA, NA, NA, "2", "2", "2", "2", "2", "2", "2", "2", "2",
NA, NA)), class = "data.frame", row.names = c(NA, -47L))
您可以使用rle
创建这样的序列,并将其应用于每个Symbol
。
library(dplyr)
create_num <- function(x) {
val <- with(rle(is.na(x)), rep(cumsum(values), lengths))
val[is.na(x)] <- NA
val
}
df %>%
group_by(Symbol) %>%
mutate(result = create_num(Window)) %>%
ungroup
返回-
Symbol Date Close Time Event Window Eventidentifierdummy result
1 AAPL 09/03/2020 66,542503 16:25:00 NA NA NA NA
2 AAPL 09/03/2020 71,334999 16:26:00 NA -4 1 1
3 AAPL 09/03/2020 68,857498 16:27:00 NA -3 1 1
4 AAPL 09/03/2020 62,057499 16:28:00 NA -2 1 1
5 AAPL 09/03/2020 69,4925 16:29:00 NA -1 1 1
6 AAPL 09/03/2020 60,552502 16:30:00 1 0 1 1
7 AAPL 09/03/2020 63,215 16:31:00 NA 1 1 1
8 AAPL 10/03/2020 61,6675 09:30:00 NA 2 1 1
9 AAPL 10/03/2020 61,195 09:31:00 NA 3 1 1
10 AAPL 10/03/2020 57,310001 09:32:00 NA 4 1 1
11 AAPL 10/03/2020 56,092499 09:33:00 NA NA NA NA
12 AAPL 15/03/2020 65,535603 15:45:00 NA NA NA NA
13 AAPL 15/03/2020 66,357545 15:46:00 NA NA NA NA
14 AAPL 15/03/2020 62,852345 15:47:00 NA -4 2 2
15 AAPL 15/03/2020 64,057325 15:48:00 NA -3 2 2
16 AAPL 16/03/2020 66,494545 09:30:00 NA -2 2 2
17 AAPL 16/03/2020 63,557967 09:31:00 1 -1 2 2
18 AAPL 16/03/2020 64,415454 09:32:00 NA 0 2 2
19 AAPL 16/03/2020 62,2357 09:33:00 NA 1 2 2
20 AAPL 16/03/2020 64,4576 09:34:00 NA 2 2 2
21 AAPL 16/03/2020 59,457579 09:35:00 NA 3 2 2
22 AAPL 16/03/2020 58,092470 09:36:00 NA 4 2 2
23 VISA 05/03/2020 186,960007 16:26:00 NA NA NA NA
24 VISA 05/03/2020 184,360001 16:27:00 NA -4 1 1
25 VISA 05/03/2020 171,130005 16:28:00 NA -3 1 1
26 VISA 05/03/2020 182,600006 16:29:00 NA -2 1 1
27 VISA 05/03/2020 172,949997 16:30:00 NA -1 1 1
28 VISA 06/03/2020 160,080002 09:32:00 1 0 1 1
29 VISA 06/03/2020 175,830002 09:33:00 NA 1 1 1
30 VISA 06/03/2020 152,009995 09:34:00 NA 2 1 1
31 VISA 06/03/2020 157,889999 09:35:00 NA 3 1 1
32 VISA 06/03/2020 148,479996 09:36:00 NA 4 1 1
33 VISA 06/03/2020 152,25 09:37:00 NA NA NA NA
34 VISA 06/03/2020 146,830002 09:38:00 NA NA NA NA
35 VISA 20/03/2020 192,203826 16:12:00 NA NA NA NA
36 VISA 20/03/2020 193,293752 16:13:00 NA -4 2 2
37 VISA 20/03/2020 192,204726 16:14:00 NA -3 2 2
38 VISA 20/03/2020 192,2396 16:15:00 NA -2 2 2
39 VISA 20/03/2020 194,185620 16:16:00 NA -1 2 2
40 VISA 20/03/2020 196,614289 16:17:00 1 0 2 2
41 VISA 20/03/2020 197,826200 16:18:00 NA 1 2 2
42 VISA 21/03/2020 197,49176 09:29:00 NA 2 2 2
43 VISA 21/03/2020 197,239230 09:30:00 NA 3 2 2
44 VISA 21/03/2020 198,2300 09:31:00 NA 4 2 2
45 VISA 21/03/2020 198,230028 09:32:00 NA NA NA NA
46 VISA 21/03/2020 197,247020 09:33:00 NA NA NA NA
创建另一个名为result
的列,以便可以比较答案。可以用Eventidentifierdummy
代替result
使用data.table
进行分组,并将事件的开始标识为-4值:
library( data.table )
setDT( companies )
companies[ !is.na(Window),
Eventidentifierdummy := cumsum(Window == -4),
by = Symbol ]