r语言 - 如何添加一个事件标识符作为一个虚拟变量面板数据集?



我有一个关于面板数据集的问题。我的数据集看起来是这样的:没有变量"eventidentifierdummy"。现在我要创建变量"eventidentifierdummy"所以它看起来像这样:

是否有办法为每个符号的事件编号?例如,定义的每个事件窗口,例如,在-4到4之间,每个符号出现几次,这是编号的吗?在某种程度上,作为事件和公司的附加标识符,使变量"eventidentifierdummy";看起来像这样?

Symbol  Date        Close       Time      Event  Window    Eventidentifierdummy
AAPL    09/03/2020  66,542503   16:25:00    NA    NA        NA
AAPL    09/03/2020  71,334999   16:26:00    NA    -4        1
AAPL    09/03/2020  68,857498   16:27:00    NA    -3        1
AAPL    09/03/2020  62,057499   16:28:00    NA    -2        1
AAPL    09/03/2020  69,4925     16:29:00    NA    -1        1
AAPL    09/03/2020  60,552502   16:30:00    1      0        1
AAPL    09/03/2020  63,215      16:31:00    NA     1        1 
AAPL    10/03/2020  61,6675     09:30:00    NA     2        1
AAPL    10/03/2020  61,195      09:31:00    NA     3        1
AAPL    10/03/2020  57,310001   09:32:00    NA     4        1  
AAPL    10/03/2020  56,092499   09:33:00    NA    NA        NA
AAPL    15/03/2020  65,535603   15:45:00    NA    NA        NA
AAPL    15/03/2020  66,357545   15:46:00    NA    NA        NA
AAPL    15/03/2020  62,852345   15:47:00    NA    -4        2
AAPL    15/03/2020  64,057325   15:48:00    NA    -3        2
AAPL    16/03/2020  66,494545   09:30:00    NA    -2        2
AAPL    16/03/2020  63,557967   09:31:00    1     -1        2
AAPL    16/03/2020  64,415454   09:32:00    NA     0        2
AAPL    16/03/2020  62,2357     09:33:00    NA     1        2
AAPL    16/03/2020  64,4576     09:34:00    NA     2        2
AAPL    16/03/2020  59,457579   09:35:00    NA     3        2
AAPL    16/03/2020  58,092470   09:36:00    NA     4        2
VISA    05/03/2020  186,960007  16:26:00    NA    NA        NA
VISA    05/03/2020  184,360001  16:27:00    NA    -4        1
VISA    05/03/2020  171,130005  16:28:00    NA    -3        1
VISA    05/03/2020  182,600006  16:29:00    NA    -2        1
VISA    05/03/2020  172,949997  16:30:00    NA    -1        1
VISA    06/03/2020  160,080002  09:32:00    1      0        1
VISA    06/03/2020  175,830002  09:33:00    NA     1        1   
VISA    06/03/2020  152,009995  09:34:00    NA     2        1
VISA    06/03/2020  157,889999  09:35:00    NA     3        1
VISA    06/03/2020  148,479996  09:36:00    NA     4        1
VISA    06/03/2020  152,25      09:37:00    NA    NA        NA
VISA    06/03/2020  146,830002  09:38:00    NA    NA        NA
VISA    20/03/2020  192,203826  16:12:00    NA    NA        NA
VISA    20/03/2020  193,293752  16:13:00    NA    -4        2
VISA    20/03/2020  192,204726  16:14:00    NA    -3        2
VISA    20/03/2020  192,2396    16:15:00    NA    -2        2
VISA    20/03/2020  194,185620  16:16:00    NA    -1        2
VISA    20/03/2020  196,614289  16:17:00    1      0        2
VISA    20/03/2020  197,826200  16:18:00    NA     1        2
VISA    21/03/2020  197,49176   09:29:00    NA     2        2
VISA    21/03/2020  197,239230  09:30:00    NA     3        2
VISA    21/03/2020  198,2300    09:31:00    NA     4        2
VISA    21/03/2020  198,230028  09:32:00    NA    NA        NA
VISA    21/03/2020  197,247020  09:33:00    NA    NA        NA

不幸的是,我不知道如何解决这个问题....

每个符号也有不同的事件,例如有时有两个事件,有时有三个事件。例如,我为符号" apl "在上述情况下。示例数据集显示了两个符号,但在我的整个数据集中,有超过1000个不同的符号,它们各自的事件和相关的事件窗口。

我已经尝试使用命令"count"来解决它,但是它不工作....

Companies$Window <- as.numeric(Companies$Window)
setDT(Companies)[,Window:= count(Window)][]

你能帮我创建虚拟变量&;eventidentifierdummy&;吗?

提前感谢!

PS:这是我的数据的输出():

> dput(Eventdummy)
structure(list(V1 = c("Symbol", "AAPL", "AAPL", "AAPL", "AAPL", 
"AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", 
"AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", 
"AAPL", "AAPL", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", 
"VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", 
"VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", "VISA", 
"VISA", "VISA"), V2 = c("Date", "09/03/2020", "09/03/2020", "09/03/2020", 
"09/03/2020", "09/03/2020", "09/03/2020", "09/03/2020", "10/03/2020", 
"10/03/2020", "10/03/2020", "10/03/2020", "15/03/2020", "15/03/2020", 
"15/03/2020", "15/03/2020", "16/03/2020", "16/03/2020", "16/03/2020", 
"16/03/2020", "16/03/2020", "16/03/2020", "16/03/2020", "05/03/2020", 
"05/03/2020", "05/03/2020", "05/03/2020", "05/03/2020", "06/03/2020", 
"06/03/2020", "06/03/2020", "06/03/2020", "06/03/2020", "06/03/2020", 
"06/03/2020", "20/03/2020", "20/03/2020", "20/03/2020", "20/03/2020", 
"20/03/2020", "20/03/2020", "20/03/2020", "21/03/2020", "21/03/2020", 
"21/03/2020", "21/03/2020", "21/03/2020"), V3 = c("Close", "66,542503", 
"71,334999", "68,857498", "62,057499", "69,4925", "60,552502", 
"63,215", "61,6675", "61,195", "57,310001", "56,092499", "65,535603", 
"66,357545", "62,852345", "64,057325", "66,494545", "63,557967", 
"64,415454", "62,2357", "64,4576", "59,457579", "58,092470", 
"186,960007", "184,360001", "171,130005", "182,600006", "172,949997", 
"160,080002", "175,830002", "152,009995", "157,889999", "148,479996", 
"152,25", "146,830002", "192,203826", "193,293752", "192,204726", 
"192,2396", "194,185620", "196,614289", "197,826200", "197,49176", 
"197,239230", "198,2300", "198,230028", "197,247020"), V4 = c("Time", 
"16:25:00", "16:26:00", "16:27:00", "16:28:00", "16:29:00", "16:30:00", 
"16:31:00", "09:30:00", "09:31:00", "09:32:00", "09:33:00", "15:45:00", 
"15:46:00", "15:47:00", "15:48:00", "09:30:00", "09:31:00", "09:32:00", 
"09:33:00", "09:34:00", "09:35:00", "09:36:00", "16:26:00", "16:27:00", 
"16:28:00", "16:29:00", "16:30:00", "09:32:00", "09:33:00", "09:34:00", 
"09:35:00", "09:36:00", "09:37:00", "09:38:00", "16:12:00", "16:13:00", 
"16:14:00", "16:15:00", "16:16:00", "16:17:00", "16:18:00", "09:29:00", 
"09:30:00", "09:31:00", "09:32:00", "09:33:00"), V5 = c("Event", 
NA, NA, NA, NA, NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "1", NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, "1", NA, NA, NA, NA, NA, 
NA), V6 = c("Window", NA, "-4", "-3", "-2", "-1", "0", "1", "2", 
"3", "4", NA, NA, NA, "-4", "-3", "-2", "-1", "0", "1", "2", 
"3", "4", NA, "-4", "-3", "-2", "-1", "0", "1", "2", "3", "4", 
NA, NA, NA, "-4", "-3", "-2", "-1", "0", "1", "2", "3", "4", 
NA, NA), V7 = c("Eventidentifierdummy", NA, "1", "1", "1", "1", 
"1", "1", "1", "1", "1", NA, NA, NA, "2", "2", "2", "2", "2", 
"2", "2", "2", "2", NA, "1", "1", "1", "1", "1", "1", "1", "1", 
"1", NA, NA, NA, "2", "2", "2", "2", "2", "2", "2", "2", "2", 
NA, NA)), class = "data.frame", row.names = c(NA, -47L))

您可以使用rle创建这样的序列,并将其应用于每个Symbol

library(dplyr)
create_num <- function(x) {
val <- with(rle(is.na(x)), rep(cumsum(values), lengths))
val[is.na(x)] <- NA
val  
}
df %>%
group_by(Symbol) %>%
mutate(result = create_num(Window)) %>%
ungroup 

返回-

Symbol       Date      Close     Time Event Window Eventidentifierdummy result
1    AAPL 09/03/2020  66,542503 16:25:00    NA     NA                   NA     NA
2    AAPL 09/03/2020  71,334999 16:26:00    NA     -4                    1      1
3    AAPL 09/03/2020  68,857498 16:27:00    NA     -3                    1      1
4    AAPL 09/03/2020  62,057499 16:28:00    NA     -2                    1      1
5    AAPL 09/03/2020    69,4925 16:29:00    NA     -1                    1      1
6    AAPL 09/03/2020  60,552502 16:30:00     1      0                    1      1
7    AAPL 09/03/2020     63,215 16:31:00    NA      1                    1      1
8    AAPL 10/03/2020    61,6675 09:30:00    NA      2                    1      1
9    AAPL 10/03/2020     61,195 09:31:00    NA      3                    1      1
10   AAPL 10/03/2020  57,310001 09:32:00    NA      4                    1      1
11   AAPL 10/03/2020  56,092499 09:33:00    NA     NA                   NA     NA
12   AAPL 15/03/2020  65,535603 15:45:00    NA     NA                   NA     NA
13   AAPL 15/03/2020  66,357545 15:46:00    NA     NA                   NA     NA
14   AAPL 15/03/2020  62,852345 15:47:00    NA     -4                    2      2
15   AAPL 15/03/2020  64,057325 15:48:00    NA     -3                    2      2
16   AAPL 16/03/2020  66,494545 09:30:00    NA     -2                    2      2
17   AAPL 16/03/2020  63,557967 09:31:00     1     -1                    2      2
18   AAPL 16/03/2020  64,415454 09:32:00    NA      0                    2      2
19   AAPL 16/03/2020    62,2357 09:33:00    NA      1                    2      2
20   AAPL 16/03/2020    64,4576 09:34:00    NA      2                    2      2
21   AAPL 16/03/2020  59,457579 09:35:00    NA      3                    2      2
22   AAPL 16/03/2020  58,092470 09:36:00    NA      4                    2      2
23   VISA 05/03/2020 186,960007 16:26:00    NA     NA                   NA     NA
24   VISA 05/03/2020 184,360001 16:27:00    NA     -4                    1      1
25   VISA 05/03/2020 171,130005 16:28:00    NA     -3                    1      1
26   VISA 05/03/2020 182,600006 16:29:00    NA     -2                    1      1
27   VISA 05/03/2020 172,949997 16:30:00    NA     -1                    1      1
28   VISA 06/03/2020 160,080002 09:32:00     1      0                    1      1
29   VISA 06/03/2020 175,830002 09:33:00    NA      1                    1      1
30   VISA 06/03/2020 152,009995 09:34:00    NA      2                    1      1
31   VISA 06/03/2020 157,889999 09:35:00    NA      3                    1      1
32   VISA 06/03/2020 148,479996 09:36:00    NA      4                    1      1
33   VISA 06/03/2020     152,25 09:37:00    NA     NA                   NA     NA
34   VISA 06/03/2020 146,830002 09:38:00    NA     NA                   NA     NA
35   VISA 20/03/2020 192,203826 16:12:00    NA     NA                   NA     NA
36   VISA 20/03/2020 193,293752 16:13:00    NA     -4                    2      2
37   VISA 20/03/2020 192,204726 16:14:00    NA     -3                    2      2
38   VISA 20/03/2020   192,2396 16:15:00    NA     -2                    2      2
39   VISA 20/03/2020 194,185620 16:16:00    NA     -1                    2      2
40   VISA 20/03/2020 196,614289 16:17:00     1      0                    2      2
41   VISA 20/03/2020 197,826200 16:18:00    NA      1                    2      2
42   VISA 21/03/2020  197,49176 09:29:00    NA      2                    2      2
43   VISA 21/03/2020 197,239230 09:30:00    NA      3                    2      2
44   VISA 21/03/2020   198,2300 09:31:00    NA      4                    2      2
45   VISA 21/03/2020 198,230028 09:32:00    NA     NA                   NA     NA
46   VISA 21/03/2020 197,247020 09:33:00    NA     NA                   NA     NA

创建另一个名为result的列,以便可以比较答案。可以用Eventidentifierdummy代替result

使用data.table进行分组,并将事件的开始标识为-4值:

library( data.table )
setDT( companies )
companies[ !is.na(Window),
Eventidentifierdummy := cumsum(Window == -4),
by = Symbol ]

最新更新