我有一个大型的公司年度账面市值比率数据集。我需要通过以下逻辑将这些转换为月度数据:公司i (stock_id)从6月y + 1到5月y + 2的BtoM等于y年的BtoM。我如何应用这个逻辑并在一个巨大的数据集中捕获输出(我有n只股票从1968年到2018年的数据)?非常感谢每一个帮助!复制代码:
library(tidyverse)
Date <- as.Date(c('1994-12-01', '1995-12-01', '1996-12-01', '1994-12-01', '1995-12-01', '1996-
12-01'))
stock_id <- c('80482', '80482', '80482', '80483', '80483', '80483')
BtoM <- as.numeric(c('0.0111', '0.0079', '0.0293', '0.671', '0.721', '0.219'))
Book_to_Market <- data.frame(Date, stock_id, BtoM)
Book_to_Market <- Book_to_Market %>%
mutate(stock_id = as.integer(stock_id))
给了:
Date stock_id BtoM
1 1994-12-01 80482 0.0111
2 1995-12-01 80482 0.0079
3 1996-12-01 80482 0.0293
4 1994-12-01 80483 0.6710
5 1995-12-01 80483 0.7210
6 1996-12-01 80483 0.2190
我想要的输出是这样的:
Date_2 stock_id_2 BtoM_2
1 1995-06-01 80482 0.0111
2 1995-07-01 80482 0.0111
3 1995-08-01 80482 0.0111
4 1995-09-01 80482 0.0111
5 1995-10-01 80482 0.0111
6 1995-11-01 80482 0.0111
7 1995-12-01 80482 0.0111
8 1996-01-01 80482 0.0111
9 1996-02-01 80482 0.0111
10 1996-03-01 80482 0.0111
11 1996-04-01 80482 0.0111
12 1996-05-01 80482 0.0111
13 1995-06-01 80483 0.6710
14 1995-07-01 80483 0.6710
15 1995-08-01 80483 0.6710
16 1995-09-01 80483 0.6710
17 1995-10-01 80483 0.6710
18 1995-11-01 80483 0.6710
19 1995-12-01 80483 0.6710
20 1996-01-01 80483 0.6710
21 1996-02-01 80483 0.6710
22 1996-03-01 80483 0.6710
23 1996-04-01 80483 0.6710
24 1996-05-01 80483 0.6710
library(tidyverse)
Book_to_Market <- data.frame(
date =
as.Date(
c(
'1994-12-01',
'1995-12-01',
'1996-12-01',
'1994-12-01',
'1995-12-01',
'1996-12-01'
)
),
stock_id = c('80482', '80482', '80482', '80483', '80483', '80483'),
BtoM = c('0.0111', '0.0079', '0.0293', '0.671', '0.721', '0.219')
) %>%
mutate(stock_id = as.integer(stock_id),
BtoM = as.numeric(BtoM))
Book_to_Market %>%
rowwise() %>%
mutate(month = list(seq.Date(date - months(6), date + months(5), by = 'month'))) %>%
unnest(cols = c(month)) %>%
select(-date)
#> # A tibble: 72 x 3
#> stock_id BtoM month
#> <int> <dbl> <date>
#> 1 80482 0.0111 1994-06-01
#> 2 80482 0.0111 1994-07-01
#> 3 80482 0.0111 1994-08-01
#> 4 80482 0.0111 1994-09-01
#> 5 80482 0.0111 1994-10-01
#> 6 80482 0.0111 1994-11-01
#> 7 80482 0.0111 1994-12-01
#> 8 80482 0.0111 1995-01-01
#> 9 80482 0.0111 1995-02-01
#> 10 80482 0.0111 1995-03-01
#> # ... with 62 more rows
library(dplyr)
library(lubridate) # for `ymd()` and `month()`
# generate list of dates for each stock_id:
data.frame(Date = rep(seq.Date(as.Date("1995-06-01"),
as.Date("1996-05-01"),
by = "month"), by = 2),
stock_id = rep(c(80482, 80483), each = 12)) %>%
# link back to a Dec 1st with particular year, based on current month
mutate(rounded_date = ymd(
paste(year(Date) - if_else(month(Date) >= 6, 1, 2), 1201))) %>%
# join to source data
left_join(src %>% mutate(Date = ymd(Date)),
by = c("rounded_date" = "Date", "stock_id"))
结果
Date stock_id rounded_date BtoM
1 1995-06-01 80482 1994-12-01 0.0111
2 1995-07-01 80482 1994-12-01 0.0111
3 1995-08-01 80482 1994-12-01 0.0111
4 1995-09-01 80482 1994-12-01 0.0111
5 1995-10-01 80482 1994-12-01 0.0111
6 1995-11-01 80482 1994-12-01 0.0111
7 1995-12-01 80482 1994-12-01 0.0111
8 1996-01-01 80482 1994-12-01 0.0111
9 1996-02-01 80482 1994-12-01 0.0111
10 1996-03-01 80482 1994-12-01 0.0111
11 1996-04-01 80482 1994-12-01 0.0111
12 1996-05-01 80482 1994-12-01 0.0111
13 1995-06-01 80483 1994-12-01 0.6710
14 1995-07-01 80483 1994-12-01 0.6710
15 1995-08-01 80483 1994-12-01 0.6710
16 1995-09-01 80483 1994-12-01 0.6710
17 1995-10-01 80483 1994-12-01 0.6710
18 1995-11-01 80483 1994-12-01 0.6710
19 1995-12-01 80483 1994-12-01 0.6710
20 1996-01-01 80483 1994-12-01 0.6710
21 1996-02-01 80483 1994-12-01 0.6710
22 1996-03-01 80483 1994-12-01 0.6710
23 1996-04-01 80483 1994-12-01 0.6710
24 1996-05-01 80483 1994-12-01 0.6710