如何在R中构建实时数据帧



假设我有两个数据帧,如下所示:

df1 = structure(list(Date = c("2000-01-05", "2000-02-03", "2000-03-02", 
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", 
"2000-09-14", "2000-10-19", "2000-11-02", "2000-12-14", "2001-02-01", 
"2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-06-21", 
"2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06"
)), row.names = c(NA, 23L), class = "data.frame")
Date
1  2000-01-05
2  2000-02-03
3  2000-03-02
4  2000-03-30
5  2000-04-13
6  2000-05-11
7  2000-06-08
8  2000-07-06
9  2000-09-14
10 2000-10-19
11 2000-11-02
12 2000-12-14
13 2001-02-01
14 2001-03-01
15 2001-04-11
16 2001-05-10
17 2001-06-07
18 2001-06-21
19 2001-07-05
20 2001-08-30
21 2001-10-11
22 2001-11-08
23 2001-12-06

df2 = structure(list(Date = structure(c(10987, 11016, 11047, 11077, 
11108, 11138, 11169, 11200, 11230, 11261, 11291, 11322, 11353, 
11381, 11412, 11442, 11473, 11503, 11534, 11565, 11595, 11626, 
11656, 11687), class = "Date"), x = c(3.04285714285714, 3.27571428571429, 
3.5104347826087, 3.685, 3.92, 4.29454545454545, 4.30857142857143, 
4.41913043478261, 4.59047619047619, 4.76272727272727, 4.82909090909091, 
4.82684210526316, 4.75590909090909, 4.9925, 4.78136363636364, 
5.06421052631579, 4.65363636363636, 4.53952380952381, 4.50545454545454, 
4.49130434782609, 3.9865, 3.97130434782609, 3.50727272727273, 
3.33888888888889)), row.names = c(NA, 24L), class = "data.frame")
Date        x
1  2000-01-31 3.042857
2  2000-02-29 3.275714
3  2000-03-31 3.510435
4  2000-04-30 3.685000
5  2000-05-31 3.920000
6  2000-06-30 4.294545
7  2000-07-31 4.308571
8  2000-08-31 4.419130
9  2000-09-30 4.590476
10 2000-10-31 4.762727
11 2000-11-30 4.829091
12 2000-12-31 4.826842
13 2001-01-31 4.755909
14 2001-02-28 4.992500
15 2001-03-31 4.781364
16 2001-04-30 5.064211
17 2001-05-31 4.653636
18 2001-06-30 4.539524
19 2001-07-31 4.505455
20 2001-08-31 4.491304
21 2001-09-30 3.986500
22 2001-10-31 3.971304
23 2001-11-30 3.507273
24 2001-12-31 3.338889

现在,我想做的是创建一个实时数据帧,即df2中仅在df1时可用的数据。例如,在2000-01-05(df1中的第一行(处,由于2000-01-31(df2的第一排(出现在2000-01-05之后,所以df2中没有数据可用。然而,在2000-02-03(df1中的第二行(中,2000-01-31(df2的第一行(中的观测是可用的。这应该是每一行的理由。结果应该是这样的:

Date    y
1  2000-01-05   NA 
2  2000-02-03  3.042857
3  2000-03-02  3.275714
4  2000-03-30  3.275714
5  2000-04-13  3.510435
6  2000-05-11  3.685000
....

规则是:仅从df2中获取df1时可用的观测值。

有人能帮我吗?

谢谢!

您可以完成df2日期,然后加入。

library(dplyr)
library(tidyr)
# create a dataframe with all the days, not just the snapshots
df2_complete <- df2 %>% 
complete(Date = seq.Date(min(Date), max(Date), by = "day")) %>% 
fill(x, .direction = "down")
# convert to Date class for this case and join
df1 %>% 
mutate(Date = as.Date(Date)) %>% 
left_join(df2_complete, by = "Date")

哪个给出:

Date        x
1  2000-01-05       NA
2  2000-02-03 3.042857
3  2000-03-02 3.275714
4  2000-03-30 3.275714
5  2000-04-13 3.510435
6  2000-05-11 3.685000
....

最新更新