我想为每月的面板数据创建滞后变量。我的示例数据集如下所示。
Time ID Value1
Jan-14 A 12
Feb-14 A 14
Mar-14 A 15
Apr-14 A 18
May-14 A 10
Jun-14 A 12
Jul-14 A 13
Aug-14 A 14
Jan-14 B 32
Feb-14 B 14
Mar-14 B 15
Apr-14 B 18
May-14 B 20
Jun-14 B 12
Jul-14 B 13
Aug-14 B 14
我想为每个ID创建2个新的lag-1和lag-1变量。我做了下面的事情,但它只适用于ID:A,其余ID滞后于以前的ID。
library(zoo)
library(plm)
df["Time1"] <- as.yearmon(df$Time,format="%b-%y")
pdf1 <-pdata.frame(df,index=c("ID","Time1"))
pdf1$lag1_Value1<-lag((pdf1$Value1), 1)
但理想情况下,我需要我的输出如下。
Time ID Value1 lag1.Value1 lag2_Value1
Jan-14 A 12 NA NA
Feb-14 A 14 12 NA
Mar-14 A 15 14 12
Apr-14 A 18 15 14
May-14 A 10 18 15
Jun-14 A 12 10 18
Jul-14 A 13 12 10
Aug-14 A 14 13 12
Jan-14 B 32 NA NA
Feb-14 B 14 32 NA
Mar-14 B 15 14 32
Apr-14 B 18 15 14
May-14 B 20 18 15
Jun-14 B 12 20 18
Jul-14 B 13 12 20
Aug-14 B 14 13 12
下面的代码中给出了我的示例数据集。
df=structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L,
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14",
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class ="factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L,
15L, 18L, 20L, 12L, 13L, 14L), Time1 = structure(c(2014,
2014.08333333333, 2014.16666666667, 2014.25, 2014.33333333333,
2014.41666666667, 2014.5, 2014.58333333333, 2014, 2014.08333333333,
2014.16666666667, 2014.25, 2014.33333333333, 2014.41666666667,
2014.5, 2014.58333333333), class = "yearmon")), row.names = c(NA,
-16L), .Names = c("Time", "ID", "Value1", "Time1"), class = "data.frame")
谢谢你的帮助。
使用dplyr
:
library(dplyr)
df %>% group_by(ID) %>%
mutate(lag1.Value1 = lag(Value1), lag2_Value1 = lag(lag1.Value1))
输出
Source: local data frame [16 x 6]
Groups: ID [2]
# Time ID Value1 Time1 lag1.Value1 lag2_Value1
# <fctr> <fctr> <int> <dbl> <int> <int>
#1 Jan-14 A 12 2014.000 NA NA
#2 Feb-14 A 14 2014.083 12 NA
#3 Mar-14 A 15 2014.167 14 12
#4 Apr-14 A 18 2014.250 15 14
#5 May-14 A 10 2014.333 18 15
#6 Jun-14 A 12 2014.417 10 18
#7 Jul-14 A 13 2014.500 12 10
#8 Aug-14 A 14 2014.583 13 12
#9 Jan-14 B 32 2014.000 NA NA
#10 Feb-14 B 14 2014.083 32 NA
#11 Mar-14 B 15 2014.167 14 32
#12 Apr-14 B 18 2014.250 15 14
#13 May-14 B 20 2014.333 18 15
#14 Jun-14 B 12 2014.417 20 18
#15 Jul-14 B 13 2014.500 12 20
#16 Aug-14 B 14 2014.583 13 12
使用data.table
library(data.table)
setDT(df)[, lag1.Value1:= shift(Value1), ID][, lag2_Value1 := shift(lag1.Value1), ID]
使用data.table
,我们可以在一个步骤中做到这一点,因为shift
采用一个'n'序列
library(data.table)
setDT(df)[, paste0("lag", 1:2, "_Value1") := shift(Value1, 1:2), by = ID]
df
# Time ID Value1 Time1 lag1_Value1 lag2_Value1
# 1: Jan-14 A 12 2014.000 NA NA
# 2: Feb-14 A 14 2014.083 12 NA
# 3: Mar-14 A 15 2014.167 14 12
# 4: Apr-14 A 18 2014.250 15 14
# 5: May-14 A 10 2014.333 18 15
# 6: Jun-14 A 12 2014.417 10 18
# 7: Jul-14 A 13 2014.500 12 10
# 8: Aug-14 A 14 2014.583 13 12
# 9: Jan-14 B 32 2014.000 NA NA
#10: Feb-14 B 14 2014.083 32 NA
#11: Mar-14 B 15 2014.167 14 32
#12: Apr-14 B 18 2014.250 15 14
#13: May-14 B 20 2014.333 18 15
#14: Jun-14 B 12 2014.417 20 18
#15: Jul-14 B 13 2014.500 12 20
#16: Aug-14 B 14 2014.583 13 12