我有一个变量的数据,我想获得当前水平与前5年同月同一变量的平均值之间的差异。
library(tidyverse)
library(data.table)
library(lubridate)
MWE <- as.data.table(ggplot2::economics) %>%
.[,c("pce","psavert","uempmed","unemploy"):=NULL]
> MWE
date pop
1: 1967-07-01 198712.0
2: 1967-08-01 198911.0
3: 1967-09-01 199113.0
4: 1967-10-01 199311.0
5: 1967-11-01 199498.0
---
570: 2014-12-01 319746.2
571: 2015-01-01 319928.6
572: 2015-02-01 320074.5
573: 2015-03-01 320230.8
574: 2015-04-01 320402.3
我可以按月完成,但我很难将引用到当前行来完成类似year(date) < year(currentline) & year(date) >= year(currentline)-6
的操作
MWE_2 <- MWE[,MeanPastYears:=mean(pop),by=month(date)]
我想要的输出是
date pop avg_5yrs
1: 1967-07-01 198712.0 NA
2: 1967-08-01 198911.0 NA
3: 1967-09-01 199113.0 NA
4: 1967-10-01 199311.0 NA
5: 1967-11-01 199498.0 NA
---
570: 2014-12-01 319746.2 313013.8
571: 2015-01-01 319928.6 313192.1
572: 2015-02-01 320074.5 313350.7
573: 2015-03-01 320230.8 313511.2
574: 2015-04-01 320402.3 313640.3
[
内的列可以作为向量进行索引,因此我们首先为每行year(date) < year(date[..I]) & year(date) >= year(date[..I]) - 6
创建一个向量,当日期在间隔中时该向量为true,然后按月获得pop
的平均值:
df[,
year:=year(date)
][,
avg_5yrs := sapply(1:.N, function(..I) mean(pop[year < year[..I] & year >= year[..I] -6])), by=month(date)
][, year:=NULL][]
date pop avg_5yrs
1: 1967-07-01 198712.0 NaN
2: 1967-08-01 198911.0 NaN
3: 1967-09-01 199113.0 NaN
4: 1967-10-01 199311.0 NaN
5: 1967-11-01 199498.0 NaN
---
570: 2014-12-01 319746.2 311845.5
571: 2015-01-01 319928.6 312028.1
572: 2015-02-01 320074.5 312192.6
573: 2015-03-01 320230.8 312357.4
574: 2015-04-01 320402.3 312498.1