R数据表创建了前五年的平均值



我有一个变量的数据,我想获得当前水平与前5年同月同一变量的平均值之间的差异。

library(tidyverse)
library(data.table)
library(lubridate)
MWE <- as.data.table(ggplot2::economics) %>%
.[,c("pce","psavert","uempmed","unemploy"):=NULL]
> MWE
date      pop
1: 1967-07-01 198712.0
2: 1967-08-01 198911.0
3: 1967-09-01 199113.0
4: 1967-10-01 199311.0
5: 1967-11-01 199498.0
---                    
570: 2014-12-01 319746.2
571: 2015-01-01 319928.6
572: 2015-02-01 320074.5
573: 2015-03-01 320230.8
574: 2015-04-01 320402.3

我可以按月完成,但我很难将引用到当前行来完成类似year(date) < year(currentline) & year(date) >= year(currentline)-6的操作

MWE_2 <- MWE[,MeanPastYears:=mean(pop),by=month(date)]

我想要的输出是

date      pop      avg_5yrs
1: 1967-07-01 198712.0     NA
2: 1967-08-01 198911.0     NA
3: 1967-09-01 199113.0     NA
4: 1967-10-01 199311.0     NA
5: 1967-11-01 199498.0     NA
---                    
570: 2014-12-01 319746.2   313013.8
571: 2015-01-01 319928.6   313192.1
572: 2015-02-01 320074.5   313350.7
573: 2015-03-01 320230.8   313511.2
574: 2015-04-01 320402.3   313640.3

[内的列可以作为向量进行索引,因此我们首先为每行year(date) < year(date[..I]) & year(date) >= year(date[..I]) - 6创建一个向量,当日期在间隔中时该向量为true,然后按月获得pop的平均值:

df[,
year:=year(date)
][, 
avg_5yrs := sapply(1:.N, function(..I) mean(pop[year < year[..I] & year >= year[..I] -6])), by=month(date)
][, year:=NULL][]
date      pop avg_5yrs
1: 1967-07-01 198712.0      NaN
2: 1967-08-01 198911.0      NaN
3: 1967-09-01 199113.0      NaN
4: 1967-10-01 199311.0      NaN
5: 1967-11-01 199498.0      NaN
---                             
570: 2014-12-01 319746.2 311845.5
571: 2015-01-01 319928.6 312028.1
572: 2015-02-01 320074.5 312192.6
573: 2015-03-01 320230.8 312357.4
574: 2015-04-01 320402.3 312498.1

相关内容

  • 没有找到相关文章

最新更新