上一个问题的答案是:由于R版本,在R中外推时间序列数据对我来说不起作用。
我有一个数据帧NEI_othertier1_long
,看起来像这样:
state pollutant Sector Fuel description year value
AK Ammonia Refining Diesel industrial 2008 1.18
AK Ammonia Refining Diesel industrial 2009 NA
AK Ammonia Refining Diesel industrial 2010 NA
AK Ammonia Refining Diesel industrial 2011 5.76
AK Ammonia Refining Diesel industrial 2012 NA
AK Ammonia Refining Diesel industrial 2013 NA
AK Ammonia Refining Diesel industrial 2014 5.83
AK Ammonia Refining Diesel industrial 2015 NA
AK Ammonia Refining Diesel industrial 2016 NA
AK Ammonia Refining Diesel industrial 2017 8.96
AK Ammonia Refining Diesel industrial 2018 NA
AK Ammonia Refining Diesel industrial 2019 NA
我有2008年、2011年、2014年和2017年的数值。我已经能够使用以下代码成功地线性插值2009-2016:
NEI_othertier1_long %>%
dplyr::mutate( value = na.approx(value, na.rm = FALSE, rule = 2) ) -> NEI_othertier1_interpolated
但插值法将2017年的数值提前到2018年和2019年。我想从前几年线性推断2018年和2019年的数值。
我有R版本3.5.2(无法更新(,所以无法安装latticeExtra
,Hmisc
依赖它来使用approxExtrap
功能。
感谢您的帮助!
dput(head(NEI_othertier1_long;(,污染物=c("氨"、"氨","氨";,"氨"氨"氨"氨"氨";,"氨"氨"氨"(,CEDS_ Sector=c("1A1b_;,"1A1b_"1A1b_"1A1b_;,"1A1b_"1A1b_;,"1A1b_"1A1b_"1A1b_;,"1A1b_"1A1b_"1A1b_,燃料=c("柴油油";,"diesel_oil"diesel_oil"diesel_oil"diesel_oil"diesel_oil";,"diesel_oil"diesel_oil"diesel_oil"diesel_oil"diesel_oil"diesel_oil";),tier1_description=c("FUEL COMB.INDUSTRIAL","FUEL COMB.INDUSTRIAL";,"燃料梳。"工业"燃料梳。"工业"燃料梳。"工业";,"燃料梳。"工业"燃料梳。"工业";,"燃料梳。"工业"燃料梳。"工业"燃料梳。"工业";,"燃料梳。"工业";,"燃料梳。INDUSTRIAL"(,单位=c("TON";,"TON"TON"TON"TON"TON";,"TON"TON"TON"(,年份=2008:2019,排放量=c(1.18,NA,NA,5.76,NA,NA,5.83,NA,NA,8.96,NA(,行名称=c(NA,-12L(,类=c("grouped_df","tbl_df","tbl","data.frame"(,组=结构(列表(状态="AK";,污染物=";氨";,CEDS_ Sector=";"1A1b_;,燃料="CEDS_;diesel_oil";,tier1_ description=";燃料梳。"工业";,单位=";TON"。rows=list(1:12((,row.names=c(NA,-1L(,class=c("tbl_df","tbl","data.frame"(,.drop=TRUE(
approxExtrap
只是approx
的包装器,因此您可以复制函数定义并使用它。
NEI_othertier1_long %>% dplyr::mutate(x = approxExtrap(year, value, year, na.rm = TRUE)$y)
如果你找不到approxExtrap
,这是它:
approxExtrap <- function (x, y, xout, method = "linear", n = 50, rule = 2, f = 0,
ties = "ordered", na.rm = FALSE)
{
if (is.list(x)) {
y <- x[[2]]
x <- x[[1]]
}
if (na.rm) {
d <- !is.na(x + y)
x <- x[d]
y <- y[d]
}
d <- !duplicated(x)
x <- x[d]
y <- y[d]
d <- order(x)
x <- x[d]
y <- y[d]
w <- approx(x, y, xout = xout, method = method, n = n, rule = 2,
f = f, ties = ties)$y
r <- range(x)
d <- xout < r[1]
if (any(is.na(d)))
stop("NAs not allowed in xout")
if (any(d))
w[d] <- (y[2] - y[1])/(x[2] - x[1]) * (xout[d] - x[1]) +
y[1]
d <- xout > r[2]
n <- length(y)
if (any(d))
w[d] <- (y[n] - y[n - 1])/(x[n] - x[n - 1]) * (xout[d] -
x[n - 1]) + y[n - 1]
list(x = xout, y = w)
}