如何将这些数据转换为arima模型预测的时间序列


 s
      X   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct Nov   Dec
1  2012 24.78 26.82 29.75 31.19 31.87 31.00 28.48 27.39 27.08 26.55 24.36 23.62
2  2013 24.82 26.04 28.83 30.85 32.44 29.70 27.86 27.66 27.73 27.00 24.87 22.94
3  2014 24.01 25.75 29.08 31.83 31.23 33.08 29.88 28.14 27.40 27.11 25.38 24.37
4  2015 24.60 26.11 29.19 30.71 32.69 28.90 29.21 28.24 27.58 27.82 25.37 24.71
5  2016 25.20 27.62 29.51 31.86 32.34 28.64 27.98 28.36 27.12 26.51 25.69 25.12
6  2017 25.28 26.88 29.55 31.88 32.74 29.89 28.41 27.60 27.72 27.23 25.43 24.08
7  2018 24.84 26.47 29.40 31.20 31.10 30.28 28.30 27.33 27.55 27.40 26.98 24.77
8  2019 23.73 26.75 29.57 31.59 32.53 31.30 29.48 27.78 27.54 27.05 25.44 24.46
9  2020 25.41 26.75 29.30 31.37 32.98 30.05 28.23 27.53 27.68 27.01 25.57 22.86
10 2021 24.70 25.90 29.62 31.42 31.68 30.17 28.13 28.08 27.68 27.29 25.59 23.16

如何将其转换为时间序列进行预测?

您可以使用tidyr包中的pivot_longer()函数将其转换为更长的格式。然后ts()函数可以将其转换为时间序列。

# recreate the original data
data1 <- structure(list(X=c(2012,2013,2014,2015,2016,2017,2018,2019,2020,2021),
               Jan=c(24.78,24.82,24.01,24.6,25.2,25.28,24.84,23.73,25.41,24.7),
               Feb=c(26.82,26.04,25.75,26.11,27.62,26.88,26.47,26.75,26.75,25.9),
               Mar=c(29.75,28.83,29.08,29.19,29.51,29.55,29.4,29.57,29.3,29.62),
               Apr=c(31.19,30.85,31.83,30.71,31.86,31.88,31.2,31.59,31.37,31.42),
               May=c(31.87,32.44,31.23,32.69,32.34,32.74,31.1,32.53,32.98,31.68),
               Jun=c(31,29.7,33.08,28.9,28.64,29.89,30.28,31.3,30.05,30.17),
               Jul=c(28.48,27.86,29.88,29.21,27.98,28.41,28.3,29.48,28.23,28.13),
               Aug=c(27.39,27.66,28.14,28.24,28.36,27.6,27.33,27.78,27.53,28.08),
               Sep=c(27.08,27.73,27.4,27.58,27.12,27.72,27.55,27.54,27.68,27.68),
               Oc=c(26.55,27,27.11,27.82,26.51,27.23,27.4,27.05,27.01,27.29),
               Nov=c(24.36,24.87,25.38,25.37,25.69,25.43,26.98,25.44,25.57,25.59),
               Dec=c(23.62,22.94,24.37,24.71,25.12,24.08,24.77,24.46,22.86,23.16)),
          row.names=c(NA,-10L),
          class=c("tbl_df","tbl","data.frame"))
# pivot to longer format
library(tidyr)
data2 <- pivot_longer(data1,-X,values_to='value')
# convert to monthly timeseries starting at Jan 2012 ending at Dec 2021
timeseries <- ts(data2$value,start=2012,end=2021+11/12,frequency=12)

我们假设问题是如何将这个答案末尾的注释中显示的数据形式的数据帧转换为ts对象。特别是,我们假设唯一的NA处于开始阶段,以防它没有在1月开始,和/或处于结束阶段,如果它没有在12月结束。我们还展示了如何轻松地消除这种假设。

1(此备选方案仅为一行代码。删除年份列后,使用t将其转置,使用c将其分解为向量,然后指定适当的起始年份s[1, 1]和频率12。我们假设,如果它不是在1月开始或12月结束,那么它以NA开始和/或结束,所以用na.omit删除它们——如果我们知道它在1月结束,我们可以选择删除下面代码中的na.omit。未使用任何程序包。

(如果内部也有NA,则使用动物园包装中的na.trim代替na.omit。(

na.omit(ts(c(t(s[, -1])), start = s[1, 1], frequency = 12))

2(另一种方法是将输入s转换为zoo对象,然后使用as.ts。首先使用read.zoos转换为12列zoo对象,然后将其融化为具有与年、月和值对应的列的长数据帧。我们把它读进了一个动物园的物体中,并对其进行了年鉴索引和修剪。最后将其转换为ts对象。虽然时间更长,但这一个避免了显式处理起始值和频率,并且可以很好地处理管道。如果我们知道该系列从1月开始到12月结束,我们可以省略na.trim步骤。

library(zoo)
# given a data frame, x, with year in first column and abbreviated
#   month in second column return a yearmon object
to_ym <- function(x) as.yearmon(paste(x[[1]], x[[2]]), "%Y %b")
s |>
  read.zoo() |>
  fortify.zoo(melt = TRUE) |>
  read.zoo(index = 1:2, FUN = to_ym) |>
  na.trim() |>
  as.ts()

备注

s <- structure(list(X = 2012:2021, Jan = c(24.78, 24.82, 24.01, 24.6, 
25.2, 25.28, 24.84, 23.73, 25.41, 24.7), Feb = c(26.82, 26.04, 
25.75, 26.11, 27.62, 26.88, 26.47, 26.75, 26.75, 25.9), Mar = c(29.75, 
28.83, 29.08, 29.19, 29.51, 29.55, 29.4, 29.57, 29.3, 29.62), 
    Apr = c(31.19, 30.85, 31.83, 30.71, 31.86, 31.88, 31.2, 31.59, 
    31.37, 31.42), May = c(31.87, 32.44, 31.23, 32.69, 32.34, 
    32.74, 31.1, 32.53, 32.98, 31.68), Jun = c(31, 29.7, 33.08, 
    28.9, 28.64, 29.89, 30.28, 31.3, 30.05, 30.17), Jul = c(28.48, 
    27.86, 29.88, 29.21, 27.98, 28.41, 28.3, 29.48, 28.23, 28.13
    ), Aug = c(27.39, 27.66, 28.14, 28.24, 28.36, 27.6, 27.33, 
    27.78, 27.53, 28.08), Sep = c(27.08, 27.73, 27.4, 27.58, 
    27.12, 27.72, 27.55, 27.54, 27.68, 27.68), Oct = c(26.55, 
    27, 27.11, 27.82, 26.51, 27.23, 27.4, 27.05, 27.01, 27.29
    ), Nov = c(24.36, 24.87, 25.38, 25.37, 25.69, 25.43, 26.98, 
    25.44, 25.57, 25.59), Dec = c(23.62, 22.94, 24.37, 24.71, 
    25.12, 24.08, 24.77, 24.46, 22.86, 23.16)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

最新更新