Processing math: 100%

时间序列分析:ARIMA 建模示例

1. 一般流程

  对时间序列的分析和建模一般有一下几个步骤:

  1. 查看时间序列的图像:
    1.1. 如果具有趋势,可能需要对序列进行差分;
    1.2. 如果方差随时间变化,可能需要对序列进行转换,如 log 变换。先进行 log 变换再做差分的变换称为 Log-Return。
  2. 查看 ACF 图像,判断滑动平均的阶数。
  3. 查看 PACF 图像,判断自回归阶数。
  4. 选择若干组参数进行建模,可以使用如下指标来选择模型:
    • SSE:评价拟合质量;
    • AIC:综合评价拟合质量和模型复杂度;
    • Ljung-Box Q 统计量:判断残差是否有自相关。
  5. 进行估计。

2. 一个例子

  数据可以在这里下载到。

2.1. 绘制图像

  首先读取数据并绘制图像如图 1 所示:

birth.data <- read.csv("./data/daily-total-female-births-CA.csv")
birth.num <- birth.data$births
birth.date <- as.Date(birth.data$date, "%Y-%m-%d")
plot(birth.num ~ birth.date, type='l')
birth.data <- read.csv("./data/daily-total-female-births-CA.csv") birth.num <- birth.databirthsbirth.date<as.Date(birth.datadate, "%Y-%m-%d") plot(birth.num ~ birth.date, type='l')
birth.data <- read.csv("./data/daily-total-female-births-CA.csv")
birth.num <- birth.data$births
birth.date <- as.Date(birth.data$date, "%Y-%m-%d")
plot(birth.num ~ birth.date, type='l')

图 1

可以看到数据似乎有一些趋势。对序列进行 Ljung-Box 检验

Box.test(birth.num, lag=log(length(birth.num)))
Box.test(birth.num, lag=log(length(birth.num)))
Box.test(birth.num, lag=log(length(birth.num)))

结果为

Box-Pierce test
data: birth.num
X-squared = 36.391, df = 5.8999, p-value = 2.088e-06
Box-Pierce test data: birth.num X-squared = 36.391, df = 5.8999, p-value = 2.088e-06
  Box-Pierce test

data:  birth.num
X-squared = 36.391, df = 5.8999, p-value = 2.088e-06

可见 p=2.088×106<0.05,拒绝零假设,认为序列存在自相关。

2.2. 进行差分

  为了移除趋势,对序列进行差分,绘制图像如图 2 所示:

birth.num.diff <- diff(birth.num)
plot(birth.num.diff ~ birth.date[1:364], type='l')
birth.num.diff <- diff(birth.num) plot(birth.num.diff ~ birth.date[1:364], type='l')
birth.num.diff <- diff(birth.num)
plot(birth.num.diff ~ birth.date[1:364], type='l')

图 2

可见趋势基本消失了。再进行 Ljung-Box 检验

Box.test(birth.num.diff, lag=log(length(birth.num.diff)))
Box.test(birth.num.diff, lag=log(length(birth.num.diff)))
Box.test(birth.num.diff, lag=log(length(birth.num.diff)))

结果为

Box-Pierce test
data: birth.num.diff
X-squared = 78.094, df = 5.8972, p-value = 7.661e-15
Box-Pierce test data: birth.num.diff X-squared = 78.094, df = 5.8972, p-value = 7.661e-15
  Box-Pierce test

data:  birth.num.diff
X-squared = 78.094, df = 5.8972, p-value = 7.661e-15

可见 p=7.661×1015<0.05,拒绝零假设,序列仍存在自相关。

2.3. 查看 ACF 和 PACF

  绘制 ACF 和 PACF 图像如图 3、图 4 所示:

acf(birth.num.diff, 50)
pacf(birth.num.diff, 50)
acf(birth.num.diff, 50) pacf(birth.num.diff, 50)
acf(birth.num.diff, 50)
pacf(birth.num.diff, 50)

图 3

图 4

可见 ACF 在滞后为 1 处截断,PACF 像是在衰减,也像是在延迟为 7 处截断。

2.4. 模型比较

  结合分析,使用以下 4 组模型进行拟合:

params <- list(c(0, 1, 1), c(0, 1, 2), c(7, 1, 1), c(7, 1, 2))
results <- data.frame()
for (i in 1:length(params)) {
param <- params[[i]]
model <- arima(birth.num, order=param)
test <- Box.test(model$residuals, lag=log(length(model$residuals)))
result = data.frame(model=paste("ARIMA(", param[1], ",", param[2], ",", param[3], ")", sep=""),
AIC=model$aic, SSE=sum(resid(model)^2), P.value=test$p.value)
results <- rbind(results, result)
}
params <- list(c(0, 1, 1), c(0, 1, 2), c(7, 1, 1), c(7, 1, 2)) results <- data.frame() for (i in 1:length(params)) { param <- params[[i]] model <- arima(birth.num, order=param) test <- Box.test(modelresiduals,lag=log(length(modelresiduals))) result = data.frame(model=paste("ARIMA(", param[1], ",", param[2], ",", param[3], ")", sep=""), AIC=modelaic,SSE=sum(resid(model)2),P.value=testp.value) results <- rbind(results, result) }
params <- list(c(0, 1, 1), c(0, 1, 2), c(7, 1, 1), c(7, 1, 2))
results <- data.frame()
for (i in 1:length(params)) {
    param <- params[[i]]
    model <- arima(birth.num, order=param)
    test <- Box.test(model$residuals, lag=log(length(model$residuals)))
    result = data.frame(model=paste("ARIMA(", param[1], ",", param[2], ",", param[3], ")", sep=""), 
                        AIC=model$aic, SSE=sum(resid(model)^2), P.value=test$p.value)
    results <- rbind(results, result)
}

结果为

model AIC SSE P.value
ARIMA(0,1,1) 2462.221 18148.46 0.5333604
ARIMA(0,1,2) 2459.571 17914.65 0.9859227
ARIMA(7,1,1) 2464.883 17584.39 0.9999899
ARIMA(7,1,2) 2466.666 17574.06 0.9999929
model AIC SSE P.value ARIMA(0,1,1) 2462.221 18148.46 0.5333604 ARIMA(0,1,2) 2459.571 17914.65 0.9859227 ARIMA(7,1,1) 2464.883 17584.39 0.9999899 ARIMA(7,1,2) 2466.666 17574.06 0.9999929
model           AIC         SSE         P.value
ARIMA(0,1,1)    2462.221    18148.46    0.5333604
ARIMA(0,1,2)    2459.571    17914.65    0.9859227
ARIMA(7,1,1)    2464.883    17584.39    0.9999899
ARIMA(7,1,2)    2466.666    17574.06    0.9999929

选择 AIC 最小的模型,即 ARIMA(0,1,2)

2.5. 最终模型

  最后使用 astsa 包的 sarima 函数进行拟合:

library(astsa)
sarima(birth.num, 0, 1, 2, 0, 0, 0)
library(astsa) sarima(birth.num, 0, 1, 2, 0, 0, 0)
library(astsa)

sarima(birth.num, 0, 1, 2, 0, 0, 0)

得到系数为

Coefficients:
ma1 ma2 constant
-0.8511 -0.1113 0.015
s.e. 0.0496 0.0502 0.015
Coefficients: ma1 ma2 constant -0.8511 -0.1113 0.015 s.e. 0.0496 0.0502 0.015
Coefficients:
          ma1      ma2  constant
      -0.8511  -0.1113     0.015
s.e.   0.0496   0.0502     0.015

故最终的模型为

(1B)Xt=0.015+et0.8511et10.1113et2

Xt=Xt1+0.015+et0.8511et10.1113et2