1 首先将题目的数据读入R的环境,绘制原始时序图,观察序列特征
train <- read.csv("~/Desktop/train.csv", head = TRUE, sep = ";")
train_ts <- ts(train, start = 1993, freq = 12)
plot(train_ts[, 2], type = "l", lwd = 2, col = "blue", xlab = "year", ylab = "train data",
main = "Number of trains running from 1993 to 1997", ylim = c(1100, 1350))
2 可以看出原始序列非平稳,呈现出近似线性趋势,所以选择一阶差分,差分后再次绘制新的时序图
# 1st-order differentiation
d_train = ts(diff(train_ts[, 2]))
plot(d_train, type = "l", col = "red", xlab = "year", ylab = expression(Delta ~
"# of trains"), main = "First order differentiation of train numbers")
3 时序图显示,差分后序列在均值附近比较稳定地波动.此时可以计算差分后序列得均值和方差
# The differentiated series has been stable, next we calculate mean and
# variance of d_train
mean_d_train = mean(d_train) # -0.09830508
var_d_train = var(d_train) # 3255.08
4 绘制一阶差分序列的自相关和偏自相关图
par(mfrow = c(1, 2))
acf(d_train, 20, xlim = c(1, 20), main = "ACF figure(d=1)")
pacf(d_train, main = "PACF figure(d=1)")
5 一阶差分的PACF拖尾,考虑MA模型拟合1阶差分后序列,利用\( \hat{ | \rho_k|} \leq \frac{1}{\sqrt{n}}\sqrt{1+2\sum_{i=1}^{q}\hat{\rho}_i^2} \) ,判断\( {\hat{ | \rho_k|}} \)的截尾性,\( M=\lfloor\sqrt{57} \rfloor=7 \)
6 根据计算结果对时间序列进行拟合,从Box检验结果可以看出,残差p=0.4742大于0.05,即残差序列属于白噪声序列.同时,从残差的自相关图看出残差项不存在滞后性
fit <- arima(train[, 2], order = c(0, 1, 3))
Box.test(fit$residuals)
##
## Box-Pierce test
##
## data: fit$residuals
## X-squared = 0.5121, df = 1, p-value = 0.4742
tsdiag(fit)
fit
##
## Call:
## arima(x = train[, 2], order = c(0, 1, 3))
##
## Coefficients:
## ma1 ma2 ma3
## -1.060 0.384 -0.248
## s.e. 0.138 0.152 0.178
##
## sigma^2 estimated as 1250: log likelihood = -295.1, aic = 598.1
7 可以写出ARIMA(0,1,3)的函数表达式为\[ (1-B)x_t =\epsilon_t(1+1.06B-0.38B^2+0.25B^3) \] 8 预测未来一年的火车数量变化趋势
ts.plot(train_ts[, 2], type = "l", lwd = 2, col = "blue", xlab = "year", ylab = "train data",
main = "Prediction of train numbers in 1998", xlim = c(1993, 1999), ylim = c(1100,
1350))
train.pred <- predict(fit, 12)
lines(ts(train.pred$pred, start = 1998, freq = 12), col = "red")
lines(ts(train.pred$pred + 2 * train.pred$se, start = 1998, freq = 12), col = "red",
lty = 3)
lines(ts(train.pred$pred - 2 * train.pred$se, start = 1998, freq = 12), col = "red",
lty = 3)