AR(2) model: \[ Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t \]
\[S_c(\phi_1,\phi_2) = \sum_{t=3}^{n} [(Y_t-\mu) - \phi_1(Y_{t-1}-\mu) - \phi_2(Y_{t-2}-\mu)]\]
The implementation in R:
AR2LS = function(Y){
stopifnot(is.vector(Y))
Yt_1 = L(Y,1) # create lag data
Yt_2 = L(Y,2)
m = lm(Y ~ Yt_1 + Yt_2)
ret = list()
ret$coefficients = m$coefficients
m_ = mean(Y)
phi1 = ret$coefficients[1]
phi2 = ret$coefficients[2]
ret$cond_SS = 0
for (t in 3:length(Y)){
ret$cond_SS = ret$cond_SS + ((Y[t] - m_) - phi1*(Y[t-1] - m_) - phi2*(Y[t-2] - m_))
}
ret$cond_SS = abs(ret$cond_SS)
n_c = (length(Y) - 3 + 1)
ret$est_std = 10*sqrt(ret$cond_SS/(n_c - 2))
return(ret)
}
Choice of parameter \(\phi_1 = 0.1\) and \(\phi_2 = 0.5\), we have:
The fitted model:
## (Intercept) Yt_1 Yt_2
## 0.49749162 -0.01693975 0.15124382
## [1] "Conditional sum squared: 2.23659213986202"
## [1] "Estimation of STD: 2.9329639378976"
We can see the coefficients very different from original parameters.
We do the experiment for 100 time, this is the histogram of \(\phi_1\) and \(\phi_2\):
## [1] "For phi1: mean= -0.004507012473987 Standard error= 0.291791919054609"
## [1] "For phi2: mean= 0.0224131438882782 Standard error= 0.208758149698439"
Now we try the estimation with n=100:
## [1] "For phi1: mean= -0.0120800459553815 Standard error= 0.130945324700548"
## [1] "For phi2: mean= 0.0896249025000784 Standard error= 0.109600880444313"
We can see the estimation significantly improved when increasing n to 100, this is indicated by the decreasing by half in standard error. The explanation for this phenomenon is the “law of large number” as well as the time seres will reveal more information with more and more sample.
We have: \[ \begin{aligned} & f_{Y_T,Y_{T-1},...,Y_1}(y_T,y_{T-1},...,y_1) = f_{Y_1,Y_2}(y_1,y_2) * \Pi_{t=3}^T f_{Y_t/Y_1,Y_2}(y_t/y_1,y_2) \\ & Y_3 = \phi_1 Y_2 + \phi_2 Y_1 + \varepsilon_3 \\ & \text{Hence:} \\ & (Y_3/Y_2 = y_2, Y_1 = y_1) \sim N(\phi_1y_2 + \phi_2y_1 + \varepsilon_e^2) \\ \end{aligned} \] Because \(Y_1\) and \(Y_2\) have determined values, then \(Y_3\) become linear combination of 2 normal distribution, results another normal distribution. The process repeated for every t from 1 to T.
In general, we have \((Y_T/Y_2 = y_2, Y_1 = y_1) \sim N(\phi_1y_2 + \phi_2y_1 + \varepsilon_e^2)\) as a substitution, which give me the result as following: \[ \begin{aligned} & f_{Y_T,Y_{T-1},...Y_1}(y_T,y_{T-1},...,y_1) = \\ & \left( \frac{1}{2\pi\sigma_e^2\sqrt{1-\rho_1^2}} \right) * exp\left( -\frac{1}{2k\sigma_e^2}(y_1^2 + y_2^2 - 2\rho_1y_1y_2)\right)* \Pi_{t=3}^T \frac{1}{\sqrt{2\pi\sigma_e^2}}exp\left( -\frac{1}{2\sigma_e^2}(y_t - (\Phi_1y_1+\Phi_2y_2))^2 \right) = \\ & \left(\frac{1}{2\pi\sigma_e^2\sqrt{1-\rho_1^2}}\right)\left(\frac{1}{\sqrt{2\pi\sigma_e^2}}\right)^{T-3} * exp\left( -\frac{k(1-\rho_1^2)}{2k\sigma_e^2(1-\rho_1^2)} * \sum_{t=3}^T(y_t - (\Phi_1y_1+\Phi_2y_2))^2 - (y_1^2 + y_2^2 - 2\rho_1y_1y_2)\right) \end{aligned} \] where: \[ S(\Phi_1,\Phi_2) = \frac{\sum_{t=3}^T[(y_t-(\Phi_1y_1 + \Phi_2y_2))]^2 - (y_1^2 + y_2^2 - 2\rho_1y_1y_2)}{2k\sigma_e^2(1-\rho_1^2)} \]
With n = 100, we have:
## [1] "AR2LS model coefficients:"
## (Intercept) Yt_1 Yt_2
## 0.02777017 -0.15497804 0.42960542
## [1] "arima model coefficients:"
## ar1 ar2 intercept
## -0.15654658 0.43957714 0.04439706
## [1] "For AR2LS: Standard error= 1.51698730082058"
## [1] "For arima: Standard error= 0.854337316367242"
Both model give approximately similar performance in term of standar error.
In this task, 75% of data was used for training and 25% for validation on model performance.
## Year Month Electricity.consumption
## 1 1990 1 15249
## 2 1990 2 13211
## 3 1990 3 14548
## 4 1990 4 12803
## 5 1990 5 11845
Next, we fit the linear model to data:
##
## Call:
## lm(formula = trainTS ~ time(trainTS))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4008.7 -1787.5 160.5 1650.4 3559.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -125720.72 48487.82 -2.593 0.0101 *
## time(trainTS) 69.42 24.25 2.863 0.0046 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1969 on 223 degrees of freedom
## Multiple R-squared: 0.03544, Adjusted R-squared: 0.03112
## F-statistic: 8.194 on 1 and 223 DF, p-value: 0.004603
Residuals of linear model:
For the model with rather few parameters (in line with the KISS prin- ciple): From the EACF, we have:
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o o o o o o o o o o x x o
## 1 x o o o o o o o o o o x x o
## 2 x o o o o o o o o o o x x o
## 3 x x x o x o o o o o o x x o
## 4 x x x o x o o o o o o x x x
## 5 x x o x o o o o o o o x x o
## 6 x x o o x o o o o o o x o o
## 7 x x o o o o o o o o o x o o
Hence, we have the model:
model1 <- arima(trainTS, order=c(0,1,1),
seasonal=list(order=c(0,1,1), period=12))
For the model with with rather many (possibly overfitting), we simply increase the number of parameters: (we need to do something here to explain our choice of following params)
# model2 <- arima(trainTS, order=c(3,1,5),
# seasonal=list(order=c(2,1,4), period=12))
For the KISS model:
##
## Call:
## arima(x = trainTS, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))
##
## Coefficients:
## ma1 sma1
## -0.4025 -0.8734
## s.e. 0.0858 0.0559
##
## sigma^2 estimated as 312039: log likelihood = -1650.49, aic = 3304.99
##
## Training set error measures:
## Warning in trainingaccuracy(f, test, d, D): test elements must be within
## sample
## ME RMSE MAE MPE MAPE
## Training set NaN NaN NaN NaN NaN
For the overfitting model:
AIC of 2 models:
### c) Now construct lead 1 predictions for all observations in the validation data by using Arima() in the forecast-library … First, one-step-ahead predictions for all observations in the validation data:
The residuals of the model:
The ACF and QQ-Plot of residuals: