The aim of this investigation is to find the best distributed lag model (DLM) for forecasting the ASX price index. The dataset used in this report is ASX_data
which contains information on the monthly asx index as well as gold, oil and copper prices. Firstly, a time series will be created for each of the variables. Then, we will examine these by looking at their stationarity and decompositions. After exploring the relationship between these variables, we will proceed to modelling. Four types of DLMs will be used including finite lag DLM, polynomial DLM, Koyck DLM and autoregressive DLM. The best model will be decided based on significance of the model and terms, adjusted R square, and residual analysis.
The data set ASX_data
used for analysis includes monthly averages of ASX all ordinaries (Ords) price index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from 2004 to 2017.
asx.data = read_csv("ASX_data.csv")
## Parsed with column specification:
## cols(
## `ASX price` = col_double(),
## `Gold price` = col_number(),
## `Crude Oil (Brent)_USD/bbl` = col_double(),
## `Copper_USD/tonne` = col_number()
## )
class(asx.data)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
head(asx.data)
## # A tibble: 6 x 4
## `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
## <dbl> <dbl> <dbl> <dbl>
## 1 2935. 612. 31.3 1650
## 2 2778. 603. 32.6 1682
## 3 2849. 566. 30.3 1656
## 4 2971. 539. 25.0 1588
## 5 2980. 549. 25.8 1651
## 6 3000. 536. 27.6 1685
Object asx.data
is a data frame and needed to be converted to time series.
asx = ts(asx.data$`ASX price`,start=c(2004,1), frequency=12)
gold = ts(asx.data$`Gold price`,start=c(2004,1), frequency=12)
oil = ts(asx.data$`Crude Oil (Brent)_USD/bbl`,start=c(2004,1), frequency=12)
copper = ts(asx.data$`Copper_USD/tonne`,start=c(2004,1), frequency=12)
asx.ts=ts(asx.data,start=c(2004,1), frequency=12)
Five time series were generated from data frame asx.data
.
plot(asx, xlab='Year', main = "Time series plot of monthly ASX price")
points(y=asx,x=time(asx), pch=as.vector(season(asx)))
Figure 1
From Figure 1
, we can analyse the time series plot of the monthly ASX all ordinaries (Ords) price index. There is an upward trend and an intervention point near 2008. There is no obvious seasonality. We can observe changing variance, moving average and succeeding points indicating autoregressive behaviour.
plot(gold, xlab='Year', main = "Time series plot of monthly gold price")
points(y=gold,x=time(gold), pch=as.vector(season(gold)))
Figure 2
From Figure 2
, we can analyse the time series plot of the monthly gold price. There is an upward trend and no obvious intervention point. There is no obvious seasonality. We can observe changing variance, moving average and succeeding points indicating autoregressive behaviour.
plot(oil, xlab='Year', main = "Time series plot of monthly oil price", ylab="USD/bbl")
points(y=oil,x=time(oil), pch=as.vector(season(oil)))
Figure 3
From Figure 3
, we can analyse the time series plot of the monthly oil price. There are trends and an intervention point near 2008. There is no obvious seasonality. We can observe changing variance, moving average and succeeding points indicating autoregressive behaviour.
plot(copper, xlab='Year', main = "Time series plot of monthly copper price", ylab="USD/tonne")
points(y=copper,x=time(copper), pch=as.vector(season(copper)))
Figure 4
From Figure 4
, we can analyse the time series plot of the monthly copper price. There are trends and an intervention point near 2008. There is no obvious seasonality. We can observe changing variance, moving average and succeeding points indicating autoregressive behaviour.
asx.scaled = scale(asx.ts)
plot(asx.scaled, plot.type="s",col = c("black", "red", "blue", "green"), main = "Monthly ASX data")
legend("topleft",lty=1, text.width =1.5, col=c("black", "red", "blue", "green"), c("ASX", "Gold", "Oil", "Copper"))
Figure 5
Figure 6
shows all four time series plotted together after they are scaled and centered.
acf(asx, lag.max = 48, main="Sample ACF for monthly ASX price")
Figure 6
adf.test(asx)
##
## Augmented Dickey-Fuller Test
##
## data: asx
## Dickey-Fuller = -2.6995, Lag order = 5, p-value = 0.2846
## alternative hypothesis: stationary
Figure 6
shows the ACF plot for monthly ASX price:
asx
.acf(gold, lag.max = 48, main="Sample ACF for monthly gold price")
Figure 7
adf.test(gold)
##
## Augmented Dickey-Fuller Test
##
## data: gold
## Dickey-Fuller = -1.8369, Lag order = 5, p-value = 0.6444
## alternative hypothesis: stationary
Figure 7
shows the ACF plot for monthly gold price:
gold
.acf(oil, lag.max=48, main="Sample ACF for monthly oil price")
Figure 8
adf.test(oil)
##
## Augmented Dickey-Fuller Test
##
## data: oil
## Dickey-Fuller = -1.8523, Lag order = 5, p-value = 0.6379
## alternative hypothesis: stationary
Figure 8
shows the ACF plot for monthly oil price:
oil
.acf(copper, lag.max = 48, main="Sample ACF for monthly copper price")
Figure 9
adf.test(copper)
##
## Augmented Dickey-Fuller Test
##
## data: copper
## Dickey-Fuller = -2.2502, Lag order = 5, p-value = 0.472
## alternative hypothesis: stationary
Figure 9
shows the ACF plot for monthly copper price:
copper
.Due to the existence of an intervention point, STL decomposition is used in the following code to decompose each time series. This is because STL decomposition is more robust to outliers, which can be a possible cause for the intervention point.
asx.decom <- stl(asx, t.window=15, s.window="periodic", robust=TRUE)
plot(asx.decom, main="STL decomposition of ASX price time series")
Figure 10
Figure 10
shows the STL decomposition of ASX price:
gold.decom <- stl(gold, t.window=15, s.window="periodic", robust=TRUE)
plot(gold.decom, main="STL decomposition of gold price time series")
Figure 11
Figure 11
shows the STL decomposition of ASX price:
gold
is caused by factors other than seasonality and trend.oil.decom <- stl(oil, t.window=15, s.window="periodic", robust=TRUE)
plot(oil.decom, main="STL decomposition of oil price time series")
Figure 12
Figure 12
shows the STL decomposition of ASX price:
copper.decom <- stl(copper, t.window=15, s.window="periodic", robust=TRUE)
plot(copper.decom, main="STL decomposition of copper price time series")
Figure 13
Figure 13
shows the STL decomposition of ASX price:
cor(asx.ts)
## ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price 1.0000000 0.3431908 0.3290338
## Gold price 0.3431908 1.0000000 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338 0.4366382 1.0000000
## Copper_USD/tonne 0.5617864 0.5364213 0.8664296
## Copper_USD/tonne
## ASX price 0.5617864
## Gold price 0.5364213
## Crude Oil (Brent)_USD/bbl 0.8664296
## Copper_USD/tonne 1.0000000
From the correlation matrix above:
Copper_USD/tonne
and Crude Oil (Brent)_USD/bbl
, therefore, they should not be used as independent variables at the same time.Copper_USD/tonne
and ASX price
.Copper_USD/tonne
and Gold price
should be used as independent variables with ASX price
being the dependent variable.colnames(asx.data)<-c("y","x1","x2","x3")
for ( i in 1:12){
model1.1 = dlm(formula=y~x1+x3, data=data.frame(asx.data), q = i )
cat("q = ", i, "AIC = ", AIC(model1.1$model), "BIC = ", BIC(model1.1$model),"\n")
}
## q = 1 AIC = 2577.989 BIC = 2596.44
## q = 2 AIC = 2564.871 BIC = 2589.423
## q = 3 AIC = 2551.583 BIC = 2582.209
## q = 4 AIC = 2538.084 BIC = 2574.759
## q = 5 AIC = 2523.864 BIC = 2566.562
## q = 6 AIC = 2509.866 BIC = 2558.561
## q = 7 AIC = 2495.447 BIC = 2550.112
## q = 8 AIC = 2481.403 BIC = 2542.012
## q = 9 AIC = 2467.268 BIC = 2533.793
## q = 10 AIC = 2454.097 BIC = 2526.512
## q = 11 AIC = 2440.381 BIC = 2518.658
## q = 12 AIC = 2426.737 BIC = 2510.848
The finite distributed lag model with the lowest AIC and BIC is the model with q=12.
model1 = dlm(formula=y~x1+x3, data=data.frame(asx.data), q = 12)
summary(model1)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1207.47 -600.48 80.48 521.41 1620.51
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.013e+03 2.698e+02 14.872 <2e-16 ***
## x1.t -8.089e-01 1.290e+00 -0.627 0.532
## x1.1 2.345e-01 1.883e+00 0.125 0.901
## x1.2 1.681e-01 1.916e+00 0.088 0.930
## x1.3 1.664e-01 1.922e+00 0.087 0.931
## x1.4 -2.845e-01 1.910e+00 -0.149 0.882
## x1.5 3.780e-01 1.916e+00 0.197 0.844
## x1.6 -1.767e-01 1.946e+00 -0.091 0.928
## x1.7 5.962e-01 1.954e+00 0.305 0.761
## x1.8 -4.992e-01 1.932e+00 -0.258 0.797
## x1.9 1.807e-01 1.958e+00 0.092 0.927
## x1.10 -4.389e-03 1.971e+00 -0.002 0.998
## x1.11 5.442e-02 1.964e+00 0.028 0.978
## x1.12 2.967e-01 1.292e+00 0.230 0.819
## x3.t 1.306e-01 1.467e-01 0.890 0.375
## x3.1 2.777e-02 2.390e-01 0.116 0.908
## x3.2 4.097e-02 2.395e-01 0.171 0.864
## x3.3 1.752e-02 2.354e-01 0.074 0.941
## x3.4 4.280e-03 2.344e-01 0.018 0.985
## x3.5 -2.112e-02 2.379e-01 -0.089 0.929
## x3.6 3.132e-02 2.392e-01 0.131 0.896
## x3.7 2.104e-04 2.404e-01 0.001 0.999
## x3.8 -1.440e-02 2.428e-01 -0.059 0.953
## x3.9 -5.916e-02 2.445e-01 -0.242 0.809
## x3.10 -3.589e-02 2.426e-01 -0.148 0.883
## x3.11 1.175e-02 2.385e-01 0.049 0.961
## x3.12 -4.120e-02 1.510e-01 -0.273 0.785
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 762.4 on 122 degrees of freedom
## Multiple R-squared: 0.1935, Adjusted R-squared: 0.02168
## F-statistic: 1.126 on 26 and 122 DF, p-value: 0.3234
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2426.737 2510.848
Finite distributed lag model model1
has q=12, there is no significant term at 5% level of significance. The adjusted R-squared for model1
is 0.02168, which means that this model only explains 2.168% of the variability in asx price. The overall model has a p value of 0.3234 (>0.05), therefore it is not significant.
checkresiduals(model1$model)
Figure 14
##
## Breusch-Godfrey test for serial correlation of order up to 30
##
## data: Residuals
## LM test = 138.19, df = 30, p-value = 7.96e-16
shapiro.test(residuals(model1$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model1$model)
## W = 0.95541, p-value = 0.0001003
Figure 14
shows the residual plots for model1
:
VIF.model1=vif(model1$model)
VIF.model1
## x1.t x1.1 x1.2 x1.3 x1.4 x1.5 x1.6
## 58.34747 126.11918 132.46580 135.37636 135.54139 138.29564 144.97845
## x1.7 x1.8 x1.9 x1.10 x1.11 x1.12 x3.t
## 147.77257 145.92043 150.61502 152.89970 151.69899 65.62082 18.38101
## x3.1 x3.2 x3.3 x3.4 x3.5 x3.6 x3.7
## 50.42609 52.36998 52.36940 53.81478 57.34063 59.91671 62.44331
## x3.8 x3.9 x3.10 x3.11 x3.12
## 65.51811 68.35053 69.10579 68.56418 28.14387
VIF.model1>10
## x1.t x1.1 x1.2 x1.3 x1.4 x1.5 x1.6 x1.7 x1.8 x1.9 x1.10 x1.11
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## x1.12 x3.t x3.1 x3.2 x3.3 x3.4 x3.5 x3.6 x3.7 x3.8 x3.9 x3.10
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## x3.11 x3.12
## TRUE TRUE
From the VIF values, it is obvious that the estimates from the finite distributed lag model are suffering from multicollinearity. To deal with multicollinearity, we can impose a polynomial shape on the lag distribution using polynomial distributed lag models.
Polynomial distributed lag models make estimations with one predictor. From the correlation matrix produced earlier, ASX price
has the highest correlation with Copper_USD/tonne
, so copper
will be used to fit polynomial distributed lag models.
model2.1 = polyDlm(x = as.vector(copper) , y = as.vector(asx), q = 12 , k = 2 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.095900 0.02530 3.7900 2.26e-04
## beta.1 0.068900 0.01530 4.5000 1.46e-05
## beta.2 0.045400 0.00867 5.2300 6.16e-07
## beta.3 0.025300 0.00765 3.3000 1.23e-03
## beta.4 0.008600 0.01020 0.8400 4.02e-01
## beta.5 -0.004620 0.01250 -0.3700 7.12e-01
## beta.6 -0.014400 0.01330 -1.0800 2.81e-01
## beta.7 -0.020700 0.01250 -1.6600 9.91e-02
## beta.8 -0.023600 0.01010 -2.3300 2.14e-02
## beta.9 -0.023000 0.00718 -3.2100 1.67e-03
## beta.10 -0.019000 0.00761 -2.5000 1.36e-02
## beta.11 -0.011600 0.01420 -0.8170 4.15e-01
## beta.12 -0.000671 0.02410 -0.0278 9.78e-01
summary(model2.1)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1213.59 -610.66 26.99 585.36 1512.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.136e+03 2.163e+02 19.124 < 2e-16 ***
## z.t0 9.592e-02 2.532e-02 3.788 0.000222 ***
## z.t1 -2.872e-02 1.190e-02 -2.414 0.017022 *
## z.t2 1.723e-03 9.736e-04 1.770 0.078906 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 710.9 on 145 degrees of freedom
## Multiple R-squared: 0.1666, Adjusted R-squared: 0.1494
## F-statistic: 9.664 on 3 and 145 DF, p-value: 7.422e-06
Polynomial distributed lag model model2.1
has q=12 and k=2, z.t2 term is not significant at 5% level of significance. The adjusted R-squared for model2.1
is 0.1494, which means that this model explains 14.94% of the variability in asx price. The overall model has a p value of 7.422e-06 (< 0.05), therefore it is significant.
checkresiduals(model2.1$model)
Figure 15
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 135.33, df = 10, p-value < 2.2e-16
shapiro.test(residuals(model2.1$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model2.1$model)
## W = 0.94909, p-value = 2.929e-05
Figure 15
shows the residual plots for mode2.1
:
Because z.t2 term is not significant at 5% level, we will adjust the value of k to 1:
model2.2 = polyDlm(x = as.vector(copper) , y = as.vector(asx), q = 12 , k = 1 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.056200 0.01180 4.750 5.10e-06
## beta.1 0.048300 0.01000 4.810 3.91e-06
## beta.2 0.040400 0.00826 4.890 2.81e-06
## beta.3 0.032500 0.00652 4.980 1.90e-06
## beta.4 0.024600 0.00487 5.050 1.40e-06
## beta.5 0.016600 0.00341 4.880 2.85e-06
## beta.6 0.008710 0.00252 3.450 7.34e-04
## beta.7 0.000794 0.00282 0.282 7.79e-01
## beta.8 -0.007130 0.00405 -1.760 8.06e-02
## beta.9 -0.015000 0.00563 -2.670 8.39e-03
## beta.10 -0.023000 0.00733 -3.130 2.12e-03
## beta.11 -0.030900 0.00909 -3.400 8.91e-04
## beta.12 -0.038800 0.01090 -3.570 4.99e-04
summary(model2.2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1246.73 -594.20 91.26 546.36 1614.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.219e+03 2.127e+02 19.838 < 2e-16 ***
## z.t0 5.624e-02 1.184e-02 4.748 4.85e-06 ***
## z.t1 -7.921e-03 1.848e-03 -4.285 3.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 716.1 on 146 degrees of freedom
## Multiple R-squared: 0.1486, Adjusted R-squared: 0.137
## F-statistic: 12.74 on 2 and 146 DF, p-value: 7.915e-06
Polynomial distributed lag model model2.2
has q=12 and k=1, all terms are significant at 5% level of significance. The adjusted R-squared for model2.2
is 0.137, which means that this model explains 13.7% of the variability in asx price. The overall model has a p value of 7.915e-06 (< 0.05), therefore it is significant.
checkresiduals(model2.2$model)
Figure 16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 134.92, df = 10, p-value < 2.2e-16
shapiro.test(residuals(model2.2$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model2.2$model)
## W = 0.95943, p-value = 0.000229
Figure 16
shows the residual plots for mode2.2
:
Koyck models makes estimation with one predictor. From the correlation matrix produced earlier, ASX price
has the highest correlation with Copper_USD/tonne
, so copper
will be used to fit polynomial distributed lag models.
model3 = koyckDlm(x = as.vector(copper) , y = as.vector(asx))
summary(model3)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
##
## alpha beta phi
## Geometric coefficients: 6672.885 -0.005863623 0.9716211
summary(model3$model, diagnostics=TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 1966.87 < 2e-16 ***
## Wu-Hausman 1 156 10.97 0.00115 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
In Koyck model model3
, X.t term is not significant at 5% level of significance. The adjusted R-squared for model3
is 0.9479, which means that this model explains 94.79% of the variability in asx price. The overall model has a p value less than 2.2e-16 (< 0.05), therefore it is significant.
checkresiduals(model3$model)
Figure 17
shapiro.test(residuals(model3$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model3$model)
## W = 0.96188, p-value = 0.0002204
Figure 17
shows the residual plots for mode3
:
VIF.model3=vif(model3$model)
VIF.model3
## Y.1 X.t
## 1.493966 1.493966
VIF.model3>10
## Y.1 X.t
## FALSE FALSE
From the VIF values, it is obvious that there is no multicollinearity between the estimates from the Koyck model.
for (i in 1:5){
for(j in 1:5){
model4.1 = ardlDlm(formula=y~x1+x3, data=data.frame(asx.data), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model4.1$model), "BIC = ", BIC(model4.1$model),"\n")
}
}
## p = 1 q = 1 AIC = 2135.427 BIC = 2156.954
## p = 1 q = 2 AIC = 2123.2 BIC = 2147.752
## p = 1 q = 3 AIC = 2108.165 BIC = 2135.728
## p = 1 q = 4 AIC = 2097.165 BIC = 2127.728
## p = 1 q = 5 AIC = 2086.767 BIC = 2120.315
## p = 2 q = 1 AIC = 2119.917 BIC = 2147.537
## p = 2 q = 2 AIC = 2121.523 BIC = 2152.212
## p = 2 q = 3 AIC = 2107.82 BIC = 2141.508
## p = 2 q = 4 AIC = 2096.569 BIC = 2133.244
## p = 2 q = 5 AIC = 2086.252 BIC = 2125.9
## p = 3 q = 1 AIC = 2109.717 BIC = 2143.406
## p = 3 q = 2 AIC = 2110.906 BIC = 2147.657
## p = 3 q = 3 AIC = 2109.982 BIC = 2149.795
## p = 3 q = 4 AIC = 2098.603 BIC = 2141.39
## p = 3 q = 5 AIC = 2088.342 BIC = 2134.09
## p = 4 q = 1 AIC = 2100.046 BIC = 2139.777
## p = 4 q = 2 AIC = 2101.141 BIC = 2143.929
## p = 4 q = 3 AIC = 2100.586 BIC = 2146.43
## p = 4 q = 4 AIC = 2102.043 BIC = 2150.943
## p = 4 q = 5 AIC = 2091.733 BIC = 2143.58
## p = 5 q = 1 AIC = 2089.626 BIC = 2135.374
## p = 5 q = 2 AIC = 2090.472 BIC = 2139.27
## p = 5 q = 3 AIC = 2089.878 BIC = 2141.726
## p = 5 q = 4 AIC = 2091.428 BIC = 2146.326
## p = 5 q = 5 AIC = 2092.86 BIC = 2150.807
Model with p=1 and q=5 has the lowest BIC value, model with p=2 and q=5 has the lowest AIC value. So we will fit and analyse both autoregressive distributed lag models.
model4 = ardlDlm(x = as.vector(gold)+as.vector(copper) , y = as.vector(asx), p = 1 , q = 5)
summary(model4)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -697.99 -118.34 -1.29 128.05 740.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 230.11210 93.84169 2.452 0.0154 *
## X.t 0.08736 0.03374 2.589 0.0106 *
## X.1 -0.08527 0.03353 -2.543 0.0120 *
## Y.1 0.99209 0.08131 12.202 <2e-16 ***
## Y.2 0.10367 0.11613 0.893 0.3734
## Y.3 -0.07051 0.11536 -0.611 0.5420
## Y.4 -0.03616 0.11504 -0.314 0.7537
## Y.5 -0.03771 0.08013 -0.471 0.6386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 197.4 on 148 degrees of freedom
## Multiple R-squared: 0.9471, Adjusted R-squared: 0.9446
## F-statistic: 378.4 on 7 and 148 DF, p-value: < 2.2e-16
Autoregressive distributed lag model model4
has q=5 and p=1, y.2, y.3, y.4 and y.5 terms are not significant at 5% level of significance. The adjusted R-squared for model4
is 0.9446, which means that this model explains 94.46% of the variability in asx price. The overall model has a p value less than 2.2e-16 (< 0.05), therefore it is significant.
checkresiduals(model4$model)
Figure 18
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 7.4059, df = 11, p-value = 0.7653
shapiro.test(residuals(model4$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model4$model)
## W = 0.98292, p-value = 0.05082
Figure 18
shows the residual plots for mode4
:
VIF.model4=vif(model4$model)
VIF.model4
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 22.54217 23.02864 18.95442 39.46522 39.88493 40.76036 20.21224
VIF.model4>10
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
From the VIF values, it is obvious that the estimates from autoregressive lag model model4
are suffering from multicollinearity.
model4.2 = ardlDlm(x = as.vector(gold)+as.vector(copper) , y = as.vector(asx), p = 2 , q = 5)
summary(model4.2)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -681.40 -110.43 -0.93 116.08 743.54
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 228.884879 92.828888 2.466 0.0148 *
## X.t 0.067284 0.034766 1.935 0.0549 .
## X.1 0.004915 0.054888 0.090 0.9288
## X.2 -0.072403 0.035107 -2.062 0.0409 *
## Y.1 0.962414 0.081705 11.779 <2e-16 ***
## Y.2 0.107337 0.114886 0.934 0.3517
## Y.3 -0.032457 0.115591 -0.281 0.7793
## Y.4 -0.047478 0.113932 -0.417 0.6775
## Y.5 -0.034857 0.079276 -0.440 0.6608
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.3 on 147 degrees of freedom
## Multiple R-squared: 0.9486, Adjusted R-squared: 0.9458
## F-statistic: 338.9 on 8 and 147 DF, p-value: < 2.2e-16
Autoregressive distributed lag model model4.2
has q=5 and p=2, xt, x1, y.2, y.3, y.4 and y.5 terms are not significant at 5% level of significance. The adjusted R-squared for model4.2
is 0.9458, which means that this model explains 94.58% of the variability in asx price. The overall model has a p value less than 2.2e-16 (< 0.05), therefore it is significant.
checkresiduals(model4.2$model)
Figure 19
##
## Breusch-Godfrey test for serial correlation of order up to 12
##
## data: Residuals
## LM test = 5.167, df = 12, p-value = 0.9522
shapiro.test(residuals(model4.2$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model4.2$model)
## W = 0.98224, p-value = 0.04221
Figure 19
shows the residual plots for mode4
:
VIF.model4.2=vif(model4.2$model)
VIF.model4.2
## X.t L(X.t, 1) L(X.t, 2) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4)
## 24.45970 63.04721 26.65834 19.56129 39.47467 40.92768 40.85521
## L(y.t, 5)
## 20.21840
VIF.model4.2>10
## X.t L(X.t, 1) L(X.t, 2) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4)
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## L(y.t, 5)
## TRUE
From the VIF values, it is obvious that the estimates from autoregressive lag model model4.2
are suffering from multicollinearity.
Due the the existence of multicollinearity in autoregressive distributed lag models with two predictors, we will remove the predictor with lower correlation with the response variable and only use copper
.
for (i in 1:5){
for(j in 1:5){
model5.1 = ardlDlm(formula=y~x3, data=data.frame(asx.data), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model5.1$model), "BIC = ", BIC(model5.1$model),"\n")
}
}
## p = 1 q = 1 AIC = 2147.741 BIC = 2163.116
## p = 1 q = 2 AIC = 2135.4 BIC = 2153.813
## p = 1 q = 3 AIC = 2121.12 BIC = 2142.558
## p = 1 q = 4 AIC = 2109.759 BIC = 2134.209
## p = 1 q = 5 AIC = 2099.056 BIC = 2126.505
## p = 2 q = 1 AIC = 2130.043 BIC = 2148.456
## p = 2 q = 2 AIC = 2132.038 BIC = 2153.52
## p = 2 q = 3 AIC = 2119.241 BIC = 2143.741
## p = 2 q = 4 AIC = 2107.649 BIC = 2135.155
## p = 2 q = 5 AIC = 2097.021 BIC = 2127.52
## p = 3 q = 1 AIC = 2117.307 BIC = 2138.745
## p = 3 q = 2 AIC = 2119.247 BIC = 2143.748
## p = 3 q = 3 AIC = 2119.696 BIC = 2147.259
## p = 3 q = 4 AIC = 2108.537 BIC = 2139.1
## p = 3 q = 5 AIC = 2097.832 BIC = 2131.38
## p = 4 q = 1 AIC = 2105.916 BIC = 2130.366
## p = 4 q = 2 AIC = 2107.774 BIC = 2135.28
## p = 4 q = 3 AIC = 2108.608 BIC = 2139.17
## p = 4 q = 4 AIC = 2110.085 BIC = 2143.704
## p = 4 q = 5 AIC = 2099.454 BIC = 2136.052
## p = 5 q = 1 AIC = 2095.118 BIC = 2122.566
## p = 5 q = 2 AIC = 2096.96 BIC = 2127.459
## p = 5 q = 3 AIC = 2097.887 BIC = 2131.436
## p = 5 q = 4 AIC = 2099.497 BIC = 2136.095
## p = 5 q = 5 AIC = 2101.419 BIC = 2141.067
Model with p=1 and q=5 has the lowest BIC value, model with p=2 and q=5 has the lowest AIC value. So we will fit and analyse both autoregressive distributed lag models.
model5 = ardlDlm(x = as.vector(copper) , y = as.vector(asx), p = 1 , q = 5)
summary(model5)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -699.82 -120.15 -3.11 126.37 735.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.77135 93.06873 2.426 0.01648 *
## X.t 0.10123 0.03350 3.022 0.00296 **
## X.1 -0.09988 0.03343 -2.987 0.00330 **
## Y.1 0.98307 0.08091 12.151 < 2e-16 ***
## Y.2 0.11625 0.11549 1.007 0.31577
## Y.3 -0.07509 0.11447 -0.656 0.51284
## Y.4 -0.04030 0.11417 -0.353 0.72457
## Y.5 -0.03005 0.07947 -0.378 0.70585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.9 on 148 degrees of freedom
## Multiple R-squared: 0.9479, Adjusted R-squared: 0.9455
## F-statistic: 384.8 on 7 and 148 DF, p-value: < 2.2e-16
Autoregressive distributed lag model model5
has q=5 and p=1, y.2, y.3, y.4 and y.5 terms are not significant at 5% level of significance. The adjusted R-squared for model5
is 0.9455, which means that this model explains 94.55% of the variability in asx price. The overall model has a p value less than 2.2e-16 (< 0.05), therefore it is significant.
checkresiduals(model5$model)
Figure 20
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 6.8037, df = 11, p-value = 0.8148
shapiro.test(residuals(model5$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model5$model)
## W = 0.98354, p-value = 0.06027
Figure 20
shows the residual plots for mode5
:
VIF.model5=vif(model5$model)
VIF.model5
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 18.41423 18.91724 19.07060 39.66170 39.90695 40.78752 20.19991
VIF.model5>10
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
From the VIF values, it is obvious that the estimates from autoregressive lag model model5
are suffering from multicollinearity.
model5.2 = ardlDlm(x = as.vector(copper) , y = as.vector(asx), p = 2 , q = 5)
summary(model5.2)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -686.09 -112.80 -4.78 119.10 740.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 224.51572 92.18701 2.435 0.0161 *
## X.t 0.08076 0.03478 2.322 0.0216 *
## X.1 -0.01276 0.05538 -0.230 0.8181
## X.2 -0.06969 0.03551 -1.963 0.0516 .
## Y.1 0.95019 0.08187 11.606 <2e-16 ***
## Y.2 0.12115 0.11442 1.059 0.2914
## Y.3 -0.03346 0.11535 -0.290 0.7722
## Y.4 -0.05190 0.11324 -0.458 0.6474
## Y.5 -0.02806 0.07872 -0.356 0.7220
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 194 on 147 degrees of freedom
## Multiple R-squared: 0.9492, Adjusted R-squared: 0.9465
## F-statistic: 343.7 on 8 and 147 DF, p-value: < 2.2e-16
Autoregressive distributed lag model model5.2
has q=5 and p=2, x1, y.2, y.3, y.4 and y.5 terms are not significant at 5% level of significance. The adjusted R-squared for model5
is 0.9465, which means that this model explains 94.65% of the variability in asx price. The overall model has a p value less than 2.2e-16 (< 0.05), therefore it is significant.
checkresiduals(model5.2$model)
Figure 21
##
## Breusch-Godfrey test for serial correlation of order up to 12
##
## data: Residuals
## LM test = 4.5804, df = 12, p-value = 0.9705
shapiro.test(residuals(model5.2$model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model5.2$model)
## W = 0.98239, p-value = 0.04397
Figure 21
shows the residual plots for mode4
:
VIF.model5.2=vif(model5$model)
VIF.model5.2
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 18.41423 18.91724 19.07060 39.66170 39.90695 40.78752 20.19991
VIF.model5.2>10
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
From the VIF values, it is obvious that the estimates from autoregressive lag model model5.2
are suffering from multicollinearity.
In conclusion, the best distributed lag model (DLM) for forecasting the ASX price index is Koyck model model3
with copper
as the predictor. Most of the terms are significant and the overall model is significant. The adjusted R-squared for model3
is moderatly high (0.9479). The model at the first stage of least squares fitting is significant at 5% level of significance and there is significant correlation between the error term and the lagged dependent variable. The residual check shows no violation in most assumptions but normality, which can be forgiven.