The aim of the investigation is to perform the analysis of the Australian Securities Exchange (ASX) data. To answer the question, we will explore and elaborate on the following:
The existence of nonstationarity in dataset
The impact of the components of a time series data on the given dataset
The most accurate and suitable distributed lag model for the ASX price index
The dataset asx used in this investigation contains information about monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) from January 2004 to May 2017.
The following code chunk imports the dataset as well as displays its first observations.
asx = read_csv("ASX_data.csv")
## Parsed with column specification:
## cols(
## Month = col_character(),
## `ASX price` = col_double(),
## `Gold price` = col_number(),
## `Crude Oil (Brent)_USD/bbl` = col_double(),
## `Copper_USD/tonne` = col_number()
## )
head(asx)
## # A tibble: 6 x 5
## Month `ASX price` `Gold price` `Crude Oil (Brent)_USD… `Copper_USD/tonn…
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Jan-04 2935. 612. 31.3 1650
## 2 Feb-04 2778. 603. 32.6 1682
## 3 Mar-04 2849. 566. 30.3 1656
## 4 Apr-04 2971. 539. 25.0 1588
## 5 May-04 2980. 549. 25.8 1651
## 6 Jun-04 3000. 536. 27.6 1685
The following code extracts the variables in our asx data frame and converts them to monthly time series objects.
ords_ts = ts(asx$`ASX price`, start = c(2004, 1), frequency = 12)
gold_ts = ts(asx$`Gold price`, start = c(2004, 1), frequency = 12)
oil_ts = ts(asx$`Crude Oil (Brent)_USD/bbl`, start = c(2004, 1), frequency = 12)
copper_ts = ts(asx$`Copper_USD/tonne`, start = c(2004, 1), frequency = 12)
data = ts(asx[,2:5])
We will firstly look at the time series plots and describe each of the series separately. The following code creates a time series plot of the all ords price index.
plot(ords_ts, type = "o", main = "Time Series plot of All Ordinaries price index")
Figure 1
From the plot in Figure 1, we can observe that the All Ords price index series has an upward trend over the given time period, there is no seasonality and no changing variance present in the series. There is also a potential intervention point around 2006 and an intervention around 2008. Successive observations and fluctuations around mean level suggest autoregressive and moving average behaviour.
The following code creates a time series plot of gold price series.
plot(gold_ts, type = "o", main = "Time Series plot of Gold price", ylab = "AUD", xlab="Year")
Figure 2
The plot in Figure 2 shows that there is a general upward trend over the observed time, there is no seasonality and no changing variance in the series. There are no obvious intervention points. Successive observations and fluctuations around mean level imply the existence of autoregressive and moving average behaviour.
The following code creates a time series plot of crude oil price series.
plot(oil_ts, type = "o", main = "Time Series plot of Crude oil price", ylab = "USD", xlab="Year")
Figure 3
From the plot in Figure 3, we can observe that the existence of trend in the series is not obvious. There is no seasonality and no changing variance present in the series. There is an intervention point around 2008. Successive observation points and fluctuations around mean level suggest the existence of autoregressive and moving average behaviour in the series.
The following code produces a time series plot of copper price series.
plot(copper_ts, type = "o", main = "Time Series plot of Copper price", ylab = "USD", xlab="Year")
Figure 4
The plot in Figure 4 shows that there is a general trend in the time series. There is no seasonality and no changing variance present. There is an intervention point observed around 2008, just like in the all ordinaries, oil and copper price series. Successive points as well as fluctuations around mean level suggest autoregressive and moving average behaviour.
To further explore the relationship between the all ords index and our independent series, we display them within the same plot. Standartisation is performed over all variables by centering and scaling to clearly plot them on the same scale.
data_scale = scale(data)
plot(data_scale, plot.type = "s", col = c("black", "palegreen", "coral2", "skyblue2"), main = "Time Series plot of standardised ASX data")
legend("topleft",lty=1, text.width = 15, col=c("black", "palegreen", "coral2", "skyblue2"), c("Ords", "Gold", "Oil", "Copper"))
Figure 5
From the plot in Figure 5, we can observe that at the beginning of the given period all ords index, oil and copper prices had a similar trend, but after the suspected intervention the trend in all ords index gradually became the opposite of oil and copper price series. It is also observed that oil and copper price series are moderately correlated.
The following code creates a correlation matrix for our data.
cor(data)
## ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price 1.0000000 0.3431908 0.3290338
## Gold price 0.3431908 1.0000000 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338 0.4366382 1.0000000
## Copper_USD/tonne 0.5617864 0.5364213 0.8664296
## Copper_USD/tonne
## ASX price 0.5617864
## Gold price 0.5364213
## Crude Oil (Brent)_USD/bbl 0.8664296
## Copper_USD/tonne 1.0000000
It is observed that there is a moderate positive correlation between the dependent series all ords index and copper price series with a correlation coefficient r = 0.56. There is also a strong positive correlation observed between oil and copper price series which supports the inference from the plot in Figure 5 (r = 0.87).
To explore the existence of nonstationarity in the dataset, we create sample ACF plots and conduct an Augmented Dickey-Fuller unit root test to test the \(H_0\) that the process is difference nonstationary. The following code displays ACF plots and the results of ADF tests over all series.
acf(ords_ts)
Figure 6
ar(ords_ts)
##
## Call:
## ar(x = ords_ts)
##
## Coefficients:
## 1 2
## 1.0726 -0.1208
##
## Order selected 2 sigma^2 estimated as 67176
adf.test(ords_ts, k = 2)
##
## Augmented Dickey-Fuller Test
##
## data: ords_ts
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary
acf(gold_ts)
Figure 7
ar(gold_ts)
##
## Call:
## ar(x = gold_ts)
##
## Coefficients:
## 1
## 0.9803
##
## Order selected 1 sigma^2 estimated as 6329
adf.test(gold_ts, k = 1)
##
## Augmented Dickey-Fuller Test
##
## data: gold_ts
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary
acf(oil_ts)
Figure 8
ar(oil_ts)
##
## Call:
## ar(x = oil_ts)
##
## Coefficients:
## 1 2 3
## 1.2243 -0.1493 -0.1186
##
## Order selected 3 sigma^2 estimated as 48.96
adf.test(oil_ts, k = 3)
##
## Augmented Dickey-Fuller Test
##
## data: oil_ts
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary
acf(copper_ts)
Figure 9
ar(copper_ts)
##
## Call:
## ar(x = copper_ts)
##
## Coefficients:
## 1 2
## 1.1756 -0.2228
##
## Order selected 2 sigma^2 estimated as 331275
adf.test(copper_ts, k = 2)
##
## Augmented Dickey-Fuller Test
##
## data: copper_ts
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary
Based on the slowly decaying pattern of significant lags in the sample ACF plots in Figures 6-9, we can conclude that all explored series have a trend. The ADF test reports p-values > 0.05 for all the series, so we fail to reject the \(H_0\) that the series are nonstationary at 5% level.
Overall, from a descriptive analysis of time series plots, sample ACF plots and the ADF test results, we can observe that there is nonstationarity existent in the asx data.
Decomposition of a given series into its trend and seasonal components is another important operation. STL and X12 decomposition approaches were implemented to extract time series components.
stl1 = stl(ords_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl1, main = "STL decomposition of All ords price index series")
Figure 10
fit1 = x12(ords_ts)
plot(fit1, sa = T, trend = T, main = "X12 decomposition of All ords price index series")
Figure 10
From the remainder series of the all ords series in Figure 10, it is observed that the spikes in the raw data are caused by other external factors, they happen around the intervention point. Since we did not find any evidence of seasonality from the time series and ACF plots, the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series.
stl2 = stl(gold_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl2, main = "STL decomposition of Gold price series")
Figure 11
fit2 = x12(gold_ts)
plot(fit2, sa = T, trend = T, main = "X12 decomposition of Gold price series")
Figure 11
There was no seasonal effect found in the gold price series at the data visualisation stage, so the seasonally adjusted data in X12 decomposition is very close to the original series, and the seasonal pattern in STL decomposition is meaningless (Figure 11). The remainder component of this series is not smooth at all, meaning there are other unknown factors that have an impact on the series.
stl3 = stl(oil_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl3, main = "STL decomposition of Oil price series")
Figure 12
fit3 = x12(oil_ts)
plot(fit3, sa = T, trend = T, main = "X12 decomposition of Oil price series")
Figure 12
The oil price series shows the same picture with seasonality as the previously explored series. From the STL decomposition in Figure 12, it is observed that the remainder has a major peak when the intervention happened, but otherwise is rather smooth.
stl4 = stl(copper_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl4, main = "STL decomposition of Copper price series")
Figure 13
fit4 = x12(copper_ts)
plot(fit4, sa = T, trend = T, main = "X12 decomposition of Copper price series")
Figure 13
The conclusions that we can make from the decomposition of the copper price data are similar to all the other previously analysed series. There are ups and downs around the intervention in the remainder of the series (Figure 13). There was no seasonality found prior to decomposition, so there is no seasonal effect here.
Overall, it is observed that there is no seasonality effect on the asx data, and all the fluctuations are due to other external factors.
In the modelling process we attempt to find the best appropriate model for the all ordinaries price index. For the finite and autoregressive distributed lag models we use variables gold and copper series. These predictors were chosen based on their correlation with each other and with the dependent variable. Because oil and copper price series were found to be strongly correlated, we do not use them both for modelling. The copper price series is used for polynomial DLM and Koyck model because it has a stronger correlation with the independent series and is more likely to explain its variation and correlation structure.
To specify the finite lag length for the model, we use a loop that compares the AIC/BIC values for lags from 1 to 10 and select a model with the lowest such values.
df = asx[,2:5]
colnames(df) <- c("y", "x1", "x2", "x3")
for ( i in 1:20){
model1.1 = dlm(formula = y ~ x1 + x3, data = data.frame(df), q = i )
cat("q = ", i, "AIC = ", AIC(model1.1$model), "BIC = ", BIC(model1.1$model),"\n")
}
## q = 1 AIC = 2577.989 BIC = 2596.44
## q = 2 AIC = 2564.871 BIC = 2589.423
## q = 3 AIC = 2551.583 BIC = 2582.209
## q = 4 AIC = 2538.084 BIC = 2574.759
## q = 5 AIC = 2523.864 BIC = 2566.562
## q = 6 AIC = 2509.866 BIC = 2558.561
## q = 7 AIC = 2495.447 BIC = 2550.112
## q = 8 AIC = 2481.403 BIC = 2542.012
## q = 9 AIC = 2467.268 BIC = 2533.793
## q = 10 AIC = 2454.097 BIC = 2526.512
## q = 11 AIC = 2440.381 BIC = 2518.658
## q = 12 AIC = 2426.737 BIC = 2510.848
## q = 13 AIC = 2411.621 BIC = 2501.537
## q = 14 AIC = 2395.252 BIC = 2490.946
## q = 15 AIC = 2377.833 BIC = 2479.276
## q = 16 AIC = 2359.944 BIC = 2467.107
## q = 17 AIC = 2343.076 BIC = 2455.929
## q = 18 AIC = 2325.554 BIC = 2444.068
## q = 19 AIC = 2307.432 BIC = 2431.577
## q = 20 AIC = 2287.756 BIC = 2417.502
model1 = dlm(formula = y ~ x1 + x3, data = data.frame(df), q = 10)
summary(model1)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1086.02 -639.88 61.36 543.00 1626.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.840e+03 2.555e+02 15.033 <2e-16 ***
## x1.t -6.761e-01 1.266e+00 -0.534 0.594
## x1.1 3.617e-01 1.839e+00 0.197 0.844
## x1.2 1.218e-01 1.878e+00 0.065 0.948
## x1.3 7.260e-02 1.894e+00 0.038 0.969
## x1.4 -2.534e-01 1.893e+00 -0.134 0.894
## x1.5 4.528e-01 1.887e+00 0.240 0.811
## x1.6 -2.478e-01 1.879e+00 -0.132 0.895
## x1.7 5.263e-01 1.892e+00 0.278 0.781
## x1.8 -3.294e-01 1.889e+00 -0.174 0.862
## x1.9 9.934e-02 1.863e+00 0.053 0.958
## x1.10 1.881e-01 1.243e+00 0.151 0.880
## x3.t 1.504e-01 1.418e-01 1.061 0.291
## x3.1 2.232e-02 2.301e-01 0.097 0.923
## x3.2 2.992e-02 2.315e-01 0.129 0.897
## x3.3 2.537e-02 2.275e-01 0.112 0.911
## x3.4 1.595e-02 2.269e-01 0.070 0.944
## x3.5 -2.686e-02 2.334e-01 -0.115 0.909
## x3.6 2.257e-02 2.335e-01 0.097 0.923
## x3.7 -7.365e-04 2.347e-01 -0.003 0.998
## x3.8 -1.282e-03 2.384e-01 -0.005 0.996
## x3.9 -4.984e-02 2.345e-01 -0.213 0.832
## x3.10 -7.532e-02 1.470e-01 -0.512 0.609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 758.2 on 128 degrees of freedom
## Multiple R-squared: 0.2145, Adjusted R-squared: 0.07954
## F-statistic: 1.589 on 22 and 128 DF, p-value: 0.05855
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2454.097 2526.512
vif(model1$model)
## x1.t x1.1 x1.2 x1.3 x1.4 x1.5 x1.6
## 59.59233 127.47872 134.39220 138.52489 140.37920 141.41274 142.08644
## x1.7 x1.8 x1.9 x1.10 x3.t x3.1 x3.2
## 145.58579 146.08078 142.23279 63.30170 18.58221 50.64862 53.07056
## x3.3 x3.4 x3.5 x3.6 x3.7 x3.8 x3.9
## 53.02948 54.55691 59.57562 61.50293 64.07991 67.86182 67.31767
## x3.10
## 27.10904
residualcheck = function(x){
checkresiduals(x)
bgtest(x)
shapiro.test(x$residuals)
}
residualcheck(model1$model)
Figure 14
##
## Breusch-Godfrey test for serial correlation of order up to 26
##
## data: Residuals
## LM test = 140.85, df = 26, p-value < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94326, p-value = 8.803e-06
It is observed that the values of information criteria decrease as the lag q increases, so we will attempt to fit a finite DLM with a number of lags = 10.
According to the significance tests of model coefficients obtained from the summary, all lag weights of predictors are not statistically significant at 5% level. Following this inference, the adjusted \(R^2\) is reported to be about 8% which is very low. F-test of the overall significance of the model reports the model is not statistically significant at 5% level (p-value > 0.05). Therefore, we can conclude that the model is not a good fit to the data.
VIF values are reported > 10 so the effect of multicollinearity is high.
The residualcheck function was created to apply a diagnostic check in a dynamic way. It displays residual analysis plots as well as performs the Breusch-Godfrey test of serial correlation and the Shapiro-Wilk normality test of the residuals.
From looking at the diagnostic check plots in Figure 14, we can observe that the residuals are not randomly distributed and clearly have a trend. ACF plot shows that there is serial correlation in the residuals, the Beusch-Godfrey test supports that at 5% level of significance. The histogram and Shapiro-Wilk (p-value < 0.05) test report that the normality of the residuals does not hold.
Overall, we can conclude that the finite DLM of lag 10 is not appropriate for further analysis.
To deal with multicollinearity problem in the finite DLM, we will attempt to use polynomial curves to restrict lag weights. We specify the optimal lag length using a function that fits finite DLMs for a range of lag lengths from 1 to 10 and orders the fitted models according to their AIC values.
The following code applies the polynomial DLM with all ordinaries price index and copper price series as well as performs residual analysis.
finiteDLMauto(x = as.vector(df$x3), y = as.vector(df$y), q.min = 1, q.max = 10, k.order = 1,
model.type = "poly", error.type ="AIC", trace = TRUE)
## q - k MASE AIC BIC R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466 0.17549 0
## 9 9 - 1 4.00168 2435.953 2448.048 0.19025 0
## 8 8 - 1 4.03309 2453.342 2465.464 0.20244 0
## 7 7 - 1 4.07118 2470.649 2482.797 0.21326 0
## 6 6 - 1 4.09428 2488.104 2500.277 0.22488 0
## 5 5 - 1 4.11367 2505.646 2517.846 0.23776 0
## 4 4 - 1 4.15203 2523.100 2535.325 0.25039 0
## 3 3 - 1 4.19205 2540.208 2552.459 0.26371 0
## 2 2 - 1 4.21449 2557.367 2569.643 0.27856 0
## 1 1 - 1 4.24262 2574.488 2586.789 0.29391 0
model2 = polyDlm(x = as.vector(df$x3), y = as.vector(df$y), q = 10, k = 1, show.beta = T)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.07360 0.01440 5.120 1.01e-06
## beta.1 0.06140 0.01170 5.240 5.82e-07
## beta.2 0.04920 0.00911 5.410 2.67e-07
## beta.3 0.03710 0.00658 5.640 8.95e-08
## beta.4 0.02490 0.00428 5.820 3.74e-08
## beta.5 0.01280 0.00287 4.450 1.73e-05
## beta.6 0.00062 0.00360 0.172 8.64e-01
## beta.7 -0.01150 0.00570 -2.020 4.48e-02
## beta.8 -0.02370 0.00817 -2.900 4.35e-03
## beta.9 -0.03580 0.01080 -3.330 1.11e-03
## beta.10 -0.04800 0.01340 -3.580 4.74e-04
summary(model2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1214.62 -619.46 25.16 588.44 1479.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.034e+03 2.040e+02 19.775 < 2e-16 ***
## z.t0 7.356e-02 1.438e-02 5.115 9.57e-07 ***
## z.t1 -1.216e-02 2.721e-03 -4.467 1.56e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 717.6 on 148 degrees of freedom
## Multiple R-squared: 0.1865, Adjusted R-squared: 0.1755
## F-statistic: 16.96 on 2 and 148 DF, p-value: 2.329e-07
vif(model2$model)
## z.t0 z.t1
## 25.86069 25.86069
residualcheck(model2$model)
Figure 15
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 138.14, df = 10, p-value < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94724, p-value = 1.828e-05
Like in the finite DLM fitting, the lowest AIC and BIC measures in the given range are for q = 10. We set the order of polynomial to 1 as it minimises the information criteria.
According to the model summary, all lag weights are significant at 5% level except lag 6 (p-value > 0.05). The adjusted \(R^2\) = 17.5% is slightly better than the finite DLM but still very low. The overall significance test reports the model is statistically significant at 5% level.
VIF values are > 10 and suggest there is still multicollinearity effect on this model.
Diagnostic checking in Figure 15 shows that the residuals are not randomly spread. There are a lot of highly significant lags in the ACF plot, so there is autocorrelation present in the residuals. That is also supported by Beusch-Godfrey test at 5% level of significance. The normality of the residuals is also violated, as observed from the histogram and Shapiro-Wilk normality test report (p-value < 0.05).
We conclude that the polynomial DLM of lag 10 is not appropriate for further analysis.
The following code implements Koyck DLM with the all ordinaries price index and copper price series as well as performs diagnostic checking.
model3 = koyckDlm(x = as.vector(df$x3), y = as.vector(df$y))
summary(model3$model, diagnostics = T)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 1966.87 < 2e-16 ***
## Wu-Hausman 1 156 10.97 0.00115 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
vif(model3$model)
## Y.1 X.t
## 1.493966 1.493966
residualcheck(model3$model)
Figure 16
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.96188, p-value = 0.0002204
attr(model3$model,"class") = "lm"
AIC(model3$model)
## [1] 2157.456
From the model summary, we can conclude that the coefficient \(\delta_3\) is not statistically significant at 5% level, but \(\delta_2\) is reported to be significant. Koyck model is reported to be overall statistically significant at 5% level (p-value < 0.05) and its \(R^2\) = 94.8% which is a great improvement of model explainability.
According to the Weak instruments test, the model at the first stage of least-squares estimation is significant at 5% level. From the Wu-Hausman test, we can conclude that there is significant correlation between the explanatory variable and the error term at 5% level.
There is no issue with multicollinearity with all VIFs < 10. The AIC measure is reported to be 2157.5 which is lower than the finite and polynomial DLMs.
From the residual analysis in Figure 16, it is observed that the errors are spread randomly as desired. There are no significant lags in the ACF plot which suggests there is no serial correlation in the residuals. However, the error terms are not perfectly normal. The histogram of the residuals seems left-skewed, and the Shapiro-Wilk normality test suggests not normal residuals (p-value < 0.05).
We attempt to fit autoregressive DLMs in order to find a more suitable model than the Koyck model.
For specifying the parameters of ARDL(p,q), we use a loop that fits autoregressive DLMs for a range of lag lengths and orders of the AR process and fit the models that minimise the infrormation criteria.
Based on the information criteria, we select the following models:
ARDL(1,5)
ARDL(2,5)
ARDL(3,5)
The following code apllies candidate models and performs diagnostic checking.
for (i in 1:5){
for(j in 1:5){
model1.2 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model1.2$model), "BIC = ", BIC(model1.2$model),"\n")
}
}
## p = 1 q = 1 AIC = 2135.427 BIC = 2156.954
## p = 1 q = 2 AIC = 2123.2 BIC = 2147.752
## p = 1 q = 3 AIC = 2108.165 BIC = 2135.728
## p = 1 q = 4 AIC = 2097.165 BIC = 2127.728
## p = 1 q = 5 AIC = 2086.767 BIC = 2120.315
## p = 2 q = 1 AIC = 2119.917 BIC = 2147.537
## p = 2 q = 2 AIC = 2121.523 BIC = 2152.212
## p = 2 q = 3 AIC = 2107.82 BIC = 2141.508
## p = 2 q = 4 AIC = 2096.569 BIC = 2133.244
## p = 2 q = 5 AIC = 2086.252 BIC = 2125.9
## p = 3 q = 1 AIC = 2109.717 BIC = 2143.406
## p = 3 q = 2 AIC = 2110.906 BIC = 2147.657
## p = 3 q = 3 AIC = 2109.982 BIC = 2149.795
## p = 3 q = 4 AIC = 2098.603 BIC = 2141.39
## p = 3 q = 5 AIC = 2088.342 BIC = 2134.09
## p = 4 q = 1 AIC = 2100.046 BIC = 2139.777
## p = 4 q = 2 AIC = 2101.141 BIC = 2143.929
## p = 4 q = 3 AIC = 2100.586 BIC = 2146.43
## p = 4 q = 4 AIC = 2102.043 BIC = 2150.943
## p = 4 q = 5 AIC = 2091.733 BIC = 2143.58
## p = 5 q = 1 AIC = 2089.626 BIC = 2135.374
## p = 5 q = 2 AIC = 2090.472 BIC = 2139.27
## p = 5 q = 3 AIC = 2089.878 BIC = 2141.726
## p = 5 q = 4 AIC = 2091.428 BIC = 2146.326
## p = 5 q = 5 AIC = 2092.86 BIC = 2150.807
model4_1 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 1, q = 5)$model
summary(model4_1)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -608.69 -108.99 -1.87 96.72 697.71
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 204.195354 91.833430 2.224 0.027715 *
## x1.t -1.168101 0.298139 -3.918 0.000137 ***
## x1.1 1.175789 0.293809 4.002 9.96e-05 ***
## x3.t 0.094982 0.032049 2.964 0.003551 **
## x3.1 -0.096433 0.032189 -2.996 0.003216 **
## y.1 0.932523 0.078420 11.891 < 2e-16 ***
## y.2 0.184956 0.111691 1.656 0.099876 .
## y.3 -0.094300 0.109515 -0.861 0.390613
## y.4 -0.063908 0.109287 -0.585 0.559605
## y.5 0.002755 0.077303 0.036 0.971618
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 187.2 on 146 degrees of freedom
## Multiple R-squared: 0.9531, Adjusted R-squared: 0.9502
## F-statistic: 329.5 on 9 and 146 DF, p-value: < 2.2e-16
residualcheck(model4_1)
##
## Breusch-Godfrey test for serial correlation of order up to 13
##
## data: Residuals
## LM test = 11.315, df = 13, p-value = 0.5844
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98623, p-value = 0.1257
vif(model4_1)
## x1 L(x1, 1) x3 L(x3, 1) L(y, 1) L(y, 2) L(y, 3) L(y, 4)
## 60.17390 59.02863 18.45786 19.20054 19.61994 40.62230 39.99994 40.92965
## L(y, 5)
## 20.93150
model4_2 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 2, q = 5)$model
summary(model4_2)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -590.34 -108.63 0.29 93.85 705.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.953e+02 9.127e+01 2.139 0.034081 *
## x1.t -1.148e+00 2.966e-01 -3.868 0.000165 ***
## x1.1 1.270e+00 4.357e-01 2.915 0.004131 **
## x1.2 -1.007e-01 3.081e-01 -0.327 0.744282
## x3.t 7.565e-02 3.355e-02 2.255 0.025665 *
## x3.1 -1.200e-02 5.335e-02 -0.225 0.822345
## x3.2 -6.907e-02 3.449e-02 -2.003 0.047097 *
## y.1 9.092e-01 8.324e-02 10.923 < 2e-16 ***
## y.2 1.866e-01 1.113e-01 1.677 0.095794 .
## y.3 -5.901e-02 1.129e-01 -0.523 0.601953
## y.4 -7.267e-02 1.087e-01 -0.668 0.505049
## y.5 9.629e-04 7.673e-02 0.013 0.990005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared: 0.9544, Adjusted R-squared: 0.9509
## F-statistic: 274.1 on 11 and 144 DF, p-value: < 2.2e-16
residualcheck(model4_2)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 14.041, df = 15, p-value = 0.5224
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98426, p-value = 0.07341
vif(model4_2)
## x1 L(x1, 1) L(x1, 2) x3 L(x3, 1) L(x3, 2) L(y, 1)
## 60.48199 131.76923 66.52833 20.53927 53.54225 23.07222 22.44197
## L(y, 2) L(y, 3) L(y, 4) L(y, 5)
## 40.96126 43.14764 41.14631 20.93494
model4_3 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 3, q = 5)$model
summary(model4_3)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -579.26 -106.14 -3.19 100.67 708.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 192.817278 91.367906 2.110 0.036582 *
## x1.t -1.078264 0.301502 -3.576 0.000477 ***
## x1.1 1.170990 0.442628 2.646 0.009075 **
## x1.2 0.245783 0.458093 0.537 0.592429
## x1.3 -0.308971 0.313521 -0.985 0.326060
## x3.t 0.072304 0.033677 2.147 0.033495 *
## x3.1 -0.011834 0.054183 -0.218 0.827427
## x3.2 -0.038084 0.054057 -0.705 0.482270
## x3.3 -0.029416 0.035206 -0.836 0.404817
## y.1 0.896281 0.084295 10.633 < 2e-16 ***
## y.2 0.207550 0.114437 1.814 0.071841 .
## y.3 -0.060117 0.113402 -0.530 0.596856
## y.4 -0.076059 0.112909 -0.674 0.501638
## y.5 -0.002674 0.077384 -0.035 0.972480
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 185.9 on 142 degrees of freedom
## Multiple R-squared: 0.955, Adjusted R-squared: 0.9509
## F-statistic: 231.7 on 13 and 142 DF, p-value: < 2.2e-16
residualcheck(model4_3)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals
## LM test = 15.848, df = 17, p-value = 0.5346
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98389, p-value = 0.06624
vif(model4_3)
## x1 L(x1, 1) L(x1, 2) L(x1, 3) x3 L(x3, 1) L(x3, 2)
## 62.36987 135.77906 146.83595 69.45807 20.65614 55.13933 56.58181
## L(x3, 3) L(y, 1) L(y, 2) L(y, 3) L(y, 4) L(y, 5)
## 24.69407 22.97584 43.22008 43.46897 44.27681 21.25863
All the fitted ARDL models were reported to be statistically significant at 5% level. The best model in terms of the significance of coefficients, AIC and adjusted \(R^2\) was marginally ARDL(1,5). According to model summary, it is suggested that the all ords price index might be related to its previous year levels, gold prices of the previous year and copper prices of the previous year as well. The AIC (2086.8) and adjusted \(R^2\) (95%) values are the best compared to those of all previously fitted models.
Residual analysis supports that this model is appropriate: the errors are randomly spread and have no discernible trend, there is no serial autocorrelation in the residuals based on ACF and Beusch-Godfrey test. The normality assumption is not violated at 5% level according to the histogram and the results of Shapiro-Wilk test.
However, all ARDL models fitted suffer from multicollinearity with VIFs > 10.
Overall, we failed to find an appropriate ARDL model in terms of multicollinearity.
To answer the investigation question, the ASX data was firstly explored through visualisation. The series given in the data were described separately before analysing the relationships between them. There was a clear intervention around 2008 found in the dataset which is probably connected with the financial crisis of that year. The series of interest all ords price index was found to have a moderate correlation with the copper price series.
We continued with analysis of nonstationarity by plotting sample ACFs and running ADF tests on the series. After the performed analysis, it was found that there is nonstationarity existent in the data as all series have a trend.
The process was followed by decomposition to explore the impact of the components of time series data on the given dataset. Since there was no seasonal effect found in the dataset prior to decomposition, we made a conclusion that there were other external factors influencing the data.
The most accurate and suitable distributed lag model for the ASX price index was found at the modelling stage. After fitting different candidate models, it was found that Koyck model is the most appropriate for the given data. The model chosen had a high explainability, the diagnostic checking showed there were no major problems in the residuals. Koyck model did not have a multicollinearity effect unlike other promissing models like ARDLs.
In conclusion, based on the analysis performed, we can suggest Koyck model as the most suitable for further use and forecasting.