MATH1307 Assignment 1

Introduction

The aim of the investigation is to perform the analysis of the Australian Securities Exchange (ASX) data. To answer the question, we will explore and elaborate on the following:

The existence of nonstationarity in dataset
The impact of the components of a time series data on the given dataset
The most accurate and suitable distributed lag model for the ASX price index

Data description

The dataset asx used in this investigation contains information about monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) from January 2004 to May 2017.

The following code chunk imports the dataset as well as displays its first observations.

asx = read_csv("ASX_data.csv")

## Parsed with column specification:
## cols(
##   Month = col_character(),
##   `ASX price` = col_double(),
##   `Gold price` = col_number(),
##   `Crude Oil (Brent)_USD/bbl` = col_double(),
##   `Copper_USD/tonne` = col_number()
## )

head(asx)

## # A tibble: 6 x 5
##   Month  `ASX price` `Gold price` `Crude Oil (Brent)_USD… `Copper_USD/tonn…
##   <chr>        <dbl>        <dbl>                   <dbl>             <dbl>
## 1 Jan-04       2935.         612.                    31.3              1650
## 2 Feb-04       2778.         603.                    32.6              1682
## 3 Mar-04       2849.         566.                    30.3              1656
## 4 Apr-04       2971.         539.                    25.0              1588
## 5 May-04       2980.         549.                    25.8              1651
## 6 Jun-04       3000.         536.                    27.6              1685

The following code extracts the variables in our asx data frame and converts them to monthly time series objects.

ords_ts = ts(asx$`ASX price`, start = c(2004, 1), frequency = 12)
gold_ts = ts(asx$`Gold price`, start = c(2004, 1), frequency = 12)
oil_ts = ts(asx$`Crude Oil (Brent)_USD/bbl`, start = c(2004, 1), frequency = 12)
copper_ts = ts(asx$`Copper_USD/tonne`, start = c(2004, 1), frequency = 12)
data = ts(asx[,2:5])

Data exploration and visualisation

We will firstly look at the time series plots and describe each of the series separately. The following code creates a time series plot of the all ords price index.

plot(ords_ts, type = "o", main = "Time Series plot of All Ordinaries price index")

Figure 1

From the plot in Figure 1, we can observe that the All Ords price index series has an upward trend over the given time period, there is no seasonality and no changing variance present in the series. There is also a potential intervention point around 2006 and an intervention around 2008. Successive observations and fluctuations around mean level suggest autoregressive and moving average behaviour.

The following code creates a time series plot of gold price series.

plot(gold_ts, type = "o", main = "Time Series plot of Gold price", ylab = "AUD", xlab="Year")

Figure 2

The plot in Figure 2 shows that there is a general upward trend over the observed time, there is no seasonality and no changing variance in the series. There are no obvious intervention points. Successive observations and fluctuations around mean level imply the existence of autoregressive and moving average behaviour.

The following code creates a time series plot of crude oil price series.

plot(oil_ts, type = "o", main = "Time Series plot of Crude oil price", ylab = "USD", xlab="Year")

Figure 3

From the plot in Figure 3, we can observe that the existence of trend in the series is not obvious. There is no seasonality and no changing variance present in the series. There is an intervention point around 2008. Successive observation points and fluctuations around mean level suggest the existence of autoregressive and moving average behaviour in the series.

The following code produces a time series plot of copper price series.

plot(copper_ts, type = "o", main = "Time Series plot of Copper price", ylab = "USD", xlab="Year")

Figure 4

The plot in Figure 4 shows that there is a general trend in the time series. There is no seasonality and no changing variance present. There is an intervention point observed around 2008, just like in the all ordinaries, oil and copper price series. Successive points as well as fluctuations around mean level suggest autoregressive and moving average behaviour.

To further explore the relationship between the all ords index and our independent series, we display them within the same plot. Standartisation is performed over all variables by centering and scaling to clearly plot them on the same scale.

data_scale = scale(data)
plot(data_scale, plot.type = "s", col = c("black", "palegreen", "coral2", "skyblue2"), main = "Time Series plot of standardised ASX data")
legend("topleft",lty=1, text.width = 15, col=c("black", "palegreen", "coral2", "skyblue2"), c("Ords", "Gold", "Oil", "Copper"))

Figure 5

From the plot in Figure 5, we can observe that at the beginning of the given period all ords index, oil and copper prices had a similar trend, but after the suspected intervention the trend in all ords index gradually became the opposite of oil and copper price series. It is also observed that oil and copper price series are moderately correlated.

The following code creates a correlation matrix for our data.

cor(data)

##                           ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price                 1.0000000  0.3431908                 0.3290338
## Gold price                0.3431908  1.0000000                 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338  0.4366382                 1.0000000
## Copper_USD/tonne          0.5617864  0.5364213                 0.8664296
##                           Copper_USD/tonne
## ASX price                        0.5617864
## Gold price                       0.5364213
## Crude Oil (Brent)_USD/bbl        0.8664296
## Copper_USD/tonne                 1.0000000

It is observed that there is a moderate positive correlation between the dependent series all ords index and copper price series with a correlation coefficient r = 0.56. There is also a strong positive correlation observed between oil and copper price series which supports the inference from the plot in Figure 5 (r = 0.87).

Analysis of nonstationarity

To explore the existence of nonstationarity in the dataset, we create sample ACF plots and conduct an Augmented Dickey-Fuller unit root test to test the \(H_0\) that the process is difference nonstationary. The following code displays ACF plots and the results of ADF tests over all series.

acf(ords_ts)

Figure 6

ar(ords_ts)

## 
## Call:
## ar(x = ords_ts)
## 
## Coefficients:
##       1        2  
##  1.0726  -0.1208  
## 
## Order selected 2  sigma^2 estimated as  67176

adf.test(ords_ts, k = 2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ords_ts
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary

acf(gold_ts)

Figure 7

ar(gold_ts)

## 
## Call:
## ar(x = gold_ts)
## 
## Coefficients:
##      1  
## 0.9803  
## 
## Order selected 1  sigma^2 estimated as  6329

adf.test(gold_ts, k = 1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold_ts
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary

acf(oil_ts)

Figure 8

ar(oil_ts)

## 
## Call:
## ar(x = oil_ts)
## 
## Coefficients:
##       1        2        3  
##  1.2243  -0.1493  -0.1186  
## 
## Order selected 3  sigma^2 estimated as  48.96

adf.test(oil_ts, k = 3)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  oil_ts
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary

acf(copper_ts)

Figure 9

ar(copper_ts)

## 
## Call:
## ar(x = copper_ts)
## 
## Coefficients:
##       1        2  
##  1.1756  -0.2228  
## 
## Order selected 2  sigma^2 estimated as  331275

adf.test(copper_ts, k = 2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_ts
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary

Based on the slowly decaying pattern of significant lags in the sample ACF plots in Figures 6-9, we can conclude that all explored series have a trend. The ADF test reports p-values > 0.05 for all the series, so we fail to reject the \(H_0\) that the series are nonstationary at 5% level.

Overall, from a descriptive analysis of time series plots, sample ACF plots and the ADF test results, we can observe that there is nonstationarity existent in the asx data.

Decomposition

Decomposition of a given series into its trend and seasonal components is another important operation. STL and X12 decomposition approaches were implemented to extract time series components.

stl1 = stl(ords_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl1, main = "STL decomposition of All ords price index series")

Figure 10

fit1 = x12(ords_ts)
plot(fit1, sa = T, trend = T, main = "X12 decomposition of All ords price index series")

Figure 10

From the remainder series of the all ords series in Figure 10, it is observed that the spikes in the raw data are caused by other external factors, they happen around the intervention point. Since we did not find any evidence of seasonality from the time series and ACF plots, the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series.

stl2 = stl(gold_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl2, main = "STL decomposition of Gold price series")

Figure 11

fit2 = x12(gold_ts)
plot(fit2, sa = T, trend = T, main = "X12 decomposition of Gold price series")

Figure 11

There was no seasonal effect found in the gold price series at the data visualisation stage, so the seasonally adjusted data in X12 decomposition is very close to the original series, and the seasonal pattern in STL decomposition is meaningless (Figure 11). The remainder component of this series is not smooth at all, meaning there are other unknown factors that have an impact on the series.

stl3 = stl(oil_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl3, main = "STL decomposition of Oil price series")

Figure 12

fit3 = x12(oil_ts)
plot(fit3, sa = T, trend = T, main = "X12 decomposition of Oil price series")

Figure 12

The oil price series shows the same picture with seasonality as the previously explored series. From the STL decomposition in Figure 12, it is observed that the remainder has a major peak when the intervention happened, but otherwise is rather smooth.

stl4 = stl(copper_ts, t.window = 15, s.window = "periodic", robust = T)
plot(stl4, main = "STL decomposition of Copper price series")

Figure 13

fit4 = x12(copper_ts)
plot(fit4, sa = T, trend = T, main = "X12 decomposition of Copper price series")

Figure 13

The conclusions that we can make from the decomposition of the copper price data are similar to all the other previously analysed series. There are ups and downs around the intervention in the remainder of the series (Figure 13). There was no seasonality found prior to decomposition, so there is no seasonal effect here.

Overall, it is observed that there is no seasonality effect on the asx data, and all the fluctuations are due to other external factors.

Modelling

In the modelling process we attempt to find the best appropriate model for the all ordinaries price index. For the finite and autoregressive distributed lag models we use variables gold and copper series. These predictors were chosen based on their correlation with each other and with the dependent variable. Because oil and copper price series were found to be strongly correlated, we do not use them both for modelling. The copper price series is used for polynomial DLM and Koyck model because it has a stronger correlation with the independent series and is more likely to explain its variation and correlation structure.

Finite DLM

To specify the finite lag length for the model, we use a loop that compares the AIC/BIC values for lags from 1 to 10 and select a model with the lowest such values.

df = asx[,2:5]
colnames(df) <- c("y", "x1", "x2", "x3")
for ( i in 1:20){
  model1.1 = dlm(formula = y ~ x1 + x3, data = data.frame(df), q = i )
  cat("q = ", i, "AIC = ", AIC(model1.1$model), "BIC = ", BIC(model1.1$model),"\n")
}

## q =  1 AIC =  2577.989 BIC =  2596.44 
## q =  2 AIC =  2564.871 BIC =  2589.423 
## q =  3 AIC =  2551.583 BIC =  2582.209 
## q =  4 AIC =  2538.084 BIC =  2574.759 
## q =  5 AIC =  2523.864 BIC =  2566.562 
## q =  6 AIC =  2509.866 BIC =  2558.561 
## q =  7 AIC =  2495.447 BIC =  2550.112 
## q =  8 AIC =  2481.403 BIC =  2542.012 
## q =  9 AIC =  2467.268 BIC =  2533.793 
## q =  10 AIC =  2454.097 BIC =  2526.512 
## q =  11 AIC =  2440.381 BIC =  2518.658 
## q =  12 AIC =  2426.737 BIC =  2510.848 
## q =  13 AIC =  2411.621 BIC =  2501.537 
## q =  14 AIC =  2395.252 BIC =  2490.946 
## q =  15 AIC =  2377.833 BIC =  2479.276 
## q =  16 AIC =  2359.944 BIC =  2467.107 
## q =  17 AIC =  2343.076 BIC =  2455.929 
## q =  18 AIC =  2325.554 BIC =  2444.068 
## q =  19 AIC =  2307.432 BIC =  2431.577 
## q =  20 AIC =  2287.756 BIC =  2417.502

model1 = dlm(formula = y ~ x1 + x3, data = data.frame(df), q = 10)
summary(model1)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1086.02  -639.88    61.36   543.00  1626.62 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.840e+03  2.555e+02  15.033   <2e-16 ***
## x1.t        -6.761e-01  1.266e+00  -0.534    0.594    
## x1.1         3.617e-01  1.839e+00   0.197    0.844    
## x1.2         1.218e-01  1.878e+00   0.065    0.948    
## x1.3         7.260e-02  1.894e+00   0.038    0.969    
## x1.4        -2.534e-01  1.893e+00  -0.134    0.894    
## x1.5         4.528e-01  1.887e+00   0.240    0.811    
## x1.6        -2.478e-01  1.879e+00  -0.132    0.895    
## x1.7         5.263e-01  1.892e+00   0.278    0.781    
## x1.8        -3.294e-01  1.889e+00  -0.174    0.862    
## x1.9         9.934e-02  1.863e+00   0.053    0.958    
## x1.10        1.881e-01  1.243e+00   0.151    0.880    
## x3.t         1.504e-01  1.418e-01   1.061    0.291    
## x3.1         2.232e-02  2.301e-01   0.097    0.923    
## x3.2         2.992e-02  2.315e-01   0.129    0.897    
## x3.3         2.537e-02  2.275e-01   0.112    0.911    
## x3.4         1.595e-02  2.269e-01   0.070    0.944    
## x3.5        -2.686e-02  2.334e-01  -0.115    0.909    
## x3.6         2.257e-02  2.335e-01   0.097    0.923    
## x3.7        -7.365e-04  2.347e-01  -0.003    0.998    
## x3.8        -1.282e-03  2.384e-01  -0.005    0.996    
## x3.9        -4.984e-02  2.345e-01  -0.213    0.832    
## x3.10       -7.532e-02  1.470e-01  -0.512    0.609    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 758.2 on 128 degrees of freedom
## Multiple R-squared:  0.2145, Adjusted R-squared:  0.07954 
## F-statistic: 1.589 on 22 and 128 DF,  p-value: 0.05855
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2454.097 2526.512

vif(model1$model)

##      x1.t      x1.1      x1.2      x1.3      x1.4      x1.5      x1.6 
##  59.59233 127.47872 134.39220 138.52489 140.37920 141.41274 142.08644 
##      x1.7      x1.8      x1.9     x1.10      x3.t      x3.1      x3.2 
## 145.58579 146.08078 142.23279  63.30170  18.58221  50.64862  53.07056 
##      x3.3      x3.4      x3.5      x3.6      x3.7      x3.8      x3.9 
##  53.02948  54.55691  59.57562  61.50293  64.07991  67.86182  67.31767 
##     x3.10 
##  27.10904

residualcheck = function(x){
  checkresiduals(x)
  bgtest(x)
  shapiro.test(x$residuals)
}
residualcheck(model1$model)

Figure 14

## 
##  Breusch-Godfrey test for serial correlation of order up to 26
## 
## data:  Residuals
## LM test = 140.85, df = 26, p-value < 2.2e-16

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.94326, p-value = 8.803e-06

It is observed that the values of information criteria decrease as the lag q increases, so we will attempt to fit a finite DLM with a number of lags = 10.

According to the significance tests of model coefficients obtained from the summary, all lag weights of predictors are not statistically significant at 5% level. Following this inference, the adjusted \(R^2\) is reported to be about 8% which is very low. F-test of the overall significance of the model reports the model is not statistically significant at 5% level (p-value > 0.05). Therefore, we can conclude that the model is not a good fit to the data.

VIF values are reported > 10 so the effect of multicollinearity is high.

The residualcheck function was created to apply a diagnostic check in a dynamic way. It displays residual analysis plots as well as performs the Breusch-Godfrey test of serial correlation and the Shapiro-Wilk normality test of the residuals.

From looking at the diagnostic check plots in Figure 14, we can observe that the residuals are not randomly distributed and clearly have a trend. ACF plot shows that there is serial correlation in the residuals, the Beusch-Godfrey test supports that at 5% level of significance. The histogram and Shapiro-Wilk (p-value < 0.05) test report that the normality of the residuals does not hold.

Overall, we can conclude that the finite DLM of lag 10 is not appropriate for further analysis.

Polynomial DLM

To deal with multicollinearity problem in the finite DLM, we will attempt to use polynomial curves to restrict lag weights. We specify the optimal lag length using a function that fits finite DLMs for a range of lag lengths from 1 to 10 and orders the fitted models according to their AIC values.

The following code applies the polynomial DLM with all ordinaries price index and copper price series as well as performs residual analysis.

finiteDLMauto(x = as.vector(df$x3), y = as.vector(df$y), q.min = 1, q.max = 10, k.order = 1,
              model.type = "poly", error.type ="AIC", trace = TRUE)

##     q - k    MASE      AIC      BIC R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466  0.17549         0
## 9   9 - 1 4.00168 2435.953 2448.048  0.19025         0
## 8   8 - 1 4.03309 2453.342 2465.464  0.20244         0
## 7   7 - 1 4.07118 2470.649 2482.797  0.21326         0
## 6   6 - 1 4.09428 2488.104 2500.277  0.22488         0
## 5   5 - 1 4.11367 2505.646 2517.846  0.23776         0
## 4   4 - 1 4.15203 2523.100 2535.325  0.25039         0
## 3   3 - 1 4.19205 2540.208 2552.459  0.26371         0
## 2   2 - 1 4.21449 2557.367 2569.643  0.27856         0
## 1   1 - 1 4.24262 2574.488 2586.789  0.29391         0

model2 = polyDlm(x = as.vector(df$x3), y = as.vector(df$y), q = 10, k = 1, show.beta = T)

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0   0.07360    0.01440   5.120 1.01e-06
## beta.1   0.06140    0.01170   5.240 5.82e-07
## beta.2   0.04920    0.00911   5.410 2.67e-07
## beta.3   0.03710    0.00658   5.640 8.95e-08
## beta.4   0.02490    0.00428   5.820 3.74e-08
## beta.5   0.01280    0.00287   4.450 1.73e-05
## beta.6   0.00062    0.00360   0.172 8.64e-01
## beta.7  -0.01150    0.00570  -2.020 4.48e-02
## beta.8  -0.02370    0.00817  -2.900 4.35e-03
## beta.9  -0.03580    0.01080  -3.330 1.11e-03
## beta.10 -0.04800    0.01340  -3.580 4.74e-04

summary(model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1214.62  -619.46    25.16   588.44  1479.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.034e+03  2.040e+02  19.775  < 2e-16 ***
## z.t0         7.356e-02  1.438e-02   5.115 9.57e-07 ***
## z.t1        -1.216e-02  2.721e-03  -4.467 1.56e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 717.6 on 148 degrees of freedom
## Multiple R-squared:  0.1865, Adjusted R-squared:  0.1755 
## F-statistic: 16.96 on 2 and 148 DF,  p-value: 2.329e-07

vif(model2$model)

##     z.t0     z.t1 
## 25.86069 25.86069

residualcheck(model2$model)

Figure 15

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 138.14, df = 10, p-value < 2.2e-16

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.94724, p-value = 1.828e-05

Like in the finite DLM fitting, the lowest AIC and BIC measures in the given range are for q = 10. We set the order of polynomial to 1 as it minimises the information criteria.

According to the model summary, all lag weights are significant at 5% level except lag 6 (p-value > 0.05). The adjusted \(R^2\) = 17.5% is slightly better than the finite DLM but still very low. The overall significance test reports the model is statistically significant at 5% level.

VIF values are > 10 and suggest there is still multicollinearity effect on this model.

Diagnostic checking in Figure 15 shows that the residuals are not randomly spread. There are a lot of highly significant lags in the ACF plot, so there is autocorrelation present in the residuals. That is also supported by Beusch-Godfrey test at 5% level of significance. The normality of the residuals is also violated, as observed from the histogram and Shapiro-Wilk normality test report (p-value < 0.05).

We conclude that the polynomial DLM of lag 10 is not appropriate for further analysis.

Koyck model

The following code implements Koyck DLM with the all ordinaries price index and copper price series as well as performs diagnostic checking.

model3 = koyckDlm(x = as.vector(df$x3), y = as.vector(df$y))
summary(model3$model, diagnostics = T)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 157   1966.87 < 2e-16 ***
## Wu-Hausman         1 156     10.97 0.00115 ** 
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16

vif(model3$model)

##      Y.1      X.t 
## 1.493966 1.493966

residualcheck(model3$model)

Figure 16

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.96188, p-value = 0.0002204

attr(model3$model,"class") = "lm"
AIC(model3$model)

## [1] 2157.456

From the model summary, we can conclude that the coefficient \(\delta_3\) is not statistically significant at 5% level, but \(\delta_2\) is reported to be significant. Koyck model is reported to be overall statistically significant at 5% level (p-value < 0.05) and its \(R^2\) = 94.8% which is a great improvement of model explainability.

According to the Weak instruments test, the model at the first stage of least-squares estimation is significant at 5% level. From the Wu-Hausman test, we can conclude that there is significant correlation between the explanatory variable and the error term at 5% level.

There is no issue with multicollinearity with all VIFs < 10. The AIC measure is reported to be 2157.5 which is lower than the finite and polynomial DLMs.

From the residual analysis in Figure 16, it is observed that the errors are spread randomly as desired. There are no significant lags in the ACF plot which suggests there is no serial correlation in the residuals. However, the error terms are not perfectly normal. The histogram of the residuals seems left-skewed, and the Shapiro-Wilk normality test suggests not normal residuals (p-value < 0.05).

Autoregressive DLM

We attempt to fit autoregressive DLMs in order to find a more suitable model than the Koyck model.

For specifying the parameters of ARDL(p,q), we use a loop that fits autoregressive DLMs for a range of lag lengths and orders of the AR process and fit the models that minimise the infrormation criteria.

Based on the information criteria, we select the following models:

ARDL(1,5)
ARDL(2,5)
ARDL(3,5)

The following code apllies candidate models and performs diagnostic checking.

for (i in 1:5){
  for(j in 1:5){
    model1.2 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model1.2$model), "BIC = ", BIC(model1.2$model),"\n")
  }
}

## p =  1 q =  1 AIC =  2135.427 BIC =  2156.954 
## p =  1 q =  2 AIC =  2123.2 BIC =  2147.752 
## p =  1 q =  3 AIC =  2108.165 BIC =  2135.728 
## p =  1 q =  4 AIC =  2097.165 BIC =  2127.728 
## p =  1 q =  5 AIC =  2086.767 BIC =  2120.315 
## p =  2 q =  1 AIC =  2119.917 BIC =  2147.537 
## p =  2 q =  2 AIC =  2121.523 BIC =  2152.212 
## p =  2 q =  3 AIC =  2107.82 BIC =  2141.508 
## p =  2 q =  4 AIC =  2096.569 BIC =  2133.244 
## p =  2 q =  5 AIC =  2086.252 BIC =  2125.9 
## p =  3 q =  1 AIC =  2109.717 BIC =  2143.406 
## p =  3 q =  2 AIC =  2110.906 BIC =  2147.657 
## p =  3 q =  3 AIC =  2109.982 BIC =  2149.795 
## p =  3 q =  4 AIC =  2098.603 BIC =  2141.39 
## p =  3 q =  5 AIC =  2088.342 BIC =  2134.09 
## p =  4 q =  1 AIC =  2100.046 BIC =  2139.777 
## p =  4 q =  2 AIC =  2101.141 BIC =  2143.929 
## p =  4 q =  3 AIC =  2100.586 BIC =  2146.43 
## p =  4 q =  4 AIC =  2102.043 BIC =  2150.943 
## p =  4 q =  5 AIC =  2091.733 BIC =  2143.58 
## p =  5 q =  1 AIC =  2089.626 BIC =  2135.374 
## p =  5 q =  2 AIC =  2090.472 BIC =  2139.27 
## p =  5 q =  3 AIC =  2089.878 BIC =  2141.726 
## p =  5 q =  4 AIC =  2091.428 BIC =  2146.326 
## p =  5 q =  5 AIC =  2092.86 BIC =  2150.807

model4_1 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 1, q = 5)$model
summary(model4_1)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -608.69 -108.99   -1.87   96.72  697.71 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 204.195354  91.833430   2.224 0.027715 *  
## x1.t         -1.168101   0.298139  -3.918 0.000137 ***
## x1.1          1.175789   0.293809   4.002 9.96e-05 ***
## x3.t          0.094982   0.032049   2.964 0.003551 ** 
## x3.1         -0.096433   0.032189  -2.996 0.003216 ** 
## y.1           0.932523   0.078420  11.891  < 2e-16 ***
## y.2           0.184956   0.111691   1.656 0.099876 .  
## y.3          -0.094300   0.109515  -0.861 0.390613    
## y.4          -0.063908   0.109287  -0.585 0.559605    
## y.5           0.002755   0.077303   0.036 0.971618    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 187.2 on 146 degrees of freedom
## Multiple R-squared:  0.9531, Adjusted R-squared:  0.9502 
## F-statistic: 329.5 on 9 and 146 DF,  p-value: < 2.2e-16

residualcheck(model4_1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 13
## 
## data:  Residuals
## LM test = 11.315, df = 13, p-value = 0.5844

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98623, p-value = 0.1257

vif(model4_1)

##       x1 L(x1, 1)       x3 L(x3, 1)  L(y, 1)  L(y, 2)  L(y, 3)  L(y, 4) 
## 60.17390 59.02863 18.45786 19.20054 19.61994 40.62230 39.99994 40.92965 
##  L(y, 5) 
## 20.93150

model4_2 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 2, q = 5)$model
summary(model4_2)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -590.34 -108.63    0.29   93.85  705.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.953e+02  9.127e+01   2.139 0.034081 *  
## x1.t        -1.148e+00  2.966e-01  -3.868 0.000165 ***
## x1.1         1.270e+00  4.357e-01   2.915 0.004131 ** 
## x1.2        -1.007e-01  3.081e-01  -0.327 0.744282    
## x3.t         7.565e-02  3.355e-02   2.255 0.025665 *  
## x3.1        -1.200e-02  5.335e-02  -0.225 0.822345    
## x3.2        -6.907e-02  3.449e-02  -2.003 0.047097 *  
## y.1          9.092e-01  8.324e-02  10.923  < 2e-16 ***
## y.2          1.866e-01  1.113e-01   1.677 0.095794 .  
## y.3         -5.901e-02  1.129e-01  -0.523 0.601953    
## y.4         -7.267e-02  1.087e-01  -0.668 0.505049    
## y.5          9.629e-04  7.673e-02   0.013 0.990005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared:  0.9544, Adjusted R-squared:  0.9509 
## F-statistic: 274.1 on 11 and 144 DF,  p-value: < 2.2e-16

residualcheck(model4_2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 14.041, df = 15, p-value = 0.5224

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98426, p-value = 0.07341

vif(model4_2)

##        x1  L(x1, 1)  L(x1, 2)        x3  L(x3, 1)  L(x3, 2)   L(y, 1) 
##  60.48199 131.76923  66.52833  20.53927  53.54225  23.07222  22.44197 
##   L(y, 2)   L(y, 3)   L(y, 4)   L(y, 5) 
##  40.96126  43.14764  41.14631  20.93494

model4_3 = ardlDlm(formula = y ~ x1 + x3, data = data.frame(df), p = 3, q = 5)$model
summary(model4_3)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -579.26 -106.14   -3.19  100.67  708.25 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 192.817278  91.367906   2.110 0.036582 *  
## x1.t         -1.078264   0.301502  -3.576 0.000477 ***
## x1.1          1.170990   0.442628   2.646 0.009075 ** 
## x1.2          0.245783   0.458093   0.537 0.592429    
## x1.3         -0.308971   0.313521  -0.985 0.326060    
## x3.t          0.072304   0.033677   2.147 0.033495 *  
## x3.1         -0.011834   0.054183  -0.218 0.827427    
## x3.2         -0.038084   0.054057  -0.705 0.482270    
## x3.3         -0.029416   0.035206  -0.836 0.404817    
## y.1           0.896281   0.084295  10.633  < 2e-16 ***
## y.2           0.207550   0.114437   1.814 0.071841 .  
## y.3          -0.060117   0.113402  -0.530 0.596856    
## y.4          -0.076059   0.112909  -0.674 0.501638    
## y.5          -0.002674   0.077384  -0.035 0.972480    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 185.9 on 142 degrees of freedom
## Multiple R-squared:  0.955,  Adjusted R-squared:  0.9509 
## F-statistic: 231.7 on 13 and 142 DF,  p-value: < 2.2e-16

residualcheck(model4_3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals
## LM test = 15.848, df = 17, p-value = 0.5346

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98389, p-value = 0.06624

vif(model4_3)

##        x1  L(x1, 1)  L(x1, 2)  L(x1, 3)        x3  L(x3, 1)  L(x3, 2) 
##  62.36987 135.77906 146.83595  69.45807  20.65614  55.13933  56.58181 
##  L(x3, 3)   L(y, 1)   L(y, 2)   L(y, 3)   L(y, 4)   L(y, 5) 
##  24.69407  22.97584  43.22008  43.46897  44.27681  21.25863

All the fitted ARDL models were reported to be statistically significant at 5% level. The best model in terms of the significance of coefficients, AIC and adjusted \(R^2\) was marginally ARDL(1,5). According to model summary, it is suggested that the all ords price index might be related to its previous year levels, gold prices of the previous year and copper prices of the previous year as well. The AIC (2086.8) and adjusted \(R^2\) (95%) values are the best compared to those of all previously fitted models.

Residual analysis supports that this model is appropriate: the errors are randomly spread and have no discernible trend, there is no serial autocorrelation in the residuals based on ACF and Beusch-Godfrey test. The normality assumption is not violated at 5% level according to the histogram and the results of Shapiro-Wilk test.

However, all ARDL models fitted suffer from multicollinearity with VIFs > 10.

Overall, we failed to find an appropriate ARDL model in terms of multicollinearity.

Conclusion

To answer the investigation question, the ASX data was firstly explored through visualisation. The series given in the data were described separately before analysing the relationships between them. There was a clear intervention around 2008 found in the dataset which is probably connected with the financial crisis of that year. The series of interest all ords price index was found to have a moderate correlation with the copper price series.

We continued with analysis of nonstationarity by plotting sample ACFs and running ADF tests on the series. After the performed analysis, it was found that there is nonstationarity existent in the data as all series have a trend.

The process was followed by decomposition to explore the impact of the components of time series data on the given dataset. Since there was no seasonal effect found in the dataset prior to decomposition, we made a conclusion that there were other external factors influencing the data.

The most accurate and suitable distributed lag model for the ASX price index was found at the modelling stage. After fitting different candidate models, it was found that Koyck model is the most appropriate for the given data. The model chosen had a high explainability, the diagnostic checking showed there were no major problems in the residuals. Koyck model did not have a multicollinearity effect unlike other promissing models like ARDLs.

In conclusion, based on the analysis performed, we can suggest Koyck model as the most suitable for further use and forecasting.