1 Introduction

The dataset includes monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004.

The following task aims to do analysis on the data exploring and elaborating on:

The existence of nonstationarity in dataset,
The impact of the components of a time series data on the given dataset
The most accurate and suitable distributed lag model for the ASX price index.

ASX_data <- read_excel("ASX_data.xlsx")
head(ASX_data)

## # A tibble: 6 x 4
##   `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
##         <dbl>        <dbl>                       <dbl>              <dbl>
## 1       2935.         612.                        31.3               1650
## 2       2778.         603.                        32.6               1682
## 3       2849.         566.                        30.3               1656
## 4       2971.         539.                        25.0               1588
## 5       2980.         549.                        25.8               1651
## 6       3000.         536.                        27.6               1685

# Convert data into a time series object
asx.ts = ts(ASX_data$`ASX price`, start = c(2004, 1), frequency = 12)
gold.ts = ts(ASX_data$`Gold price`, start = c(2004, 1), frequency = 12)
oil.ts = ts(ASX_data$`Crude Oil (Brent)_USD/bbl`, start = c(2004, 1), frequency = 12)
copper.ts = ts(ASX_data$`Copper_USD/tonne`, start = c(2004, 1), frequency = 12)

2 Exploratory Data Analysis

2.1 Descriptive Statistics

summary(ASX_data)

##    ASX price      Gold price     Crude Oil (Brent)_USD/bbl Copper_USD/tonne
##  Min.   :2778   Min.   : 520.9   Min.   : 25.02            Min.   :1588    
##  1st Qu.:4325   1st Qu.: 825.9   1st Qu.: 47.23            1st Qu.:4642    
##  Median :4929   Median :1363.6   Median : 70.80            Median :6650    
##  Mean   :4813   Mean   :1202.0   Mean   : 73.63            Mean   :5989    
##  3rd Qu.:5448   3rd Qu.:1546.8   3rd Qu.:106.98            3rd Qu.:7623    
##  Max.   :6779   Max.   :1776.3   Max.   :133.90            Max.   :9881

2.2 Time Series Plot

par(mfrow=c(2,2))
# ASX Price
plot(asx.ts, ylab='Price', xlab = 'Year', type='o', main = "ASX Price Series")
# Gold
plot(gold.ts, ylab='Price (AUD)', xlab = 'Year', type='o', main = "Gold Price Series")
# Crude Oil
plot(oil.ts, ylab='Price (USD/bbl)', xlab = 'Year', type='o', main = "Crude Oil Price Series")
# Copper
plot(copper.ts, ylab='Price (USD/tonne)', xlab = 'Year', type='o', main = "Copper Price Series")

Based on the inspection of the above Time-Series plot, there are several observations that can be seen as follows:

For ASX Price Series

Trend: All Ordinaries price index series has an upward trend over year 2004 to 2017.
Seasonality: There are no seasonality present in the series.
Variance: There are no changing variance present in the series.
Behaviour: It is clear that there are successive fluctuations around the mean level which indicates an autoregressive and moving average behaviour.
Intervention Points: There are potential intervention points around 2006 and 2008.

For Gold Price Series

Trend: The Gold price series has a general upward trend over year 2004 to 2017.
Seasonality: There are no seasonality present in the series.
Variance: There are no changing variance present in the series.
Behaviour: It is clear that there are successive fluctuations around the mean level which indicates an autoregressive and moving average behaviour.
Intervention Points: There are no obvious intervention points seen.

For Crude Oil Series

Trend: The Crude Oil price series over the year 2004 to 2017 has no obvious trend.
Seasonality: There are no seasonality present in the series.
Variance: There are no changing variance present in the series.
Behaviour: It is clear that there are successive fluctuations around the mean level which indicates an autoregressive and moving average behaviour.
Intervention Points: There is an intervention point around 2008.

For Copper Price Series

Trend: The Copper price series over the year 2004 to 2017 has a general trend.
Seasonality: There are no seasonality present in the series.
Variance: There are no changing variance present in the series.
Behaviour: It is clear that there are successive fluctuations around the mean level which indicates an autoregressive and moving average behaviour.
Intervention Points: There is an intervention point around 2008.

2.3 Scaled Plot

In order to visualize all the data in the same plot, scaling and centering is done in all the series to plot them on the same scale as follow

data.ts = ts(ASX_data[,1:4])
# scale the data
data.scaled = scale(data.ts)
plot(data.scaled, plot.type='s', col=c('blue', 'red', 'green', 'black'), 
     main='Standardized ASX Series', ylab='Scaled Price', xlab='Time')
legend('topleft',lty=1,text.width=17, col=c('blue', 'red', 'green', 'black'),
       c('All Ords', 'Gold', 'Oil', 'Copper'))

Based on the figure above, all the series seemed to follow a similar trend from the range 0 to 50. After the time 50, the trend of All Ordinaries (Ords) Price Index begin to trend towards opposite of Crude Oil and Copper Price Series. Since Copper and Oil Price are following the same trend, it is suspected that the two price series are correlated. Therefore, to check for these correlation, scatterplot matrix and correlation matrix is used.

2.4 Correlation

We now display the scatterplot matrix and the correlation matrix to check for multicolinearity.

df_corr <- ASX_data[,1:4]
# rename the columns
colnames(df_corr) <- c('asxPrice', 'goldPrice', 'crudeOilPrice', 'copperPrice')
df_corr.ts = ts(df_corr[,1:4])

# function fot panel.cor & upper.pannel
# can be found in Appendix
# Create the plot
pairs(df_corr.ts, 
      lower.panel = panel.cor,
      upper.panel = upper.panel)

cor(df_corr.ts)

##                asxPrice goldPrice crudeOilPrice copperPrice
## asxPrice      1.0000000 0.3431908     0.3290338   0.5617864
## goldPrice     0.3431908 1.0000000     0.4366382   0.5364213
## crudeOilPrice 0.3290338 0.4366382     1.0000000   0.8664296
## copperPrice   0.5617864 0.5364213     0.8664296   1.0000000

Based on the plot and the correlation matrix above, it is observed that a moderate positive correlation is found between the ASX Price and Copper Price with r=0.5617864. It is also observed that a strong positive correlation is found between Copper Price and Crude Oil Price. Furthermore, amongst the four variables, ASX Price is considered to be the dependent variable. Hence, the gold, crude oil, and copper price series is used as the predictor variables for the ASX price series.

3 Stationary Test

3.1 Autocorrelation and Partial Autocorrelation Plots

3.1.1 ASX Price

par(mfrow=c(1,2))
acf(asx.ts, main="ACF of ASX Price Index")
pacf(asx.ts, main="PACF of ASX Price Index")

3.1.2 Gold

par(mfrow=c(1,2))
acf(gold.ts, main="ACF of Gold Price ")
pacf(gold.ts, main="PACF of Gold Price ")

3.1.3 Crude Oil

par(mfrow=c(1,2))
acf(gold.ts, main="ACF of Crude Oil Price ")
pacf(gold.ts, main="PACF of Crude Oil Price")

3.1.4 Copper

par(mfrow=c(1,2))
acf(copper.ts, main="ACF of Copper Price ")
pacf(copper.ts, main="PACF of Copper Price ")

As a summary for all the above ACF and PACF plots, there are slowly decaying pattern of significant lags in the ACF plot on all the series. This indicates all the ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) have a trend.

3.2 Augmented Dickey-Fuller test

To test whether a time series is stationary is to perform an Augmented Dickey-Fuller (ADF) test with the following hypothesis:

\(H_0\): The series is non-stationary
\(H_A\): The series is stationary

3.2.1 ASX Price

asx.k = ar(asx.ts)$order
adf.test(asx.ts, k=asx.k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  asx.ts
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary

Since the p-value is not less than .05, we fail to reject the null hypothesis. The series are non-stationary at 5% level. This means the time series for the ASX Price Index is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

3.2.2 Gold

gold.k = ar(gold.ts)$order
adf.test(gold.ts, k = gold.k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold.ts
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary

Since the p-value is not less than .05, we fail to reject the null hypothesis. The series are non-stationary at 5% level. This means the time series for the Gold Price is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

3.2.3 Crude Oil

oil.k = ar(oil.ts)$order
adf.test(oil.ts, k = oil.k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  oil.ts
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary

Since the p-value is not less than .05, we fail to reject the null hypothesis. The series are non-stationary at 5% level. This means the time series for the Oil Price is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

3.2.4 Copper

copper.k = ar(copper.ts)$order
adf.test(copper.ts, k = copper.k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper.ts
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary

Since the p-value is not less than .05, we fail to reject the null hypothesis. The series are non-stationary at 5% level. This means the time series for the Copper Price is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

4 Composition of Time Series

By looking at the composition of the time series, it would show its individual and historical effects on the existing components.

First, the lambda values will be displayed as follows in order to decide whether a series require a transformation.

\(\lambda\) = -1. is a reciprocal transform.
\(\lambda\) = -0.5 is a reciprocal square root transform.
\(\lambda\) = 0.0 is a log transform.
\(\lambda\) = 0.5 is a square root transform.
\(\lambda\) = 1.0 is no transform.

This process will be followed by differencing of the time series before decomposition.

#Lambda values
lambda.asx = BoxCox.lambda(asx.ts) 
lambda.gold = BoxCox.lambda(gold.ts) 
lambda.oil = BoxCox.lambda(oil.ts) 
lambda.copper = BoxCox.lambda(copper.ts)
cbind(lambda.asx, lambda.gold, lambda.oil, lambda.copper)

##      lambda.asx lambda.gold lambda.oil lambda.copper
## [1,]   1.999924    0.976695 -0.8304136     0.9336783

Looking at the above results, it is concluded that only ASX price and Crude Oil price series requires transformation. This because the lambda value for ASX price is approximately 2 which would require a transformation to the 2nd order (i.e. y^2). Other than that, lambda value for Crude Oil is -0.8304 which would require a transformation of a negative order. For Gold and Copper price series does not require any transformation because they both have lambda values very close to 1.

4.1 Transformation

4.1.1 ASX Price

asx.ts.BC = ((asx.ts^(lambda.asx))-1)/lambda.asx
plot(asx.ts.BC, ylab='Price', xlab = 'Year', type='o', main = "Box-Cox Transformed ASX Price Series")

4.1.2 Crude Oil

oil.ts.BC = ((oil.ts^(lambda.oil))-1)/lambda.oil
plot(oil.ts.BC, ylab='Price', xlab = 'Year', type='o', main = "Box-Cox Transformed Oil Price Series")

Based on the plot above, it is observed that both the series are still not stationary after Box-Cox transformations has been applied. Hence, a further differencing has to be done

4.2 Differencing

Using the transformed series (i.e. ASX and Crude Oil Price series) and the untransformed series (i.e. Gold and Copper Price series). First order differencing is applied to the all four series as follows.

# First order differencing
asx.ts.BC.D1 <- diff(asx.ts.BC, differences = 1)
oil.ts.BC.D1 <- diff(oil.ts.BC, differences = 1)
gold.ts.D1 <- diff(gold.ts, differences = 1)
copper.ts.D1 <- diff(copper.ts, differences = 1)

par(mfrow=c(2,2))
# ASX Price Box-Cox and Differencing
plot(asx.ts.BC.D1, ylab='Price', xlab = 'Year', type='o', main = "ASX Price 1st Order Diff")
# Gold
plot(gold.ts.D1, ylab='Price (AUD)', xlab = 'Year', type='o', main = "Gold Price 1st Order Diff")
# Crude Oil Box-Cox and Differencing
plot(oil.ts.BC.D1, ylab='Price (USD/bbl)', xlab = 'Year', type='o', main = "Crude Oil Price 1st Order Diff")
# Copper
plot(copper.ts.D1, ylab='Price (USD/tonne)', xlab = 'Year', type='o', main = "Copper Price 1st Order Diff")

Based on the above results, all four series are now stationary. To validate this, a further test is done using the Augmented Dickey-Fuller (ADF) test on all the series.

# ADF test for ASX price
adf.test(asx.ts.BC.D1)

FALSE 
FALSE   Augmented Dickey-Fuller Test
FALSE 
FALSE data:  asx.ts.BC.D1
FALSE Dickey-Fuller = -4.4343, Lag order = 5, p-value = 0.01
FALSE alternative hypothesis: stationary

# ADF test for Gold price
adf.test(gold.ts.D1)

FALSE 
FALSE   Augmented Dickey-Fuller Test
FALSE 
FALSE data:  gold.ts.D1
FALSE Dickey-Fuller = -5.8718, Lag order = 5, p-value = 0.01
FALSE alternative hypothesis: stationary

# ADF test for Oil price
adf.test(oil.ts.BC.D1)

FALSE 
FALSE   Augmented Dickey-Fuller Test
FALSE 
FALSE data:  oil.ts.BC.D1
FALSE Dickey-Fuller = -5.5931, Lag order = 5, p-value = 0.01
FALSE alternative hypothesis: stationary

# ADF test for Copper price
adf.test(copper.ts.D1)

FALSE 
FALSE   Augmented Dickey-Fuller Test
FALSE 
FALSE data:  copper.ts.D1
FALSE Dickey-Fuller = -5.478, Lag order = 5, p-value = 0.01
FALSE alternative hypothesis: stationary

Since the p-value is less than .05, it it statistically significant. There is sufficient evidence to reject the null hypothesis. All four series at the 5% significance level are stationary.

5 Decomposition

To understand the effect of trend and seasonality of the series, a decomposition can be done on each of the price series using x12 and stl decompositions. There are two types of composition of a time series which are:

Additive Model:

\(y(t) = Level + Trend + Seasonality + Remainder\)

Multiplicative Model:

\(y(t) = Level\) x \(Trend\) x \(Seasonality\) x \(Remainder\)

5.1 ASX Price

par(mfrow=c(1,1))
stl_all <- stl(asx.ts, t.window=15, s.window="periodic", robust=TRUE)
plot(stl_all, main = "STL Decomposition of All Ordinaries Price Index")

asx.x12 = x12(asx.ts)
plot(asx.x12 , sa=TRUE , trend=TRUE, main="X12 decomposition of All ords price index series", forecast=T)

Based on the above plots for the ASX Price series, it is observed that the main fluctuations from the raw data are due the noise in the series, they happen around the intervention point. From the time series and ACF plot, there is no evidence of seasonality, hence, it is safe to say that the seasonal component in the STL decomposition does not convey any information. From the X12 decomposition above, the Seasonally Adjusted series is close to the original data series.

5.2 Gold

stl_gold = stl(gold.ts, t.window = 15, s.window = "periodic", robust = TRUE)
plot(stl_gold, main = "STL decomposition of Gold Price")

gold.x12 = x12(gold.ts)
plot(gold.x12 , sa=TRUE , trend=TRUE, main="X12 decomposition of Gold Price", forecast=T)

From the STL decomposition, the remainder in the series is not smooth, impling that there are external factors which is affecting the series which is at the point of intervention. Similar to the asx price series, it is observed from the time series and ACF plot that there are no seasonality found in the gold price series and observed that the main fluctuations from the raw data are due the noise in the series. Hence, it is safe to say that the seasonal component in the STL decomposition does not convey any information. From the X12 decomposition above, the Seasonally Adjusted series is close to the original data series.

5.3 Crude Oil

stl_oil = stl(oil.ts, t.window = 15, s.window = "periodic", robust = TRUE)
plot(stl_oil, main = "STL decomposition of Oil Price")

oil.x12 = x12(oil.ts)
plot(oil.x12 , sa=TRUE , trend=TRUE, main="X12 decomposition of Crude Oil Price", forecast=T)

From the STL decomposition, the remainder in the series is not smooth, impling that there are external factors which is affecting the series which is at the point of intervention. Similar to the other two series above, it is observed from the time series and ACF plot that there are no seasonality found in the gold price series and observed that the main fluctuations from the raw data are due the noise in the series. Hence, it is safe to say that the seasonal component in the STL decomposition does not convey any information. From the X12 decomposition above, the Seasonally Adjusted series is close to the original data series.

5.4 Copper

stl_copper = stl(copper.ts, t.window = 15, s.window = "periodic", robust = TRUE)
plot(stl_copper, main = "STL decomposition of Copper Price")

copper.x12 = x12(copper.ts)
plot(copper.x12 , sa=TRUE , trend=TRUE, main="X12 decomposition of Copper Price", forecast=T)

sFrom the STL decomposition, the remainder in the series is not smooth, impling that there are external factors which is affecting the series which is at the point of intervention corresponds to the raw data in the Crude Oil price series. Similar to all the price series above, it is observed from the time series and ACF plot that there are no seasonality found in the gold price series and observed that the main fluctuations from the raw data are due the noise in the series. Hence, it is safe to say that the seasonal component in the STL decomposition does not convey any information. From the X12 decomposition above, the Seasonally Adjusted series is close to the original data series.

6 Model Building and Diagnostics

From the scatterplat matrix and correlation coefficient, Crude Oil and Copper Series are strongly correlated, therefore only one of them is chosen for modelling when considering all the predictors. Furthermore, based on the correlation matrix above, with the Copper Price series having the strongest correlation with All Ordinaries (Ords) Price series (i.e independent series) amongst the other series, it will be a the main consideration for polynomial DLM and Koyck models.

6.1 Finite DLM

The model building is started using a finite Distribution Lag Model (DLM). In finite distributed lag model, the lag length is chosen and deal with multicollinearity. To specify the finite lag for the model, using the function finiteDLMauto() to select the value for lags with minimum of lag 1 and maximum of lag 10. Using this function, it will select q values based on the lowest AIC and BIC scores as follows

df_model <- ASX_data[,1:4]
# rename the columns
colnames(df_model) <- c('asxPrice', 'goldPrice', 'oilPrice', 'copperPrice')

# for AIC
finiteDLMauto(x=as.vector(df_model$goldPrice), y=as.vector(df_model$asxPrice), q.min=1, q.max=10,
              model.type = 'dlm', error.type = "AIC", trace=TRUE)

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

# For BIC
finiteDLMauto(x=as.vector(df_model$goldPrice), y=as.vector(df_model$asxPrice), q.min=1, q.max=10,
              model.type = 'dlm', error.type = "BIC", trace=TRUE)

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

From the above loops, q = 10 shows the lowest AIC and BIC scores for gold price vs asx price, q = 10 can be used to compare all the AIC and BIC values of all possible models.

# Individual series vs asxPrice series
goldModel = dlm( x = as.vector(df_model$goldPrice) , y = as.vector(df_model$asxPrice), q = 10)$model
oilModel = dlm( x = as.vector(df_model$oilPrice) , y = as.vector(df_model$asxPrice), q = 10)$model
copperModel = dlm( x = as.vector(df_model$copperPrice) , y = as.vector(df_model$asxPrice), q = 10)$model

# summission of gold and copper predictors
goldCopperModel = dlm(formula = asxPrice ~ goldPrice + copperPrice, data = data.frame(df_model), q = 10)$model
# summission of all the predictors
allSeriesModel = dlm(formula = asxPrice ~ goldPrice + oilPrice + copperPrice, data = data.frame(df_model), q = 10)$model

# compare AIC and BIC of all models
all_AIC = AIC(goldModel, oilModel, copperModel, goldCopperModel, allSeriesModel)
all_BIC = BIC(goldModel, oilModel, copperModel, goldCopperModel, allSeriesModel)

sort.score(all_AIC, 'aic')

##                 df      AIC
## copperModel     13 2436.164
## allSeriesModel  35 2440.112
## goldCopperModel 24 2454.097
## oilModel        13 2459.842
## goldModel       13 2460.345

sort.score(all_BIC, 'bic')

##                 df      BIC
## copperModel     13 2475.389
## oilModel        13 2499.066
## goldModel       13 2499.570
## goldCopperModel 24 2526.512
## allSeriesModel  35 2545.716

Based on the above results, it is observed that as the number lag (i.e. q) increases, the values of information criteria decreases. Therefore, the Copper price series is chosen as the independent series for the finite DLM model and compared with the dependent series (ASX price series). The model is implemented as follows

modelcopperDLM = dlm( x = as.vector(copper.ts) , y = as.vector(asx.ts), q = 10 )$model
summary(modelcopperDLM)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1154.09  -643.75   -11.55   596.33  1429.23 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.981e+03  2.166e+02  18.382   <2e-16 ***
## x.t          1.536e-01  1.354e-01   1.134    0.259    
## x.1          1.857e-02  2.205e-01   0.084    0.933    
## x.2          4.480e-02  2.220e-01   0.202    0.840    
## x.3          2.830e-02  2.180e-01   0.130    0.897    
## x.4          1.889e-02  2.175e-01   0.087    0.931    
## x.5         -4.846e-02  2.191e-01  -0.221    0.825    
## x.6          3.046e-02  2.175e-01   0.140    0.889    
## x.7         -3.494e-03  2.189e-01  -0.016    0.987    
## x.8         -1.349e-03  2.239e-01  -0.006    0.995    
## x.9         -8.232e-02  2.222e-01  -0.371    0.712    
## x.10        -1.012e-02  1.340e-01  -0.076    0.940    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 737.4 on 139 degrees of freedom
## Multiple R-squared:  0.1931, Adjusted R-squared:  0.1292 
## F-statistic: 3.024 on 11 and 139 DF,  p-value: 0.001201

From the summary of the above model, all the lag weights of the predictors are not statistically significant at the 5% level. The Adjusted-\(R^2\) = 0.1292 which suggests that the model explains about 13% of variability in the ASX price series which is very low.

6.1.1 Diagnostic Checking

checkresiduals(modelcopperDLM)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 138.89, df = 15, p-value < 2.2e-16

copperVIF = vif(modelcopperDLM)
copperVIF

##      x.t      x.1      x.2      x.3      x.4      x.5      x.6      x.7 
## 17.92226 49.15319 51.59119 51.49576 52.97842 55.52579 56.41365 58.90348 
##      x.8      x.9     x.10 
## 63.24668 63.87601 23.81718

copperVIF > 10

##  x.t  x.1  x.2  x.3  x.4  x.5  x.6  x.7  x.8  x.9 x.10 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

6.1.1.1 Normality Test

The hypotheis for normality is:

\(H_0\): The residuals are normally distributed
\(H_A\): The residuals are not normally distributed

shapiro.test(modelcopperDLM$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  modelcopperDLM$residuals
## W = 0.93912, p-value = 4.235e-06

F-test of the overall significance of the model reports the model is not statistically significant at 5% level whereby all p-values in all the lag weights of the predictors are greater than 0.05. Therefore, we can conclude that the model is not a good fit to the data.

From the time series plot and histogram of standardized residuals, there is an obvious non-random pattern and very high residual values which violate general assumptions. According to this test and the ACF plot, it is concluded that the serial correlation left in the residuals is highly significant. This is further agreed with the Breusch-Godfrey test which displayed showing the existence serial correlation of order up to 15 further agrees that there is an existence of serial correlation at 5% level of significance in this model.

From the VIF values, it is obvious that the estimates of the finite DLM coefficients are suffering from the multicollinearity since all the VIF values are much greater than 10. In order to with this issue, the restricted least squares method is used to find parameter estimates. Using this approach, some restrictions are placed on the model parameters to reduce the variances of the estimators which is able translate the pattern of time effects into the restrictions on parameters.

Furthermore, the Shapiro-Wilk normality test shows a p-value of 4.235e-06 which implies that the results is statistically significant at 5% level. Hence, this implies that the null hypothesis is rejected showing the residuals significantly deviate from a normal distribution. Both the histogram and Shapiro-Wilk (p-value < 0.05) test report that the normality of the residuals does not hold.

In summary, it is concluded that the finite DLM of lag 10 is not an suitable model for the series.

6.2 Polynomial DLM

To deal with multicolinearity that was identified from the finite DLM above, polynomial DLM can be used as this model addresses the collinearity by imposing a polynomial shape on the lag distribution. Based on the finite DLM results, the independent series (Copper price series) is chosen and compared with the dependent series (ASX price series) for this model. To minimise the information criteria for this model, the order of polynomial is set to 1.

# for AIC
finiteDLMauto(x=as.vector(df_model$copperPrice), y=as.vector(df_model$asxPrice), q.min=1, q.max=10, k.order = 1,
              model.type = 'poly', error.type = "AIC", trace=TRUE)

##     q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466 5.05005  0.84313  0.17549         0
## 9   9 - 1 4.00168 2435.953 2448.048 4.99410 -5.37833  0.19025         0
## 8   8 - 1 4.03309 2453.342 2465.464 5.05381  0.55608  0.20244         0
## 7   7 - 1 4.07118 2470.649 2482.797 5.10793  0.61094  0.21326         0
## 6   6 - 1 4.09428 2488.104 2500.277 5.18334  2.74265  0.22488         0
## 5   5 - 1 4.11367 2505.646 2517.846 5.08586  0.76203  0.23776         0
## 4   4 - 1 4.15203 2523.100 2535.325 5.05822 -0.05088  0.25039         0
## 3   3 - 1 4.19205 2540.208 2552.459 5.15270  0.42877  0.26371         0
## 2   2 - 1 4.21449 2557.367 2569.643 5.30198  0.73653  0.27856         0
## 1   1 - 1 4.24262 2574.488 2586.789 5.38993 -0.49453  0.29391         0

# For BIC
finiteDLMauto(x=as.vector(df_model$copperPrice), y=as.vector(df_model$asxPrice), q.min=1, q.max=10, k.order = 1,
              model.type = 'poly', error.type = "BIC", trace=TRUE)

##     q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466 5.05005  0.84313  0.17549         0
## 9   9 - 1 4.00168 2435.953 2448.048 4.99410 -5.37833  0.19025         0
## 8   8 - 1 4.03309 2453.342 2465.464 5.05381  0.55608  0.20244         0
## 7   7 - 1 4.07118 2470.649 2482.797 5.10793  0.61094  0.21326         0
## 6   6 - 1 4.09428 2488.104 2500.277 5.18334  2.74265  0.22488         0
## 5   5 - 1 4.11367 2505.646 2517.846 5.08586  0.76203  0.23776         0
## 4   4 - 1 4.15203 2523.100 2535.325 5.05822 -0.05088  0.25039         0
## 3   3 - 1 4.19205 2540.208 2552.459 5.15270  0.42877  0.26371         0
## 2   2 - 1 4.21449 2557.367 2569.643 5.30198  0.73653  0.27856         0
## 1   1 - 1 4.24262 2574.488 2586.789 5.38993 -0.49453  0.29391         0

Similar to fitting the finite DLM, the number lags is set within the range of 1 to 10. As a results, the lowest AIC and BIC values is found when the number of lags is 10.

copperModelpoly = polyDlm( x = as.vector(df_model$copperPrice) , y = as.vector(df_model$asxPrice), q = 10, k=1)$model

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0   0.07360    0.01440   5.120 1.01e-06
## beta.1   0.06140    0.01170   5.240 5.82e-07
## beta.2   0.04920    0.00911   5.410 2.67e-07
## beta.3   0.03710    0.00658   5.640 8.95e-08
## beta.4   0.02490    0.00428   5.820 3.74e-08
## beta.5   0.01280    0.00287   4.450 1.73e-05
## beta.6   0.00062    0.00360   0.172 8.64e-01
## beta.7  -0.01150    0.00570  -2.020 4.48e-02
## beta.8  -0.02370    0.00817  -2.900 4.35e-03
## beta.9  -0.03580    0.01080  -3.330 1.11e-03
## beta.10 -0.04800    0.01340  -3.580 4.74e-04

summary(copperModelpoly)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1214.62  -619.46    25.16   588.44  1479.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.034e+03  2.040e+02  19.775  < 2e-16 ***
## z.t0         7.356e-02  1.438e-02   5.115 9.57e-07 ***
## z.t1        -1.216e-02  2.721e-03  -4.467 1.56e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 717.6 on 148 degrees of freedom
## Multiple R-squared:  0.1865, Adjusted R-squared:  0.1755 
## F-statistic: 16.96 on 2 and 148 DF,  p-value: 2.329e-07

From the summary of the above model, all the lag weights of the predictors are statistically significant at the 5% level. The Adjusted-\(R^2\) = 0.1755 which suggests that the model explains about 18% of variability in the ASX price series which is still considered very low. However, the model reveals are slight improvement from the finite DLM.

6.2.1 Diagnostic Checking

checkresiduals(copperModelpoly)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 138.14, df = 10, p-value < 2.2e-16

copperVIFpoly = vif(copperModelpoly)
copperVIFpoly

##     z.t0     z.t1 
## 25.86069 25.86069

copperVIFpoly > 10

## z.t0 z.t1 
## TRUE TRUE

6.2.1.1 Normality Test

The hypotheis for normality is:

\(H_0\): The residuals are normally distributed
\(H_A\): The residuals are not normally distributed

shapiro.test(copperModelpoly$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  copperModelpoly$residuals
## W = 0.94724, p-value = 1.828e-05

F-test of the overall significance of the model reports the model is statistically significant at 5% level whereby all p-values in all the lag weights of the predictors are lesser than 0.05.

From the time series plot and histogram of standardized residuals, there is an obvious non-random pattern and very high residual values which violate general assumptions. According to this test and the ACF plot, it is concluded that the serial correlation left in the residuals is highly significant. This is further agreed with the Breusch-Godfrey test which displayed showing the existence serial correlation of order up to 10 further agrees that there is an existence of serial correlation at 5% level of significance in this model.

From the VIF values, the estimates of the polynomial DLM coefficients are still suffering from the multicollinearity since all the VIF values are much greater than 10.

Furthermore, the Shapiro-Wilk normality test shows a p-value of 1.828e-05 which implies that the results is statistically significant at 5% level. Hence, this implies that the null hypothesis is rejected showing the residuals significantly deviate from a normal distribution. Both the histogram and Shapiro-Wilk (p-value < 0.05) test report that the normality of the residuals does not hold.

In summary, similar to the finite DLM, it is concluded that the polynomial DLM of lag 10 is not an suitable model for the series.

6.3 Koyck DLM

Since the correlation between copper and asx price series is the highest amongst the three other predictors, it is chosen for the Koyck Modelling.

With the diagnostics argument set to TRUE, shows three tests:

An F test of the first stage regression for weak instruments.
The Wu-Hausman test for the significance of the correlation between an explanatory variable and the error term (endogeneity).
The Sargan test of overidentifying restrictions (only if there are more instruments than regressors).

The model is implemented as follows

modelkoyckcopper = koyckDlm(x = as.vector(copper.ts) , y = as.vector(asx.ts))$model
summary(modelkoyckcopper, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 157   1966.87 < 2e-16 ***
## Wu-Hausman         1 156     10.97 0.00115 ** 
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16

From the summary of the above model, it is observed that \(\delta_2\) is statistically significant whereas \(\delta_3\) is not statistically significant at the 5% level. The Adjusted-\(R^2\) = 0.9479 which suggests that the model explains about 95% of variability in the ASX price series which is very high. Therefore, this model reveals are great improvement from the finite DLM and the polynomial DLM.

6.3.1 Diagnostic Checking

checkresiduals(modelkoyckcopper)

copperkoyckVIF = vif(modelkoyckcopper)
copperkoyckVIF

##      Y.1      X.t 
## 1.493966 1.493966

copperkoyckVIF > 10

##   Y.1   X.t 
## FALSE FALSE

6.3.1.1 Normality Test

The hypotheis for normality is:

\(H_0\): The residuals are normally distributed
\(H_A\): The residuals are not normally distributed

normalitycheck <- function(x){
  qqnorm(x,main="QQ plot of standardised residuals")
  qqline(x, col = 2)
  print(shapiro.test(x))
}
normalitycheck(modelkoyckcopper$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.96188, p-value = 0.0002204

F-test of the overall significance of the model reports the model is statistically significant at 5% level where the p-value is less than 0.05.

Based on the Weak instruments test, the model at the first stage of least-squares estimation is significant at 5% level. The Wu-Hausman test shows that there is significant correlation between the explanatory variable and the error term at 5% significance level.

From the VIF values, the estimates of the Koyck DLM coefficients are now much lesser 10 which implies that theres is no multicolinearity.

Furthermore, the Shapiro-Wilk normality test shows a p-value of 0.0002204 which implies that the results is statistically significant at 5% level. Hence, this implies that the null hypothesis is rejected showing the residuals significantly deviate from a normal distribution. It is also observed the histogram of the residuals is slightyly left-skewed. Therefore, both the histogram and Shapiro-Wilk (p-value < 0.05) test report that the normality of the residuals does not hold.

6.4 Autorregressive DLM

In order to improve on the Koyck model, the predictors are now fitted with the Autorregressive Distributed Lag Models against the ASX Price series. A loop is used to find parameters p and g for the ADRL(p,q) that is fitted with the ASX and copper price series to find the lowest AIC and BIC values. Using the parameters p and q found from the loop, AIC and BIC values can be compared for all other possible models such as gold, crude oil and gold + copper.

for (i in 1:5){
  for(j in 1:5){
    model4.1 = ardlDlm(x = as.vector(df_model$copperPrice) , y = as.vector(df_model$asxPrice), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4.1$model), "BIC = ", BIC(model4.1$model),"\n")
    }
  }

## p =  1 q =  1 AIC =  2147.741 BIC =  2163.116 
## p =  1 q =  2 AIC =  2135.4 BIC =  2153.813 
## p =  1 q =  3 AIC =  2121.12 BIC =  2142.558 
## p =  1 q =  4 AIC =  2109.759 BIC =  2134.209 
## p =  1 q =  5 AIC =  2099.056 BIC =  2126.505 
## p =  2 q =  1 AIC =  2130.043 BIC =  2148.456 
## p =  2 q =  2 AIC =  2132.038 BIC =  2153.52 
## p =  2 q =  3 AIC =  2119.241 BIC =  2143.741 
## p =  2 q =  4 AIC =  2107.649 BIC =  2135.155 
## p =  2 q =  5 AIC =  2097.021 BIC =  2127.52 
## p =  3 q =  1 AIC =  2117.307 BIC =  2138.745 
## p =  3 q =  2 AIC =  2119.247 BIC =  2143.748 
## p =  3 q =  3 AIC =  2119.696 BIC =  2147.259 
## p =  3 q =  4 AIC =  2108.537 BIC =  2139.1 
## p =  3 q =  5 AIC =  2097.832 BIC =  2131.38 
## p =  4 q =  1 AIC =  2105.916 BIC =  2130.366 
## p =  4 q =  2 AIC =  2107.774 BIC =  2135.28 
## p =  4 q =  3 AIC =  2108.608 BIC =  2139.17 
## p =  4 q =  4 AIC =  2110.085 BIC =  2143.704 
## p =  4 q =  5 AIC =  2099.454 BIC =  2136.052 
## p =  5 q =  1 AIC =  2095.118 BIC =  2122.566 
## p =  5 q =  2 AIC =  2096.96 BIC =  2127.459 
## p =  5 q =  3 AIC =  2097.887 BIC =  2131.436 
## p =  5 q =  4 AIC =  2099.497 BIC =  2136.095 
## p =  5 q =  5 AIC =  2101.419 BIC =  2141.067

From the above loop, it is observed that p = 1 and q = 5 has the lowest AIC and BIC values. Now, with these variables, a comparison can be done to check for a lower AIC and BIC values with more than one predictors.

# Individual series vs asxPrice series
goldModelardl = ardlDlm( x = as.vector(df_model$goldPrice) , y = as.vector(df_model$asxPrice),p = 1 , q = 5)$model
oilModelardl = ardlDlm( x = as.vector(df_model$oilPrice) , y = as.vector(df_model$asxPrice), p = 1 , q = 5)$model
copperModelardl = ardlDlm( x = as.vector(df_model$copperPrice) , y = as.vector(df_model$asxPrice), p = 1 , q = 5)$model

# summission of gold and copper predictors
f1 = asxPrice ~ goldPrice + copperPrice
goldCopperModelardl = ardlDlm(formula = f1, data = data.frame(df_model), p = 1 , q = 5)$model

# compare AIC and BIC of all models
ardlDLM_AIC = AIC(goldModelardl, oilModelardl, copperModelardl, goldCopperModelardl)
ardlDLM_BIC = BIC(goldModelardl, oilModelardl, copperModelardl, goldCopperModelardl)

sort.score(ardlDLM_AIC, 'aic')

##                     df      AIC
## goldCopperModelardl 11 2086.767
## goldModelardl        9 2092.194
## oilModelardl         9 2098.335
## copperModelardl      9 2099.056

sort.score(ardlDLM_BIC, 'bic')

##                     df      BIC
## goldModelardl        9 2119.643
## goldCopperModelardl 11 2120.315
## oilModelardl         9 2125.784
## copperModelardl      9 2126.505

As a result, it is obvious that there are lower AIC and BIC values for model with more than one predictors. Removing colinearity between the price series, copper and gold price series is chosen as predictors for asx price series in the ardlDlm modelling. Now, a loop is used to fit autoregressive DLMs for a range of lag lengths and orders of the AR process for the two predictors (Gold and Copper Price series).

f1 = asxPrice ~ goldPrice + copperPrice
for (i in 1:5){
  for(j in 1:5){
    model4.1 = ardlDlm(formula = f1, data = data.frame(df_model), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4.1$model), "BIC = ", BIC(model4.1$model),"\n")
    }
  }

## p =  1 q =  1 AIC =  2135.427 BIC =  2156.954 
## p =  1 q =  2 AIC =  2123.2 BIC =  2147.752 
## p =  1 q =  3 AIC =  2108.165 BIC =  2135.728 
## p =  1 q =  4 AIC =  2097.165 BIC =  2127.728 
## p =  1 q =  5 AIC =  2086.767 BIC =  2120.315 
## p =  2 q =  1 AIC =  2119.917 BIC =  2147.537 
## p =  2 q =  2 AIC =  2121.523 BIC =  2152.212 
## p =  2 q =  3 AIC =  2107.82 BIC =  2141.508 
## p =  2 q =  4 AIC =  2096.569 BIC =  2133.244 
## p =  2 q =  5 AIC =  2086.252 BIC =  2125.9 
## p =  3 q =  1 AIC =  2109.717 BIC =  2143.406 
## p =  3 q =  2 AIC =  2110.906 BIC =  2147.657 
## p =  3 q =  3 AIC =  2109.982 BIC =  2149.795 
## p =  3 q =  4 AIC =  2098.603 BIC =  2141.39 
## p =  3 q =  5 AIC =  2088.342 BIC =  2134.09 
## p =  4 q =  1 AIC =  2100.046 BIC =  2139.777 
## p =  4 q =  2 AIC =  2101.141 BIC =  2143.929 
## p =  4 q =  3 AIC =  2100.586 BIC =  2146.43 
## p =  4 q =  4 AIC =  2102.043 BIC =  2150.943 
## p =  4 q =  5 AIC =  2091.733 BIC =  2143.58 
## p =  5 q =  1 AIC =  2089.626 BIC =  2135.374 
## p =  5 q =  2 AIC =  2090.472 BIC =  2139.27 
## p =  5 q =  3 AIC =  2089.878 BIC =  2141.726 
## p =  5 q =  4 AIC =  2091.428 BIC =  2146.326 
## p =  5 q =  5 AIC =  2092.86 BIC =  2150.807

Based on the above loop, it shows p = 1 and q = 5 has the lowest AIC and BIC values where two predictors is used to explain the ASX price index.

Now, the modelling for the Autorregressive Distributed Lag Model with gold and copper price series with the above lag lengths and orders of the AR process. The model is implemented as follows

f1 = asxPrice ~ goldPrice + copperPrice
modelARDLM = ardlDlm(formula = f1, data = data.frame(df_model), p = 1 , q = 5 )$model
summary(modelARDLM)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -608.69 -108.99   -1.87   96.72  697.71 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   204.195354  91.833430   2.224 0.027715 *  
## goldPrice.t    -1.168101   0.298139  -3.918 0.000137 ***
## goldPrice.1     1.175789   0.293809   4.002 9.96e-05 ***
## copperPrice.t   0.094982   0.032049   2.964 0.003551 ** 
## copperPrice.1  -0.096433   0.032189  -2.996 0.003216 ** 
## asxPrice.1      0.932523   0.078420  11.891  < 2e-16 ***
## asxPrice.2      0.184956   0.111691   1.656 0.099876 .  
## asxPrice.3     -0.094300   0.109515  -0.861 0.390613    
## asxPrice.4     -0.063908   0.109287  -0.585 0.559605    
## asxPrice.5      0.002755   0.077303   0.036 0.971618    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 187.2 on 146 degrees of freedom
## Multiple R-squared:  0.9531, Adjusted R-squared:  0.9502 
## F-statistic: 329.5 on 9 and 146 DF,  p-value: < 2.2e-16

From the summary of the above model, all the lag weights of the predictors are statistically significant at the 5% level. The Adjusted-\(R^2\) = 0.9502 which suggests that the model explains about 95% of variability in the ASX price series. F-test of the overall significance of the model reports the model is statistically significant at 5% level where the p-value is less than 0.05.

6.4.1 Diagnostic Checking

checkresiduals(modelARDLM)

## 
##  Breusch-Godfrey test for serial correlation of order up to 13
## 
## data:  Residuals
## LM test = 11.315, df = 13, p-value = 0.5844

ardlVIF = vif(copperModelpoly)
ardlVIF

##     z.t0     z.t1 
## 25.86069 25.86069

ardlVIF > 10

## z.t0 z.t1 
## TRUE TRUE

6.4.1.1 Normality Test

The hypotheis for normality is:

\(H_0\): The residuals are normally distributed
\(H_A\): The residuals are not normally distributed

normalitycheck(modelARDLM$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.98623, p-value = 0.1257

From the time series plot and histogram of standardized residuals, there is an obvious that the errors are spread randomly. According to this test and the ACF plot, it is concluded that there are no serial correlation in the residuals. The AIC and BIC values where the parameter ARDL(1,5) is report to be AIC = 2086.767 and BIC = 2120.315, respectively.

From the VIF values, the estimates of the Autoregressive DLM coefficients are suffering from the multicollinearity since all the VIF values are much greater than 10.

From the time series plot of standardised residuals, we can infer that standardised residuals are distributed around zero mean level as desired. From the histogram and QQ plots, we can say that standardised residuals are distributed according to standard normal distribution within (−3,3) range; hence, we do not have an influential point. ACF of the standardised residuals do not suggest the existence of autocorrelation within error terms.

Furthermore, the Shapiro-Wilk normality test shows a p-value of 0.1257 which implies that the results is not statistically significant at 5% level. Hence, this implies that there is no sufficient evidence to reject the null hypothesis. Therefore, according to the histogram and the results of Shapiro-Wilk test, the normality assumption is not violated at the 5% level.

Overall, in term of multicolinearity, ARDL model is not a suitable model for further analysis.

7 Discussion

Based on the finite DLM, the copper and asx price series is fitted and the model summary shows all the lag weights of the predictors are not statistically significant at the 5% level. The Adjusted-\(R^2\) = 0.1292 which suggests that the model explains about 13% of variability in the ASX price series which is very low. The time series plot and histogram of standardized residuals shows that the model violate general assumptions. Additionally, the VIF values of all the finite DLM coefficients are suffering from the multicollinearity. Therefore, the finite DLM is not a suitable model for the series.

An attempt to deal with the multicolinearity for the finite DLM issue is implementing the polynomial DLM by imposing a polynomial shape on the lag distribution. The summary shows that the model is able to explain about 18% of variability in the ASX price series which is an improvement to the finite DLM, however, still very low. Overall, the model is considered to be significant at the 5% level. The diagnostic check for the model reveals that it violates the general assumption as it has serial correlation in the residuals, suffering from multicolinearity and the residuals is not normally distributed. Therefore, the polynomial DLM is also not a suitable model for the series.

The next model fitted was the Koyck model which shows that the model is considered to be significant at the 5% level. This model is able to explain about 95% of variability in the ASX price series which a huge improvement to the finite and polynomial DLM. The standardized residuals in time series plot and histogram shows an evenly spreaded errors. Furthermore, no serial correlation in the residuals and the are no multicolinearity found in the Koyck DLM coefficients. Therefore, it can be said that Koyck DLM is a suitable model even though the normality of the residuals does not hold.

Finally, after comparing the AIC and BIC values, gold + copper price series is fitted again the asx price series with the parameters of p=1, q=5. Overall, the model is significant since F-test of the overall significance of the model reports the model is statistically significant at 5% level, the histogram and time series plot shows evenly spreaded errors, normality assumption is not violated and model explains about 95% of variability. However, the model Autoregressive DLM coefficients suffer from the multicollinearity. Hence, this model is also not suitable for further analysis and forecasting.

8 Conclusion

The investigation is started by exploring into the raw data of the ASX data which include ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004. All these series were explored through visualisation individually to check for trends, seasonality, changing variance, intervention points. By exploring the series individually, an obvious intervention around 2008 was found in the ASX All Ordinaries (Ords) Price Index, Crude Oil (Brent, USD/bbl) and Copper (USD/tonne). These interventions points was suspected to be connected to the financial crisis of that year.

A scatterplot matrix and a correlation matrix is plotted and it is found that the ASX All Ordinaries (Ords) Price Index is moderately correlated with the Copper (USD/tonne) price series. The analysis is then continued with the stationarity test which include plotting sample ACFs and PACFs and running Augmented Dickey-Fuller tests on the all the series. After performing the analysis, it was found that all series is not stationary and the slow decaying ACF plots shows that all the series have trends in them.

Next, a Box-Cox transformation is done on the ASX and Crude Oil Price series after checking the lambda values on each series from the data. By visual inspection, it is observed that all the series are still not stationary. Therefore, first order differencing is done on the Box-Cox transformed ASX and Crude Oil Price series and the Gold and Copper Price series. As a result, stationary data on all the series was achieved.

The analysis was followed by decomposition on all the series to explore the impact of each component of time series data on the given dataset. Since there was no seasonal effect found in the dataset prior to decomposition, a conclusion was made such that there were other external factors that have an impact the data such as the remainder (i.e. noise in the series).

In the following stage, a model building and diagnostics strategy was done by looking into different lag models such as finite DLM, polynomial DLM, Koyck DLM and autoregressive DLM. After fitting different candidate models, the Koyck model was found to be the most appropriate model for the given data. The model chosen had a high explainability at with \(R^2\) = 0.9479 , the model diagnostics reveals that there were no major issues in the residuals. Compared to other models, the Koyck model did not suffer from a multicollinearity effect.

As a conclusion, although the normality of the residuals for the model does not hold, the Koyck model as compared to all other models is the most suitable for further analysis and forecasting.

9 Appendix

normalitycheck <- function(x){
  qqnorm(x,main="QQ plot of standardised residuals")
  qqline(x, col = 2)
  print(shapiro.test(x))
}

# Correlation panel
panel.cor <- function(x, y){
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- round(cor(x, y), digits=2)
    txt <- paste0("R = ", r)
    cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
# Customize upper panel
upper.panel<-function(x, y){
  points(x,y, pch = 19)
}

# helper functions
# AIC and BIC sorting function by Cameron Doyle
sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

MATH 1307 Assignment 1

Australian Stock Exchange Time-Series Analysis

Roy Wong (s3835352)

7/26/2021

1 Introduction

2 Exploratory Data Analysis

2.1 Descriptive Statistics

2.2 Time Series Plot

2.3 Scaled Plot

2.4 Correlation

3 Stationary Test

3.1 Autocorrelation and Partial Autocorrelation Plots

3.1.1 ASX Price

3.1.2 Gold

3.1.3 Crude Oil

3.1.4 Copper

3.2 Augmented Dickey-Fuller test

3.2.1 ASX Price

3.2.2 Gold

3.2.3 Crude Oil

3.2.4 Copper

4 Composition of Time Series

4.1 Transformation

4.1.1 ASX Price

4.1.2 Crude Oil

4.2 Differencing

5 Decomposition

5.1 ASX Price

5.2 Gold

5.3 Crude Oil

5.4 Copper

6 Model Building and Diagnostics

6.1 Finite DLM

6.1.1 Diagnostic Checking

6.1.1.1 Normality Test

6.2 Polynomial DLM

6.2.1 Diagnostic Checking

6.2.1.1 Normality Test

6.3 Koyck DLM

6.3.1 Diagnostic Checking

6.3.1.1 Normality Test

6.4 Autorregressive DLM

6.4.1 Diagnostic Checking

6.4.1.1 Normality Test

7 Discussion

8 Conclusion

9 Appendix