Forecasting: Project 1

Data Description
Objective
Read Data
Identification of the response and the regressor variables
Relationship between Regressor and Response variables
Descriptive Analysis
Transformations
Differencing to improve Stationarity
Decomposition
Modeling
Fit Finite DLM
Fit Polynomial DLM model
Fit Koyck geometric DLM model
Fit Autoregressive Distributed Lag Model
Detailed conclusion on the most appropriate model including context (Model Selection)
Model Selection
Detailed Graphical and statistical tests of assumptions for \(ARDL(12,4)\) model (Residual Analysis)
Final Analysis Conclusion and Best Fitting model:

library(readr)
library(forecast) # checkresiduals()
library(TSA) # season() and other functions
library(urca) # for ur.df()
library(tseries) # for adf.test()
library(stats) # for classical decomposition
library(x12) # X-12-ARIMA decomposition
library(GGally) #for ggpairs()
library(car) #for vif()
library(dLagM) # For DLM modelling
library(lmtest) # for bgtest()

Data Description

The dataset hold 4 columns and 161 observations. They are, monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004.

Objective

Our aim for the ASX dataset is to determine the most accurate and suitable regression model that determines ASX Price Index using at least one of the 3 regressors. A descriptive analysis and decomposition of the response variable will be conducted initially. Model-building strategy will be applied to find the best fitting model from the 4 different Distributed Lag Models.

Read Data

ASX = read.csv("C:/Users/admin/Downloads/ASX_data.csv", header = TRUE)

ASX$Gold.price <- as.numeric(gsub(",", "", ASX$Gold.price))
ASX$Copper_USD.tonne <- as.numeric(gsub(",", "", ASX$Copper_USD.tonne))
# Convert from character to numeric

ASXPrice.ts = ts(ASX$ASX.price, start = c(2004,1), frequency = 12)
head(ASXPrice.ts, 36)

##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2004 2935.4 2778.4 2848.6 2970.9 2979.8 2999.7 3106.7 3202.9 3176.2 3282.4
## 2005 3283.6 3372.5 3416.4 3407.7 3456.9 3530.3 3546.1 3561.9 3674.7 3786.3
## 2006 4880.2 4878.4 5087.2 5207.0 4972.3 5034.0 4957.1 5079.8 5113.0 5352.9
##         Nov    Dec
## 2004 3195.7 3306.0
## 2005 3942.8 4053.1
## 2006 5461.6 5644.3

Identification of the response and the regressor variables

For fitting a regression model, the response is ASX Price Index (ASX.price) and the 3 regressor variables are the Gold price (Gold.price), Crude Oil in USD price per barrel (Crude.Oil.Brent.USD.bbl) and Copper price in USD per tonne (CopperUSD.tonne).

y = ASX.price = ASX Price Index
x1 = Gold.price = Price of Gold in AUD
x2 = Crude.Oil..Brent._USD.bbl = Crude Oil in USD price per barrel
x3 = Copper_USD.tonne = Copper price in USD per tonne

All the 4 variables are continuous variables.

Read Regressor and Response variables

Lets first get the regressor and 3 response as TS objects,

ASXPrice.ts = ts(ASX$ASX.price, start = c(2004,1), frequency = 12) # Y
GoldPrice.ts = ts(ASX$Gold.price, start = c(2004,1), frequency = 12) #X1
CrudeOilPrice.ts = ts(ASX$Crude.Oil..Brent._USD.bbl, start = c(2004,1), frequency = 12) # X2
CopperPrice.ts = ts(ASX$Copper_USD.tonne, start = c(2004,1), frequency = 12) # X3
data.ts = ts(ASX, start = c(2004,1), frequency = 12) # Y, X1, X2, X3 all in single dataframe

Relationship between Regressor and Response variables

Lets scale, center and plot all the 4 variables together

data.scale = scale(data.ts)
plot(data.scale, plot.type="s", col=c("black", "red", "blue", "green"), main = "ASX Price Index (Black - Respone), Gold Price (Red - X1),\n  Oil price (Blue - X2), Copper Price (Green - X2)")

It is hard to read the correlations between the regressors and the response and the among the response themselves. But it is fair to say the 4 variables show some correlations. Lets check for correlation statistically using ggpairs(),

ggpairs(data = ASX, columns = c(1,2,3,4), progress = FALSE) #library(ggally)

As shown in the matrix above,
- Response ASX price index and regressors Gold price, Crude oil price and copper price are correlated by 0.343, 0.329 and 0.562 correlation coefficient. - The regressors themselves are correlated by 0.437 (Gold and Crude oil price), 0.536 (Gold and Copper price) and 0.866 (Crude oil and Copper price)

Hence, some correlations between the 3 regressor and response is present. We can generate regression model based on these correlations. First, lets look at the descriptive statistics

Descriptive Analysis

Since we are generating regression model which estimates the response, \(ASX Price Index\), lets focus on ASX.Price’s statistics.

Summary statistics

summary(ASXPrice.ts)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2778    4325    4929    4813    5448    6779

The mean and median of the ASX Price Index are very close indicating symmetrical distribution.

Time Series plot:

The time series plot for our data is generated using the following code chunk,

plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Figure 1: ASX All Ordinaries Price Index Trend from January, 2004")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))

Plot Inference :

From Figure 1, we can comment on the time series’s,

Trend: The overall shape of the trend seems to follow an upward trend except from 2008 to 2009. Thus, indicating non-stationarity.
Seasonality: From the plot, no noticeable seasonal behavior is seen as no clear repeating patterns are visible. This needs to be confirmed using statistical tests.
Change in Variance: A change in variance can be seen as the variation is much higher during the earlier years than the latter.
Behavior: We notice mixed behavior of MA and AR series. In the earlier years (2004-2009), AR behavior is dominant as we obverse more following data points. In the latter years (2010-2017), MA behavior is more evident due to more up and down fluctuations than following data points.
Intervention/Change points: Steep drop around 2008 might be an intervention point. This needs to be confirmed using statistical tests.

ACF and PACF plots:

par(mfrow=c(2,1))
acf(ASXPrice.ts, main="ACF of ASX All Ordinaries Price Index")
pacf(ASXPrice.ts, main ="PACF of ASX All Ordinaries Price Index")

par(mfrow=c(1,1))

ACF plot: We notice multiple autocorrelations are significant. A slowly decaying pattern indicates non stationary series. We do not see any ‘wavish’ form. Thus, no seasonal behavior is observed. We did not observe seasonality in time series plot as well.
PACF plot: We see 1 high vertical spike indicating non stationary series. We have observed non stationarity in the time series plot as well. Also, the second correlation bar is significant as well.

Check normality

Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.

qqnorm(ASXPrice.ts, main = "Normal Q-Q Plot of ASX Price Index Time Series")
qqline(ASXPrice.ts, col = 2)

We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,

\(H_0\) : Time series is Normally distributed
\(H_a\) : Time series is not normal

shapiro.test(ASXPrice.ts)

## 
##  Shapiro-Wilk normality test
## 
## data:  ASXPrice.ts
## W = 0.9709, p-value = 0.001786

From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, ASX Price Index is not normally distributed.

Test Stationarity

The ACF, PACF and time series plots of the \(ASX Price Index\) time series at the descriptive analysis stage of time series tells us nonstationarity in our time series. Lets use ADF and PP tests,

Using ADF (Augmented Dickey-Fuller) test :

Lets confirm the non-stationarity using Dickey-Fuller Test or ADF test. Lets state the hypothesis,

\(H_0\) : Time series is Difference non-stationary
\(H_a\) : Time series is Stationary

3 approaches with different method of calculating lag lengths will be considered. We first calculate the lag length and then check the test results.

First approach,

k = ar(diff(ASXPrice.ts))$order
print(k) # Lag length

## [1] 0

adf.test(ASXPrice.ts, k = k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ASXPrice.ts
## Dickey-Fuller = -1.9684, Lag order = 0, p-value = 0.5895
## alternative hypothesis: stationary

Second approach,

k = trunc(12*((length(ASXPrice.ts)/100)^(1/4)))
print(k)

## [1] 13

adf.test(ASXPrice.ts, k = k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ASXPrice.ts
## Dickey-Fuller = -2.8677, Lag order = 13, p-value = 0.2144
## alternative hypothesis: stationary

Third approach using default value of lag length, which is k=⌈(N−1)1/3⌉.

test  = adf.test(ASXPrice.ts)
test$parameter

## Lag order 
##         5

print(test)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ASXPrice.ts
## Dickey-Fuller = -2.6995, Lag order = 5, p-value = 0.2846
## alternative hypothesis: stationary

From all versions of the test, since p-value > 0.05, we fail to reject null hypothesis of non stationarity. we can conclude that the series is non stationary at 5% level of significance.

Using PP (Phillips-Perron) test :

The null and alternate hypothesis are same as ADF test. For the PP test, we have two options for the lag length. If lshort parameter is set to TRUE we get k=⌈4∗(N/100)1/4)⌉, otherwise we get k=⌈12∗(N/100)1/4)⌉.

PP.test(ASXPrice.ts, lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  ASXPrice.ts
## Dickey-Fuller = -2.2284, Truncation lag parameter = 4, p-value = 0.4811

PP.test(ASXPrice.ts, lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  ASXPrice.ts
## Dickey-Fuller = -2.3932, Truncation lag parameter = 13, p-value =
## 0.4123

According to the 2 PP tests, ASX Price Index series is non stationary at 5% level.

Conclusion from descriptive analysis:

From the time series plot, ACF, PACF plots, ADF and PP tests, we found our ASX Price Index response is non stationary. Differencing is required to fix this.
Trend is not normal and carries increasing variance. Thus Box-cox transformation is required, unless it gets fixed by differencing.

Lets perform with Box-Cox transformation,

Transformations

Box-Cox transformation to improve normality or variance

To improve normality and increasing variance in our ASX Price Index time series, lets test Box-Cox transformations on the series.

lambda = BoxCox.lambda(ASXPrice.ts, method = "loglik")
BC.ASXPrice.ts = BoxCox(ASXPrice.ts, lambda = lambda)

Check Normality and improvement in variance of transformed ASX series

Visually comparing the time series plots before and after box-cox transformation,

par(mfrow=c(2,1))
plot(BC.ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts,x=time(BC.ASXPrice.ts), pch=as.vector(season(BC.ASXPrice.ts)))

plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))

par(mfrow=c(1,1))

From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,

shapiro.test(BC.ASXPrice.ts)

## 
##  Shapiro-Wilk normality test
## 
## data:  BC.ASXPrice.ts
## W = 0.97203, p-value = 0.002376

From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed ASX Price Index is not normal.

Now, lets perform Differencing with the aim to improve stationarity in our ASX Price Index time series.

Differencing to improve Stationarity

The ACF, PACF and time series plots at the descriptive analysis stage of time series tells us nonstationarity in our time series. This was confirmed using ADF and PP tesdt. This non-stationarity is still present in BC transformed series as no actions are taken yet to flatten the trend curve. Lets confirm statistically,

Check Stationarity of BC transformed ASX series

Lets check if transformation has improved stationarity of our non-stationary ASX Price Index series.

Using ADF test

adf.BC.ASXPrice.ts = ur.df(BC.ASXPrice.ts, type = "none", lags = 1, selectlags = "AIC")
summary(adf.BC.ASXPrice.ts)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1116.68  -156.30    56.77   211.31  1232.57 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)
## z.lag.1    0.002123   0.003472   0.612    0.542
## z.diff.lag 0.083995   0.079916   1.051    0.295
## 
## Residual standard error: 311.1 on 157 degrees of freedom
## Multiple R-squared:  0.01029,    Adjusted R-squared:  -0.002316 
## F-statistic: 0.8163 on 2 and 157 DF,  p-value: 0.4439
## 
## 
## Value of test-statistic is: 0.6115 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Using PP test

PP.test(BC.ASXPrice.ts, lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  BC.ASXPrice.ts
## Dickey-Fuller = -2.2244, Truncation lag parameter = 4, p-value = 0.4827

PP.test(BC.ASXPrice.ts, lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  BC.ASXPrice.ts
## Dickey-Fuller = -2.3922, Truncation lag parameter = 13, p-value =
## 0.4127

In all cases, Since p-value > 0.05, we fail to reject null hypothesis of non stationarity. we can conclude that the BC transformed ASX Price Index series is non stationary at 5% level of significance.

Ordinary or Seasonal Differencing

Since no seasonality was observed at the descriptive analysis stage (from ACF, PACF or time series plot), we expect ordinary differencing should suffice. Lets compare the seasonally differenced and ordinary differenced series w.r.t the original time series.

BC.ASXPrice.ts.Orddiff = diff(BC.ASXPrice.ts, differences = 1) # First Ordinary difference
BC.ASXPrice.ts.Seasdiff = diff(BC.ASXPrice.ts, differences = 1, lag=12) # First seasonal difference

par(mfrow=c(3,1))

plot(BC.ASXPrice.ts.Orddiff,ylab='Landings in metric tons',
     xlab='Year',main = "First Ordinary Differenced Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts.Orddiff,x=time(BC.ASXPrice.ts.Orddiff), pch=as.vector(season(BC.ASXPrice.ts.Orddiff)))

plot(BC.ASXPrice.ts.Seasdiff,ylab='Landings in metric tons',
     xlab='Year',main = "First Seasonal Differenced Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts.Seasdiff,x=time(BC.ASXPrice.ts.Seasdiff), pch=as.vector(season(BC.ASXPrice.ts.Seasdiff)))

plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))

par(mfrow=c(1,1))

As expected, seasonal differencing is useless in this series. Ordinary differencing of first order makes the series look much more stationary. Lets confirm statistically,

adf.BC.ASXPrice.ts.Orddiff = ur.df(BC.ASXPrice.ts.Orddiff, type = "none", lags = 1, selectlags = "AIC")
summary(adf.BC.ASXPrice.ts.Orddiff)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1059.25  -154.88    61.15   215.28  1214.09 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1     -0.7869     0.1072  -7.343 1.08e-11 ***
## z.diff.lag  -0.1342     0.0794  -1.690    0.093 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 309.5 on 156 degrees of freedom
## Multiple R-squared:  0.4631, Adjusted R-squared:  0.4562 
## F-statistic: 67.29 on 2 and 156 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -7.3432 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

PP.test(BC.ASXPrice.ts.Orddiff, lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  BC.ASXPrice.ts.Orddiff
## Dickey-Fuller = -11.683, Truncation lag parameter = 4, p-value = 0.01

PP.test(BC.ASXPrice.ts.Orddiff, lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  BC.ASXPrice.ts.Orddiff
## Dickey-Fuller = -11.859, Truncation lag parameter = 13, p-value = 0.01

From the ADF and PP tests, Since p-value < 0.05, we reject the null hypothesis of non stationarity. Thus, First order Ordinary Differenced BC transformed ASX Price Index series is Stationary at 5% level of significance.

Note, 1st Order Differenced BC transformed ASX Price Index is still not normal. We proceed as such.

shapiro.test(BC.ASXPrice.ts.Orddiff)

## 
##  Shapiro-Wilk normality test
## 
## data:  BC.ASXPrice.ts.Orddiff
## W = 0.94862, p-value = 1.373e-05

Conclusion after BC transformation and Differencing

The First order Ordinary Differenced BC transformed ASX Price Index series is Stationary and not normal. BC transformation was not affective.

Decomposition

To observe the individual effects of the existing components and historical effects occurred in the past, lets perform decomposition of the ASX Price Index time series. The time series can be decomposed into are seasonal, trend and remainder components.

Three main decomposition methods for time series are known, classical decomposition, X-12-ARIMA decomposition and Loess (STL) decomposition.

Classical decomposition

To implement the classical decomposition, we use decompose() function from the stats package.

fit.classical.add <- decompose(ASXPrice.ts, type="additive")
plot(fit.classical.add)

fit.classical.mult <- decompose(ASXPrice.ts, type="multiplicative")
plot(fit.classical.mult)

Notice that a seasonal component was extracted from the series which wasn’t identified during the descriptive analysis stage. We got very similar estimates from the additive and multiplicative methods. It seems from the remainder series that the multiplicative model handles the decomposition slightly better (lesser variance).

X-12-ARIMA decomposition

Although using the classical decomposition, a seasonal component was identified, we do not know how significant this seasonal effect is. X-12-ARIMA decomposition can help us detect the significance of the seasonal component. We use the function x12() to implement the X-12-ARIMA decomposition.

fit.x12 = x12(ASXPrice.ts)
plot(fit.x12 , sa=TRUE , trend=TRUE)

From the plot, we can see the Seasonally Adjusted series follows the original series very closely (especially during earlier years). This implies, that the original series is not affected by seasonality much, meaning seasonality is less significant in the ASX Price Index series. On the contrary, the Trend series deviates quite a lot from the original series (especially at the ups and downs), implying the Original series has significant trend component.

At this point, we believe the original series has very less seasonal component. Lets confirm this using SI (Seasonal & Irregular) Chart and also check irregular component’s contribution to the original series. SI is basically the Original series minus the trend component, which leaves behind the Seasonal and Irregular component (hence the name SI).

plotSeasFac(fit.x12)

The green points represent the SIs obtained from the time series, while the solid black line shows the seasonal component.

SI charts are useful in determining whether short-term movements are caused by seasonal or irregular influences. In the graph above, the SIs can be seen to fluctuate erratically, which indicates the time series under analysis is dominated by its irregular component.

SI charts are also used to identify seasonal breaks, moving holiday patterns and extreme values in a time series. Since the seasonal pattern changes randomly over the 12 months, no particular intervention point is identified. We did not identify any intervention points at the descriptive analysis stage as well (from the Time series plot).

In this plot, replaced SI ratios are the SI ratios with the extreme values replaced. It is also observed from SI ratios that we have outliers or influential observations for almost all the 12 months (especially Nov and Aug). Since we identify too many influential observations, it is fair to say that they are likely not outliers.

STL decomposition

Another decomposition method, STL has advantage of flexibility over classical decomposition. stl() function is used with t.window and s.window arguments to control rate of change of trend and seasonal components. Lets set t.window to 15 and look the STL decomposed plots,

stl.ASXPrice.ts <- stl(window(ASXPrice.ts,start=c(2004,1)), t.window=15, s.window="periodic", robust=TRUE)
plot(stl.ASXPrice.ts)

In this output, the first plot is the smoothed version of the original time series. The second one shows the seasonal component. The third one is for the trend component and the last one is the time series bar chart for the remaining series.

We can display seasonal sub-series using the monthplot() function. This helps us to get a sense of the variation in the seasonal component over time.

monthplot(stl.ASXPrice.ts$time.series[,"seasonal"],choice = "seasonal", main="Seasonal sub-series plot of the 
          seasonal component of ASX series", ylab="Seasonal")

We get only mean levels in this plot because seasonal component repeats itself without changing. Lowest seasonal ASX Price Index value occurs in September and highest in April.

We can also display a monthly trend component by setting choice = “trend”.

monthplot(stl.ASXPrice.ts,choice = "trend", main="Seasonal sub-series plot of the 
          seasonal component of ASX series", ylab="Trend")

From this visualization, we can observe the same trend pattern for each month but with slight change in variance over the months.

After having the decomposition, we can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,

par(mfrow=c(2,1))
plot(seasadj(stl.ASXPrice.ts), ylab='ASX Price Index',xlab='Time', main = "Seasonally adjusted ASX Price Index")
points(y=seasadj(stl.ASXPrice.ts),x=time(seasadj(stl.ASXPrice.ts)), pch=as.vector(season(seasadj(stl.ASXPrice.ts))))

plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))

par(mfrow=c(1,1))

No change is visually seen in the seasonally adjusted series compared to the original series. Thus, seasonal component is insignificant.

After the decomposition, we can adjust for trend as well. For this, we need to subtract the trend component from the original series. The following code chunk illustrates this approach.

par(mfrow=c(2,1))
stl.ASXPrice.ts.trend = stl.ASXPrice.ts$time.series[,"trend"] # Extract the trend component from the output
stl.ASXPrice.ts.trend.adjusted = ASXPrice.ts - stl.ASXPrice.ts.trend

plot(stl.ASXPrice.ts.trend.adjusted, ylab='ASX Price Index',xlab='Time', main = "Trend adjusted ASX Price Index")
points(y=stl.ASXPrice.ts.trend.adjusted,x=time(stl.ASXPrice.ts.trend.adjusted), pch=as.vector(season(stl.ASXPrice.ts.trend.adjusted)))

plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
     type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))

par(mfrow=c(1,1))

The Trend adjusted series is completely different from the original series, implying the trend component of the series is the most influential component of the original series.

Conclusion of Decomposition

Seasonal component is insignificant when comparing seasonally adjusted series with the original series. From the SI ratio, we found irregular component dominating the seasonal component. No significant intervention point or outliers were identified. The trend component of the series is the most influential component of the original series.

Modeling

A regression method, so called, Distributed lag models (DLMs) which uses several lags of a predictor variable in the model as explanatory variables will be applied. Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,

basic Distributed lag model,
Polynomial DLM,
Koyck transformed geometric DLM,
and Autoregressive DLM.

Fit Finite DLM

The response of a finite DLM model with 1 regressor is represented as shown below,

\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s X_{t-s} + \epsilon_t\)

where,

\(\alpha\) is intercept
\(\beta_s\) is coefficient of s lagged response \(X_t\)
and \(\epsilon_t\) is the error term

In our dataset, we have 3 regressors, hence the model equation has X1, X2 and X3 instead of just one regressor.

Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,

finiteDLMauto(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, q.min = 1, q.max = 12,
              model.type = "dlm", error.type = "AIC", trace = TRUE)

##    q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.29042 2411.054 2534.215 3.97425  1.22784  0.17221         0
## 11    11 3.37268 2425.664 2540.068 4.08759  1.12332  0.18832         0
## 10    10 3.45015 2440.112 2545.716 4.17908 -0.09314  0.20653         0
## 9      9 3.51306 2452.691 2549.455 4.32629  0.92363  0.22986         0
## 8      8 3.54824 2464.680 2552.563 4.33153  1.13867  0.25664         0
## 7      7 3.58483 2476.180 2555.141 4.18334  1.59915  0.28265         0
## 6      6 3.61856 2488.492 2558.491 4.36746  1.11068  0.30501         0
## 5      5 3.63787 2500.567 2561.564 4.45774  0.93643  0.32883         0
## 4      4 3.67485 2513.098 2565.055 4.26210  1.02182  0.34904         0
## 3      3 3.75026 2525.161 2568.037 4.80762  0.61340  0.36951         0
## 2      2 3.80734 2538.017 2571.775 4.87590  0.68063  0.38758         0
## 1      1 3.87272 2552.754 2577.355 4.84385  0.91834  0.39832         0

q = 12 has the smallest AIC and BIC scores. Fit model with q = 12,

DLM.model = dlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, q = 12)

Note - we are using ASX.price and not the BC.ASXPrice.ts.Orddiff (Differenced and transformed Price Index) as normality is violated in both of these.

Diagnostic check for DLM.model (Residual analysis)

We can apply a diagnostic check using checkresiduals() function from the forecast package.

checkresiduals(DLM.model$model$residuals) # forecast package

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 682.93, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

In this output,

the time series plot and histogram of residuals, there is an obvious non-random pattern and very high residual values that violate general assumptions.
the Ljung-Box test output is displayed. According to this test, the null hypothesis that a series of residuals exhibits no autocorrelation up-to lag 10 is violated. Another test to check serial correlation is Breusch-Godfrey test,

bgtest(DLM.model$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  DLM.model$model
## LM test = 133.5, df = 1, p-value < 2.2e-16

Which has,
\(H_0\) : there is no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any order up to p

According to this test and ACF plot, we can conclude that the serial correlation left in residuals is highly significant.

Model Summary for Finite DLM model (DLM.model)

summary(DLM.model)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1171.74  -507.71    16.21   535.37  1122.14 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   4.163e+03  2.546e+02  16.350   <2e-16 ***
## Gold.price.t                 -1.894e+00  1.310e+00  -1.445    0.151    
## Gold.price.1                 -2.839e-02  1.860e+00  -0.015    0.988    
## Gold.price.2                 -8.698e-02  1.896e+00  -0.046    0.963    
## Gold.price.3                 -3.677e-03  1.912e+00  -0.002    0.998    
## Gold.price.4                 -4.397e-01  1.893e+00  -0.232    0.817    
## Gold.price.5                  5.175e-01  1.888e+00   0.274    0.785    
## Gold.price.6                 -4.749e-01  1.899e+00  -0.250    0.803    
## Gold.price.7                  3.374e-01  1.911e+00   0.177    0.860    
## Gold.price.8                 -2.941e-01  1.893e+00  -0.155    0.877    
## Gold.price.9                  5.555e-02  1.874e+00   0.030    0.976    
## Gold.price.10                 3.781e-01  1.871e+00   0.202    0.840    
## Gold.price.11                -1.477e-01  1.859e+00  -0.079    0.937    
## Gold.price.12                 1.753e+00  1.251e+00   1.402    0.164    
## Crude.Oil..Brent._USD.bbl.t  -1.546e+01  1.276e+01  -1.211    0.228    
## Crude.Oil..Brent._USD.bbl.1   3.390e+00  1.931e+01   0.176    0.861    
## Crude.Oil..Brent._USD.bbl.2  -5.230e+00  1.967e+01  -0.266    0.791    
## Crude.Oil..Brent._USD.bbl.3  -2.215e+00  2.005e+01  -0.111    0.912    
## Crude.Oil..Brent._USD.bbl.4  -1.226e+01  2.030e+01  -0.604    0.547    
## Crude.Oil..Brent._USD.bbl.5   1.436e+00  2.012e+01   0.071    0.943    
## Crude.Oil..Brent._USD.bbl.6  -8.810e-01  2.014e+01  -0.044    0.965    
## Crude.Oil..Brent._USD.bbl.7  -4.581e+00  2.088e+01  -0.219    0.827    
## Crude.Oil..Brent._USD.bbl.8   2.792e+00  2.106e+01   0.133    0.895    
## Crude.Oil..Brent._USD.bbl.9  -5.310e+00  2.125e+01  -0.250    0.803    
## Crude.Oil..Brent._USD.bbl.10  7.420e+00  2.120e+01   0.350    0.727    
## Crude.Oil..Brent._USD.bbl.11 -3.211e+00  2.101e+01  -0.153    0.879    
## Crude.Oil..Brent._USD.bbl.12  1.180e+00  1.317e+01   0.090    0.929    
## Copper_USD.tonne.t            2.219e-01  1.653e-01   1.343    0.182    
## Copper_USD.tonne.1            4.800e-02  2.464e-01   0.195    0.846    
## Copper_USD.tonne.2            5.181e-02  2.476e-01   0.209    0.835    
## Copper_USD.tonne.3            6.852e-02  2.435e-01   0.281    0.779    
## Copper_USD.tonne.4            1.015e-01  2.449e-01   0.414    0.679    
## Copper_USD.tonne.5            1.918e-02  2.494e-01   0.077    0.939    
## Copper_USD.tonne.6            1.482e-02  2.489e-01   0.060    0.953    
## Copper_USD.tonne.7            7.472e-02  2.517e-01   0.297    0.767    
## Copper_USD.tonne.8           -1.335e-02  2.565e-01  -0.052    0.959    
## Copper_USD.tonne.9            1.278e-02  2.562e-01   0.050    0.960    
## Copper_USD.tonne.10          -1.237e-01  2.525e-01  -0.490    0.625    
## Copper_USD.tonne.11           2.773e-02  2.473e-01   0.112    0.911    
## Copper_USD.tonne.12           1.192e-01  1.710e-01   0.698    0.487    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 701.3 on 109 degrees of freedom
## Multiple R-squared:  0.3903, Adjusted R-squared:  0.1722 
## F-statistic: 1.789 on 39 and 109 DF,  p-value: 0.009903
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2411.054 2534.216

The Finite DLM model is significant at 5% percent significance level with R-squared value value of 39.03% . None of the 12 lagged regressors for the 3 regressors are found to be significant. Looking at the values of coefficients, the highest lag effect is seen for the lag of 1 of Crude oil price. However, we need to consider the effect of collinearity on these results. To inspect this issue, we will display variance inflation factors (VIFs). If the value of VIF is greater than 10, we can conclude that the effect of multicollinearity is high.

VIF.DLM.model = vif(DLM.model$model) # variance inflation factors 
VIF.DLM.model

##                 Gold.price.t                 Gold.price.1 
##                     71.11600                    145.50539 
##                 Gold.price.2                 Gold.price.3 
##                    153.29844                    158.41381 
##                 Gold.price.4                 Gold.price.5 
##                    157.31232                    158.70864 
##                 Gold.price.6                 Gold.price.7 
##                    163.17932                    167.18117 
##                 Gold.price.8                 Gold.price.9 
##                    165.50486                    162.99642 
##                Gold.price.10                Gold.price.11 
##                    162.90794                    160.54574 
##                Gold.price.12  Crude.Oil..Brent._USD.bbl.t 
##                     72.66813                     39.35489 
##  Crude.Oil..Brent._USD.bbl.1  Crude.Oil..Brent._USD.bbl.2 
##                     91.19183                     96.06632 
##  Crude.Oil..Brent._USD.bbl.3  Crude.Oil..Brent._USD.bbl.4 
##                    101.07363                    105.33100 
##  Crude.Oil..Brent._USD.bbl.5  Crude.Oil..Brent._USD.bbl.6 
##                    104.83777                    106.58631 
##  Crude.Oil..Brent._USD.bbl.7  Crude.Oil..Brent._USD.bbl.8 
##                    115.81473                    119.52918 
##  Crude.Oil..Brent._USD.bbl.9 Crude.Oil..Brent._USD.bbl.10 
##                    123.26762                    123.78057 
## Crude.Oil..Brent._USD.bbl.11 Crude.Oil..Brent._USD.bbl.12 
##                    122.39788                     48.51511 
##           Copper_USD.tonne.t           Copper_USD.tonne.1 
##                     27.58041                     63.35766 
##           Copper_USD.tonne.2           Copper_USD.tonne.3 
##                     66.17587                     66.21162 
##           Copper_USD.tonne.4           Copper_USD.tonne.5 
##                     69.38872                     74.44770 
##           Copper_USD.tonne.6           Copper_USD.tonne.7 
##                     76.67356                     80.91336 
##           Copper_USD.tonne.8           Copper_USD.tonne.9 
##                     86.46023                     88.70268 
##          Copper_USD.tonne.10          Copper_USD.tonne.11 
##                     88.48727                     87.08307 
##          Copper_USD.tonne.12 
##                     42.65771

From the VIF values, it is obvious that the estimates of the finite DLM coefficients are suffering from the multicollinearity. To deal with this issue, we can use the restricted least squares method to find parameter estimates. In this approach, some restrictions are placed on the model parameters to reduce the variances of the estimators. In the context of DLMs, we translate the pattern of time effects into the restrictions on parameters. Polynomial DLM model can fix effect of multicollinearity.

Conclusion of Finite DLM model

Model is significant.
Low R-squared value of 39.03% suggests bad fit
violations in the test of assumptions
Has multicollinearity among the regressors

Fit Polynomial DLM model

If the lag weights follow a smooth polynomial pattern, then a polynomial DLM model will reduce/remove the multicollinearity. Lets fit a polynomial DLM of order 2 and check if the polynomial component (order 2) reduces multicollinearity. Lets do this for each of the 3 regressors individually.

For Gold price regressor:

PolyDLM.goldprice = polyDlm(x = as.vector(GoldPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.goldprice)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1546.79  -526.50    13.42   491.35  1894.95 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.673e+03  2.205e+02  21.195   <2e-16 ***
## z.t0        -4.054e-01  2.896e-01  -1.400    0.164    
## z.t1         1.517e-01  1.356e-01   1.119    0.265    
## z.t2        -9.683e-03  1.111e-02  -0.871    0.385    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 761.8 on 145 degrees of freedom
## Multiple R-squared:  0.04315,    Adjusted R-squared:  0.02336 
## F-statistic:  2.18 on 3 and 145 DF,  p-value: 0.09295

Polynomial DLM model with gold price as regressor variable is insignificant at 5% significance level.

For Crude oil price regressor:

PolyDLM.crudeoilprice = polyDlm(x = as.vector(CrudeOilPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.crudeoilprice)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1620.88  -552.51    44.28   526.00  1678.23 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4819.26930  194.08408  24.831  < 2e-16 ***
## z.t0           4.93313    1.78495   2.764  0.00646 ** 
## z.t1          -2.11687    0.84765  -2.497  0.01363 *  
## z.t2           0.15794    0.06961   2.269  0.02474 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 758.4 on 145 degrees of freedom
## Multiple R-squared:  0.05166,    Adjusted R-squared:  0.03204 
## F-statistic: 2.633 on 3 and 145 DF,  p-value: 0.05221

Polynomial DLM model with crude oil price as regressor variable is insignificant at 5% significance level.

For Copper price regressor:

PolyDLM.copperprice = polyDlm(x = as.vector(CopperPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.copperprice)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1213.59  -610.66    26.99   585.36  1512.44 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.136e+03  2.163e+02  19.124  < 2e-16 ***
## z.t0         9.592e-02  2.532e-02   3.788 0.000222 ***
## z.t1        -2.872e-02  1.190e-02  -2.414 0.017022 *  
## z.t2         1.723e-03  9.736e-04   1.770 0.078906 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 710.9 on 145 degrees of freedom
## Multiple R-squared:  0.1666, Adjusted R-squared:  0.1494 
## F-statistic: 9.664 on 3 and 145 DF,  p-value: 7.422e-06

Polynomial DLM model with copper price as regressor variable is significant at 5% significance level. Notice that the standard errors of estimators are much less (to the powers of -2 for polynomial DLM vs power of -1 for ordinary finite DLM) than their unconstrained (Finite DLM) counterparts. But the R-squared value is much less (16.66% for polynomial DLM vs 39.02% for ordinary DLM). The 0th and 1st order regressors of copper price variable are significant, but the 2nd order regressor (z.t2) is insignificant. This implies, the polynomial component of the regressor variable in the model is redundant and not necessary. Lets check if multicollinearity is fixed,

VIF.PolyDLM.copperprice = vif(PolyDLM.copperprice$model) # variance inflation factors 
VIF.PolyDLM.copperprice

##     z.t0     z.t1     z.t2 
## 106.2733 963.4178 477.2208

Multicollinearity is still significant.

Diagnostic check for Polynomial DLM (Residual analysis)

checkresiduals(PolyDLM.copperprice$model$residuals)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 789.7, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

In this output,

the time series plot and histogram of residuals, there is an obvious non-random pattern and very high residual values that violate general assumptions.
the Ljung-Box test output is displayed. According to this test, the null hypothesis that a series of residuals exhibits no autocorrelation up-to lag 10 is violated. Another test to check serial correlation is Breusch-Godfrey test,

bgtest(PolyDLM.copperprice$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  PolyDLM.copperprice$model
## LM test = 134.95, df = 1, p-value < 2.2e-16

Which has,
\(H_0\) : there is no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any order up to p

According to this test and ACF plot, we can conclude that the serial correlation left in residuals is highly significant.

Conclusion of Polynomial DLM model

Thus, because of insignificant order 2 component (z.t2), lower R-squared statistic and no improvement in multicollinearity, and violations in the test of assumptions (residual analysis), Polynomial DLM model is not a good fitting model.

Fit Koyck geometric DLM model

Here the lag weights are positive and decline geometrically. This model is called infinite geometric DLM, meaning there are infinite lag weights. Koyck transformation is applied to implement this infinite geometric DLM model by subtracting the first lag of geometric DLM multiplied by \(\phi\). The Koyck transformed model is represented as,

\(Y_t = \delta_1 + \delta_2Y_{t-1} + \nu_t\)

where \(\delta_1 = \alpha(1-\phi), \delta_2 = \phi, \delta_3 = \beta\) and the random error after the transformation is \(\nu_t = (\epsilon_t -\phi\epsilon_{t-1})\).

The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models for each of the 3 regressors individually.

For Gold price regressor:

Koyck.GoldPrice = koyckDlm(x = as.vector(ASX$Gold.price) , y = as.vector(ASX$ASX.price) )
summary(Koyck.GoldPrice$model, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -682.19 -105.44   15.86  135.04  783.60 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.902e+02  8.958e+01   2.123   0.0353 *  
## Y.1         9.635e-01  1.909e-02  50.469   <2e-16 ***
## X.t         2.595e-03  4.304e-02   0.060   0.9520    
## 
## Diagnostic tests:
##                  df1 df2 statistic  p-value    
## Weak instruments   1 157   8006.64  < 2e-16 ***
## Wu-Hausman         1 156     18.07 3.66e-05 ***
## Sargan             0  NA        NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488,  Adjusted R-squared: 0.9481 
## Wald test:  1454 on 2 and 157 DF,  p-value: < 2.2e-16

The Koyck transformed geometric DLM model with predictor, Gold price is significant. From the Weak Instruments line, we conclude that the model at the first stage of the least-squares fitting is significant at 5% level of significance. In this model, both \(\delta_1\) and \(\delta_2\) are significant, but \(\delta_3\) is insignificant at 5% level. Meaning ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Gold Price. Using the coefficients, \(\hat{\phi}\)=0.9635, the ASX Price Index will decline quickly at the rate of 0.9635 of last years ASX price Index.

From the Wu-Hausman test result in the model output, we reject the null hypothesis that the correlation between explanatory variable (\(Y_{t-1}\)) and the error term is zero (There is no endogeneity) at 5% level. So, there is a significant correlation between the explanatory variable and the error term at 5% level.

Diagnostic check for Koyck.GoldPrice (Residual analysis)

checkresiduals(Koyck.GoldPrice$model)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 7.3793, df = 10, p-value = 0.6892
## 
## Model df: 0.   Total lags used: 10

From the Breusch-Godfrey test and ACF plot, serial correlation left in residuals is not significant. Also, from the time series plot and histogram of residuals, normality and linearity is not violated. Thus, the general assumptions hold. Thus no assumptions are violated.

For Crude oil price regressor:

Koyck.CrudeOilPrice = koyckDlm(x = as.vector(ASX$Crude.Oil..Brent._USD.bbl) , y = as.vector(ASX$ASX.price) )
summary(Koyck.CrudeOilPrice$model, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -683.91 -108.66   13.68  139.77  762.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 209.89536   87.89368   2.388   0.0181 *  
## Y.1           0.97537    0.01905  51.193   <2e-16 ***
## X.t          -0.99907    0.58045  -1.721   0.0872 .  
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 157    3018.5 < 2e-16 ***
## Wu-Hausman         1 156      11.2 0.00103 ** 
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949,   Adjusted R-squared: 0.9483 
## Wald test:  1461 on 2 and 157 DF,  p-value: < 2.2e-16

The results of the Koyck geometric DLM with predictor, Crude oil are completely identical to that of Gold price koyck DLM. Here too ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Crude Oil Price.

Diagnostic check for Koyck.CrudeOilPrice (Residual analysis)

checkresiduals(Koyck.CrudeOilPrice$model)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 5.4432, df = 10, p-value = 0.8597
## 
## Model df: 0.   Total lags used: 10

Again, no assumptions are violated. Serial correlations are not significant as per BG test and ACF plot. Also, Normality and Linearity are not violated.

For Copper price regressor:

Koyck.CopperPrice = koyckDlm(x = as.vector(ASX$Copper_USD.tonne) , y = as.vector(ASX$ASX.price) )
summary(Koyck.CopperPrice$model, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 157   1966.87 < 2e-16 ***
## Wu-Hausman         1 156     10.97 0.00115 ** 
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16

Again same results. ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Copper Price.

Diagnostic check for Koyck.CopperPrice (Residual analysis)

checkresiduals(Koyck.CopperPrice$model)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 6.2327, df = 10, p-value = 0.7953
## 
## Model df: 0.   Total lags used: 10

Again, no assumptions are violated. Serial correlations are not significant as per BG test and ACF plot. Also, Normality and Linearity are not violated.

Check for multicollinearity in Koyck DLM models

vif(Koyck.GoldPrice$model) # variance inflation factors

##      Y.1      X.t 
## 1.140648 1.140648

vif(Koyck.CrudeOilPrice$model) # variance inflation factors

##     Y.1     X.t 
## 1.14038 1.14038

vif(Koyck.CopperPrice$model) # variance inflation factors

##      Y.1      X.t 
## 1.493966 1.493966

For all 3 Koyck models, multicollinearity is insignificant.

Conclusion of Koyck DLM model

All 3 koyck models with individual regressors are significant. Response, ASX Price index is not significantly dependent on the individual regressor variable, rather, ASX Price index is significantly dependent on Last years ASX price index. Multicollinearity is insignificant. None of the test of assumptions are violated. Each of the 3 koyck are good fitting models with high R-squared statistics (>90%).

Fit Autoregressive Distributed Lag Model

In both Polynomial and Koyck models, the response ASX Price Index cannot be represented using all 3 predictor variables at once (keeping the ). Only one predictor can be chosen at once. Autoregressive Distributed lag model is a flexible and parsimonious infinite DLM which can incorporate all 3 predictors at once. The model is represented as,

\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1} + \gamma_1 Y_{t-1} + e_t\)

Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 12.

## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.

# First create an empty df. Iterate over 144 ARDL (since max lag for response and predictor of ARDL model is 12, i.e, p = q = 12 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.

df = data.frame(matrix(
                vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
                stringsAsFactors=F) # create empty dataframe

for(i in 1:12){
  for(j in 1:12){
    model4.1 = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = i, q = j)
    new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
    df[nrow(df) + 1, ] <- new 
  }
} # Iterate and save in df 

head(df[order( df[,3] ),],1) # Best model as per AIC

##      p q      AIC      BIC
## 136 12 4 2006.823 2142.001

head(df[order( df[,4] ),],1) # Best model as per BIC

##    p  q      AIC      BIC
## 12 1 12 2007.304 2067.382

ARDL(12,4) and ARDL(1,12) are the best models as per AIC and BIC scores respectively. Now, lets fit these 2 models,

Model summary for ARDL(12,4)

ARDL.12x4.ASXPrice = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = 12, q = 4)
summary(ARDL.12x4.ASXPrice)

## 
## Time series regression with "ts" data:
## Start = 13, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -432.64 -103.68  -16.67  109.37  552.08 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  260.237963 118.616216   2.194  0.03044 *  
## Gold.price.t                  -1.116842   0.338215  -3.302  0.00131 ** 
## Gold.price.1                   0.836800   0.482073   1.736  0.08553 .  
## Gold.price.2                   0.485515   0.498837   0.973  0.33264    
## Gold.price.3                  -0.243140   0.499190  -0.487  0.62722    
## Gold.price.4                  -0.720131   0.493132  -1.460  0.14719    
## Gold.price.5                   1.155515   0.486102   2.377  0.01926 *  
## Gold.price.6                  -1.032707   0.499816  -2.066  0.04127 *  
## Gold.price.7                   1.181472   0.508301   2.324  0.02203 *  
## Gold.price.8                  -1.198228   0.506886  -2.364  0.01992 *  
## Gold.price.9                   0.521266   0.492774   1.058  0.29257    
## Gold.price.10                 -0.016666   0.485648  -0.034  0.97269    
## Gold.price.11                  0.100623   0.479856   0.210  0.83431    
## Gold.price.12                  0.033926   0.323335   0.105  0.91664    
## Crude.Oil..Brent._USD.bbl.t    2.091690   3.408196   0.614  0.54073    
## Crude.Oil..Brent._USD.bbl.1   -3.451248   5.106748  -0.676  0.50064    
## Crude.Oil..Brent._USD.bbl.2   -5.683870   5.204351  -1.092  0.27727    
## Crude.Oil..Brent._USD.bbl.3    4.563400   5.242218   0.871  0.38601    
## Crude.Oil..Brent._USD.bbl.4   -9.525602   5.228455  -1.822  0.07132 .  
## Crude.Oil..Brent._USD.bbl.5   13.481929   5.277609   2.555  0.01207 *  
## Crude.Oil..Brent._USD.bbl.6   -2.387740   5.399458  -0.442  0.65924    
## Crude.Oil..Brent._USD.bbl.7   -4.622657   5.575113  -0.829  0.40890    
## Crude.Oil..Brent._USD.bbl.8    9.051488   5.592799   1.618  0.10857    
## Crude.Oil..Brent._USD.bbl.9   -9.676900   5.545055  -1.745  0.08389 .  
## Crude.Oil..Brent._USD.bbl.10  10.542622   5.589598   1.886  0.06204 .  
## Crude.Oil..Brent._USD.bbl.11  -8.833586   5.565585  -1.587  0.11548    
## Crude.Oil..Brent._USD.bbl.12   3.022640   3.436238   0.880  0.38106    
## Copper_USD.tonne.t             0.082274   0.042636   1.930  0.05634 .  
## Copper_USD.tonne.1            -0.037119   0.063789  -0.582  0.56188    
## Copper_USD.tonne.2             0.067839   0.063736   1.064  0.28960    
## Copper_USD.tonne.3            -0.083801   0.062908  -1.332  0.18571    
## Copper_USD.tonne.4             0.056879   0.063209   0.900  0.37026    
## Copper_USD.tonne.5            -0.081410   0.064432  -1.264  0.20921    
## Copper_USD.tonne.6             0.034045   0.064376   0.529  0.59803    
## Copper_USD.tonne.7             0.041368   0.064737   0.639  0.52421    
## Copper_USD.tonne.8            -0.107062   0.066065  -1.621  0.10811    
## Copper_USD.tonne.9             0.002464   0.066860   0.037  0.97067    
## Copper_USD.tonne.10           -0.032143   0.065428  -0.491  0.62426    
## Copper_USD.tonne.11            0.122397   0.064058   1.911  0.05877 .  
## Copper_USD.tonne.12           -0.046667   0.044246  -1.055  0.29397    
## ASX.price.1                    0.836936   0.100927   8.292 3.98e-13 ***
## ASX.price.2                    0.217088   0.132384   1.640  0.10403    
## ASX.price.3                    0.049227   0.129576   0.380  0.70478    
## ASX.price.4                   -0.148738   0.098447  -1.511  0.13383    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 179.2 on 105 degrees of freedom
## Multiple R-squared:  0.9617, Adjusted R-squared:  0.946 
## F-statistic: 61.26 on 43 and 105 DF,  p-value: < 2.2e-16

The Autoregressive Distributed Lag Model with 12 lags of independent variable (p=12) and 4 lags of dependent variable (q=4) is found to be significant at 5% significance level. It has high R-squared value of 96.17% indicating very good fit although many coefficients are insignificant in this model.

Diagnostic check for ARDL(12,4) model (Residual analysis)

checkresiduals(ARDL.12x4.ASXPrice$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 47
## 
## data:  Residuals
## LM test = 57.545, df = 47, p-value = 0.1394

Model summary for ARDL(1,12)

ARDL.1x12.ASXPrice = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = 1, q = 12)
summary(ARDL.1x12.ASXPrice)

## 
## Time series regression with "ts" data:
## Start = 13, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -528.25 -115.49   -7.69  117.76  657.83 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 359.563212 116.676097   3.082 0.002513 ** 
## Gold.price.t                 -1.239259   0.312968  -3.960 0.000123 ***
## Gold.price.1                  1.254689   0.309800   4.050 8.75e-05 ***
## Crude.Oil..Brent._USD.bbl.t   2.042785   3.256102   0.627 0.531516    
## Crude.Oil..Brent._USD.bbl.1  -4.662725   3.206261  -1.454 0.148285    
## Copper_USD.tonne.t            0.071622   0.040401   1.773 0.078611 .  
## Copper_USD.tonne.1           -0.040138   0.041172  -0.975 0.331429    
## ASX.price.1                   0.864563   0.085725  10.085  < 2e-16 ***
## ASX.price.2                   0.216545   0.116093   1.865 0.064398 .  
## ASX.price.3                  -0.080201   0.112800  -0.711 0.478359    
## ASX.price.4                  -0.072744   0.112351  -0.647 0.518469    
## ASX.price.5                   0.031715   0.113283   0.280 0.779954    
## ASX.price.6                   0.011837   0.114467   0.103 0.917794    
## ASX.price.7                  -0.001875   0.113392  -0.017 0.986834    
## ASX.price.8                   0.022526   0.114748   0.196 0.844675    
## ASX.price.9                  -0.080082   0.113842  -0.703 0.483035    
## ASX.price.10                  0.010614   0.112841   0.094 0.925206    
## ASX.price.11                  0.036039   0.113442   0.318 0.751231    
## ASX.price.12                 -0.031336   0.079910  -0.392 0.695599    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 190.7 on 130 degrees of freedom
## Multiple R-squared:  0.9462, Adjusted R-squared:  0.9388 
## F-statistic: 127.1 on 18 and 130 DF,  p-value: < 2.2e-16

The Autoregressive Distributed Lag Model with 1 lags of independent variable (p=1) and 12 lags of dependent variable (q=12) is found to be significant at 5% significance level. It has R-squared value of 94.62% which is lesser compared to ARDL(12,4) model.

Diagnostic check for ARDL(1,12) model (Residual analysis)

checkresiduals(ARDL.1x12.ASXPrice$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 22
## 
## data:  Residuals
## LM test = 20.881, df = 22, p-value = 0.5281

Results of residual analysis for ARDL(1,12) are same as ARDL(12,4). No assumptions are violated.

Compare ARDL(12,4) and ARDL(1,12)

ARDL(12,4) has better fit as per R-squared statistic (96.17% vs 94.62%)

Check for multicollinearity in ARDL models

For ARDL(12,4),

vif(ARDL.12x4.ASXPrice$model)

##                       Gold.price                 L(Gold.price, 1) 
##                         72.61221                        149.74710 
##                 L(Gold.price, 2)                 L(Gold.price, 3) 
##                        162.62798                        165.42749 
##                 L(Gold.price, 4)                 L(Gold.price, 5) 
##                        163.58482                        161.21089 
##                 L(Gold.price, 6)                 L(Gold.price, 7) 
##                        173.12932                        181.15457 
##                 L(Gold.price, 8)                 L(Gold.price, 9) 
##                        181.80426                        172.71411 
##                L(Gold.price, 10)                L(Gold.price, 11) 
##                        168.14553                        163.92087 
##                L(Gold.price, 12)        Crude.Oil..Brent._USD.bbl 
##                         74.42449                         42.98759 
##  L(Crude.Oil..Brent._USD.bbl, 1)  L(Crude.Oil..Brent._USD.bbl, 2) 
##                         97.76707                        103.01963 
##  L(Crude.Oil..Brent._USD.bbl, 3)  L(Crude.Oil..Brent._USD.bbl, 4) 
##                        105.90665                        107.06259 
##  L(Crude.Oil..Brent._USD.bbl, 5)  L(Crude.Oil..Brent._USD.bbl, 6) 
##                        110.57013                        117.37147 
##  L(Crude.Oil..Brent._USD.bbl, 7)  L(Crude.Oil..Brent._USD.bbl, 8) 
##                        126.56606                        129.15452 
##  L(Crude.Oil..Brent._USD.bbl, 9) L(Crude.Oil..Brent._USD.bbl, 10) 
##                        128.59271                        131.82120 
## L(Crude.Oil..Brent._USD.bbl, 11) L(Crude.Oil..Brent._USD.bbl, 12) 
##                        131.57342                         50.60618 
##                 Copper_USD.tonne           L(Copper_USD.tonne, 1) 
##                         28.12775                         65.04369 
##           L(Copper_USD.tonne, 2)           L(Copper_USD.tonne, 3) 
##                         67.16237                         67.73033 
##           L(Copper_USD.tonne, 4)           L(Copper_USD.tonne, 5) 
##                         70.83413                         76.14236 
##           L(Copper_USD.tonne, 6)           L(Copper_USD.tonne, 7) 
##                         78.55734                         81.99171 
##           L(Copper_USD.tonne, 8)           L(Copper_USD.tonne, 9) 
##                         87.85832                         92.56974 
##          L(Copper_USD.tonne, 10)          L(Copper_USD.tonne, 11) 
##                         91.03595                         89.54301 
##          L(Copper_USD.tonne, 12)                  L(ASX.price, 1) 
##                         43.78156                         28.54591 
##                  L(ASX.price, 2)                  L(ASX.price, 3) 
##                         50.18514                         48.94920 
##                  L(ASX.price, 4) 
##                         28.91733

For ARDL(1,12),

vif(ARDL.1x12.ASXPrice$model)

##                      Gold.price                L(Gold.price, 1) 
##                        54.85837                        54.56510 
##       Crude.Oil..Brent._USD.bbl L(Crude.Oil..Brent._USD.bbl, 1) 
##                        34.61848                        34.00318 
##                Copper_USD.tonne          L(Copper_USD.tonne, 1) 
##                        22.28432                        23.90739 
##                 L(ASX.price, 1)                 L(ASX.price, 2) 
##                        18.17035                        34.05160 
##                 L(ASX.price, 3)                 L(ASX.price, 4) 
##                        32.72897                        33.22967 
##                 L(ASX.price, 5)                 L(ASX.price, 6) 
##                        34.54566                        36.11062 
##                 L(ASX.price, 7)                 L(ASX.price, 8) 
##                        36.48050                        38.47006 
##                 L(ASX.price, 9)                L(ASX.price, 10) 
##                        38.88476                        39.33336 
##                L(ASX.price, 11)                L(ASX.price, 12) 
##                        40.89081                        20.81649

For both the ARDL models, multicollinearity is significant.

Conclusion of ARDL models

Best model as per AIC is ARDL(12,4) and as per BIC is ARDL(1,12). Both are significant. ARDL(12,4) has better fit as per R-squared statistic (96.17% vs 94.62%). Multicollinearity is significant for both ARDL(12,4) and ARDL(1,12). Diagnostic check for the model suggests presence of no serial correlation and no violation of general assumptions for both the models.

Detailed conclusion on the most appropriate model including context (Model Selection)

Lets summarize the 4 models,

Fit Finite DLM:

All 3 regressors present. Model is significant
All 12 lagged regressors for the 3 variables are insignificant
R-Squared of 39.03% suggests not a good fit
Multicollinearity is significant
Diagnostic check for the model suggests presence of significant serial correlation and violation of general assumptions

Polynomial DLM:

3 separate models for each regressor. 2 are insignificant, 1 (Copper Price) is significant
Polynomial DLM model with Copper Price as regressor is significant
2nd order regressor (z.t2) is insignificant implying polynomial component of the regressor variable is redundant
R-Squared of 16.66% suggests not a good fit
Multicollinearity is significant
Diagnostic check for the model suggests presence of significant serial correlation and violation of general assumptions

Koyck transformed geometric DLM:

3 separate models for each regressor, Gold, Crude oil and Copper prices. All 3 is significant.
R-squared of 94.88%, 94.9% and 94.85% for regressors, Gold, Crude oil and Copper prices suggest good fit
For all 3 koyck models response, ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on their respective regressor.
Multicollinearity is insignificant
Diagnostic check for the model suggests presence of no serial correlation and no violation of general assumptions.

Autoregressive DLM:

All 3 regressors present.
Best model as per AIC is ARDL(12,4) and as per BIC is ARDL(1,12). Both are significant.
Mix of significant and insignificant lag coefficients are present.
ARDL(12,4) has better fit as per R-squared statistic (96.17% vs 94.62%)
Multicollinearity is significant for both ARDL(12,4) and ARDL(1,12)
Diagnostic check for the model suggests presence of no serial correlation and no violation of general assumptions for both the models.

Model Selection

Lets compare all the models generated based on AIC and BIC scores. Also lets comp

Using AIC and BIC scores

Akaike’s (1973) Information Criterion (AIC) and Bayesian Information Criterion (BIC) are useful model specification and selection criterion.

AICBIC = data.frame(
            model = c("Finite DLM model with all 3 predictors", "Polynomial DLM with gold price as predictor", "Polynomial DLM with crude oil price as predictor", "Polynomial DLM with copper price as predictor", "Koyck DLM with gold price as predictor", "Koyck DLM with crude oil price as predictor", "Koyck DLM with copper price as predictor", "ARDL(1,12) model", "ARDL(12,4) model"),
            AIC = c(AIC(DLM.model), AIC(PolyDLM.goldprice), AIC(PolyDLM.crudeoilprice), AIC(PolyDLM.copperprice), AIC(Koyck.GoldPrice), AIC(Koyck.CrudeOilPrice), AIC(Koyck.CopperPrice), AIC(ARDL.1x12.ASXPrice), AIC(ARDL.12x4.ASXPrice)),
            BIC = c(BIC(DLM.model), BIC(PolyDLM.goldprice), BIC(PolyDLM.crudeoilprice), BIC(PolyDLM.copperprice), BIC(Koyck.GoldPrice), BIC(Koyck.CrudeOilPrice), BIC(Koyck.CopperPrice), BIC(ARDL.1x12.ASXPrice), AIC(ARDL.12x4.ASXPrice))
            )

## [1] 2411.054
## [1] 2406.216
## [1] 2404.884
## [1] 2385.629
## [1] 2156.757
## [1] 2156.172
## [1] 2157.456
## [1] 2007.304
## [1] 2006.823
## [1] 2534.216
## [1] 2421.236
## [1] 2419.904
## [1] 2400.649
## [1] 2169.058
## [1] 2168.473
## [1] 2169.757
## [1] 2067.382
## [1] 2006.823

AICBIC[order(AICBIC[,2], AICBIC[,2]),]

##                                              model      AIC      BIC
## 9                                 ARDL(12,4) model 2006.823 2006.823
## 8                                 ARDL(1,12) model 2007.304 2067.382
## 6      Koyck DLM with crude oil price as predictor 2156.172 2168.473
## 5           Koyck DLM with gold price as predictor 2156.757 2169.058
## 7         Koyck DLM with copper price as predictor 2157.456 2169.757
## 4    Polynomial DLM with copper price as predictor 2385.629 2400.649
## 3 Polynomial DLM with crude oil price as predictor 2404.884 2419.904
## 2      Polynomial DLM with gold price as predictor 2406.216 2421.236
## 1           Finite DLM model with all 3 predictors 2411.054 2534.216

Conclusion from AIC and BIC scores:

ADRL(12,4) is the best fitting model with lowest AIC and BIC scores of all models.

Error Estimation

Another way to test the fit of a model is by checking the error estimates. Lower the error estimates, better the models fit. Lets perform error estimation for all the fitted models and compare the results. This is done using the gof(), goodness of fit function which computes,

mean absolute error (MAE),
mean squared error (MSE),
mean percentage error (MPE),
symmetric mean absolute percentage error (sMAPE),
mean absolute percentage error (MAPE),
mean absolute scaled error (MASE),
mean relative absolute error (MRAE),
geometric mean relative absolute error (GMRAE),
mean bounded relative absolute error (MBRAE),
unscaled MBRAE (UMBRAE) for distributed lag models.

GoF(DLM.model, PolyDLM.goldprice, PolyDLM.crudeoilprice, PolyDLM.copperprice, Koyck.GoldPrice, Koyck.CrudeOilPrice, Koyck.CopperPrice, ARDL.1x12.ASXPrice, ARDL.12x4.ASXPrice)

##                         n      MAE          MPE       MAPE      sMAPE      MASE
## DLM.model             149 531.4470 -0.016121296 0.11176072 0.10943918 3.2904182
## PolyDLM.goldprice     149 613.3683 -0.025090357 0.13083066 0.12652744 3.7976286
## PolyDLM.crudeoilprice 149 621.5757 -0.024890659 0.13243151 0.12789303 3.8484442
## PolyDLM.copperprice   149 627.1317 -0.021388636 0.13125574 0.12842174 3.8828433
## Koyck.GoldPrice       160 150.4315 -0.001880814 0.03138333 0.03119260 0.9691180
## Koyck.CrudeOilPrice   160 151.5705 -0.001997884 0.03175681 0.03157271 0.9764557
## Koyck.CopperPrice     160 151.2266 -0.001941328 0.03158104 0.03138867 0.9742402
## ARDL.1x12.ASXPrice    149 139.6163 -0.001397628 0.02821228 0.02820369 0.8644246
## ARDL.12x4.ASXPrice    149 119.3358 -0.001023640 0.02447001 0.02448218 0.7388595
##                             MSE      MRAE     GMRAE     MBRAE      UMBRAE
## DLM.model             359818.80 24.425863 3.9742464 1.2278371  -5.3891007
## PolyDLM.goldprice     564733.42 21.788791 4.1978095 1.5066470  -2.9737607
## PolyDLM.crudeoilprice 559709.43 22.665212 4.3265877 1.0682567 -15.6505852
## PolyDLM.copperprice   491855.47 25.511671 5.0280616 1.5012496  -2.9950139
## Koyck.GoldPrice        39809.28  1.471603 0.9743760 0.4756390   0.9070833
## Koyck.CrudeOilPrice    39663.94  1.658135 0.9730326 0.5464510   1.2048335
## Koyck.CopperPrice      39983.58  1.486736 0.9775711 0.4696419   0.8855186
## ARDL.1x12.ASXPrice     31745.37  2.764232 0.9125113 0.9418756  16.2044693
## ARDL.12x4.ASXPrice     22622.70  3.218118 0.7892615 0.4195087   0.7226787

Conclusion from Error Estimation:

From error estimates table, all the 9 error estimates are lowest for the ADRL(12,4) model.

Detailed Graphical and statistical tests of assumptions for \(ARDL(12,4)\) model (Residual Analysis)

Residual analysis to test model assumptions.

Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.

The estimator error (or residual) is defined by:

\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)

The following problems are to be checked,

linearity in distribution of error terms
The mean value of residuals is zero
Serial autocorrelation
Normality of distribution of error terms

Lets first apply diagnostic check using checkresiduals() function,

checkresiduals(ARDL.12x4.ASXPrice)

## Time Series:
## Start = 13 
## End = 161 
## Frequency = 1 
##           13           14           15           16           17           18 
## -210.3182789  -61.9471949 -112.5717250 -169.2609508  -42.7950870  -53.6276645 
##           19           20           21           22           23           24 
##   18.5015408 -146.3937766  114.8750503   17.5999960  131.6652188  204.0152710 
##           25           26           27           28           29           30 
##  552.0808621   70.1891709   -7.1444563   22.4894273 -175.8151702   37.7195206 
##           31           32           33           34           35           36 
##  -60.1750716  140.3881786  -78.8805690  -51.1927740  -62.7706162  109.3735936 
##           37           38           39           40           41           42 
##  161.3517407  180.1245484  -18.0030434   61.9717481  294.6838400   57.1777430 
##           43           44           45           46           47           48 
##  -70.6153451  -18.6198705  329.4644025   98.7740259 -124.3349186  -13.2097315 
##           49           50           51           52           53           54 
## -279.3139638  -39.4517736 -123.3145062  109.8835875  154.0309115 -304.2952014 
##           55           56           57           58           59           60 
##  -87.3626050  220.2080254 -251.7105126 -183.7704695  -97.0812452  132.9816662 
##           61           62           63           64           65           66 
## -107.8591475 -120.1664832 -103.2956065   66.5110330 -108.1829293  127.6420477 
##           67           68           69           70           71           72 
##  -74.6583973   13.1559107   76.8281247  -99.7463808 -124.7711574  155.6336961 
##           73           74           75           76           77           78 
## -432.6416462 -103.6800076  193.0200314 -188.6093659  -51.8036246 -150.9054447 
##           79           80           81           82           83           84 
##   75.3015126   12.5373665  138.0748404  -15.5538633 -203.3735412  -28.5073578 
##           85           86           87           88           89           90 
##  -64.7526774 -120.6244520  -75.8245248  -61.2446281   55.0634077   32.3995289 
##           91           92           93           94           95           96 
##  -65.8056196  163.2278312 -182.1912604  293.2943311   83.5324385    6.6131178 
##           97           98           99          100          101          102 
##  -15.7521939   68.5512165   19.9295812   59.9243097 -121.1739470 -123.1585728 
##          103          104          105          106          107          108 
##   -0.3467265  -77.9715030   63.9441852  163.6910060  -27.6319352  136.6313095 
##          109          110          111          112          113          114 
##  118.4868516  200.1345196 -193.7573161  -36.9144283 -168.0751639  159.0209419 
##          115          116          117          118          119          120 
## -302.6134082  289.4281745  192.6248619  116.2707536  -45.7039086  -72.4857072 
##          121          122          123          124          125          126 
## -197.0383696  271.0947063  -12.1976707   86.5223719   -5.0940747  -23.9782735 
##          127          128          129          130          131          132 
##  265.6977566   14.4814480 -288.3300434  160.7737713 -148.9056635   23.3065746 
##          133          134          135          136          137          138 
##  272.7917007  208.5641276 -191.1740825 -154.1661892 -187.7797825  -78.1116039 
##          139          140          141          142          143          144 
##   94.9744259 -175.6984961 -105.7749890  160.6106992 -106.8145852  -31.2532715 
##          145          146          147          148          149          150 
##  -40.1566425  -57.7674021  -34.4500192   -7.0885487  241.3572934   49.4967086 
##          151          152          153          154          155          156 
##  331.2859201 -126.4277838  -16.6735741 -248.1037869  118.1521024   70.9457614 
##          157          158          159          160          161 
##  -27.3683241  -34.3376048  117.0365707  102.4020674  -78.0667753

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 2.9651, df = 10, p-value = 0.9822
## 
## Model df: 0.   Total lags used: 10

From the Residuals plot, linearity is not violated as the residuals are randomly distributed accross the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,

mean(ARDL.12x4.ASXPrice$model$residuals)

## [1] -2.092938e-15

As mean value of residuals is very very close to 0, zero mean residuals is not violated

In the checkresiduals output, the Ljung-Box test output is displayed. According to this test, the null hypothesis that a series of residuals exhibits no autocorrelation up-to lag 10 is violated. Another test to check serial correlation is Breusch-Godfrey test,

bgtest(ARDL.12x4.ASXPrice$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  ARDL.12x4.ASXPrice$model
## LM test = 1.5879, df = 1, p-value = 0.2076

Which has,
\(H_0\) : there is no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any order up to p

According to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.

From the histogram shown by checkresiduals(), residuals seem to follow normality. Lets test this statistically,

\(H_0\) : Time series is Normally distributed
\(H_a\) : Time series is not normal

shapiro.test(ARDL.12x4.ASXPrice$model$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  ARDL.12x4.ASXPrice$model$residuals
## W = 0.98856, p-value = 0.2628

From the Shapiro-Wilk test, since p > 0.05 significance level, we cannot reject the null hypothesis that states the data is normal. Thus, residuals of ARDL(12,4) are normally distributed.

Summarizing residual analysis on \(full\) model:

Assumption 1: The error terms are randomly distributed and thus show linearity: Not violated
Assumption 2: The mean value of E is zero (zero mean residuals): Not violated
Assumption 4: The error terms are independently distributed, i.e. they are not autocorrelated: Not violated
Assumption 5: The errors are normally distributed. not Violated

Final Analysis Conclusion and Best Fitting model:

The best fitting model we could achieved for monthly averages of ASX All Ordinaries (Ords) Price Index using the regressors, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) is ADRL(12,4) model. This is backed by AIC, BIC and error estimates of Autoregressive Distributed Lag ADRL(12,4) model. This model has R-squared of 96.17% which is the highest of all models. Although transformation and 1st order ordinary differencing improved stationarity, normality could not be achieved. Hence the raw (without any transformation) response, ASX Price Index is used for modelling. Residual analysis of ARDL(12,4) shows no serial correlation and no assumptions are violated. This model has significant multicollinearity.

Forecasting: Project 1

Evaluating the Best Regression Model for Forecasting ASX Price Index: A Comparative Analysis of Distributed Lag Models

Shaikh Mohammad Rahil

26-08-2023

Table of Contents

Data Description

Objective

Read Data

Identification of the response and the regressor variables

Read Regressor and Response variables

Relationship between Regressor and Response variables

Descriptive Analysis

Summary statistics

Time Series plot:

ACF and PACF plots:

Check normality

Test Stationarity

Conclusion from descriptive analysis:

Transformations

Box-Cox transformation to improve normality or variance

Check Normality and improvement in variance of transformed ASX series

Differencing to improve Stationarity

Check Stationarity of BC transformed ASX series

Ordinary or Seasonal Differencing

Conclusion after BC transformation and Differencing

Decomposition

Classical decomposition

X-12-ARIMA decomposition

STL decomposition

Conclusion of Decomposition

Modeling

Fit Finite DLM

Diagnostic check for DLM.model (Residual analysis)

Model Summary for Finite DLM model (DLM.model)

Conclusion of Finite DLM model

Fit Polynomial DLM model

Diagnostic check for Polynomial DLM (Residual analysis)

Conclusion of Polynomial DLM model

Fit Koyck geometric DLM model

For Gold price regressor:

Diagnostic check for Koyck.GoldPrice (Residual analysis)

For Crude oil price regressor:

Diagnostic check for Koyck.CrudeOilPrice (Residual analysis)

For Copper price regressor:

Diagnostic check for Koyck.CopperPrice (Residual analysis)

Check for multicollinearity in Koyck DLM models

Conclusion of Koyck DLM model

Fit Autoregressive Distributed Lag Model

Model summary for ARDL(12,4)

Diagnostic check for ARDL(12,4) model (Residual analysis)

Model summary for ARDL(1,12)

Diagnostic check for ARDL(1,12) model (Residual analysis)

Compare ARDL(12,4) and ARDL(1,12)

Check for multicollinearity in ARDL models

Conclusion of ARDL models

Detailed conclusion on the most appropriate model including context (Model Selection)

Model Selection

Using AIC and BIC scores

Error Estimation

Detailed Graphical and statistical tests of assumptions for \(ARDL(12,4)\) model (Residual Analysis)

Final Analysis Conclusion and Best Fitting model:

`26-08-2023`