Forecasting

Data Description

The dataset we will analyse includes monthly averages of the ASX Price Index, Gold price, Crude Oil, and Copper price starting from January 2004.

Setup

library(TSA)

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

library(urca)
library(x12)

## Loading required package: x13binary

## x12 is ready to use.

## Use the package x12GUI for a Graphical User Interface.

## By default the X13-ARIMA-SEATS binaries provided by the R package x13binary

## are used but this can be changed with x12path(validpath)

## ---------------

## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/x12/issues

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA

library(dLagM)

## Loading required package: nardl

## Loading required package: dynlm

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'dLagM'

## The following object is masked from 'package:forecast':
## 
##     forecast

library(tseries)
library(readr)

## 
## Attaching package: 'readr'

## The following object is masked from 'package:TSA':
## 
##     spec

Importing data and data wrangling

asx_df <- read.csv("ASX_data.csv") 
class(asx_df)

## [1] "data.frame"

str(asx_df)

## 'data.frame':    161 obs. of  4 variables:
##  $ ASX.price                : num  2935 2778 2849 2971 2980 ...
##  $ Gold.price               : Factor w/ 160 levels "1,009.10","1,010.10",..: 136 135 128 112 119 110 109 122 130 117 ...
##  $ Crude.Oil..Brent._USD.bbl: num  31.3 32.6 30.3 25 25.8 ...
##  $ Copper_USD.tonne         : Factor w/ 158 levels "1,588","1,650",..: 2 5 4 1 3 6 7 8 9 10 ...

#convert numeric class to time series for ASX,Price column
asx_price = ts(asx_df$ASX.price, start = c(2004, 1),frequency = 12)    
head(asx_price)

##         Jan    Feb    Mar    Apr    May    Jun
## 2004 2935.4 2778.4 2848.6 2970.9 2979.8 2999.7

#Converting factor class to numeric in order to
#convert numeric to time series for Gold.Price column
asx_df$Gold.price = sub(",","", asx_df$Gold.price)
asx_df$Gold.price = as.numeric(as.character(asx_df$Gold.price)) 
gold_price= ts(asx_df$Gold.price, start = c(2004, 1),frequency = 12) 
head(gold_price)

##        Jan   Feb   Mar   Apr   May   Jun
## 2004 611.9 603.3 565.7 538.6 549.4 535.9

#convert numeric class to time series for Crude oil column
asx_df$Crude.Oil..Brent._USD.bbl = as.numeric(asx_df$Crude.Oil..Brent._USD.bbl)
crude_oil_price= ts(asx_df$Crude.Oil..Brent._USD.bbl, start = c(2004, 1),frequency = 12) 
head(crude_oil_price)

##        Jan   Feb   Mar   Apr   May   Jun
## 2004 31.29 32.65 30.34 25.02 25.81 27.55

##Converting factor class to numeric in order to
#convert numeric to time series for Copper_USD.tonne column
asx_df$Copper_USD.tonne = sub(",","", asx_df$Copper_USD.tonne)
asx_df$Copper_USD.tonne = as.numeric(as.character(asx_df$Copper_USD.tonne)) 
copper_price= ts(asx_df$Copper_USD.tonne, start = c(2004, 1),frequency = 12) 
head(copper_price)

##       Jan  Feb  Mar  Apr  May  Jun
## 2004 1650 1682 1656 1588 1651 1685

We converted the two non-numeric columns (Gold price, copper price)first from factor to numeric and then into timeseries. Similarly the other two columns are also converted into timeseries.

Inspect and Understand the time series

The following are the five basic patterns we can deduce from a time series plot:

Existence of a recurring pattern or trend
Seasonality evidence
Evidence of fluctuating variation across time which is also the variance
Moving average and autoregression behaviour patterns
Any indication of a possible intervention

#plots the visualization of series
par(mfrow=c(2,2))
plot(asx_price, ylab="Avg monthly ASX price", xlab = "Year", main = "ASX time series", type='o')
plot(gold_price, ylab="Avg monthly Gold price", xlab = "Year", main = "Gold time series", type='o')
plot(crude_oil_price, ylab="Avg monthly ASX price", xlab = "Year", main = "Crude oil time series", type='o')
plot(copper_price, ylab="Avg monthly Copper price", xlab = "Year", main = "Copper time series", type='o')

ASX prices

There is a deifinite upwards trend observed in the time series, with the exception of a dip around the years 2008-2009. No clear evidence of changing variance can be observed.

Gold prices

With gold prices it is very clear that an upwards trend is observed. Although as with ASX prices, there is not any intervention or dip in the prices. There is a constant increase. No clear evidence of changing variance can be observed.

Crude oil prices

No clear trend is visible. There are regular upshifts and dips in the plot.

Copper prices

AS with crude oil prices, there isn’t a clear trend. The prices had a constant upwards trend until the huge dip in 2008.

Existence of non stationarity in dataset

The goal here is to determine whether or not the time series is stationary. Visualization, as recommended by the ACF and PACF, is the most effective technique to check for it. Unit root tests are used to ensure the authenticity of this. I’m just interested in the augmented Dickey-Fuller test (ADF) and not the Phillips-Perron (PP) test.

#ACF and PACF 
par(mfrow=c(2,4)) 
acf(asx_price, main = "ACF ASX price")
pacf(asx_price, main = "PACF ASX price")

acf(gold_price, main = "ACF Gold price")
pacf(gold_price, main = "PACF Gold price")

acf(crude_oil_price, main = "ACF Crude oil price")
pacf(crude_oil_price, main = "PACF Crude oil price")

acf(copper_price, main = "ACF Copper price")
pacf(copper_price, main = "PACF Copper price")

We deduce that there is a trend that dominates the series correlation characteristics because we observe a progressively declining pattern in ACF. We need to remove the trend and then re-display sample ACF to see more serial correlation properties of these series.

ADF Test

#The next steps are to apply ADF test to asx_price column with the first approach to determine the lag length.
k1 = ar(asx_price)$order 
k1

## [1] 2

asx_test  =adf.test(asx_price, k = k1)
asx_test$parameter

## Lag order 
##         2

print(asx_test)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  asx_price
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary

#The next steps are to apply ADF test to gold_price column with the first approach to determine the lag length.
k2 = ar(gold_price)$order 
k2

## [1] 1

gold_test  =adf.test(gold_price, k = k2)
gold_test$parameter

## Lag order 
##         1

print(gold_test)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold_price
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary

#The next steps are to apply ADF test to crude_oil_price column with the first approach to determine the lag length.
k3 = ar(crude_oil_price)$order 
k3

## [1] 3

crude_test  =adf.test(crude_oil_price, k = k3)
crude_test$parameter

## Lag order 
##         3

print(crude_test)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  crude_oil_price
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary

#The next steps are to apply ADF test to copper_price column with the first approach to determine the lag length.
k4 = ar(copper_price)$order 
k4

## [1] 2

copper_test  =adf.test(copper_price, k = k1)
copper_test$parameter

## Lag order 
##         2

print(copper_test)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_price
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary

As seen clearly by each of the ADF tests, the time series is Non-Stationary. WIth optimal lag selection, the test gives us really optimal results.

The impact of the components of a time series data on the given dataset

Decomposition of time series into multiple components is useful for observing the individual effects of existing components as well as historical influences. The components that may be dissected from a time series i.e. seasonality, trend, and the rest that include impacts that the seasonal and trend components do not capture.

Transformation

The time series may need ome transformations before they are decomposed. To achieve that we need to check their lambda values, to get a better idea, if any of the series need a transformation.

#Lambda values
asx_lambda = BoxCox.lambda(asx_price) 
gold_lambda = BoxCox.lambda(gold_price) 
crude_oil_lambda = BoxCox.lambda(crude_oil_price) 
copper_lambda = BoxCox.lambda(copper_price)
cbind(asx_lambda, gold_lambda, crude_oil_lambda, copper_lambda)

##      asx_lambda gold_lambda crude_oil_lambda copper_lambda
## [1,]   1.999924    0.976695       -0.8304136     0.9336783

As we can see the lambda values for copper and gold are very close to 1, hence they need no transformation before decomposition.

However lambda value for asx and crude oil suggest that they may need transformations, since they are not close to 1. Close to 2 in the case of asx_price and negative in case crude_oil_prices.

In the next steps we transform both the time series using boxcox transformations.

#First we transform asx_price, since lambda is not equal to 0 we use the following formula
asx_boxcox = ((asx_price^(asx_lambda)) - 1) / asx_lambda

#Similarly for crude_oil_price

crude_boxcox = ((crude_oil_price^(crude_oil_lambda)) - 1) / crude_oil_lambda

plot(asx_boxcox,ylab='ASX price Index',xlab='Year',type='o', main="Box-Cox Transformed ASX price Index")

plot(crude_boxcox,ylab='Crude Oil price Index',xlab='Year',type='o', main="Box-Cox Transformed Crude Oil price Index")

Differencing

Now that the transformation has been done, we can infer that all the time series are ready for differencing.

#ASX differencing
asx_diff = diff(asx_boxcox)
gold_diff = diff(gold_price)
crude_diff = diff(crude_boxcox)
copper_diff = diff(copper_price)

par(mfrow=c(2,2))
plot(asx_diff,ylab='ASX prices',xlab='Year', main = "First differences of ASX prices")
plot(gold_diff,ylab='Gold prices',xlab='Year', main = "First differences of Gold prices")
plot(crude_diff,ylab='Crude oil prices',xlab='Year', main = "First differences of Crude Oil prices")
plot(copper_diff,ylab='Copper prices',xlab='Year', main = "First differences of Copper prices")

Lets confirm if the above plotted time series’ are stationary or not by using ADF tests on each.

asx_diff_adf = adf.test(asx_diff)

## Warning in adf.test(asx_diff): p-value smaller than printed p-value

asx_diff_adf

## 
##  Augmented Dickey-Fuller Test
## 
## data:  asx_diff
## Dickey-Fuller = -4.4343, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

gold_diff_adf = adf.test(gold_diff)

## Warning in adf.test(gold_diff): p-value smaller than printed p-value

gold_diff_adf

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold_diff
## Dickey-Fuller = -5.8718, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

crude_diff_adf = adf.test(crude_diff)

## Warning in adf.test(crude_diff): p-value smaller than printed p-value

crude_diff_adf

## 
##  Augmented Dickey-Fuller Test
## 
## data:  crude_diff
## Dickey-Fuller = -5.5931, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

copper_diff_adf = adf.test(copper_diff)

## Warning in adf.test(copper_diff): p-value smaller than printed p-value

copper_diff_adf

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_diff
## Dickey-Fuller = -5.478, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

With p value being less that 5%, we can safely reject the null hypothesis, which in case of ADF is that the series is not-stationary. Hence we can say that the series are Stationary.

Impact of trend and stationarity components

Now it is time to actually analyze the effects of components on each of the time series in the dataset. This can be tested by decomposing above used series. We will decompose each of them using the X12 and STL decompostions.

Effects on ASX Price

#ASX Price decomposition
asx_decomp = x12(asx_price)
plot(asx_decomp , sa=TRUE , trend=TRUE , forecast = TRUE)

#Seasonal Factors ASX Price
plotSeasFac(asx_decomp)

#STL Test
asx_stl <- stl(asx_price,t.window=15, s.window="periodic", robust=TRUE) 
plot(asx_stl)

The seasonally adjusted series closely resembles the original series, with a minor seasonal variations. The trend strays slightly from the original, indicating that it has an impact on the series. Seasonally adjusted series continues to follow the original series in the forecast.When seasonal factors are adjusted, seasonal patterns are expected to fluctuate around the mean level if there is no change in the seasonal pattern. They do, however, deviate from the mean level over time, with some high fluctuations in February, September and July notably. Trend follows the overall pattern of the original series with an upward trend, falling back, and growing again, according to STL.

Effects on Gold Prices

#Gold Price decomposition
gold_decomp = x12(gold_price)
plot(gold_decomp , sa=TRUE , trend=TRUE , forecast = TRUE)

#Seasonal Factors Gold Price
plotSeasFac(gold_decomp)

#STL test
gold_stl <- stl(gold_price,t.window=15, s.window="periodic", robust=TRUE) 
plot(gold_stl)

The seasonally adjusted series and trend lines closely follow the original series throughout, with minor deviations here and there. The series is expected to remain constant in the coming years, according to the forecast. With respect to Seasonal factors plot, there are deviations around the mean levels in most of the months. This means there are changes in seasonal patterns. The 3rd plot shows an upward trend, seasonal fluctuations, and fairly random residuals.

Effects on Crude oil prices

#Crude oil Price decomposition
crude_oil_decomp = x12(crude_oil_price)
plot(crude_oil_decomp , sa=TRUE , trend=TRUE , forecast = TRUE)

#Seasonal Factors Crude Oil Price
plotSeasFac(crude_oil_decomp)

#STL test
crude_stl <- stl(crude_oil_price,t.window=15, s.window="periodic", robust=TRUE) 
plot(crude_stl)

From 2006 to 2010, the seasonally adjusted series and trend deviate significantly from the original series. It has recently followed closely with the original series and will continue to do so in the near future. With respect to the seasonal factors plot, if there is no change in the seasonal pattern, this plot should show seasonal patterns fluctuating around the mean level. Despite the fact that it is seasonal, it tends to follow the mean. Plot 3 shows variable trend, seasonal fluctuations, and residuals which peak during 2008-2009 and then subsequently dip.

Effects on Copper prices

#Copper Price decomposition
copper_decomp = x12(copper_price)
plot(copper_decomp , sa=TRUE , trend=TRUE , forecast = TRUE)

#Seasonal Factors Copper Price
plotSeasFac(copper_decomp)

#STL test
copper_stl <- stl(copper_price,t.window=15, s.window="periodic", robust=TRUE) 
plot(copper_stl)

Despite an intervention in 2009, the seasonally adjusted graph, trend, and original series all fit together. At any point in the series, there is no deviation. The forecast predicts that it will continue in the same pattern. As depicted by plot 3, there are interventions from 2007 to 2009, as shown by the peaks. The trend and original series go hand in hand.

Appropriate Model Selection

Ou next piece of business is to select the best possible model with respect to ASX Price index. We will see how different models perform with each of the columns namely, Gold prices, crude oil price and copper prices.

ASX Prices vs Gold

# Finite dlm 
gold_model1 =dlm(x =as.vector(gold_price) , y =as.vector(asx_price) , q =9)
summary(gold_model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1484.01  -590.23    13.97   473.71  1991.28 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4452.20100  223.27538  19.940   <2e-16 ***
## x.t           -0.35288    1.27113  -0.278    0.782    
## x.1           -0.10591    1.88058  -0.056    0.955    
## x.2            0.02149    1.91708   0.011    0.991    
## x.3           -0.07972    1.93268  -0.041    0.967    
## x.4           -0.25836    1.93610  -0.133    0.894    
## x.5            0.36953    1.93817   0.191    0.849    
## x.6            0.16311    1.95298   0.084    0.934    
## x.7            0.70404    1.94616   0.362    0.718    
## x.8           -0.21513    1.91991  -0.112    0.911    
## x.9            0.16715    1.29373   0.129    0.897    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 802.4 on 141 degrees of freedom
## Multiple R-squared:  0.05795,    Adjusted R-squared:  -0.008866 
## F-statistic: 0.8673 on 10 and 141 DF,  p-value: 0.5654
## 
## AIC and BIC values for the model:
##        AIC     BIC
## 1 2476.983 2513.27

checkresiduals(gold_model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 140.36, df = 14, p-value < 2.2e-16

#ploymonial dlm
gold_model2 =polyDlm(x =as.vector(gold_price) , y =as.vector(asx_price) , q =2, k = 2, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0    0.396       1.28   0.310   0.757
## beta.1   -0.233       1.90  -0.122   0.903
## beta.2    0.536       1.27   0.421   0.675

summary(gold_model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1632.50  -700.82     4.61   549.72  2213.87 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3998.3587   212.3161  18.832   <2e-16 ***
## z.t0           0.3958     1.2767   0.310    0.757    
## z.t1          -1.3268     5.7723  -0.230    0.819    
## z.t2           0.6983     2.8546   0.245    0.807    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 834.5 on 155 degrees of freedom
## Multiple R-squared:  0.1015, Adjusted R-squared:  0.08409 
## F-statistic: 5.835 on 3 and 155 DF,  p-value: 0.0008385

checkresiduals(gold_model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 147.24, df = 10, p-value < 2.2e-16

#Koyk's transform
gold_model3 = koyckDlm(x =as.vector(gold_price) , y =as.vector(asx_price) )
summary(gold_model3)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -682.19 -105.44   15.86  135.04  783.60 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.902e+02  8.958e+01   2.123   0.0353 *  
## Y.1         9.635e-01  1.909e-02  50.469   <2e-16 ***
## X.t         2.595e-03  4.304e-02   0.060   0.9520    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488,  Adjusted R-squared: 0.9481 
## Wald test:  1454 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
## NULL
## 
##                            alpha        beta       phi
## Geometric coefficients:  5205.15 0.002595168 0.9634602

checkresiduals(gold_model3$model)

#ARDLM 
gold_model4.1 =ardlDlm(x =as.vector(gold_price) , y =as.vector(asx_price) ,p =1 ,q =1 )$model
summary(gold_model4.1)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -598.9 -102.9   10.4  119.5  724.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 182.80452   85.05812   2.149   0.0332 *  
## X.t          -1.24911    0.29162  -4.283 3.21e-05 ***
## X.1           1.23169    0.28976   4.251 3.66e-05 ***
## Y.1           0.97172    0.01812  53.624  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 191.1 on 156 degrees of freedom
## Multiple R-squared:  0.9542, Adjusted R-squared:  0.9533 
## F-statistic:  1083 on 3 and 156 DF,  p-value: < 2.2e-16

gold_model4.2 =ardlDlm(x =as.vector(gold_price) , y =as.vector(asx_price) ,p =2 ,q =2 )$model
summary(gold_model4.2)

## 
## Time series regression with "ts" data:
## Start = 3, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -590.39 -101.05    5.98  120.08  717.11 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 206.07177   86.92387   2.371  0.01900 *  
## X.t          -1.23319    0.30048  -4.104 6.58e-05 ***
## X.1           1.20172    0.43989   2.732  0.00704 ** 
## X.2           0.01244    0.30803   0.040  0.96783    
## Y.1           0.99348    0.08164  12.169  < 2e-16 ***
## Y.2          -0.02596    0.08198  -0.317  0.75195    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 191.7 on 153 degrees of freedom
## Multiple R-squared:  0.9532, Adjusted R-squared:  0.9517 
## F-statistic:   623 on 5 and 153 DF,  p-value: < 2.2e-16

gold_model4.3 =ardlDlm(x =as.vector(gold_price) , y =as.vector(asx_price) ,p =3 ,q =3 )$model
summary(gold_model4.3)

## 
## Time series regression with "ts" data:
## Start = 4, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -557.60 -111.52   14.36  115.19  697.04 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 219.25957   88.32755   2.482  0.01415 *  
## X.t          -1.22010    0.30229  -4.036 8.63e-05 ***
## X.1           1.29275    0.44643   2.896  0.00435 ** 
## X.2           0.15014    0.45449   0.330  0.74160    
## X.3          -0.23147    0.31042  -0.746  0.45703    
## Y.1           0.98562    0.08183  12.045  < 2e-16 ***
## Y.2           0.14973    0.11517   1.300  0.19555    
## Y.3          -0.17417    0.08164  -2.133  0.03453 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 190.6 on 150 degrees of freedom
## Multiple R-squared:  0.9531, Adjusted R-squared:  0.9509 
## F-statistic: 435.1 on 7 and 150 DF,  p-value: < 2.2e-16

gold_models.AIC =AIC(gold_model4.1,gold_model4.2,gold_model4.3)
gold_models.BIC =BIC(gold_model4.1,gold_model4.2,gold_model4.3)
sortScore(gold_models.AIC, "aic")

##               df      AIC
## gold_model4.3  9 2117.305
## gold_model4.2  7 2130.523
## gold_model4.1  5 2140.897

sortScore(gold_models.BIC, "bic")

##               df      BIC
## gold_model4.3  9 2144.868
## gold_model4.2  7 2152.005
## gold_model4.1  5 2156.273

#According to both AIC and BIC, ARDL(3,3) model is the most accurate one. We will displaydiagnostic plots of residuals from ARDL(3,3) to examine residuals.
checkresiduals(gold_model4.3$residuals)

ASX Prices vs Crude oil

# Finite dlm 
crude_model1 =dlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) , q =9)
summary(crude_model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1563.08  -643.18   -11.17   571.05  1707.84 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4641.7675   196.4286  23.631   <2e-16 ***
## x.t            8.8037    11.3647   0.775    0.440    
## x.1            2.8911    19.0087   0.152    0.879    
## x.2           -2.1607    19.2799  -0.112    0.911    
## x.3            2.9569    19.4045   0.152    0.879    
## x.4           -7.0213    19.3913  -0.362    0.718    
## x.5            0.7784    19.3589   0.040    0.968    
## x.6            1.7254    19.4044   0.089    0.929    
## x.7           -2.5268    19.5635  -0.129    0.897    
## x.8           -2.7015    19.3336  -0.140    0.889    
## x.9            0.8312    11.3876   0.073    0.942    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 802.1 on 141 degrees of freedom
## Multiple R-squared:  0.05864,    Adjusted R-squared:  -0.008123 
## F-statistic: 0.8783 on 10 and 141 DF,  p-value: 0.5551
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2476.871 2513.158

checkresiduals(crude_model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 140.06, df = 14, p-value < 2.2e-16

#ploymonial dlm
crude_model2 =polyDlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) , q =2, k = 2, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0    13.70       11.6   1.180   0.239
## beta.1     3.18       19.1   0.167   0.868
## beta.2    -8.41       11.5  -0.730   0.467

summary(crude_model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1616.4  -703.8   -77.9   657.5  1783.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4209.0519   179.6342  23.431   <2e-16 ***
## z.t0          13.6870    11.5777   1.182    0.239    
## z.t1          -9.9541    57.9232  -0.172    0.864    
## z.t2          -0.5483    28.7836  -0.019    0.985    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 835.6 on 155 degrees of freedom
## Multiple R-squared:  0.09909,    Adjusted R-squared:  0.08165 
## F-statistic: 5.683 on 3 and 155 DF,  p-value: 0.001019

checkresiduals(crude_model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 146.42, df = 10, p-value < 2.2e-16

#Koyk's transform
crude_model3 = koyckDlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) )
summary(crude_model3)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -683.91 -108.66   13.68  139.77  762.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 209.89536   87.89368   2.388   0.0181 *  
## Y.1           0.97537    0.01905  51.193   <2e-16 ***
## X.t          -0.99907    0.58045  -1.721   0.0872 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949,   Adjusted R-squared: 0.9483 
## Wald test:  1461 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
## NULL
## 
##                             alpha       beta       phi
## Geometric coefficients:  8522.034 -0.9990694 0.9753703

checkresiduals(crude_model3$model)

#ARDLM 
crude_model4.1 =ardlDlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) ,p =1 ,q =1 )$model
summary(crude_model4.1)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -684.06 -111.57   -1.77  138.90  719.35 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 211.45397   85.03705   2.487  0.01395 *  
## X.t           7.45111    2.46199   3.026  0.00290 ** 
## X.1          -8.17917    2.44422  -3.346  0.00103 ** 
## Y.1           0.97067    0.01837  52.827  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.5 on 156 degrees of freedom
## Multiple R-squared:  0.9525, Adjusted R-squared:  0.9516 
## F-statistic:  1044 on 3 and 156 DF,  p-value: < 2.2e-16

crude_model4.2 =ardlDlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) ,p =2 ,q =2 )$model
summary(crude_model4.2)

## 
## Time series regression with "ts" data:
## Start = 3, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -676.1 -115.8  -10.3  141.0  731.0 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 244.07731   86.68388   2.816  0.00551 ** 
## X.t           6.24419    2.79912   2.231  0.02715 *  
## X.1          -3.49738    4.46844  -0.783  0.43502    
## X.2          -3.63452    2.70013  -1.346  0.18028    
## Y.1           0.94407    0.08337  11.324  < 2e-16 ***
## Y.2           0.02266    0.08323   0.272  0.78584    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194 on 153 degrees of freedom
## Multiple R-squared:  0.9521, Adjusted R-squared:  0.9505 
## F-statistic: 607.9 on 5 and 153 DF,  p-value: < 2.2e-16

crude_model4.3 =ardlDlm(x =as.vector(crude_oil_price) , y =as.vector(asx_price) ,p =3 ,q =3 )$model
summary(crude_model4.3)

## 
## Time series regression with "ts" data:
## Start = 4, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -649.54 -119.25  -12.53  138.52  709.15 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 249.79681   89.55878   2.789  0.00597 ** 
## X.t           6.52707    2.82900   2.307  0.02241 *  
## X.1          -4.76939    4.71360  -1.012  0.31325    
## X.2          -3.02318    4.58005  -0.660  0.51022    
## X.3           0.47726    2.73635   0.174  0.86178    
## Y.1           0.94185    0.08455  11.139  < 2e-16 ***
## Y.2           0.12269    0.11766   1.043  0.29876    
## Y.3          -0.10083    0.08400  -1.200  0.23187    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.9 on 150 degrees of freedom
## Multiple R-squared:  0.9509, Adjusted R-squared:  0.9486 
## F-statistic: 415.2 on 7 and 150 DF,  p-value: < 2.2e-16

crude_models.AIC =AIC(crude_model4.1,crude_model4.2,crude_model4.3)
crude_models.BIC =BIC(crude_model4.1,crude_model4.2,crude_model4.3)
sortScore(crude_models.AIC, "aic")

##                df      AIC
## crude_model4.3  9 2124.324
## crude_model4.2  7 2134.235
## crude_model4.1  5 2146.524

sortScore(crude_models.BIC, "bic")

##                df      BIC
## crude_model4.3  9 2151.887
## crude_model4.2  7 2155.718
## crude_model4.1  5 2161.900

#According to both AIC and BIC, ARDL(3,3) model is the most accurate one. We will displaydiagnostic plots of residuals from ARDL(3,3) to examine residuals.
checkresiduals(crude_model4.3$residuals)

ASX Prices vs Copper

# Finite dlm 
copper_model1 =dlm(x =as.vector(copper_price) , y =as.vector(asx_price) , q =9)
summary(copper_model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1163.95  -653.62    -5.48   601.46  1422.56 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.921e+03  2.114e+02  18.546   <2e-16 ***
## x.t          1.576e-01  1.351e-01   1.166    0.245    
## x.1          1.829e-02  2.177e-01   0.084    0.933    
## x.2          4.688e-02  2.177e-01   0.215    0.830    
## x.3          2.755e-02  2.164e-01   0.127    0.899    
## x.4          2.061e-02  2.157e-01   0.096    0.924    
## x.5         -5.263e-02  2.157e-01  -0.244    0.808    
## x.6          3.688e-02  2.165e-01   0.170    0.865    
## x.7         -5.357e-03  2.186e-01  -0.025    0.980    
## x.8         -2.372e-04  2.195e-01  -0.001    0.999    
## x.9         -9.203e-02  1.337e-01  -0.688    0.493    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 736.7 on 141 degrees of freedom
## Multiple R-squared:  0.2059, Adjusted R-squared:  0.1496 
## F-statistic: 3.656 on 10 and 141 DF,  p-value: 0.000233
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2451.016 2487.302

checkresiduals(copper_model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 140.43, df = 14, p-value < 2.2e-16

#ploymonial dlm
copper_model2 =polyDlm(x =as.vector(copper_price) , y =as.vector(asx_price) , q =2, k = 2, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0  0.17800      0.131  1.3600   0.176
## beta.1  0.05290      0.207  0.2550   0.799
## beta.2 -0.00654      0.129 -0.0507   0.960

summary(copper_model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1302.7  -694.4  -135.3   635.5  1512.0 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3484.26812  181.89988  19.155   <2e-16 ***
## z.t0           0.17781    0.13067   1.361    0.176    
## z.t1          -0.15771    0.62898  -0.251    0.802    
## z.t2           0.03277    0.31191   0.105    0.916    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 743 on 155 degrees of freedom
## Multiple R-squared:  0.2877, Adjusted R-squared:  0.274 
## F-statistic: 20.87 on 3 and 155 DF,  p-value: 2.065e-11

checkresiduals(copper_model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 146.69, df = 10, p-value < 2.2e-16

#Koyk's transform
copper_model3 = koyckDlm(x =as.vector(copper_price) , y =as.vector(asx_price) )
summary(copper_model3)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
## NULL
## 
##                             alpha         beta       phi
## Geometric coefficients:  6672.885 -0.005863623 0.9716211

checkresiduals(copper_model3$model)

#ARDLM 
copper_model4.1 =ardlDlm(x =as.vector(copper_price) , y =as.vector(asx_price) ,p =1 ,q =1 )$model
summary(copper_model4.1)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -734.34 -101.82   16.56  123.05  774.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.61066   84.94572   2.044  0.04266 *  
## X.t           0.10629    0.03258   3.263  0.00136 ** 
## X.1          -0.10695    0.03228  -3.313  0.00115 ** 
## Y.1           0.96784    0.02103  46.027  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.2 on 156 degrees of freedom
## Multiple R-squared:  0.9522, Adjusted R-squared:  0.9513 
## F-statistic:  1035 on 3 and 156 DF,  p-value: < 2.2e-16

copper_model4.2 =ardlDlm(x =as.vector(copper_price) , y =as.vector(asx_price) ,p =2 ,q =2 )$model
summary(copper_model4.2)

## 
## Time series regression with "ts" data:
## Start = 3, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -693.17 -110.95    7.57  121.75  765.87 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 195.791146  85.366884   2.294   0.0232 *  
## X.t           0.079371   0.034097   2.328   0.0212 *  
## X.1          -0.004626   0.053845  -0.086   0.9316    
## X.2          -0.078848   0.034420  -2.291   0.0233 *  
## Y.1           0.972757   0.079625  12.217   <2e-16 ***
## Y.2          -0.005210   0.079957  -0.065   0.9481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 192.6 on 153 degrees of freedom
## Multiple R-squared:  0.9527, Adjusted R-squared:  0.9512 
## F-statistic: 616.8 on 5 and 153 DF,  p-value: < 2.2e-16

copper_model4.3 =ardlDlm(x =as.vector(copper_price) , y =as.vector(asx_price) ,p =3 ,q =3 )$model
summary(copper_model4.3)

## 
## Time series regression with "ts" data:
## Start = 4, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -633.75 -106.38   -7.83  119.11  746.52 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 211.80399   86.99784   2.435   0.0161 *  
## X.t           0.08242    0.03418   2.412   0.0171 *  
## X.1          -0.02677    0.05491  -0.488   0.6266    
## X.2          -0.01751    0.05445  -0.322   0.7482    
## X.3          -0.04240    0.03493  -1.214   0.2267    
## Y.1           0.94704    0.08119  11.665   <2e-16 ***
## Y.2           0.11444    0.11280   1.015   0.3120    
## Y.3          -0.09745    0.08010  -1.217   0.2257    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 192.1 on 150 degrees of freedom
## Multiple R-squared:  0.9523, Adjusted R-squared:  0.9501 
## F-statistic: 428.2 on 7 and 150 DF,  p-value: < 2.2e-16

copper_models.AIC =AIC(copper_model4.1,copper_model4.2,copper_model4.3)
copper_models.BIC =BIC(copper_model4.1,copper_model4.2,copper_model4.3)
sortScore(copper_models.AIC, "aic")

##                 df      AIC
## copper_model4.3  9 2119.696
## copper_model4.2  7 2132.038
## copper_model4.1  5 2147.741

sortScore(copper_models.BIC, "bic")

##                 df      BIC
## copper_model4.3  9 2147.259
## copper_model4.2  7 2153.520
## copper_model4.1  5 2163.116

#According to both AIC and BIC, ARDL(3,3) model is the most accurate one. We will displaydiagnostic plots of residuals from ARDL(3,3) to examine residuals.
checkresiduals(copper_model4.3$residuals)

The residual standard deviation values for the ARDLM are lower when compared to any of the other. Although the Koyk transform also has a relatively low residual SD values, it still performs a little less optimally when compared to ARDLM. The polynomial DLM and finite lag DLM fall short of our expectation as soon as well have a look at the residual SD, which is very high in comparison to the other two models. There is also the case of the ACF values falling out of 95% confidence interval, which is not a case in ARDLM and Koyk transform.

The best fit model for this particular ARDL model is with p,q = (3,3), which give us the least AIC and BIC values.

Conclusion

When we initially started with our dataset, we had to make sure if the data was stationary or not. For those purpose we tested the data using Augmented Dickey Fuller test. The given data was non-stationary hence we had to transform it using transromation and differencing.
We also had to check the effects of seasonality and trend on the time series. These components were responsible in making the data non-stationary. The process of decomposition was crucial in helping solve the effects of seasonality and trend on the data and hence in turn making it Stationary.
Then it was time to find the best fit model for ASX price. We started with comparing 4 models namely Finite lag DLM, polynomialDLM, Koyk transform and ARDLM. After looking at the residual plots and summary we concluded that it was the ARDLM which was the best fit model when it comes to our dataset.