Introduction

We use Time Series Analysis and Forecasting methods to draw statistical inferences about monthly averages of ASX all ordinaries (Ords) price index, Gold price, Crude Oil (Brent)_USD/bbl and Copper_USD/tonne starting from January 2004. The main aim of the analysis is to find a Distributed Lag Model (DLM) for the independent variable i.e. ASX Price Index. For the analysis, we used the asx_data from the RMIT University repository. The data set contains values of ASX price index, gold price in AUD, Crude oil per litre in USD and copper price per tonne in USD. We fit a time series plot for all the variables followed by analysing the stationarity and decomposition. In the last part, we fit four different models namely Distributed Lag Model (DLM), Polynomial DLM, Koyck DLM and Auto-regressive DLM for each variable. We compare each model using each models adjusted r-squared.

Loading necessary R-packages

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: carData

## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA

## Loading required package: nardl

## Loading required package: dynlm

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'dLagM'

## The following object is masked from 'package:forecast':
## 
##     forecast

## Loading required package: x13binary

## x12 is ready to use.

## Use the package x12GUI for a Graphical User Interface.

## By default the X13-ARIMA-SEATS binaries provided by the R package x13binary

## are used but this can be changed with x12path(validpath)

## ---------------

## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/x12/issues

## 
## Attaching package: 'readr'

## The following object is masked from 'package:TSA':
## 
##     spec

Loading the dataset and converting it into a time series object

asx_data <- read_csv("ASX_data.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   `ASX price` = col_double(),
##   `Gold price` = col_number(),
##   `Crude Oil (Brent)_USD/bbl` = col_double(),
##   `Copper_USD/tonne` = col_number()
## )

head(asx_data)

## # A tibble: 6 x 4
##   `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
##         <dbl>        <dbl>                       <dbl>              <dbl>
## 1       2935.         612.                        31.3               1650
## 2       2778.         603.                        32.6               1682
## 3       2849.         566.                        30.3               1656
## 4       2971.         539.                        25.0               1588
## 5       2980.         549.                        25.8               1651
## 6       3000.         536.                        27.6               1685

class(asx_data)

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Creating individual time series object of each variable from the dataset

head(asx)

##         Jan    Feb    Mar    Apr    May    Jun
## 2004 2935.4 2778.4 2848.6 2970.9 2979.8 2999.7

head(gold)

##        Jan   Feb   Mar   Apr   May   Jun
## 2004 611.9 603.3 565.7 538.6 549.4 535.9

head(oil)

##        Jan   Feb   Mar   Apr   May   Jun
## 2004 31.29 32.65 30.34 25.02 25.81 27.55

head(copper)

##       Jan  Feb  Mar  Apr  May  Jun
## 2004 1650 1682 1656 1588 1651 1685

Checking the first five observations of the dataset after converting it into a time series object.

asx_price_index <- ts(asx_data, start = c(2004,1), frequency = 12)
class(asx_price_index)

## [1] "mts"    "ts"     "matrix"

head(asx_price_index)

##          ASX price Gold price Crude Oil (Brent)_USD/bbl Copper_USD/tonne
## Jan 2004    2935.4      611.9                     31.29             1650
## Feb 2004    2778.4      603.3                     32.65             1682
## Mar 2004    2848.6      565.7                     30.34             1656
## Apr 2004    2970.9      538.6                     25.02             1588
## May 2004    2979.8      549.4                     25.81             1651
## Jun 2004    2999.7      535.9                     27.55             1685

A Time series plot of the dataset

asx.price_index_scaled = scale(asx_price_index)
plot(asx.price_index_scaled, plot.type="s",col = c("blue", "yellow", "red", "black"), main = "Fig 1. Visualizing monthly price of ASX, Gold, Crude oil, Copper")
legend("topleft",lty=1, text.width =1.5, col=c("blue", "yellow", "red", "black"), c("ASX", "Gold", "Oil", "Copper"))

Checking the existence of non-stationarity in the Dataset

The following time series plot of ASX price index starting from January 2004

plot(asx, ylab = "ASX Price Index", xlab = "Years", main = "Fig 2. Monthwise ASX price index starting from January 2004")
points(y = asx, x = time(asx), pch = as.vector(season(asx)))

Analysis & Interpretation

Trend- The ASX price index was increasing from 2004 to 2008. However, there is a sudden fall in the price index in 2008. But, again after 2008 we can see a upward trend.
Seasonality - Seasonality is present in the plot as we can observe that ASX price index increases in the months of February, March and April and decreases in the months of May and June.
Behaviour- Succeding points tell us the presence of an Auto-Regressive pattern in the time series plot.
Changing Variance - A change in variance is present.
Intervention Points- A major Intervention is observed during the year 2008 where the price index fell. It might be due to the great recession from 2007 to 2009.

ACF and PACF plots

Acf(asx, lag.max = 48, main = "Fig 3. ACF plot of ASX Price Index")

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

Pacf(asx, lag.max = 48 ,main = "Fig 4. PACF plot of ASX Price Index")

PACF plot has a significant spike only at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the ASX price index.

summary(ur.df(asx, type="none", lags=1, selectlags="AIC"))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -722.83 -103.12   36.63  138.79  809.17 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)
## z.lag.1    0.002107   0.003309   0.637    0.525
## z.diff.lag 0.085167   0.079885   1.066    0.288
## 
## Residual standard error: 203.3 on 157 degrees of freedom
## Multiple R-squared:  0.01074,    Adjusted R-squared:  -0.001866 
## F-statistic: 0.8519 on 2 and 157 DF,  p-value: 0.4286
## 
## 
## Value of test-statistic is: 0.6366 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.4286 and value of test_statistic = 0.6366

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the ASX price index using Phillips-Perron Unit Root Test

PP.test(asx,lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  asx
## Dickey-Fuller = -2.2284, Truncation lag parameter = 4, p-value = 0.4811

PP.test(asx,lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  asx
## Dickey-Fuller = -2.3932, Truncation lag parameter = 13, p-value =
## 0.4123

From the Phillips-Perron Unit Root Test, the p_value = 0.4811 when truncation lag parameter is 4 and p_value = 0.4123 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in ASX price index

lambda_asx = BoxCox.lambda(asx, method = "loglik")
lambda_asx

## [1] 1.05

The value of lambda is 1.05

asx_boxcox <- BoxCox(asx, lambda = lambda_asx )
plot(asx_boxcox, type = "o", ylab = "ASX Price Index", 
     xlab = "Years", main = "Fig 5. Time series plot of ASX Price Index after BoxCox transformation")

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

adf.test(asx_boxcox)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  asx_boxcox
## Dickey-Fuller = -2.6916, Lag order = 5, p-value = 0.2879
## alternative hypothesis: stationary

pp.test(asx_boxcox)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  asx_boxcox
## Dickey-Fuller Z(alpha) = -8.7334, Truncation lag parameter = 4, p-value
## = 0.614
## alternative hypothesis: stationary

We can see the p_value of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the ASX price index still exists.

Applying first order ordinary differencing

asx_ordiff <- diff(asx)

Time Series Plot after applying first ordinary differencing

plot(asx_ordiff, type = "o", ylab = "ASX Price Index", 
     xlab = "Years", main = "Fig 6. ASX Price Index with first order differencing")

Time series plot of the transformed data shows that the trend is reduced. But we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

par(mfrow=c(1,2))
Acf(asx_ordiff, lag.max = 48, main = "Fig 7. ACF plot Transformed ASX price index")
Pacf(asx_ordiff, lag.max = 48, main = "Fig 8. PACF plot Transformed ASX price index")

par(mfrow=c(1,1))

From the ACF and PACF plots we can see the decaying pattern has faded. There are no significant lags in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

adf.test(asx_ordiff)

## Warning in adf.test(asx_ordiff): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  asx_ordiff
## Dickey-Fuller = -4.5543, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

pp.test(asx_ordiff)

## Warning in pp.test(asx_ordiff): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  asx_ordiff
## Dickey-Fuller Z(alpha) = -162.52, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the ASX price index time series data.

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the ASX price index is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the ASX price index which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

plotSeasFac(asx12, main = "Fig 10. SI Plot of ASX Price Index")

We observe seasonality this plot as the values fluctuate at the mean during the months of February, March and April followed by June, July and August.

We apply STL Decomposition on the transformed ASX price index time series data

asx_stl <- stl(asx, t.window = 15, s.window = "periodic",
               robust = TRUE)
plot(asx_stl, main = "Fig 11. STL Plot of ASX Price Index")

In the above plot, we observe a smoothed plot of the original time series plot of ASX price index

The effect of seasonality is seen in the second plot.

Highest ASX price index values were recorded in the month of March while lowest ASX price index values were recorded in the months of October-November.

In the remainder part, there are some significant fluctuations and changing variance is present

We display the forecasts for ASX Price index from STL Method

asx_seasadj <- seasadj(asx_stl)
plot(naive(asx_seasadj), xlab ="New orders index", 
     main = "Fig 12. Naive forecasts of ASX Price Index")

Naive forecasts represent the mean level for ASX Price Index

The following time series plot of Gold price (AUD) starting from January 2004

plot(gold, ylab = "Gold Price", xlab = "Years", main = "Fig 13. Monthwise Gold price starting from January 2004")
points(y = gold, x = time(gold), pch = as.vector(season(gold)))

Analysis & Interpretation

Trend- There is an increasing trend in Gold price (AUD) since 2004.
Seasonality - We can observe seasonality in the plot the price of Gold is volatile. We observe prices fluctuating every year.
Behaviour- Succeding points tell us the presence of an Auto-Regressive pattern in the time series plot.
Changing Variance - A change in variance is present.
Intervention Points- No major Interventions are observed in the time series plot of Gold Price.

ACF and PACF plots

Acf(gold,lag.max = 48, main = "Fig 15. ACF plot of Gold Prices (AUD)")

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

Pacf(gold, lag.max = 48, main = "Fig 16. PACF plot of Gold Prices (AUD)")

PACF plot has a significant spike only at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the Gold price (AUD)

summary(ur.df(gold, type="none", lags=1, selectlags="AIC"))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -138.655  -26.586   -2.016   22.726  206.439 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)
## z.lag.1    0.003218   0.003322   0.969    0.334
## z.diff.lag 0.122451   0.079603   1.538    0.126
## 
## Residual standard error: 52.57 on 157 degrees of freedom
## Multiple R-squared:  0.02351,    Adjusted R-squared:  0.01107 
## F-statistic:  1.89 on 2 and 157 DF,  p-value: 0.1545
## 
## 
## Value of test-statistic is: 0.9687 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.1545 and value of test_statistic = 0.9687

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the Gold price index using Phillips-Perron Unit Root Test

PP.test(gold,lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  gold
## Dickey-Fuller = -2.1044, Truncation lag parameter = 4, p-value = 0.5328

PP.test(gold,lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  gold
## Dickey-Fuller = -1.8656, Truncation lag parameter = 13, p-value =
## 0.6324

From the Phillips-Perron Unit Root Test, the p_value = 0.5328 when truncation lag parameter is 4 and p_value = 0.6324 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in Gold price index

lambda_gold <- BoxCox.lambda(gold, method = "loglik")
lambda_gold

## [1] 1.45

The value of lambda is 1.05

gold_boxcox <- BoxCox(gold, lambda = lambda_gold)

plot(gold_boxcox, type = "o", ylab = "Gold Price", xlab = "Years",
     main = "Fig 17. Gold Prices (AUD) after Box-Cox transformation")

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

adf.test(gold_boxcox)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold_boxcox
## Dickey-Fuller = -2.0813, Lag order = 5, p-value = 0.5424
## alternative hypothesis: stationary

pp.test(gold_boxcox)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  gold_boxcox
## Dickey-Fuller Z(alpha) = -11.535, Truncation lag parameter = 4, p-value
## = 0.4534
## alternative hypothesis: stationary

We can see the p_value of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the Gold price (AUD) still exists.

Applying first order ordinary differencing

gold_ordiff <- diff(gold)

Time Series Plot applying first ordinary differencing

plot(gold_ordiff, ylab = "GOld Price", xlab = "Years", type = "o",
     main = "Fig 18. Gold Prices with first order differencing")

Time series plot of the transformed data shows that the trend is however reduced. But we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

par(mfrow=c(1,2))
Acf(gold_ordiff,lag.max = 48, main = "Fig 19. ACF plot transformed Gold price (AUD)")
Pacf(gold_ordiff, lag.max = 48, main = "Fig 20. PACF plot transformed Gold price (AUD)")

par(mfrow=c(1,1))

From the ACF and PACF plots we can see the decaying pattern has faded. There are no significant lags in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

adf.test(gold_ordiff)

## Warning in adf.test(gold_ordiff): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gold_ordiff
## Dickey-Fuller = -5.8718, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

pp.test(gold_ordiff)

## Warning in pp.test(gold_ordiff): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  gold_ordiff
## Dickey-Fuller Z(alpha) = -127.96, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the Gold price (AUD) time series data.

gold_x12 <- x12(gold)
plot(gold_x12, sa =TRUE, trend = TRUE, main = "Fig 21. X12 plot of Gold Price (AUD)")

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the Gold price is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the Gold price which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

plotSeasFac(gold_x12, main = "Fig 22. SI Plot of Gold Price (AUD)")

We observe seasonality this plot as the values fluctuate at the mean during the months of July,August and September followed by January, February and March.

We apply STL Decomposition on the transformed Gold price (AUD) time series data

gold_stl <- stl(gold, t.window = 15, s.window = "periodic", 
                  robust = TRUE)
plot(gold_stl, main = "Fig 23. STL plot of Gold Price (AUD)")

In the above plot, we observe a smoothed plot of the original time series plot of Gold price

The effect of seasonality is seen in the second plot.

Highest Gold price values were recorded in the month of September while lowest Gold price values were recorded in the months of March

In the remainder part, there are some significant fluctuations and changing variance is present

We display the forecasts for Gold Price index from STL Method

gold_seasadj <- seasadj(gold_stl)
plot(naive(gold_seasadj), xlab = "New Orders Index",
     main = "Fig 24. Naive forecasts of Gold Price (AUD)")

Naive forecasts represent the mean level for Gold Price Index

The following time series plot of Crude Oil starting from January 2004

plot(oil, ylab = "Crude Oil Price", xlab = "Years",
     main = "Fig 25. Times Series plot of Crude Oil Prices per Litre (USD)")
points(y = oil, x = time(oil), pch = as.vector(season(oil)))

Analysis & Interpretation

Trend- We can a increasing trend in Crude Oil price since 2004. However, the crude oil suddenly falls in the year 2009. These trends are caused due the great recession.
Seasonality - A repeating cycle or pattern is be observed from the plot. The prices tend to increase during the February, March, April and prices fall during June, July. Hence, we can say there is a presence of seasonality.
Behaviour- Succeding points tell us the presence of an Auto-Regressive pattern in the time series plot.
Changing Variance - A change in variance is present.
Intervention Points- Two major Interventions are observed in the time series plot of Crude Oil Price. First is seen during the years 2007 to 2009 i.e. during the great recession period. The second intervention is seen in the year 2015.

ACF and PACF plots

Acf(oil,lag.max = 48, main = "Fig 26. ACF plot of Crude Oil Prices")

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

Pacf(oil,lag.max = 48, main = "Fig 27. ACF plot of Crude Oil Prices")

PACF plot has a significant spike only at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the Crude Oil price

summary(ur.df(oil, type="none", lags=1, selectlags="AIC"))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.905  -3.562   1.204   3.988  15.202 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.004013   0.005806  -0.691     0.49    
## z.diff.lag  0.400123   0.073236   5.463  1.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.839 on 157 degrees of freedom
## Multiple R-squared:  0.1605, Adjusted R-squared:  0.1498 
## F-statistic: 15.01 on 2 and 157 DF,  p-value: 1.084e-06
## 
## 
## Value of test-statistic is: -0.6913 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.1498 and test_statistic = 0.6913

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the Crude oil price using Phillips-Perron Unit Root Test

PP.test(oil,lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  oil
## Dickey-Fuller = -1.7567, Truncation lag parameter = 4, p-value = 0.6778

PP.test(oil,lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  oil
## Dickey-Fuller = -1.5181, Truncation lag parameter = 13, p-value =
## 0.7773

From the Phillips-Perron Unit Root Test, the p_value = 0.6778 when truncation lag parameter is 4 and p_value = 0.7773 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in Gold price index

lambda_oil <- BoxCox.lambda(oil, method = "loglik")
lambda_oil

## [1] 0.4

oil_boxcox <- BoxCox(oil, lambda = lambda_oil)

plot(oil_boxcox, type = "o", ylab = "Crude Oil Price", xlab = "Years",
     main = "Fig 28. Crude Oil Prices per litre (USD) after Box-Cox transformation")

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

adf.test(oil_boxcox)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  oil_boxcox
## Dickey-Fuller = -1.8707, Lag order = 5, p-value = 0.6303
## alternative hypothesis: stationary

pp.test(oil_boxcox)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  oil_boxcox
## Dickey-Fuller Z(alpha) = -6.0377, Truncation lag parameter = 4, p-value
## = 0.7685
## alternative hypothesis: stationary

We can see the p_value of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the Crude oil price still exists.

Applying first order ordinary differencing

oil_ordiff <- diff(oil)

Time Series Plot applying first ordinary differencing

plot(oil_ordiff, ylab = "Crude Oil Price", xlab = "Years", type = "o",
     main = "Fig 29. Crude Oil Prices per litre (USD) with first order differencing")

Time series plot of the transformed data shows that the trend is reduced. However, we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

par(mfrow=c(1,2))
Acf(oil_ordiff, lag.max = 48,main = "Fig 30. ACF plot transformed Crude oil price per litre (USD)")
Pacf(oil_ordiff,lag.max = 48, main = "Fig 31. PACF plot transformed Crude oil price per litre (USD)")

par(mfrow=c(1,1))

From the ACF and PACF plots we can see the decaying pattern has faded but some part of it is still visible. We can there are no significant lags at lag-1 in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

adf.test(oil_ordiff)

## Warning in adf.test(oil_ordiff): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  oil_ordiff
## Dickey-Fuller = -5.4261, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

pp.test(oil_ordiff)

## Warning in pp.test(oil_ordiff): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  oil_ordiff
## Dickey-Fuller Z(alpha) = -96.402, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the Gold price (AUD) time series data.

oil_x12 <- x12(oil)
plot(oil_x12, sa = TRUE, trend = TRUE, main = "Fig 32. X12 plot of Crude Oil Price per litre (USD)")

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the Crude Oil price is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the Crude oil price which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

plotSeasFac(oil_x12, main = "Fig 33. SI Plot of Crude Oil Price per litre (USD)")

We observe seasonality this plot as the values fluctuate at the mean during the months of February and March followed by June and July.

We apply STL Decomposition on the transformed Gold price (AUD) time series data

oil_stl <- stl(oil, t.window = 15, s.window = "periodic",
                   robust = TRUE)
plot(oil_stl, main = "Fig 34. STL Plot of Crude Oil Price per litre (USD)")

In the above plot, we observe a smoothed plot of the original time series plot of Crude oil price

The effect of seasonality is seen in the second plot.

Highest Crude oil price values were recorded in the month of March while lowest Crude oil price values were recorded in the months of July

In the remainder part, there are some significant fluctuations and changing variance is present

We display the forecasts for Crude oil Price from STL Method

oil_seasadj <- seasadj(oil_stl)
plot(naive(oil_seasadj), xlab = "New Orders Index",
     main = "Fig 35. Naive forecasts of Crude Oil Price per litre (USD)")

Naive forecasts represent the mean level for Crude oil Price

The following time series plot of Crude Oil starting from January 2004

plot(copper, ylab = "Copper Price", xlab = "Years",
     main = "Fig 36. Times Series plot of Copper Price per tonne (USD)")
points(y = copper, x = time(copper), pch = as.vector(season(copper)))

Analysis & Interpretation

Trend- We can a increasing trend in Copper price since 2004. However, the copper price suddenly falls in the year 2009. These trends are caused due the great recession.
Seasonality - A repeating cycle or pattern is be observed in the plot. The prices tend to increase during the December, January, February and prices fall during August, September and October. Hence, we can say there is a presence of seasonality.
Behaviour- Succeding points tell us the presence of an Auto-Regressive pattern in the time series plot.
Changing Variance - A change in variance is present.
Intervention Points- A major Intervention is observed in the time series plot of Copper Price. It is seen during the years 2007 to 2009 i.e. during the great recession period.

ACF and PACF plots

Acf(copper,lag.max = 48, main = "Fig 37. ACF plot of Copper Prices")

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

Pacf(copper,lag.max = 48, main = "Fig 38. ACF plot of Copper Prices")

PACF plot has a significant spike at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the Copper price

summary(ur.df(copper, type="none", lags=1, selectlags="AIC"))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1861.58  -164.87    32.72   248.08  2829.69 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.002256   0.005772  -0.391    0.696    
## z.diff.lag  0.307564   0.076139   4.040 8.36e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 463.1 on 157 degrees of freedom
## Multiple R-squared:  0.09422,    Adjusted R-squared:  0.08268 
## F-statistic: 8.166 on 2 and 157 DF,  p-value: 0.0004229
## 
## 
## Value of test-statistic is: -0.3909 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

adf.test(copper)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper
## Dickey-Fuller = -2.2502, Lag order = 5, p-value = 0.472
## alternative hypothesis: stationary

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.472 and test_statistic = 2.2502

The p_value is greater that 5% level of significance, hence, we fail reject the null hypothesis. We assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the Crude oil price using Phillips-Perron Unit Root Test

PP.test(copper,lshort = TRUE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  copper
## Dickey-Fuller = -2.0718, Truncation lag parameter = 4, p-value = 0.5464

PP.test(copper,lshort = FALSE)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  copper
## Dickey-Fuller = -1.9005, Truncation lag parameter = 13, p-value =
## 0.6178

From the Phillips-Perron Unit Root Test, the p_value = 0.5464 when truncation lag parameter is 4 and p_value = 0.6178 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in Copper price index

lambda_copper <- BoxCox.lambda(copper, method = "loglik")
lambda_copper

## [1] 1.1

The value of lambda is 1.1

copper_boxcox<- BoxCox(copper, lambda = lambda_copper)

plot(copper_boxcox, type = "o", ylab = "Copper Price", xlab = "Years",
     main = "Fig 40. Copper Prices with Box-Cox transformation")

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

adf.test(copper_boxcox)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_boxcox
## Dickey-Fuller = -2.2362, Lag order = 5, p-value = 0.4778
## alternative hypothesis: stationary

pp.test(copper_boxcox)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  copper_boxcox
## Dickey-Fuller Z(alpha) = -7.995, Truncation lag parameter = 4, p-value
## = 0.6563
## alternative hypothesis: stationary

We can see the p_values of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the Copper price still exists.

Applying first order ordinary differencing

copper_ordiff <- diff(copper)

Time Series Plot applying first ordinary differencing

plot(copper_ordiff, ylab = "Copper Price", xlab = "Years", type = "o",
     main = "Fig 41. Copper Prices with first order differencing")

Time series plot of the transformed data shows that the trend is reduced. However, we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

par(mfrow=c(1,2))
Acf(copper_ordiff,lag.max = 48, main = "Fig 42. ACF plot transformed Copper price index")
Pacf(copper_ordiff,lag.max = 48, main = "Fig 43. PACF plot transformed Copper price index")

par(mfrow=c(1,1))

From the ACF and PACF plots we can see the decaying pattern has faded but some part of it is still visible. We can there are no significant lags at lag-1 in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

adf.test(copper_ordiff)

## Warning in adf.test(copper_ordiff): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_ordiff
## Dickey-Fuller = -5.478, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

pp.test(copper_ordiff)

## Warning in pp.test(copper_ordiff): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  copper_ordiff
## Dickey-Fuller Z(alpha) = -106.69, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the Copper price time series data.

copper_x12 <- x12(copper)
plot(copper_x12, sa = TRUE, trend = TRUE, main = "Fig 44. X12 Plot of Copper price per tonne (USD)")

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the Copper price is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the Copper price which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

plotSeasFac(copper_x12, main = "Fig 45. SI Plot of Copper price per tonne (USD)")

We observe seasonality this plot as the values fluctuate at the mean during the months of January and February followed by September and October.

We apply STL Decomposition on the transformed Copper price time series data

copper_stl <- stl(copper, t.window = 15, s.window = "periodic",
                   robust = TRUE)
plot(copper_stl, main = "Fig 46. STL Plot of Copper price per tonne (USD)")

In the above plot, we observe a smoothed plot of the original time series plot of Crude oil price

The effect of seasonality is seen in the second plot.

Highest Copper price values were recorded in the month of February while lowest Copper price values were recorded in the months of October

In the remainder part, there are some significant fluctuations and changing variance is present

We display the forecasts for Copper Price from STL Method

copper_seasadj <- seasadj(copper_stl)
plot(naive(copper_seasadj), xlab = "New Orders Index",
     main = "Fig 47. Naive forecasts of Copper Prices per tonne (USD)")

Naive forecasts represent the mean level for Copper Price

A Time Series plot of the Data Set

plot(asx_price_index, xlab = "Years", ylab = "Copper  Crude_Oil  Gold  ASX",
     main = "Fig 48. Time series plot for ASXPrice Index, Gold (AUD), Crude Oil (USD/litre) and Copper Price (USD/tonne")

A Pair plot (Scatterplot) to check correlation between the dependent and independent variable

pairs(asx_price_index[,1:4], pch = 1, lower.panel = NULL)

From the above plot, we observe that ASX Price Index and Copper are highly correlated. From this, it can be concluded that an increase in the value of independent variable causes an increase in the value of the dependent variable.

To further check this, we use cor() function.

cor(asx_price_index)

##                           ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price                 1.0000000  0.3431908                 0.3290338
## Gold price                0.3431908  1.0000000                 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338  0.4366382                 1.0000000
## Copper_USD/tonne          0.5617864  0.5364213                 0.8664296
##                           Copper_USD/tonne
## ASX price                        0.5617864
## Gold price                       0.5364213
## Crude Oil (Brent)_USD/bbl        0.8664296
## Copper_USD/tonne                 1.0000000

A strong positive correlation is observed between Copper_USD/tonne and ASX price

From this, we can explain the dependent variable ASX price index more precisely when we use the information from the independent variable Copper_USD/tonne

Firstly, we fit a distributed lag model for ASX price and Copper.

asx_copper = data.frame(asx_data$`ASX price`,asx_data$`Copper_USD/tonne`)
asx_copper.ts = ts(asx_copper, start = 2004, frequency = 12)
head(asx_copper.ts)

##          asx_data..ASX.price. asx_data..Copper_USD.tonne.
## Jan 2004               2935.4                        1650
## Feb 2004               2778.4                        1682
## Mar 2004               2848.6                        1656
## Apr 2004               2970.9                        1588
## May 2004               2979.8                        1651
## Jun 2004               2999.7                        1685

asx_copper_scaled = scale(asx_copper.ts)
plot(asx_copper_scaled, plot.type = "s", col = c("blue","red"), main = "Fig 49. Time Series Plot of Copper price per tonne (USD) vs ASX Price Index")
legend("bottomright", colnames(asx_copper_scaled), col=1:ncol(asx_copper), lty=2, cex=.75)

Distrubuted Lag Model of ASX price index and copper_usd/tonne with lag 1

model1c <- dlm(x = as.vector(copper), y = as.vector(asx), 
             q = 1)
summary(model1c)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1315.8  -679.5  -117.9   646.4  1506.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.437e+03  1.781e+02  19.295   <2e-16 ***
## x.t         1.847e-01  1.238e-01   1.492    0.138    
## x.1         4.624e-02  1.222e-01   0.378    0.706    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 743.1 on 157 degrees of freedom
## Multiple R-squared:  0.3028, Adjusted R-squared:  0.2939 
## F-statistic: 34.09 on 2 and 157 DF,  p-value: 5.06e-13
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2574.488 2586.789

From the above model, we observe the following

Residual standard error = 743.1 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 34.09
The multiple and adjusted R_sqrd are 30.28 and 29.39 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1c$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 147.66, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1c <- vif(model1c$model)
vif_model1c

##      x.t      x.1 
## 19.66222 19.66222

vif_model1c > 10

##  x.t  x.1 
## TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Specification of lag length:

model1c_AIC_lag_length <- finiteDLMauto(x = as.vector(copper), y = as.vector(asx), 
                                       q.min = 1, q.max = 12, model.type = "dlm", 
                                       error.type = "AIC", trace = TRUE)
model1c_AIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.88750 2405.162 2450.221 4.98966  0.75956  0.08924         0
## 11    11 3.93602 2420.623 2462.771 5.00068  1.10787  0.10850         0
## 10    10 3.97380 2436.164 2475.389 5.07172  0.93641  0.12924         0
## 9      9 4.00524 2451.016 2487.302 5.07902  0.99736  0.14957         0
## 8      8 4.03682 2466.511 2499.846 5.19900  0.04304  0.16793         0
## 7      7 4.07595 2481.988 2512.357 5.19017  0.78489  0.18422         0
## 6      6 4.10610 2497.775 2525.166 5.29119  0.78994  0.20021         0
## 5      5 4.12565 2513.265 2537.664 5.36259  0.90478  0.21921         0
## 4      4 4.16156 2528.895 2550.289 5.38475  0.88789  0.23649         0
## 3      3 4.19532 2544.155 2562.531 5.23016  0.67751  0.25434         0
## 2      2 4.21688 2559.356 2574.700 5.34802  0.67093  0.27395         0
## 1      1 4.24262 2574.488 2586.789 5.38993 -0.49453  0.29391         0

model1c_BIC_lag_length <- 
        finiteDLMauto(x =as.vector(copper), y = as.vector(asx),
                      q.min = 1, q.max = 12, error.type = "BIC",
                      model.type = "dlm", trace = TRUE)
model1c_BIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.88750 2405.162 2450.221 4.98966  0.75956  0.08924         0
## 11    11 3.93602 2420.623 2462.771 5.00068  1.10787  0.10850         0
## 10    10 3.97380 2436.164 2475.389 5.07172  0.93641  0.12924         0
## 9      9 4.00524 2451.016 2487.302 5.07902  0.99736  0.14957         0
## 8      8 4.03682 2466.511 2499.846 5.19900  0.04304  0.16793         0
## 7      7 4.07595 2481.988 2512.357 5.19017  0.78489  0.18422         0
## 6      6 4.10610 2497.775 2525.166 5.29119  0.78994  0.20021         0
## 5      5 4.12565 2513.265 2537.664 5.36259  0.90478  0.21921         0
## 4      4 4.16156 2528.895 2550.289 5.38475  0.88789  0.23649         0
## 3      3 4.19532 2544.155 2562.531 5.23016  0.67751  0.25434         0
## 2      2 4.21688 2559.356 2574.700 5.34802  0.67093  0.27395         0
## 1      1 4.24262 2574.488 2586.789 5.38993 -0.49453  0.29391         0

We see from the AIC and BIC criteria that lag 12 has the lowest AIC and BIC values. Hence, we specify the lag length as 12.

model1.1c <- dlm(x = as.vector(copper), y = as.vector(asx), q = 12)
summary(model1.1c)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1196.59  -618.58    58.35   582.94  1441.94 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.117e+03  2.276e+02  18.087   <2e-16 ***
## x.t          1.458e-01  1.369e-01   1.065    0.289    
## x.1          2.374e-02  2.212e-01   0.107    0.915    
## x.2          3.433e-02  2.239e-01   0.153    0.878    
## x.3          2.489e-02  2.234e-01   0.111    0.911    
## x.4          2.331e-02  2.225e-01   0.105    0.917    
## x.5         -4.938e-02  2.209e-01  -0.224    0.823    
## x.6          3.183e-02  2.207e-01   0.144    0.886    
## x.7         -1.153e-02  2.218e-01  -0.052    0.959    
## x.8         -2.387e-03  2.244e-01  -0.011    0.992    
## x.9         -6.943e-02  2.252e-01  -0.308    0.758    
## x.10        -2.573e-02  2.258e-01  -0.114    0.909    
## x.11        -2.364e-02  2.232e-01  -0.106    0.916    
## x.12         2.764e-02  1.357e-01   0.204    0.839    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 735.6 on 135 degrees of freedom
## Multiple R-squared:  0.1692, Adjusted R-squared:  0.08924 
## F-statistic: 2.115 on 13 and 135 DF,  p-value: 0.01687
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2405.162 2450.221

From the above model, we observe the following

Residual standard error = 735.6 on 135 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 2.115
The multiple and adjusted R_sqrd are 16.92 and 8.924 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1.1c$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals
## LM test = 137.27, df = 17, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1.1c <- vif(model1.1c$model)
vif_model1.1c

##      x.t      x.1      x.2      x.3      x.4      x.5      x.6      x.7 
## 17.21011 46.38555 49.15297 50.67591 52.07731 53.10852 54.77615 57.11949 
##      x.8      x.9     x.10     x.11     x.12 
## 60.14983 62.29806 64.29926 64.49842 24.41346

vif_model1.1c > 10

##  x.t  x.1  x.2  x.3  x.4  x.5  x.6  x.7  x.8  x.9 x.10 x.11 x.12 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Polynomial Distributed Lag Model

We can reduce the effect of multi-collinearity using the polynomial distributed lag model

model2c <- polyDlm(x = as.vector(copper), y = as.vector(asx),
                 q = 1, k = 1, show.beta = FALSE)
summary(model2c)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1315.8  -679.5  -117.9   646.4  1506.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3436.7102   178.1110  19.295   <2e-16 ***
## z.t0           0.1847     0.1238   1.492    0.138    
## z.t1          -0.1385     0.2445  -0.566    0.572    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 743.1 on 157 degrees of freedom
## Multiple R-squared:  0.3028, Adjusted R-squared:  0.2939 
## F-statistic: 34.09 on 2 and 157 DF,  p-value: 5.06e-13

From the above model, we observe the following

Residual standard error = 743.1 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 34.09
The multiple and adjusted R_sqrd are 30.28 and 29.39 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

A polynomial pattern (q = 4, k = 4) is imposed on the lag distribution i.e. equivalent to represent beta parameters with Kth order polynomial model time.

model2.2c <- polyDlm(x = as.vector(copper), y = as.vector(asx),
                 q = 4, k = 4, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0   0.1670      0.131   1.270   0.205
## beta.1   0.0332      0.211   0.157   0.875
## beta.2   0.0405      0.213   0.190   0.849
## beta.3   0.0645      0.211   0.306   0.760
## beta.4  -0.0980      0.129  -0.758   0.449

In the above polynomial distributed lag model, we observe that all the variables are not significant to 5% level of significance.

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model2.2c$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 144.76, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model2.2c <- vif(model2.2c$model)
vif_model2.2c

##         z.t0         z.t1         z.t2         z.t3         z.t4 
##     506.9064  319571.6069 4844004.8499 8794411.8246 1726358.6343

vif_model2.2c > 10

## z.t0 z.t1 z.t2 z.t3 z.t4 
## TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Koyck Distributed Lag Model

model3c <- koyckDlm(x = as.vector(copper), y = as.vector(asx))
summary(model3c, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2  statistic      p-value
## Weak instruments   1 157 1966.86799 1.043205e-90
## Wu-Hausman         1 156   10.97528 1.147725e-03
## 
##                             alpha         beta       phi
## Geometric coefficients:  6672.885 -0.005863623 0.9716211

From the above model, we observe the following

Residual standard error = 201.9 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
The multiple and adjusted R_sqrd are 94.85 and 94.79 respectively. The adjusted R_sqrd are comparitively greater than the Distributed Lag & Polynomial Distributed Lag model
Also, variable Y.1 in the model is significant to 5% level of significance. The beta and phi values are -0.0058 and 0.971 respectively
The weak instruments p_value suggests that the model is significant at first stage of least squares fitting
Wu-Hausman test is used to compare the consistency of instrumental variable estimates to ordinary least square estimates. The presence of endogeniety tells us that instrumental variable estimate is consistent and ordinary least square estimate is inconsistent.
The Wu-Hausman test is significant as the p_value is less than 5% level of significance. Hence, we reject the null hypothesis that correlation between explanatory vairable and error term is zero or in other words endogeniety is not present.
Therefore, we conclude that there is significant correlation between explanatory variable and error term is present at 5% level of significance.

Residual Check

checkresiduals(model3c$model)

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Multi-collinearity Check

vif_model3c <- vif(model3c$model)
vif_model3c

##      Y.1      X.t 
## 1.493966 1.493966

vif_model3c > 10

##   Y.1   X.t 
## FALSE FALSE

The variation inflation factor is less than 5 when value of q = 2, we conclude the effect of multi-collinearity is not present.

Auto Regressive Distributed Lag Model

model4c <- ardlDlm(x = as.vector(copper), y = as.vector(asx),
                 p = 1, q = 1)
summary(model4c)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -734.34 -101.82   16.56  123.05  774.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 173.61066   84.94572   2.044  0.04266 *  
## X.t           0.10629    0.03258   3.263  0.00136 ** 
## X.1          -0.10695    0.03228  -3.313  0.00115 ** 
## Y.1           0.96784    0.02103  46.027  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.2 on 156 degrees of freedom
## Multiple R-squared:  0.9522, Adjusted R-squared:  0.9513 
## F-statistic:  1035 on 3 and 156 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 195.2 on 156 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 1035 on 3 and 156 Degrees of freedom
The multiple and adjusted R_sqrd are 95.22 and 95.13 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Also, all the variables in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(model4c$model, lag = 2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 2
## 
## data:  Residuals
## LM test = 2.1004, df = 2, p-value = 0.3499

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

Multi-collinearity Check

vif_model4c <- vif(model4c$model)
vif_model4c

##       X.t L(X.t, 1) L(y.t, 1) 
## 19.716144 19.873449  1.473109

vif_model4c > 10

##       X.t L(X.t, 1) L(y.t, 1) 
##      TRUE      TRUE     FALSE

The variation inflation factor is less than 5 for coefficient Y.t when value of q = 1.

Determining the values of p and q using AIC and BIC values

for(i in 1:5){
        for(j in 1:5){
                model4.1c <- ardlDlm(x = as.vector(copper),
                                   y = as.vector(asx), p = i, q = j)
                cat("p:",i,"q:",j,"AIC:",AIC(model4.1c$model), "BIC:",
                    BIC(model4.1c$model),"\n")
        }
}

## p: 1 q: 1 AIC: 2147.741 BIC: 2163.116 
## p: 1 q: 2 AIC: 2135.4 BIC: 2153.813 
## p: 1 q: 3 AIC: 2121.12 BIC: 2142.558 
## p: 1 q: 4 AIC: 2109.759 BIC: 2134.209 
## p: 1 q: 5 AIC: 2099.056 BIC: 2126.505 
## p: 2 q: 1 AIC: 2130.043 BIC: 2148.456 
## p: 2 q: 2 AIC: 2132.038 BIC: 2153.52 
## p: 2 q: 3 AIC: 2119.241 BIC: 2143.741 
## p: 2 q: 4 AIC: 2107.649 BIC: 2135.155 
## p: 2 q: 5 AIC: 2097.021 BIC: 2127.52 
## p: 3 q: 1 AIC: 2117.307 BIC: 2138.745 
## p: 3 q: 2 AIC: 2119.247 BIC: 2143.748 
## p: 3 q: 3 AIC: 2119.696 BIC: 2147.259 
## p: 3 q: 4 AIC: 2108.537 BIC: 2139.1 
## p: 3 q: 5 AIC: 2097.832 BIC: 2131.38 
## p: 4 q: 1 AIC: 2105.916 BIC: 2130.366 
## p: 4 q: 2 AIC: 2107.774 BIC: 2135.28 
## p: 4 q: 3 AIC: 2108.608 BIC: 2139.17 
## p: 4 q: 4 AIC: 2110.085 BIC: 2143.704 
## p: 4 q: 5 AIC: 2099.454 BIC: 2136.052 
## p: 5 q: 1 AIC: 2095.118 BIC: 2122.566 
## p: 5 q: 2 AIC: 2096.96 BIC: 2127.459 
## p: 5 q: 3 AIC: 2097.887 BIC: 2131.436 
## p: 5 q: 4 AIC: 2099.497 BIC: 2136.095 
## p: 5 q: 5 AIC: 2101.419 BIC: 2141.067

We observe that p = 5 and q = 1 give the smallest AIC and BIC values.

best_model_copper <- ardlDlm(x = as.vector(copper), y = as.vector(asx),
                 p = 5, q = 1)
summary(best_model_copper)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -663.01 -104.28   -6.52  134.04  748.83 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 223.07851   91.31917   2.443   0.0157 *  
## X.t           0.08180    0.03445   2.374   0.0189 *  
## X.1          -0.02382    0.05497  -0.433   0.6654    
## X.2          -0.02426    0.05554  -0.437   0.6629    
## X.3          -0.01059    0.05554  -0.191   0.8490    
## X.4          -0.01423    0.05498  -0.259   0.7962    
## X.5          -0.01564    0.03396  -0.461   0.6457    
## Y.1           0.96496    0.02137  45.160   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 193.4 on 148 degrees of freedom
## Multiple R-squared:  0.9492, Adjusted R-squared:  0.9468 
## F-statistic: 395.2 on 7 and 148 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 193.4 on 148 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 395.2 on 7 and 148 Degrees of freedom
The multiple and adjusted R_sqrd are 94.92 and 94.68 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Variables X.t and Y.1 in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(best_model_copper$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5.0114, df = 11, p-value = 0.9306

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

————————————————————————————————————————————————————————

Now, we fit a distributed lag model for ASX price and Gold Price (AUD)

asx_gold = data.frame(asx_data$`ASX price`,asx_data$`Gold price`)
asx_gold.ts = ts(asx_gold, start = 2004, frequency = 12)
head(asx_gold.ts)

##          asx_data..ASX.price. asx_data..Gold.price.
## Jan 2004               2935.4                 611.9
## Feb 2004               2778.4                 603.3
## Mar 2004               2848.6                 565.7
## Apr 2004               2970.9                 538.6
## May 2004               2979.8                 549.4
## Jun 2004               2999.7                 535.9

asx_gold_scaled = scale(asx_gold.ts)
plot(asx_gold_scaled, plot.type = "s", col = c("blue","red"), main = "Fig 50. Time series Plot of Gold price (AUD) vs ASX Price Index")
legend("topleft", colnames(asx_gold_scaled), col=1:ncol(asx_gold), lty=1, cex=.70)

Distrubuted Lag Model of ASX price index and copper_usd/tonne with lag 1

model1g <- dlm(x = as.vector(gold), y = as.vector(asx), 
             q = 1)
summary(model1g)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1654.13  -706.62    -9.17   561.27  2219.74 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3943.0338   211.5468  18.639   <2e-16 ***
## x.t            0.4117     1.2742   0.323    0.747    
## x.1            0.3216     1.2711   0.253    0.801    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 839.7 on 157 degrees of freedom
## Multiple R-squared:  0.1097, Adjusted R-squared:  0.09832 
## F-statistic: 9.669 on 2 and 157 DF,  p-value: 0.0001097
## 
## AIC and BIC values for the model:
##        AIC     BIC
## 1 2613.609 2625.91

From the above model, we observe the following

Residual standard error = 839.7 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.669
The multiple and adjusted R_sqrd are 10.97 and 9.832 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1g$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 148.3, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1g <- vif(model1g$model)
vif_model1g

##      x.t      x.1 
## 58.50301 58.50301

vif_model1g > 10

##  x.t  x.1 
## TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Specification of lag length

model1g_AIC_lag_length <- finiteDLMauto(x = as.vector(gold), y = as.vector(asx), 
                                       q.min = 1, q.max = 12, model.type = "dlm", 
                                       error.type = "AIC", trace = TRUE)
model1g_AIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.78532 2425.883 2470.942 4.08977 1.42891 -0.04664         0
## 11    11 3.85987 2442.989 2485.138 4.27351 1.04343 -0.03486         0
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

model1g_BIC_lag_length <- 
        finiteDLMauto(x =as.vector(gold), y = as.vector(asx),
                      q.min = 1, q.max = 12, error.type = "BIC",
                      model.type = "dlm", trace = TRUE)
model1g_BIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.78532 2425.883 2470.942 4.08977 1.42891 -0.04664         0
## 11    11 3.85987 2442.989 2485.138 4.27351 1.04343 -0.03486         0
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

We see from the AIC and BIC criteria that lag 12 has the lowest AIC and BIC values. Hence, we specify the lag length as 12.

model1.1g <- dlm(x = as.vector(gold), y = as.vector(asx), q = 12)
summary(model1.1g)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1550.43  -505.36    30.42   500.55  1884.16 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4683.75496  230.73258  20.299   <2e-16 ***
## x.t           -0.73063    1.27183  -0.574    0.567    
## x.1           -0.05830    1.88620  -0.031    0.975    
## x.2            0.06822    1.90429   0.036    0.971    
## x.3            0.02231    1.91389   0.012    0.991    
## x.4           -0.38688    1.90638  -0.203    0.839    
## x.5            0.37549    1.90912   0.197    0.844    
## x.6            0.08537    1.93160   0.044    0.965    
## x.7            0.61146    1.93935   0.315    0.753    
## x.8           -0.28852    1.95099  -0.148    0.883    
## x.9            0.17788    1.96992   0.090    0.928    
## x.10           0.22900    1.95773   0.117    0.907    
## x.11          -0.06205    1.94273  -0.032    0.975    
## x.12           0.22419    1.29372   0.173    0.863    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 788.6 on 135 degrees of freedom
## Multiple R-squared:  0.04529,    Adjusted R-squared:  -0.04664 
## F-statistic: 0.4926 on 13 and 135 DF,  p-value: 0.9257
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2425.883 2470.942

From the above model, we observe the following

Residual standard error = 788.6 on 135 degrees of freedom
The model is not significant as the p_value is greater than 5% level of significance
F-statistic of the model = 0.4926
The multiple and adjusted R_sqrd are 4.529 and -4.664 respectively. The adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1.1g$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals
## LM test = 136.44, df = 17, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1.1g <- vif(model1.1g$model)
vif_model1.1g

##       x.t       x.1       x.2       x.3       x.4       x.5       x.6       x.7 
##  53.00292 118.33830 122.33823 125.52489 126.19786 128.35765 133.47622 136.12418 
##       x.8       x.9      x.10      x.11      x.12 
## 139.03106 142.47802 141.04707 138.69245  61.50449

vif_model1.1g > 10

##  x.t  x.1  x.2  x.3  x.4  x.5  x.6  x.7  x.8  x.9 x.10 x.11 x.12 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Polynomial Distributed Lag Model

We can reduce the effect of multi-collinearity using the polynomial distributed lag model

model2g <- polyDlm(x = as.vector(gold), y = as.vector(asx),
                 q = 1, k = 1, show.beta = FALSE)
summary(model2g)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1654.13  -706.62    -9.17   561.27  2219.74 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3943.03376  211.54676  18.639   <2e-16 ***
## z.t0           0.41169    1.27421   0.323    0.747    
## z.t1          -0.09005    2.53981  -0.035    0.972    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 839.7 on 157 degrees of freedom
## Multiple R-squared:  0.1097, Adjusted R-squared:  0.09832 
## F-statistic: 9.669 on 2 and 157 DF,  p-value: 0.0001097

From the above model, we observe the following

Residual standard error = 839.7 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.669
The multiple and adjusted R_sqrd are 10.97 and 9.832 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

A polynomial pattern (q = 4, k = 4) is imposed on the lag distribution i.e. equivalent to represent beta parameters with Kth order polynomial model time.

model2.2g <- polyDlm(x = as.vector(gold), y = as.vector(asx),
                 q = 4, k = 4, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0   0.0328       1.28  0.0256   0.980
## beta.1  -0.1190       1.92 -0.0621   0.951
## beta.2   0.3650       1.95  0.1870   0.852
## beta.3  -0.3040       1.93 -0.1570   0.875
## beta.4   0.6500       1.28  0.5090   0.611

summary(model2.2g)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1574.26  -685.10    -6.82   528.30  2168.22 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4118.08645  214.92803  19.160   <2e-16 ***
## z.t0           0.03279    1.28010   0.026    0.980    
## z.t1          -2.20885   15.15710  -0.146    0.884    
## z.t2           3.30652   19.49855   0.170    0.866    
## z.t3          -1.44009    7.84269  -0.184    0.855    
## z.t4           0.19029    0.97843   0.194    0.846    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 825.6 on 151 degrees of freedom
## Multiple R-squared:  0.08576,    Adjusted R-squared:  0.05548 
## F-statistic: 2.833 on 5 and 151 DF,  p-value: 0.01784

In the above polynomial distributed lag model, we observe that all the variables are not significant to 5% level of significance

The residual standard error = 825.6 on 151 degree of freedom

The multiple and adjusted R-sqrd are 8.576 and 5.548 respectively

The F-statistic is 2.833 on 5 and 151 degrees of freedom

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model2.2g$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 145.02, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model2.2g <- vif(model2.2g$model)
vif_model2.2g

##         z.t0         z.t1         z.t2         z.t3         z.t4 
##     1454.893   826492.983 12368348.542 22307303.143  4361029.001

vif_model2.2g > 10

## z.t0 z.t1 z.t2 z.t3 z.t4 
## TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Koyck Distributed Lag Model

model3g <- koyckDlm(x = as.vector(gold), y = as.vector(asx))
summary(model3g, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -682.19 -105.44   15.86  135.04  783.60 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.902e+02  8.958e+01   2.123   0.0353 *  
## Y.1         9.635e-01  1.909e-02  50.469   <2e-16 ***
## X.t         2.595e-03  4.304e-02   0.060   0.9520    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488,  Adjusted R-squared: 0.9481 
## Wald test:  1454 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2  statistic       p-value
## Weak instruments   1 157 8006.63657 1.266826e-136
## Wu-Hausman         1 156   18.06854  3.655601e-05
## 
##                            alpha        beta       phi
## Geometric coefficients:  5205.15 0.002595168 0.9634602

From the above model, we observe the following

Residual standard error = 201.4 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
The multiple and adjusted R_sqrd are 94.88 and 94.81 respectively. The adjusted R_sqrd is comparitively greater than the Distributed Lag & Polynomial Distributed Lag model
Also, variable Y.1 in the model is significant to 5% level of significance. The beta and phi values are 0.0026 and 0.963 respectively
The weak instruments p_value suggests that the model is significant at first stage of least squares fitting
Wu-Hausman test is used to compare the consistency of instrumental variable estimates to ordinary least square estimates. The presence of endogeniety tells us that instrumental variable estimate is consistent and ordinary least square estimate is inconsistent.
The Wu-Hausman test is significant as the p_value is less than 5% level of significance. Hence, we reject the null hypothesis that correlation between explanatory vairable and error term is zero or in other words endogeniety is not present.
Therefore, we conclude that there is significant correlation between explanatory variable and error term is present at 5% level of significance.

Residual Check

checkresiduals(model3g$model)

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Multi-collinearity Check

vif_model3g <- vif(model3g$model)
vif_model3g

##      Y.1      X.t 
## 1.140648 1.140648

vif_model3g > 10

##   Y.1   X.t 
## FALSE FALSE

The variation inflation factor is less than 5 when value of q = 2, we conclude the effect of multi-collinearity is not present.

Auto Regressive Distributed Lag Model

model4g <- ardlDlm(x = as.vector(gold), y = as.vector(asx),
                 p = 1, q =1)
summary(model4g)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -598.9 -102.9   10.4  119.5  724.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 182.80452   85.05812   2.149   0.0332 *  
## X.t          -1.24911    0.29162  -4.283 3.21e-05 ***
## X.1           1.23169    0.28976   4.251 3.66e-05 ***
## Y.1           0.97172    0.01812  53.624  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 191.1 on 156 degrees of freedom
## Multiple R-squared:  0.9542, Adjusted R-squared:  0.9533 
## F-statistic:  1083 on 3 and 156 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 191.1 on 156 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 1083 on 3 and 156 Degrees of freedom
The multiple and adjusted R_sqrd are 95.42 and 95.33 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Also, all the variables in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(model4g$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 7.9182, df = 10, p-value = 0.6368

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals does not suggest the existence of Auto-correlation within error terms

Multi-collinearity Check

vif_model4g <- vif(model4g$model)
vif_model4g

##       X.t L(X.t, 1) L(y.t, 1) 
##  59.17040  58.70438   1.14186

vif_model4g > 10

##       X.t L(X.t, 1) L(y.t, 1) 
##      TRUE      TRUE     FALSE

The variation inflation factor is less than 5 for coefficient Y.t when value of q = 1.

Determining the values of p and q using AIC and BIC values

for(i in 1:5){
        for(j in 1:5){
                model4.1g <- ardlDlm(x = as.vector(gold),
                                   y = as.vector(asx), p = i, q = j)
                cat("p:",i,"q:",j,"AIC:",AIC(model4.1g$model), "BIC:",
                    BIC(model4.1g$model),"\n")
        }
}

## p: 1 q: 1 AIC: 2140.897 BIC: 2156.273 
## p: 1 q: 2 AIC: 2128.524 BIC: 2146.938 
## p: 1 q: 3 AIC: 2113.99 BIC: 2135.428 
## p: 1 q: 4 AIC: 2102.754 BIC: 2127.204 
## p: 1 q: 5 AIC: 2092.194 BIC: 2119.643 
## p: 2 q: 1 AIC: 2128.627 BIC: 2147.04 
## p: 2 q: 2 AIC: 2130.523 BIC: 2152.005 
## p: 2 q: 3 AIC: 2115.89 BIC: 2140.39 
## p: 2 q: 4 AIC: 2104.694 BIC: 2132.2 
## p: 2 q: 5 AIC: 2094.14 BIC: 2124.639 
## p: 3 q: 1 AIC: 2118.109 BIC: 2139.547 
## p: 3 q: 2 AIC: 2120.027 BIC: 2144.528 
## p: 3 q: 3 AIC: 2117.305 BIC: 2144.868 
## p: 3 q: 4 AIC: 2105.731 BIC: 2136.293 
## p: 3 q: 5 AIC: 2095.264 BIC: 2128.812 
## p: 4 q: 1 AIC: 2107.002 BIC: 2131.452 
## p: 4 q: 2 AIC: 2108.914 BIC: 2136.42 
## p: 4 q: 3 AIC: 2106.276 BIC: 2136.839 
## p: 4 q: 4 AIC: 2107.456 BIC: 2141.074 
## p: 4 q: 5 AIC: 2097.01 BIC: 2133.608 
## p: 5 q: 1 AIC: 2094.908 BIC: 2122.357 
## p: 5 q: 2 AIC: 2096.86 BIC: 2127.359 
## p: 5 q: 3 AIC: 2094.144 BIC: 2127.692 
## p: 5 q: 4 AIC: 2095.425 BIC: 2132.023 
## p: 5 q: 5 AIC: 2097.324 BIC: 2136.972

We observe that p = 1 and q = 5 give the smallest AIC and BIC values.

best_model_gold <- ardlDlm(x = as.vector(gold), y = as.vector(asx),
                 p = 1, q = 5)
summary(best_model_gold)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -564.02 -106.74    8.99  126.34  691.74 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 234.72597   92.97346   2.525 0.012635 *  
## X.t          -1.21331    0.30480  -3.981 0.000107 ***
## X.1           1.21097    0.30013   4.035 8.73e-05 ***
## Y.1           0.96620    0.07927  12.189  < 2e-16 ***
## Y.2           0.13687    0.11316   1.210 0.228368    
## Y.3          -0.07572    0.11193  -0.676 0.499816    
## Y.4          -0.04931    0.11174  -0.441 0.659640    
## Y.5          -0.02130    0.07871  -0.271 0.787092    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 191.6 on 148 degrees of freedom
## Multiple R-squared:  0.9502, Adjusted R-squared:  0.9478 
## F-statistic: 403.1 on 7 and 148 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 191.6 on 148 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 403.1 on 7 and 148 Degrees of freedom
The multiple and adjusted R_sqrd are 95.02 and 94.78 respectively. The adjusted R_sqrd of this model is comparitively higher.
Variables X.t, X.1 and Y.1 in the model are significant as the p_value is smaller than 5% level of significance

Residuals Check

checkresiduals(best_model_gold$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5.7145, df = 11, p-value = 0.8917

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

#————————————————————————————————————————————————————————-

Now, we fit a distributed lag model for ASX price and Crude Oil

asx_oil = data.frame(asx_data$`ASX price`, asx_data$`Crude Oil (Brent)_USD/bbl`)
asx_oil.ts = ts(asx_oil, start = 2004, frequency = 12)
head(asx_oil.ts)

##          asx_data..ASX.price. asx_data..Crude.Oil..Brent._USD.bbl.
## Jan 2004               2935.4                                31.29
## Feb 2004               2778.4                                32.65
## Mar 2004               2848.6                                30.34
## Apr 2004               2970.9                                25.02
## May 2004               2979.8                                25.81
## Jun 2004               2999.7                                27.55

asx_oil_scaled = scale(asx_oil.ts)
plot(asx_oil_scaled, plot.type = "s", col = c("blue","red"),main = "Fig 51. Time series Plot of Crude oil price (USD/litre) vs ASX Price Index")
legend("bottomright", colnames(asx_oil.ts), col=1:ncol(asx_oil), lty=2, cex=.75)

Distrubuted Lag Model of ASX price index and Crude Oil with lag 1

model1crude <- dlm(x = as.vector(oil), y = as.vector(asx), 
             q = 1)
summary(model1crude)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1677.08  -770.94   -61.63   694.56  1823.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4144.338    178.033  23.278   <2e-16 ***
## x.t           17.140     10.636   1.611    0.109    
## x.1           -7.941     10.589  -0.750    0.454    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 842.6 on 157 degrees of freedom
## Multiple R-squared:  0.1036, Adjusted R-squared:  0.09217 
## F-statistic: 9.071 on 2 and 157 DF,  p-value: 0.000187
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2614.698 2626.998

From the above model, we observe the following

Residual standard error = 842.6 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.071
The multiple and adjusted R_sqrd are 10.36 and 9.217 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1crude$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 147.28, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1crude <- vif(model1crude$model)
vif_model1crude

##      x.t      x.1 
## 22.79751 22.79751

vif_model1crude > 10

##  x.t  x.1 
## TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Specification of lag length:

model1crude_AIC_lag_length <- finiteDLMauto(x = as.vector(oil), y = as.vector(asx), 
                                       q.min = 1, q.max = 12, model.type = "dlm", 
                                       error.type = "AIC", trace = TRUE)
model1crude_AIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.85940 2424.123 2469.182 4.49414 1.18212 -0.03435         0
## 11    11 3.94102 2442.030 2484.178 4.50351 1.71454 -0.02826         0
## 10    10 4.00878 2459.842 2499.066 4.39830 0.94829 -0.01859         0
## 9      9 4.07657 2476.871 2513.158 4.59976 0.86566 -0.00812         0
## 8      8 4.13953 2493.914 2527.249 4.76106 0.78221  0.00472         0
## 7      7 4.20880 2510.754 2541.124 4.79551 0.02285  0.01667         0
## 6      6 4.26700 2527.701 2555.091 4.84721 0.56283  0.02988         0
## 5      5 4.32339 2544.936 2569.335 4.87419 1.47404  0.04345         0
## 4      4 4.39825 2561.888 2583.281 5.11093 1.10612  0.05794         0
## 3      3 4.47189 2579.101 2597.477 5.24099 1.43485  0.06975         0
## 2      2 4.53543 2596.715 2612.059 5.37682 0.74046  0.08165         0
## 1      1 4.60656 2614.698 2626.998 5.46829 0.63464  0.09217         0

model1crude_BIC_lag_length <- 
        finiteDLMauto(x =as.vector(oil), y = as.vector(asx),
                      q.min = 1, q.max = 12, error.type = "BIC",
                      model.type = "dlm", trace = TRUE)
model1crude_BIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.85940 2424.123 2469.182 4.49414 1.18212 -0.03435         0
## 11    11 3.94102 2442.030 2484.178 4.50351 1.71454 -0.02826         0
## 10    10 4.00878 2459.842 2499.066 4.39830 0.94829 -0.01859         0
## 9      9 4.07657 2476.871 2513.158 4.59976 0.86566 -0.00812         0
## 8      8 4.13953 2493.914 2527.249 4.76106 0.78221  0.00472         0
## 7      7 4.20880 2510.754 2541.124 4.79551 0.02285  0.01667         0
## 6      6 4.26700 2527.701 2555.091 4.84721 0.56283  0.02988         0
## 5      5 4.32339 2544.936 2569.335 4.87419 1.47404  0.04345         0
## 4      4 4.39825 2561.888 2583.281 5.11093 1.10612  0.05794         0
## 3      3 4.47189 2579.101 2597.477 5.24099 1.43485  0.06975         0
## 2      2 4.53543 2596.715 2612.059 5.37682 0.74046  0.08165         0
## 1      1 4.60656 2614.698 2626.998 5.46829 0.63464  0.09217         0

We see from the AIC and BIC criteria that lag 12 has the lowest AIC and BIC values. Hence, we specify the lag length as 12.

model1.1crude <- dlm(x = as.vector(oil), y = as.vector(asx), q = 12)
summary(model1.1crude)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1603.38  -588.61    59.47   510.06  1586.31 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4797.8928   203.0857  23.625   <2e-16 ***
## x.t            8.7184    11.1651   0.781    0.436    
## x.1            0.9094    18.7149   0.049    0.961    
## x.2           -0.7991    18.9816  -0.042    0.966    
## x.3            0.7995    19.0667   0.042    0.967    
## x.4           -4.3798    19.0414  -0.230    0.818    
## x.5            0.2400    19.1322   0.013    0.990    
## x.6            1.0663    19.2530   0.055    0.956    
## x.7           -5.0585    19.4123  -0.261    0.795    
## x.8            0.9627    19.3634   0.050    0.960    
## x.9           -2.0805    19.3391  -0.108    0.914    
## x.10           1.0344    19.3332   0.054    0.957    
## x.11          -8.1813    19.0500  -0.429    0.668    
## x.12           8.7236    11.1989   0.779    0.437    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 784 on 135 degrees of freedom
## Multiple R-squared:  0.0565, Adjusted R-squared:  -0.03435 
## F-statistic: 0.6219 on 13 and 135 DF,  p-value: 0.8325
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2424.123 2469.182

From the above model, we observe the following

Residual standard error = 784 on 135 degrees of freedom
The model is not significant as the p_value is greater than 5% level of significance
F-statistic of the model = 0.6219
The multiple and adjusted R_sqrd are 5.65 and 3.435 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1.1crude$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals
## LM test = 135.49, df = 17, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1.1crude <- vif(model1.1crude$model)
vif_model1.1crude

##      x.t      x.1      x.2      x.3      x.4      x.5      x.6      x.7 
## 24.09715 68.58448 71.58093 73.18015 74.17098 75.89967 77.94797 80.15117 
##      x.8      x.9     x.10     x.11     x.12 
## 80.86523 81.70058 82.37142 80.51631 28.07618

vif_model1.1crude > 10

##  x.t  x.1  x.2  x.3  x.4  x.5  x.6  x.7  x.8  x.9 x.10 x.11 x.12 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Polynomial Distributed Lag Model

We can reduce the effect of multi-collinearity using the polynomial distributed lag model

model2crude <- polyDlm(x = as.vector(oil), y = as.vector(asx),
                 q = 1, k = 1, show.beta = FALSE)
summary(model2crude)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1677.08  -770.94   -61.63   694.56  1823.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4144.34     178.03  23.278   <2e-16 ***
## z.t0           17.14      10.64   1.611    0.109    
## z.t1          -25.08      21.11  -1.188    0.237    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 842.6 on 157 degrees of freedom
## Multiple R-squared:  0.1036, Adjusted R-squared:  0.09217 
## F-statistic: 9.071 on 2 and 157 DF,  p-value: 0.000187

From the above model, we observe the following

Residual standard error = 842.6 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.071
The multiple and adjusted R_sqrd are 10.36 and 9.217 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

A polynomial pattern (q = 4, k = 4) is imposed on the lag distribution i.e. equivalent to represent beta parameters with Kth order polynomial model time.

model2.2crude <- polyDlm(x = as.vector(oil), y = as.vector(asx),
                 q = 4, k = 4, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0    10.50       11.5  0.9090   0.365
## beta.1     2.71       19.3  0.1400   0.889
## beta.2    -1.30       19.5 -0.0664   0.947
## beta.3     3.76       19.3  0.1940   0.846
## beta.4    -8.78       11.4 -0.7670   0.444

In the above polynomial distributed lag model, we observe that all the variables are not significant to 5% level of significance.

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model2.2crude$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 145.13, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model2.2crude <- vif(model2.2crude$model)
vif_model2.2crude

##         z.t0         z.t1         z.t2         z.t3         z.t4 
## 6.500776e+02 4.348826e+05 6.573599e+06 1.189865e+07 2.330502e+06

vif_model2.2crude > 10

## z.t0 z.t1 z.t2 z.t3 z.t4 
## TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Koyck Distributed Lag Model

model3crude <- koyckDlm(x = as.vector(oil), y = as.vector(asx))
summary(model3crude, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -683.91 -108.66   13.68  139.77  762.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 209.89536   87.89368   2.388   0.0181 *  
## Y.1           0.97537    0.01905  51.193   <2e-16 ***
## X.t          -0.99907    0.58045  -1.721   0.0872 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949,   Adjusted R-squared: 0.9483 
## Wald test:  1461 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2 statistic       p-value
## Weak instruments   1 157 3018.4519 1.997814e-104
## Wu-Hausman         1 156   11.1979  1.026113e-03
## 
##                             alpha       beta       phi
## Geometric coefficients:  8522.034 -0.9990694 0.9753703

From the above model, we observe the following

Residual standard error = 201.1 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
The multiple and adjusted R_sqrd are 94.9 and 94.83 respectively. The adjusted R_sqrd are comparitively greater than the Distributed Lag & Polynomial Distributed Lag model
Also, variable Y.1 in the model is significant to 5% level of significance. The beta and phi values are -0.9990694 and 0.9753703 respectively
The weak instruments p_value suggests that the model is significant at first stage of least squares fitting
Wu-Hausman test is used to compare the consistency of instrumental variable estimates to ordinary least square estimates. The presence of endogeniety tells us that instrumental variable estimate is consistent and ordinary least square estimate is inconsistent.
The Wu-Hausman test is significant as the p_value is less than 5% level of significance. Hence, we reject the null hypothesis that correlation between explanatory vairable and error term is zero or in other words endogeniety is not present.
Therefore, we conclude that there is significant correlation between explanatory variable and error term is present at 5% level of significance.

Residual Check

checkresiduals(model3crude$model)

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Multi-collinearity Check

vif_model3crude <- vif(model3crude$model)
vif_model3crude

##     Y.1     X.t 
## 1.14038 1.14038

vif_model3crude > 10

##   Y.1   X.t 
## FALSE FALSE

The variation inflation factor is less than 5 when value of q = 2, we conclude the effect of multi-collinearity is not present.

Auto Regressive Distributed Lag Model

model4crude <- ardlDlm(x = as.vector(oil), y = as.vector(asx),
                 p = 1, q =1)
summary(model4crude)

## 
## Time series regression with "ts" data:
## Start = 2, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -684.06 -111.57   -1.77  138.90  719.35 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 211.45397   85.03705   2.487  0.01395 *  
## X.t           7.45111    2.46199   3.026  0.00290 ** 
## X.1          -8.17917    2.44422  -3.346  0.00103 ** 
## Y.1           0.97067    0.01837  52.827  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.5 on 156 degrees of freedom
## Multiple R-squared:  0.9525, Adjusted R-squared:  0.9516 
## F-statistic:  1044 on 3 and 156 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 194.5 on 156 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 1044 on 3 and 156 Degrees of freedom
The multiple and adjusted R_sqrd are 95.25 and 95.16 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Also, all the variables in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(model4crude$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 6.0252, df = 10, p-value = 0.8131

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

Multi-collinearity Check

vif_model4crude <- vif(model4crude$model)
vif_model4crude

##       X.t L(X.t, 1) L(y.t, 1) 
## 22.924733 22.797583  1.133443

vif_model4crude > 10

##       X.t L(X.t, 1) L(y.t, 1) 
##      TRUE      TRUE     FALSE

The variation inflation factor is less than 5 for coefficient Y.t when value of q = 1.

Determining the values of p and q using AIC and BIC values

for(i in 1:5){
        for(j in 1:5){
                model4.1crude <- ardlDlm(x = as.vector(oil),
                                   y = as.vector(asx), p = i, q = j)
                cat("p:",i,"q:",j,"AIC:",AIC(model4.1crude$model), "BIC:",
                    BIC(model4.1crude$model),"\n")
        }
}

## p: 1 q: 1 AIC: 2146.524 BIC: 2161.9 
## p: 1 q: 2 AIC: 2134.107 BIC: 2152.521 
## p: 1 q: 3 AIC: 2121.07 BIC: 2142.508 
## p: 1 q: 4 AIC: 2109.4 BIC: 2133.85 
## p: 1 q: 5 AIC: 2098.335 BIC: 2125.784 
## p: 2 q: 1 AIC: 2132.312 BIC: 2150.726 
## p: 2 q: 2 AIC: 2134.235 BIC: 2155.718 
## p: 2 q: 3 AIC: 2122.356 BIC: 2146.857 
## p: 2 q: 4 AIC: 2110.793 BIC: 2138.299 
## p: 2 q: 5 AIC: 2099.752 BIC: 2130.251 
## p: 3 q: 1 AIC: 2121.919 BIC: 2143.357 
## p: 3 q: 2 AIC: 2123.835 BIC: 2148.335 
## p: 3 q: 3 AIC: 2124.324 BIC: 2151.887 
## p: 3 q: 4 AIC: 2112.401 BIC: 2142.963 
## p: 3 q: 5 AIC: 2101.35 BIC: 2134.899 
## p: 4 q: 1 AIC: 2111.383 BIC: 2135.832 
## p: 4 q: 2 AIC: 2113.294 BIC: 2140.8 
## p: 4 q: 3 AIC: 2113.805 BIC: 2144.367 
## p: 4 q: 4 AIC: 2114.384 BIC: 2148.003 
## p: 4 q: 5 AIC: 2103.342 BIC: 2139.94 
## p: 5 q: 1 AIC: 2097.076 BIC: 2124.525 
## p: 5 q: 2 AIC: 2099.041 BIC: 2129.54 
## p: 5 q: 3 AIC: 2099.518 BIC: 2133.066 
## p: 5 q: 4 AIC: 2099.845 BIC: 2136.443 
## p: 5 q: 5 AIC: 2100.917 BIC: 2140.566

We observe that p = 5 and q = 1 give the smallest AIC and BIC values.

best_model_crude <- ardlDlm(x = as.vector(oil), y = as.vector(asx),
                 p = 5, q = 1)
summary(best_model_crude)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -657.99 -121.05   -4.38  123.84  714.56 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 239.54102   94.36245   2.539   0.0122 *  
## X.t           5.68288    2.73874   2.075   0.0397 *  
## X.1          -3.32354    4.57294  -0.727   0.4685    
## X.2          -3.29907    4.62372  -0.714   0.4767    
## X.3           3.05569    4.63039   0.660   0.5103    
## X.4          -8.00745    4.57207  -1.751   0.0820 .  
## X.5           5.08983    2.72010   1.871   0.0633 .  
## Y.1           0.96648    0.01934  49.983   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.6 on 148 degrees of freedom
## Multiple R-squared:  0.9486, Adjusted R-squared:  0.9461 
## F-statistic:   390 on 7 and 148 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 194.6 on 148 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 390 on 7 and 148 Degrees of freedom
The multiple and adjusted R_sqrd are 94.86 and 94.61 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Variables X.t, X.1 and Y.1 in the model are significant as the p_value is smaller than 5% level of significance

Residuals Check

checkresiduals(best_model_crude$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5.18, df = 11, p-value = 0.9221

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

————————————————————————————————————————————————————

Now, we fit Distributed Lag model of ASX Price Index vs Gold + Copper + Crude_Oil

head(asx_data)

## # A tibble: 6 x 4
##   `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
##         <dbl>        <dbl>                       <dbl>              <dbl>
## 1       2935.         612.                        31.3               1650
## 2       2778.         603.                        32.6               1682
## 3       2849.         566.                        30.3               1656
## 4       2971.         539.                        25.0               1588
## 5       2980.         549.                        25.8               1651
## 6       3000.         536.                        27.6               1685

asx_split1 = asx_price_index[,1]
head(asx_split1)

##         Jan    Feb    Mar    Apr    May    Jun
## 2004 2935.4 2778.4 2848.6 2970.9 2979.8 2999.7

asx_split2 = asx_price_index[,2:4]
head(asx_split2)

##          Gold price Crude Oil (Brent)_USD/bbl Copper_USD/tonne
## Jan 2004      611.9                     31.29             1650
## Feb 2004      603.3                     32.65             1682
## Mar 2004      565.7                     30.34             1656
## Apr 2004      538.6                     25.02             1588
## May 2004      549.4                     25.81             1651
## Jun 2004      535.9                     27.55             1685

asx_all = data.frame(asx_split1,asx_split2)
asx_all.ts = ts(asx_all, start = 2004, frequency = 12)
head(asx_all.ts)

##          asx_split1 Gold.price Crude.Oil..Brent._USD.bbl Copper_USD.tonne
## Jan 2004     2935.4      611.9                     31.29             1650
## Feb 2004     2778.4      603.3                     32.65             1682
## Mar 2004     2848.6      565.7                     30.34             1656
## Apr 2004     2970.9      538.6                     25.02             1588
## May 2004     2979.8      549.4                     25.81             1651
## Jun 2004     2999.7      535.9                     27.55             1685

asx_all_scaled = scale(asx_all.ts)
plot(asx_all_scaled, plot.type = "s", col = c("yellow","black","blue","red"), main = "Fig 52. Time series Plot")
legend("topleft",lty=1.25, text.width =1.25, col=c("blue", "yellow", "red", "black"), c("ASX", "Gold", "Oil", "Copper"))

Distrubuted Lag Model of ASX price index and copper_usd/tonne with lag 1

model1asx <- dlm(x = as.vector(asx_split2), y = as.vector(asx_split1), 
             q = 1)
summary(model1asx)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1654.13  -706.62    -9.17   561.27  2219.74 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3943.0338   211.5468  18.639   <2e-16 ***
## x.t            0.4117     1.2742   0.323    0.747    
## x.1            0.3216     1.2711   0.253    0.801    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 839.7 on 157 degrees of freedom
##   (322 observations deleted due to missingness)
## Multiple R-squared:  0.1097, Adjusted R-squared:  0.09832 
## F-statistic: 9.669 on 2 and 157 DF,  p-value: 0.0001097
## 
## AIC and BIC values for the model:
##        AIC     BIC
## 1 2613.609 2625.91

From the above model, we observe the following

Residual standard error = 839.7 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.669
The multiple and adjusted R_sqrd are 10.97 and 9.832 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1asx$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 148.3, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1asx <- vif(model1asx$model)
vif_model1asx

##      x.t      x.1 
## 58.50301 58.50301

vif_model1asx > 10

##  x.t  x.1 
## TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Specification of lag length:

model1asx_AIC_lag_length <- finiteDLMauto(x = as.vector(asx_split2), y = as.vector(asx_split1), 
                                       q.min = 1, q.max = 12, model.type = "dlm", 
                                       error.type = "AIC", trace = TRUE)
model1asx_AIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.78532 2425.883 2470.942 4.08977 1.42891 -0.04664         0
## 11    11 3.85987 2442.989 2485.138 4.27351 1.04343 -0.03486         0
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

model1asx_BIC_lag_length <- 
        finiteDLMauto(x =as.vector(asx_split2), y = as.vector(asx_split1),
                      q.min = 1, q.max = 12, error.type = "BIC",
                      model.type = "dlm", trace = TRUE)
model1asx_BIC_lag_length

##    q - k    MASE      AIC      BIC   GMRAE   MBRAE R.Adj.Sq Ljung-Box
## 12    12 3.78532 2425.883 2470.942 4.08977 1.42891 -0.04664         0
## 11    11 3.85987 2442.989 2485.138 4.27351 1.04343 -0.03486         0
## 10    10 3.91844 2460.345 2499.570 4.32190 0.69116 -0.02199         0
## 9      9 3.97365 2476.983 2513.270 4.43213 0.75893 -0.00887         0
## 8      8 4.02150 2493.885 2527.220 4.45929 0.61156  0.00491         0
## 7      7 4.07437 2510.535 2540.905 4.47998 0.92888  0.01807         0
## 6      6 4.11652 2527.575 2554.966 4.40970 1.22132  0.03067         0
## 5      5 4.16217 2544.887 2569.286 4.45961 0.93030  0.04375         0
## 4      4 4.22171 2562.296 2583.690 4.65274 1.30422  0.05548         0
## 3      3 4.27118 2579.215 2597.590 4.64464 1.61370  0.06908         0
## 2      2 4.31239 2596.292 2611.637 4.68194 2.08243  0.08409         0
## 1      1 4.35631 2613.609 2625.910 4.56013 1.06887  0.09832         0

We see from the AIC and BIC criteria that lag 12 has the lowest AIC and BIC values. Hence, we specify the lag length as 12.

model1.1asx <- dlm(x = as.vector(asx_split2), y = as.vector(asx_split1), q = 12)
summary(model1.1asx)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1550.43  -505.36    30.42   500.55  1884.16 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4683.75496  230.73258  20.299   <2e-16 ***
## x.t           -0.73063    1.27183  -0.574    0.567    
## x.1           -0.05830    1.88620  -0.031    0.975    
## x.2            0.06822    1.90429   0.036    0.971    
## x.3            0.02231    1.91389   0.012    0.991    
## x.4           -0.38688    1.90638  -0.203    0.839    
## x.5            0.37549    1.90912   0.197    0.844    
## x.6            0.08537    1.93160   0.044    0.965    
## x.7            0.61146    1.93935   0.315    0.753    
## x.8           -0.28852    1.95099  -0.148    0.883    
## x.9            0.17788    1.96992   0.090    0.928    
## x.10           0.22900    1.95773   0.117    0.907    
## x.11          -0.06205    1.94273  -0.032    0.975    
## x.12           0.22419    1.29372   0.173    0.863    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 788.6 on 135 degrees of freedom
##   (322 observations deleted due to missingness)
## Multiple R-squared:  0.04529,    Adjusted R-squared:  -0.04664 
## F-statistic: 0.4926 on 13 and 135 DF,  p-value: 0.9257
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2425.883 2470.942

From the above model, we observe the following

Residual standard error = 788.6 on 135 degrees of freedom
The model is not significant as the p_value is greater than 5% level of significance
F-statistic of the model = 0.4926
The multiple and adjusted R_sqrd are 4.529 and -4.664 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model1.1asx$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals
## LM test = 136.44, df = 17, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model1.1asx <- vif(model1.1asx$model)
vif_model1.1asx

##       x.t       x.1       x.2       x.3       x.4       x.5       x.6       x.7 
##  53.00292 118.33830 122.33823 125.52489 126.19786 128.35765 133.47622 136.12418 
##       x.8       x.9      x.10      x.11      x.12 
## 139.03106 142.47802 141.04707 138.69245  61.50449

vif_model1.1asx > 10

##  x.t  x.1  x.2  x.3  x.4  x.5  x.6  x.7  x.8  x.9 x.10 x.11 x.12 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Polynomial Distributed Lag Model

We can reduce the effect of multi-collinearity using the polynomial distributed lag model

model2asx <- polyDlm(x = as.vector(asx_split2), y = as.vector(asx_split1),
                 q = 1, k = 1, show.beta = FALSE)
summary(model2asx)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1654.13  -706.62    -9.17   561.27  2219.74 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3943.03376  211.54676  18.639   <2e-16 ***
## z.t0           0.41169    1.27421   0.323    0.747    
## z.t1          -0.09005    2.53981  -0.035    0.972    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 839.7 on 157 degrees of freedom
##   (322 observations deleted due to missingness)
## Multiple R-squared:  0.1097, Adjusted R-squared:  0.09832 
## F-statistic: 9.669 on 2 and 157 DF,  p-value: 0.0001097

From the above model, we observe the following

Residual standard error = 839.7 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 9.669
The multiple and adjusted R_sqrd are 10.97 and 9.832 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

A polynomial pattern (q = 1, k = 1) is imposed on the lag distribution i.e. equivalent to represent beta parameters with Kth order polynomial model time.

model2.2asx <- polyDlm(x = as.vector(asx_split2), y = as.vector(asx_split1),q = 4, k = 4, show.beta = FALSE)
summary(model2.2asx)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1574.26  -685.10    -6.82   528.30  2168.22 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4118.08645  214.92803  19.160   <2e-16 ***
## z.t0           0.03279    1.28010   0.026    0.980    
## z.t1          -2.20885   15.15710  -0.146    0.884    
## z.t2           3.30652   19.49855   0.170    0.866    
## z.t3          -1.44009    7.84269  -0.184    0.855    
## z.t4           0.19029    0.97843   0.194    0.846    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 825.6 on 151 degrees of freedom
##   (322 observations deleted due to missingness)
## Multiple R-squared:  0.08576,    Adjusted R-squared:  0.05548 
## F-statistic: 2.833 on 5 and 151 DF,  p-value: 0.01784

From the above model, we observe the following

Residual standard error = 825.6 on 151 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 2.833
The multiple and adjusted R_sqrd are 8.576 and 5.548 respectively. However the adjusted R_sqrd is very small
Also, no variables except the intercept in the model are significant as they are greater than 5% level of significance

Residual check and Serial Correlation Check using Breusch-Godfrey

checkresiduals(model2.2asx$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 145.02, df = 10, p-value < 2.2e-16

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, p_value of breusch-godfrey test is less that 5% level of significance. This concludes serial correlation in the residuals is highly significant.

Multi-collinearity Check

vif_model2.2asx <- vif(model2.2asx$model)
vif_model2.2asx

##         z.t0         z.t1         z.t2         z.t3         z.t4 
##     1454.893   826492.983 12368348.542 22307303.143  4361029.001

vif_model2.2asx > 10

## z.t0 z.t1 z.t2 z.t3 z.t4 
## TRUE TRUE TRUE TRUE TRUE

As the Variation inflation factor is greater than 10, we conclude the effect of multi-collinearity is high

Koyck Distributed Lag Model

model3asx <- koyckDlm(x = as.vector(asx_split1), y = as.vector(gold) + as.vector(oil) + as.vector(copper))
summary(model3asx, diagnostics = TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1906.48  -196.75   -31.89   199.81  2961.68 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 141.40687  215.59162   0.656    0.513    
## Y.1           0.95823    0.01893  50.607   <2e-16 ***
## X.t           0.04010    0.05285   0.759    0.449    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 477.7 on 157 degrees of freedom
## Multiple R-Squared: 0.9599,  Adjusted R-squared: 0.9594 
## Wald test:  1880 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2   statistic      p-value
## Weak instruments   1 157 2019.209407 1.541577e-91
## Wu-Hausman         1 156    6.313068 1.300128e-02
## 
##                             alpha       beta       phi
## Geometric coefficients:  3385.231 0.04010146 0.9582283

From the above model, we observe the following

Residual standard error = 477.7 on 157 degrees of freedom
The model is significant as the p_value is less 5% level of significance
The multiple and adjusted R_sqrd are 95.99 and 95.94 respectively. The adjusted R_sqrd are comparitively greater than the Distributed Lag & Polynomial Distributed Lag model
Also, variable Y.1 in the model is significant to 5% level of significance. The beta and phi values are 0.0401 and 0.9582 respectively
The weak instruments p_value suggests that the model is significant at first stage of least squares fitting
Wu-Hausman test is used to compare the consistency of instrumental variable estimates to ordinary least square estimates. The presence of endogeniety tells us that instrumental variable estimate is consistent and ordinary least square estimate is inconsistent.
The Wu-Hausman test is significant as the p_value is less than 5% level of significance. Hence, we reject the null hypothesis that correlation between explanatory vairable and error term is zero or in other words endogeniety is not present.
Therefore, we conclude that there is significant correlation between explanatory variable and error term is present at 5% level of significance.

Residual Check

checkresiduals(model3asx$model)

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Multi-collinearity Check

vif_model3asx <- vif(model3asx$model)
vif_model3asx

##      Y.1      X.t 
## 1.443737 1.443737

vif_model3asx > 10

##   Y.1   X.t 
## FALSE FALSE

The variation inflation factor is less than 5 when value of q = 2, we conclude the effect of multi-collinearity is not present.

Auto Regressive Distributed Lag Model

model4asx <- ardlDlm(x = as.vector(asx_split2), y = as.vector(asx_split1),
                 p = 1, q =1)
summary(model4asx)

## 
## Time series regression with "ts" data:
## Start = 2, End = 483
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2912.60   -96.84    21.21   148.36   791.49 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 242.43332   67.14254   3.611 0.000338 ***
## X.t           0.08502    0.04104   2.072 0.038841 *  
## X.1          -0.08243    0.04110  -2.006 0.045446 *  
## Y.1           0.94935    0.01388  68.376  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 268.3 on 478 degrees of freedom
## Multiple R-squared:  0.9095, Adjusted R-squared:  0.9089 
## F-statistic:  1601 on 3 and 478 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 268.3 on 478 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 1601 on 3 and 478 Degrees of freedom
The multiple and adjusted R_sqrd are 90.95 and 90.89 respectively. The adjusted R_sqrd of this model is comparitively higher than DLM, Polynomial DLM and Koyck models.
Also, all the variables in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(model4asx$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 18.298, df = 10, p-value = 0.05014

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

Multi-collinearity Check

vif_model4asx <- vif(model4asx$model)
vif_model4asx

##       X.t L(X.t, 1) L(y.t, 1) 
## 91.854920 91.933838  1.024841

vif_model4asx > 10

##       X.t L(X.t, 1) L(y.t, 1) 
##      TRUE      TRUE     FALSE

The variation inflation factor is less than 5 for coefficient Y.t when value of q = 1.

Determining the values of p and q using AIC and BIC values

for(i in 1:5){
        for(j in 1:5){
                model4.1asx <- ardlDlm(x = as.vector(asx_split2),
                                   y = as.vector(asx_split1), p = i, q = j)
                cat("p:",i,"q:",j,"AIC:",AIC(model4.1asx$model), "BIC:",
                    BIC(model4.1asx$model),"\n")
        }
}

## p: 1 q: 1 AIC: 6764.712 BIC: 6785.602 
## p: 1 q: 2 AIC: 6745.169 BIC: 6770.224 
## p: 1 q: 3 AIC: 6730.729 BIC: 6759.946 
## p: 1 q: 4 AIC: 6719.373 BIC: 6752.746 
## p: 1 q: 5 AIC: 6706.985 BIC: 6744.511 
## p: 2 q: 1 AIC: 6749.959 BIC: 6775.014 
## p: 2 q: 2 AIC: 6744.981 BIC: 6774.212 
## p: 2 q: 3 AIC: 6730.858 BIC: 6764.249 
## p: 2 q: 4 AIC: 6719.477 BIC: 6757.023 
## p: 2 q: 5 AIC: 6707.166 BIC: 6748.862 
## p: 3 q: 1 AIC: 6737.899 BIC: 6767.115 
## p: 3 q: 2 AIC: 6733.274 BIC: 6766.665 
## p: 3 q: 3 AIC: 6732.348 BIC: 6769.912 
## p: 3 q: 4 AIC: 6721.016 BIC: 6762.733 
## p: 3 q: 5 AIC: 6708.682 BIC: 6754.547 
## p: 4 q: 1 AIC: 6726.459 BIC: 6759.832 
## p: 4 q: 2 AIC: 6721.97 BIC: 6759.516 
## p: 4 q: 3 AIC: 6721.158 BIC: 6762.875 
## p: 4 q: 4 AIC: 6722.852 BIC: 6768.741 
## p: 4 q: 5 AIC: 6710.564 BIC: 6760.6 
## p: 5 q: 1 AIC: 6715.302 BIC: 6752.829 
## p: 5 q: 2 AIC: 6710.779 BIC: 6752.475 
## p: 5 q: 3 AIC: 6709.946 BIC: 6755.811 
## p: 5 q: 4 AIC: 6711.651 BIC: 6761.686 
## p: 5 q: 5 AIC: 6712.563 BIC: 6766.768

We observe that p = 1 and q = 5 give the smallest AIC and BIC values.

best_model_asx_all <- ardlDlm(x = as.vector(asx_split2), y = as.vector(asx_split1),
                 p = 5, q = 1)
summary(best_model_asx_all)

## 
## Time series regression with "ts" data:
## Start = 6, End = 483
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2884.31  -102.27    18.85   149.50   786.82 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 257.202235  69.064158   3.724  0.00022 ***
## X.t           0.067481   0.043097   1.566  0.11807    
## X.1          -0.006041   0.068950  -0.088  0.93022    
## X.2          -0.024334   0.069549  -0.350  0.72658    
## X.3          -0.010001   0.069546  -0.144  0.88572    
## X.4          -0.017381   0.068991  -0.252  0.80121    
## X.5          -0.008330   0.043211  -0.193  0.84722    
## Y.1           0.946784   0.014216  66.602  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 269.1 on 470 degrees of freedom
## Multiple R-squared:  0.9068, Adjusted R-squared:  0.9054 
## F-statistic: 653.3 on 7 and 470 DF,  p-value: < 2.2e-16

From the above model, we observe the following

Residual standard error = 269.1 on 470 degrees of freedom
The model is significant as the p_value is less 5% level of significance
F-statistic of the model = 653.3 on 7 and 470 Degrees of freedom
The multiple and adjusted R_sqrd are 90.68 and 90.54 respectively. The adjusted R_sqrd of this model is comparitively higher.
Variables Y.1 in the model are significant as they are smaller than 5% level of significance

Residuals Check

checkresiduals(best_model_asx_all$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 19.577, df = 11, p-value = 0.05148

The breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that variance is not changing in the residuals.

From the time series plot and histogram of the standardized residuals, we observe a non-random pattern and high residual values which violate general assumptions

Also, we can infer that the standardized residuals in the histogram are distributed around zero mean levels

The ACF plot of the residuals do not suggest the existence of Auto-correlation within error terms

——————————————————————————————————————————-

Summary and Conclusion

Time Series Analysis and Forecasting methods were used to draw statistical inferences for the monthly averages of ASX all ordinaries (Ords) price index, Gold price, Crude Oil (Brent)_USD/bbl and Copper_USD/tonne starting from January 2004. We visualized the time series plot of each variable and perform two unit root tests namely Augmented Dickey-Fuller (ADF) and Phillips Perron (PP) tests to detect nonstationarity. We found the data was not stationary as the p_value obtained from the ADF and PP tests was not significant to 5% level of significance. To make the data stationary, we applied ordinary differencing. We checked the p_value of the ADF and PP test. The data was stationary after applying first order ordinary differencing as the p_value of both the tests were significant. We performed decomposition of the time series data to detrend it and check for any seasonality present in the data. Decomposition of times series gives a useful abstract of the model. We found that, copper was the best variable that would be used for modeling with respect to ASX price index. We observe minute differences from the trend component in the seasonally adjusted series. In the SI ratio plot, mean levels of Gold, Crude Oil and ASX are tremendously fluctuating over time, however, the mean level of copper is comparitively stable. In the last part of the analysis, we fit the data using four different models, DLM - where we need to choose the lag length and deal with multicollinearity, Poly DLM - which addresses the collinearity by changing lag weights smoothly. The infinite distributed lag model removes the need to specify lag length but it is impossible to fit the model in its straightforward form. We used the geometric lag weights to convert the model to a form which can be used to fit the time series data. Later we transformed the geometric distributed lag model into Koyck DLM where finite number of parameters are estimated and lastly the ARDLM. The Distributed Lag and Polynomial Distributed Lag models had the lowest adjusted r_sqrd. The variation inflation factors of these models is tremendously high. The best model was the Koyck DLM with the comparitively higher adjusted r squared 94.79%. All the variables in this model are significant to 5% level of significance. Also, the breusch-godfrey test has a p_value greater than 5% level of significance. Hence, it states we reject the null hypothesis that the variance does not change over time. The residual analysis shows that the histogram of standardized residuals is distributed at zero mean levels.

Conclusion

From the above analysis we conclude that Koyck DLM (model3c) of predictor variable copper is the best model for forecasting the ASX Price Index. The model has the highest adjusted r squared. The breusch-godfrey test of this model is significant and which states the variance is constant over time. The residual analysis also shows the histogram of standardized residuals is distributed at zero mean levels. The variation inflation factor is less than five which states multi-collinearity is low. The residual analysis also states no assumptions were violationed. The weak instruments p_value suggests that the model is significant at first stage of least squares fitting

————————————————————————————————————————————————————————-

MATH1307 Forecasting Assignment 1

Ashwin Menon, StdID: S3808190

8/13/2021

Introduction

Loading necessary R-packages

Loading the dataset and converting it into a time series object

Creating individual time series object of each variable from the dataset

Checking the first five observations of the dataset after converting it into a time series object.

A Time series plot of the dataset

Checking the existence of non-stationarity in the Dataset

The following time series plot of ASX price index starting from January 2004

Analysis & Interpretation

ACF and PACF plots

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

PACF plot has a significant spike only at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the ASX price index.

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.4286 and value of test_statistic = 0.6366

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the ASX price index using Phillips-Perron Unit Root Test

From the Phillips-Perron Unit Root Test, the p_value = 0.4811 when truncation lag parameter is 4 and p_value = 0.4123 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in ASX price index

The value of lambda is 1.05

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

We can see the p_value of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the ASX price index still exists.

Applying first order ordinary differencing

Time Series Plot after applying first ordinary differencing

Time series plot of the transformed data shows that the trend is reduced. But we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

From the ACF and PACF plots we can see the decaying pattern has faded. There are no significant lags in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the ASX price index time series data.

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the ASX price index is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the ASX price index which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

We observe seasonality this plot as the values fluctuate at the mean during the months of February, March and April followed by June, July and August.

We apply STL Decomposition on the transformed ASX price index time series data

In the above plot, we observe a smoothed plot of the original time series plot of ASX price index

The effect of seasonality is seen in the second plot.

Highest ASX price index values were recorded in the month of March while lowest ASX price index values were recorded in the months of October-November.

In the remainder part, there are some significant fluctuations and changing variance is present

We display the forecasts for ASX Price index from STL Method

Naive forecasts represent the mean level for ASX Price Index

The following time series plot of Gold price (AUD) starting from January 2004

Analysis & Interpretation

ACF and PACF plots

In the ACF Plot there are strong correlations between the elements of the time series. The lags seen are correlations of the observations of previous time stamps.We can conclude there is a presence of Seasonality in the time series data due to a wave like pattern in the ACF plot.

PACF plot has a significant spike only at lag 1. Hence, we say that all the higher-order autocorrelations are explained by the lag-1 autocorrelation

Augumented Dickey-Fuller Unit Root Test to check the existence of non-stationarity in the Gold price (AUD)

From Augmented Dickey-Fuller Unit Root Test, the p_value = 0.1545 and value of test_statistic = 0.9687

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we assume that the data is non-stationary.

Further we test the assumption of existence of non-stationarity in the Gold price index using Phillips-Perron Unit Root Test

From the Phillips-Perron Unit Root Test, the p_value = 0.5328 when truncation lag parameter is 4 and p_value = 0.6324 when truncation lag parameter is 13

As the p_value is greater that 5% level of significance, we fail to reject the null hypothesis. Hence, we conclude that the data is non-stationary.

We apply a Box-Cox transformation to deal with non-stationarity in Gold price index

The value of lambda is 1.05

Here, we can see no difference in the time series plot even after applying a box-cox transformation

This can be checked after applying adf and phillips-perron unit root test.

We can see the p_value of both Augmented Dickey-Fuller and Phillips-perron Unit root test is greater than 5% level of significance. Hence we fail to reject the null hypothesis and conclude that the non-stationarity in the Gold price (AUD) still exists.

Applying first order ordinary differencing

Time Series Plot applying first ordinary differencing

Time series plot of the transformed data shows that the trend is however reduced. But we can still intervention points. Also, the Auto-regressive pattern is still visible.

ACF and PACF plots of the transformed data

From the ACF and PACF plots we can see the decaying pattern has faded. There are no significant lags in the both the plots.

Augmented Dickey Fuller and Phillips-Perron Unit root test on the transformed data.

From the above results, we can see the p_value for both Augmented Dickey Fuller and Phillips-Perron Unit root are less than 5% level of significance.

Hence, we reject the null hypothesis stating the transformed data is stationary.

We apply a X12 decomposition on the Gold price (AUD) time series data.

There are minute deviances from the trend component in the seasonally adjusted series. An increase in the Gold price is due to effect of seasonality.

During the period of great recession i.e. from 2007 to 2009, we see a fall in the Gold price which has also decreased the seasonality in the plot.

SI Ratios plot to check whether the seasonality flucuates around the mean

We observe seasonality this plot as the values fluctuate at the mean during the months of July,August and September followed by January, February and March.

We apply STL Decomposition on the transformed Gold price (AUD) time series data

In the above plot, we observe a smoothed plot of the original time series plot of Gold price

The effect of seasonality is seen in the second plot.

Highest Gold price values were recorded in the month of September while lowest Gold price values were recorded in the months of March

In the remainder part, there are some significant fluctuations and changing variance is present