26-08-2023library(readr)
library(forecast) # checkresiduals()
library(TSA) # season() and other functions
library(urca) # for ur.df()
library(tseries) # for adf.test()
library(stats) # for classical decomposition
library(x12) # X-12-ARIMA decomposition
library(GGally) #for ggpairs()
library(car) #for vif()
library(dLagM) # For DLM modelling
library(lmtest) # for bgtest()
The dataset hold 4 columns and 161 observations. They are, monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004.
Our aim for the ASX dataset is to determine the most accurate and suitable regression model that determines ASX Price Index using at least one of the 3 regressors. A descriptive analysis and decomposition of the response variable will be conducted initially. Model-building strategy will be applied to find the best fitting model from the 4 different Distributed Lag Models.
ASX = read.csv("C:/Users/admin/Downloads/ASX_data.csv", header = TRUE)
ASX$Gold.price <- as.numeric(gsub(",", "", ASX$Gold.price))
ASX$Copper_USD.tonne <- as.numeric(gsub(",", "", ASX$Copper_USD.tonne))
# Convert from character to numeric
ASXPrice.ts = ts(ASX$ASX.price, start = c(2004,1), frequency = 12)
head(ASXPrice.ts, 36)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2004 2935.4 2778.4 2848.6 2970.9 2979.8 2999.7 3106.7 3202.9 3176.2 3282.4
## 2005 3283.6 3372.5 3416.4 3407.7 3456.9 3530.3 3546.1 3561.9 3674.7 3786.3
## 2006 4880.2 4878.4 5087.2 5207.0 4972.3 5034.0 4957.1 5079.8 5113.0 5352.9
## Nov Dec
## 2004 3195.7 3306.0
## 2005 3942.8 4053.1
## 2006 5461.6 5644.3
For fitting a regression model, the response is ASX Price Index (ASX.price) and the 3 regressor variables are the Gold price (Gold.price), Crude Oil in USD price per barrel (Crude.Oil.Brent.USD.bbl) and Copper price in USD per tonne (CopperUSD.tonne).
All the 4 variables are continuous variables.
Lets first get the regressor and 3 response as TS objects,
ASXPrice.ts = ts(ASX$ASX.price, start = c(2004,1), frequency = 12) # Y
GoldPrice.ts = ts(ASX$Gold.price, start = c(2004,1), frequency = 12) #X1
CrudeOilPrice.ts = ts(ASX$Crude.Oil..Brent._USD.bbl, start = c(2004,1), frequency = 12) # X2
CopperPrice.ts = ts(ASX$Copper_USD.tonne, start = c(2004,1), frequency = 12) # X3
data.ts = ts(ASX, start = c(2004,1), frequency = 12) # Y, X1, X2, X3 all in single dataframe
Lets scale, center and plot all the 4 variables together
data.scale = scale(data.ts)
plot(data.scale, plot.type="s", col=c("black", "red", "blue", "green"), main = "ASX Price Index (Black - Respone), Gold Price (Red - X1),\n Oil price (Blue - X2), Copper Price (Green - X2)")
It is hard to read the correlations between the regressors and the response and the among the response themselves. But it is fair to say the 4 variables show some correlations. Lets check for correlation statistically using ggpairs(),
ggpairs(data = ASX, columns = c(1,2,3,4), progress = FALSE) #library(ggally)
As shown in the matrix above,
- Response ASX price index and
regressors Gold price, Crude oil price and copper price are correlated
by 0.343, 0.329 and 0.562 correlation coefficient. - The regressors
themselves are correlated by 0.437 (Gold and Crude oil price), 0.536
(Gold and Copper price) and 0.866 (Crude oil and Copper price)
Hence, some correlations between the 3 regressor and response is present. We can generate regression model based on these correlations. First, lets look at the descriptive statistics
Since we are generating regression model which estimates the response, \(ASX Price Index\), lets focus on ASX.Price’s statistics.
summary(ASXPrice.ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2778 4325 4929 4813 5448 6779
The mean and median of the ASX Price Index are very close indicating symmetrical distribution.
The time series plot for our data is generated using the following code chunk,
plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Figure 1: ASX All Ordinaries Price Index Trend from January, 2004")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))
Plot Inference :
From Figure 1, we can comment on the time series’s,
Trend: The overall shape of the trend seems to follow an upward trend except from 2008 to 2009. Thus, indicating non-stationarity.
Seasonality: From the plot, no noticeable seasonal behavior is seen as no clear repeating patterns are visible. This needs to be confirmed using statistical tests.
Change in Variance: A change in variance can be seen as the variation is much higher during the earlier years than the latter.
Behavior: We notice mixed behavior of MA and AR series. In the earlier years (2004-2009), AR behavior is dominant as we obverse more following data points. In the latter years (2010-2017), MA behavior is more evident due to more up and down fluctuations than following data points.
Intervention/Change points: Steep drop around 2008 might be an intervention point. This needs to be confirmed using statistical tests.
par(mfrow=c(2,1))
acf(ASXPrice.ts, main="ACF of ASX All Ordinaries Price Index")
pacf(ASXPrice.ts, main ="PACF of ASX All Ordinaries Price Index")
par(mfrow=c(1,1))
ACF plot: We notice multiple autocorrelations are significant. A slowly decaying pattern indicates non stationary series. We do not see any ‘wavish’ form. Thus, no seasonal behavior is observed. We did not observe seasonality in time series plot as well.
PACF plot: We see 1 high vertical spike indicating non stationary series. We have observed non stationarity in the time series plot as well. Also, the second correlation bar is significant as well.
Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.
qqnorm(ASXPrice.ts, main = "Normal Q-Q Plot of ASX Price Index Time Series")
qqline(ASXPrice.ts, col = 2)
We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(ASXPrice.ts)
##
## Shapiro-Wilk normality test
##
## data: ASXPrice.ts
## W = 0.9709, p-value = 0.001786
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, ASX Price Index is not normally distributed.
The ACF, PACF and time series plots of the \(ASX Price Index\) time series at the descriptive analysis stage of time series tells us nonstationarity in our time series. Lets use ADF and PP tests,
Using ADF (Augmented Dickey-Fuller) test :
Lets confirm the non-stationarity using Dickey-Fuller Test or ADF
test. Lets state the hypothesis,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
3 approaches with different method of calculating lag lengths will be
considered. We first calculate the lag length and then check the test
results.
k = ar(diff(ASXPrice.ts))$order
print(k) # Lag length
## [1] 0
adf.test(ASXPrice.ts, k = k)
##
## Augmented Dickey-Fuller Test
##
## data: ASXPrice.ts
## Dickey-Fuller = -1.9684, Lag order = 0, p-value = 0.5895
## alternative hypothesis: stationary
k = trunc(12*((length(ASXPrice.ts)/100)^(1/4)))
print(k)
## [1] 13
adf.test(ASXPrice.ts, k = k)
##
## Augmented Dickey-Fuller Test
##
## data: ASXPrice.ts
## Dickey-Fuller = -2.8677, Lag order = 13, p-value = 0.2144
## alternative hypothesis: stationary
test = adf.test(ASXPrice.ts)
test$parameter
## Lag order
## 5
print(test)
##
## Augmented Dickey-Fuller Test
##
## data: ASXPrice.ts
## Dickey-Fuller = -2.6995, Lag order = 5, p-value = 0.2846
## alternative hypothesis: stationary
From all versions of the test, since p-value > 0.05, we fail to reject null hypothesis of non stationarity. we can conclude that the series is non stationary at 5% level of significance.
Using PP (Phillips-Perron) test :
The null and alternate hypothesis are same as ADF test. For the PP test, we have two options for the lag length. If lshort parameter is set to TRUE we get k=⌈4∗(N/100)1/4)⌉, otherwise we get k=⌈12∗(N/100)1/4)⌉.
PP.test(ASXPrice.ts, lshort = TRUE)
##
## Phillips-Perron Unit Root Test
##
## data: ASXPrice.ts
## Dickey-Fuller = -2.2284, Truncation lag parameter = 4, p-value = 0.4811
PP.test(ASXPrice.ts, lshort = FALSE)
##
## Phillips-Perron Unit Root Test
##
## data: ASXPrice.ts
## Dickey-Fuller = -2.3932, Truncation lag parameter = 13, p-value =
## 0.4123
According to the 2 PP tests, ASX Price Index series is non stationary at 5% level.
Lets perform with Box-Cox transformation,
To improve normality and increasing variance in our ASX Price Index time series, lets test Box-Cox transformations on the series.
lambda = BoxCox.lambda(ASXPrice.ts, method = "loglik")
BC.ASXPrice.ts = BoxCox(ASXPrice.ts, lambda = lambda)
Visually comparing the time series plots before and after box-cox transformation,
par(mfrow=c(2,1))
plot(BC.ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts,x=time(BC.ASXPrice.ts), pch=as.vector(season(BC.ASXPrice.ts)))
plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))
par(mfrow=c(1,1))
From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,
shapiro.test(BC.ASXPrice.ts)
##
## Shapiro-Wilk normality test
##
## data: BC.ASXPrice.ts
## W = 0.97203, p-value = 0.002376
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed ASX Price Index is not normal.
Now, lets perform Differencing with the aim to improve stationarity in our ASX Price Index time series.
The ACF, PACF and time series plots at the descriptive analysis stage of time series tells us nonstationarity in our time series. This was confirmed using ADF and PP tesdt. This non-stationarity is still present in BC transformed series as no actions are taken yet to flatten the trend curve. Lets confirm statistically,
Lets check if transformation has improved stationarity of our
non-stationary ASX Price Index series.
Using ADF test
adf.BC.ASXPrice.ts = ur.df(BC.ASXPrice.ts, type = "none", lags = 1, selectlags = "AIC")
summary(adf.BC.ASXPrice.ts)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1116.68 -156.30 56.77 211.31 1232.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 0.002123 0.003472 0.612 0.542
## z.diff.lag 0.083995 0.079916 1.051 0.295
##
## Residual standard error: 311.1 on 157 degrees of freedom
## Multiple R-squared: 0.01029, Adjusted R-squared: -0.002316
## F-statistic: 0.8163 on 2 and 157 DF, p-value: 0.4439
##
##
## Value of test-statistic is: 0.6115
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
Using PP test
PP.test(BC.ASXPrice.ts, lshort = TRUE)
##
## Phillips-Perron Unit Root Test
##
## data: BC.ASXPrice.ts
## Dickey-Fuller = -2.2244, Truncation lag parameter = 4, p-value = 0.4827
PP.test(BC.ASXPrice.ts, lshort = FALSE)
##
## Phillips-Perron Unit Root Test
##
## data: BC.ASXPrice.ts
## Dickey-Fuller = -2.3922, Truncation lag parameter = 13, p-value =
## 0.4127
In all cases, Since p-value > 0.05, we fail to reject null hypothesis of non stationarity. we can conclude that the BC transformed ASX Price Index series is non stationary at 5% level of significance.
Since no seasonality was observed at the descriptive analysis stage (from ACF, PACF or time series plot), we expect ordinary differencing should suffice. Lets compare the seasonally differenced and ordinary differenced series w.r.t the original time series.
BC.ASXPrice.ts.Orddiff = diff(BC.ASXPrice.ts, differences = 1) # First Ordinary difference
BC.ASXPrice.ts.Seasdiff = diff(BC.ASXPrice.ts, differences = 1, lag=12) # First seasonal difference
par(mfrow=c(3,1))
plot(BC.ASXPrice.ts.Orddiff,ylab='Landings in metric tons',
xlab='Year',main = "First Ordinary Differenced Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts.Orddiff,x=time(BC.ASXPrice.ts.Orddiff), pch=as.vector(season(BC.ASXPrice.ts.Orddiff)))
plot(BC.ASXPrice.ts.Seasdiff,ylab='Landings in metric tons',
xlab='Year',main = "First Seasonal Differenced Box-Cox Transformed ASX Price Index Time Series")
points(y=BC.ASXPrice.ts.Seasdiff,x=time(BC.ASXPrice.ts.Seasdiff), pch=as.vector(season(BC.ASXPrice.ts.Seasdiff)))
plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))
par(mfrow=c(1,1))
As expected, seasonal differencing is useless in this series. Ordinary differencing of first order makes the series look much more stationary. Lets confirm statistically,
adf.BC.ASXPrice.ts.Orddiff = ur.df(BC.ASXPrice.ts.Orddiff, type = "none", lags = 1, selectlags = "AIC")
summary(adf.BC.ASXPrice.ts.Orddiff)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1059.25 -154.88 61.15 215.28 1214.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.7869 0.1072 -7.343 1.08e-11 ***
## z.diff.lag -0.1342 0.0794 -1.690 0.093 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 309.5 on 156 degrees of freedom
## Multiple R-squared: 0.4631, Adjusted R-squared: 0.4562
## F-statistic: 67.29 on 2 and 156 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -7.3432
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
PP.test(BC.ASXPrice.ts.Orddiff, lshort = TRUE)
##
## Phillips-Perron Unit Root Test
##
## data: BC.ASXPrice.ts.Orddiff
## Dickey-Fuller = -11.683, Truncation lag parameter = 4, p-value = 0.01
PP.test(BC.ASXPrice.ts.Orddiff, lshort = FALSE)
##
## Phillips-Perron Unit Root Test
##
## data: BC.ASXPrice.ts.Orddiff
## Dickey-Fuller = -11.859, Truncation lag parameter = 13, p-value = 0.01
From the ADF and PP tests, Since p-value < 0.05, we reject the null hypothesis of non stationarity. Thus, First order Ordinary Differenced BC transformed ASX Price Index series is Stationary at 5% level of significance.
Note, 1st Order Differenced BC transformed ASX Price Index is still not normal. We proceed as such.
shapiro.test(BC.ASXPrice.ts.Orddiff)
##
## Shapiro-Wilk normality test
##
## data: BC.ASXPrice.ts.Orddiff
## W = 0.94862, p-value = 1.373e-05
The First order Ordinary Differenced BC transformed ASX Price Index series is Stationary and not normal. BC transformation was not affective.
To observe the individual effects of the existing components and historical effects occurred in the past, lets perform decomposition of the ASX Price Index time series. The time series can be decomposed into are seasonal, trend and remainder components.
Three main decomposition methods for time series are known, classical decomposition, X-12-ARIMA decomposition and Loess (STL) decomposition.
To implement the classical decomposition, we use decompose() function from the stats package.
fit.classical.add <- decompose(ASXPrice.ts, type="additive")
plot(fit.classical.add)
fit.classical.mult <- decompose(ASXPrice.ts, type="multiplicative")
plot(fit.classical.mult)
Notice that a seasonal component was extracted from the series which wasn’t identified during the descriptive analysis stage. We got very similar estimates from the additive and multiplicative methods. It seems from the remainder series that the multiplicative model handles the decomposition slightly better (lesser variance).
Although using the classical decomposition, a seasonal component was identified, we do not know how significant this seasonal effect is. X-12-ARIMA decomposition can help us detect the significance of the seasonal component. We use the function x12() to implement the X-12-ARIMA decomposition.
fit.x12 = x12(ASXPrice.ts)
plot(fit.x12 , sa=TRUE , trend=TRUE)
From the plot, we can see the Seasonally Adjusted series follows the
original series very closely (especially during earlier years). This
implies, that the original series is not affected by seasonality much,
meaning seasonality is less significant in the ASX Price Index series.
On the contrary, the Trend series deviates quite a lot from the original
series (especially at the ups and downs), implying the Original series
has significant trend component.
At this point, we believe the original series has very less seasonal component. Lets confirm this using SI (Seasonal & Irregular) Chart and also check irregular component’s contribution to the original series. SI is basically the Original series minus the trend component, which leaves behind the Seasonal and Irregular component (hence the name SI).
plotSeasFac(fit.x12)
The green points represent the SIs obtained from the time series, while
the solid black line shows the seasonal component.
SI charts are useful in determining whether short-term movements are caused by seasonal or irregular influences. In the graph above, the SIs can be seen to fluctuate erratically, which indicates the time series under analysis is dominated by its irregular component.
SI charts are also used to identify seasonal breaks, moving holiday patterns and extreme values in a time series. Since the seasonal pattern changes randomly over the 12 months, no particular intervention point is identified. We did not identify any intervention points at the descriptive analysis stage as well (from the Time series plot).
In this plot, replaced SI ratios are the SI ratios with the extreme values replaced. It is also observed from SI ratios that we have outliers or influential observations for almost all the 12 months (especially Nov and Aug). Since we identify too many influential observations, it is fair to say that they are likely not outliers.
Another decomposition method, STL has advantage of flexibility over classical decomposition. stl() function is used with t.window and s.window arguments to control rate of change of trend and seasonal components. Lets set t.window to 15 and look the STL decomposed plots,
stl.ASXPrice.ts <- stl(window(ASXPrice.ts,start=c(2004,1)), t.window=15, s.window="periodic", robust=TRUE)
plot(stl.ASXPrice.ts)
In this output, the first plot is the smoothed version of the original time series. The second one shows the seasonal component. The third one is for the trend component and the last one is the time series bar chart for the remaining series.
We can display seasonal sub-series using the monthplot() function. This helps us to get a sense of the variation in the seasonal component over time.
monthplot(stl.ASXPrice.ts$time.series[,"seasonal"],choice = "seasonal", main="Seasonal sub-series plot of the
seasonal component of ASX series", ylab="Seasonal")
We get only mean levels in this plot because seasonal component repeats
itself without changing. Lowest seasonal ASX Price Index value occurs in
September and highest in April.
We can also display a monthly trend component by setting choice = “trend”.
monthplot(stl.ASXPrice.ts,choice = "trend", main="Seasonal sub-series plot of the
seasonal component of ASX series", ylab="Trend")
From this visualization, we can observe the same trend pattern for each month but with slight change in variance over the months.
After having the decomposition, we can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,
par(mfrow=c(2,1))
plot(seasadj(stl.ASXPrice.ts), ylab='ASX Price Index',xlab='Time', main = "Seasonally adjusted ASX Price Index")
points(y=seasadj(stl.ASXPrice.ts),x=time(seasadj(stl.ASXPrice.ts)), pch=as.vector(season(seasadj(stl.ASXPrice.ts))))
plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))
par(mfrow=c(1,1))
No change is visually seen in the seasonally adjusted series compared to the original series. Thus, seasonal component is insignificant.
After the decomposition, we can adjust for trend as well. For this, we need to subtract the trend component from the original series. The following code chunk illustrates this approach.
par(mfrow=c(2,1))
stl.ASXPrice.ts.trend = stl.ASXPrice.ts$time.series[,"trend"] # Extract the trend component from the output
stl.ASXPrice.ts.trend.adjusted = ASXPrice.ts - stl.ASXPrice.ts.trend
plot(stl.ASXPrice.ts.trend.adjusted, ylab='ASX Price Index',xlab='Time', main = "Trend adjusted ASX Price Index")
points(y=stl.ASXPrice.ts.trend.adjusted,x=time(stl.ASXPrice.ts.trend.adjusted), pch=as.vector(season(stl.ASXPrice.ts.trend.adjusted)))
plot(ASXPrice.ts,ylab='ASX Price Index',xlab='Time',
type='o', main="Original ASX Price Index Time Series")
points(y=ASXPrice.ts,x=time(ASXPrice.ts), pch=as.vector(season(ASXPrice.ts)))
par(mfrow=c(1,1))
The Trend adjusted series is completely different from the original series, implying the trend component of the series is the most influential component of the original series.
Seasonal component is insignificant when comparing seasonally adjusted series with the original series. From the SI ratio, we found irregular component dominating the seasonal component. No significant intervention point or outliers were identified. The trend component of the series is the most influential component of the original series.
A regression method, so called, Distributed lag models (DLMs) which
uses several lags of a predictor variable in the model as explanatory
variables will be applied. Based on whether the lags are known (Finite
DLM) or undetermined (Infinite DLM), 4 major modelling methods will be
tested, namely,
The response of a finite DLM model with 1 regressor is represented as shown below,
\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s
X_{t-s} + \epsilon_t\)
where,
In our dataset, we have 3 regressors, hence the model equation has X1, X2 and X3 instead of just one regressor.
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, q.min = 1, q.max = 12,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 12 12 3.29042 2411.054 2534.215 3.97425 1.22784 0.17221 0
## 11 11 3.37268 2425.664 2540.068 4.08759 1.12332 0.18832 0
## 10 10 3.45015 2440.112 2545.716 4.17908 -0.09314 0.20653 0
## 9 9 3.51306 2452.691 2549.455 4.32629 0.92363 0.22986 0
## 8 8 3.54824 2464.680 2552.563 4.33153 1.13867 0.25664 0
## 7 7 3.58483 2476.180 2555.141 4.18334 1.59915 0.28265 0
## 6 6 3.61856 2488.492 2558.491 4.36746 1.11068 0.30501 0
## 5 5 3.63787 2500.567 2561.564 4.45774 0.93643 0.32883 0
## 4 4 3.67485 2513.098 2565.055 4.26210 1.02182 0.34904 0
## 3 3 3.75026 2525.161 2568.037 4.80762 0.61340 0.36951 0
## 2 2 3.80734 2538.017 2571.775 4.87590 0.68063 0.38758 0
## 1 1 3.87272 2552.754 2577.355 4.84385 0.91834 0.39832 0
q = 12 has the smallest AIC and BIC scores. Fit model with q = 12,
DLM.model = dlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, q = 12)
Note - we are using ASX.price and not the BC.ASXPrice.ts.Orddiff (Differenced and transformed Price Index) as normality is violated in both of these.
We can apply a diagnostic check using checkresiduals() function from the forecast package.
checkresiduals(DLM.model$model$residuals) # forecast package
##
## Ljung-Box test
##
## data: Residuals
## Q* = 682.93, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
In this output,
bgtest(DLM.model$model)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: DLM.model$model
## LM test = 133.5, df = 1, p-value < 2.2e-16
Which has,
\(H_0\) : there is
no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any
order up to p
According to this test and ACF plot, we can conclude that the serial correlation left in residuals is highly significant.
summary(DLM.model)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1171.74 -507.71 16.21 535.37 1122.14
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.163e+03 2.546e+02 16.350 <2e-16 ***
## Gold.price.t -1.894e+00 1.310e+00 -1.445 0.151
## Gold.price.1 -2.839e-02 1.860e+00 -0.015 0.988
## Gold.price.2 -8.698e-02 1.896e+00 -0.046 0.963
## Gold.price.3 -3.677e-03 1.912e+00 -0.002 0.998
## Gold.price.4 -4.397e-01 1.893e+00 -0.232 0.817
## Gold.price.5 5.175e-01 1.888e+00 0.274 0.785
## Gold.price.6 -4.749e-01 1.899e+00 -0.250 0.803
## Gold.price.7 3.374e-01 1.911e+00 0.177 0.860
## Gold.price.8 -2.941e-01 1.893e+00 -0.155 0.877
## Gold.price.9 5.555e-02 1.874e+00 0.030 0.976
## Gold.price.10 3.781e-01 1.871e+00 0.202 0.840
## Gold.price.11 -1.477e-01 1.859e+00 -0.079 0.937
## Gold.price.12 1.753e+00 1.251e+00 1.402 0.164
## Crude.Oil..Brent._USD.bbl.t -1.546e+01 1.276e+01 -1.211 0.228
## Crude.Oil..Brent._USD.bbl.1 3.390e+00 1.931e+01 0.176 0.861
## Crude.Oil..Brent._USD.bbl.2 -5.230e+00 1.967e+01 -0.266 0.791
## Crude.Oil..Brent._USD.bbl.3 -2.215e+00 2.005e+01 -0.111 0.912
## Crude.Oil..Brent._USD.bbl.4 -1.226e+01 2.030e+01 -0.604 0.547
## Crude.Oil..Brent._USD.bbl.5 1.436e+00 2.012e+01 0.071 0.943
## Crude.Oil..Brent._USD.bbl.6 -8.810e-01 2.014e+01 -0.044 0.965
## Crude.Oil..Brent._USD.bbl.7 -4.581e+00 2.088e+01 -0.219 0.827
## Crude.Oil..Brent._USD.bbl.8 2.792e+00 2.106e+01 0.133 0.895
## Crude.Oil..Brent._USD.bbl.9 -5.310e+00 2.125e+01 -0.250 0.803
## Crude.Oil..Brent._USD.bbl.10 7.420e+00 2.120e+01 0.350 0.727
## Crude.Oil..Brent._USD.bbl.11 -3.211e+00 2.101e+01 -0.153 0.879
## Crude.Oil..Brent._USD.bbl.12 1.180e+00 1.317e+01 0.090 0.929
## Copper_USD.tonne.t 2.219e-01 1.653e-01 1.343 0.182
## Copper_USD.tonne.1 4.800e-02 2.464e-01 0.195 0.846
## Copper_USD.tonne.2 5.181e-02 2.476e-01 0.209 0.835
## Copper_USD.tonne.3 6.852e-02 2.435e-01 0.281 0.779
## Copper_USD.tonne.4 1.015e-01 2.449e-01 0.414 0.679
## Copper_USD.tonne.5 1.918e-02 2.494e-01 0.077 0.939
## Copper_USD.tonne.6 1.482e-02 2.489e-01 0.060 0.953
## Copper_USD.tonne.7 7.472e-02 2.517e-01 0.297 0.767
## Copper_USD.tonne.8 -1.335e-02 2.565e-01 -0.052 0.959
## Copper_USD.tonne.9 1.278e-02 2.562e-01 0.050 0.960
## Copper_USD.tonne.10 -1.237e-01 2.525e-01 -0.490 0.625
## Copper_USD.tonne.11 2.773e-02 2.473e-01 0.112 0.911
## Copper_USD.tonne.12 1.192e-01 1.710e-01 0.698 0.487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 701.3 on 109 degrees of freedom
## Multiple R-squared: 0.3903, Adjusted R-squared: 0.1722
## F-statistic: 1.789 on 39 and 109 DF, p-value: 0.009903
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2411.054 2534.216
The Finite DLM model is significant at 5% percent significance level with R-squared value value of 39.03% . None of the 12 lagged regressors for the 3 regressors are found to be significant. Looking at the values of coefficients, the highest lag effect is seen for the lag of 1 of Crude oil price. However, we need to consider the effect of collinearity on these results. To inspect this issue, we will display variance inflation factors (VIFs). If the value of VIF is greater than 10, we can conclude that the effect of multicollinearity is high.
VIF.DLM.model = vif(DLM.model$model) # variance inflation factors
VIF.DLM.model
## Gold.price.t Gold.price.1
## 71.11600 145.50539
## Gold.price.2 Gold.price.3
## 153.29844 158.41381
## Gold.price.4 Gold.price.5
## 157.31232 158.70864
## Gold.price.6 Gold.price.7
## 163.17932 167.18117
## Gold.price.8 Gold.price.9
## 165.50486 162.99642
## Gold.price.10 Gold.price.11
## 162.90794 160.54574
## Gold.price.12 Crude.Oil..Brent._USD.bbl.t
## 72.66813 39.35489
## Crude.Oil..Brent._USD.bbl.1 Crude.Oil..Brent._USD.bbl.2
## 91.19183 96.06632
## Crude.Oil..Brent._USD.bbl.3 Crude.Oil..Brent._USD.bbl.4
## 101.07363 105.33100
## Crude.Oil..Brent._USD.bbl.5 Crude.Oil..Brent._USD.bbl.6
## 104.83777 106.58631
## Crude.Oil..Brent._USD.bbl.7 Crude.Oil..Brent._USD.bbl.8
## 115.81473 119.52918
## Crude.Oil..Brent._USD.bbl.9 Crude.Oil..Brent._USD.bbl.10
## 123.26762 123.78057
## Crude.Oil..Brent._USD.bbl.11 Crude.Oil..Brent._USD.bbl.12
## 122.39788 48.51511
## Copper_USD.tonne.t Copper_USD.tonne.1
## 27.58041 63.35766
## Copper_USD.tonne.2 Copper_USD.tonne.3
## 66.17587 66.21162
## Copper_USD.tonne.4 Copper_USD.tonne.5
## 69.38872 74.44770
## Copper_USD.tonne.6 Copper_USD.tonne.7
## 76.67356 80.91336
## Copper_USD.tonne.8 Copper_USD.tonne.9
## 86.46023 88.70268
## Copper_USD.tonne.10 Copper_USD.tonne.11
## 88.48727 87.08307
## Copper_USD.tonne.12
## 42.65771
From the VIF values, it is obvious that the estimates of the finite DLM coefficients are suffering from the multicollinearity. To deal with this issue, we can use the restricted least squares method to find parameter estimates. In this approach, some restrictions are placed on the model parameters to reduce the variances of the estimators. In the context of DLMs, we translate the pattern of time effects into the restrictions on parameters. Polynomial DLM model can fix effect of multicollinearity.
If the lag weights follow a smooth polynomial pattern, then a polynomial DLM model will reduce/remove the multicollinearity. Lets fit a polynomial DLM of order 2 and check if the polynomial component (order 2) reduces multicollinearity. Lets do this for each of the 3 regressors individually.
For Gold price regressor:
PolyDLM.goldprice = polyDlm(x = as.vector(GoldPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.goldprice)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1546.79 -526.50 13.42 491.35 1894.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.673e+03 2.205e+02 21.195 <2e-16 ***
## z.t0 -4.054e-01 2.896e-01 -1.400 0.164
## z.t1 1.517e-01 1.356e-01 1.119 0.265
## z.t2 -9.683e-03 1.111e-02 -0.871 0.385
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 761.8 on 145 degrees of freedom
## Multiple R-squared: 0.04315, Adjusted R-squared: 0.02336
## F-statistic: 2.18 on 3 and 145 DF, p-value: 0.09295
Polynomial DLM model with gold price as regressor variable is insignificant at 5% significance level.
For Crude oil price regressor:
PolyDLM.crudeoilprice = polyDlm(x = as.vector(CrudeOilPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.crudeoilprice)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1620.88 -552.51 44.28 526.00 1678.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4819.26930 194.08408 24.831 < 2e-16 ***
## z.t0 4.93313 1.78495 2.764 0.00646 **
## z.t1 -2.11687 0.84765 -2.497 0.01363 *
## z.t2 0.15794 0.06961 2.269 0.02474 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 758.4 on 145 degrees of freedom
## Multiple R-squared: 0.05166, Adjusted R-squared: 0.03204
## F-statistic: 2.633 on 3 and 145 DF, p-value: 0.05221
Polynomial DLM model with crude oil price as regressor variable is insignificant at 5% significance level.
For Copper price regressor:
PolyDLM.copperprice = polyDlm(x = as.vector(CopperPrice.ts), y = as.vector(ASXPrice.ts), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM.copperprice)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1213.59 -610.66 26.99 585.36 1512.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.136e+03 2.163e+02 19.124 < 2e-16 ***
## z.t0 9.592e-02 2.532e-02 3.788 0.000222 ***
## z.t1 -2.872e-02 1.190e-02 -2.414 0.017022 *
## z.t2 1.723e-03 9.736e-04 1.770 0.078906 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 710.9 on 145 degrees of freedom
## Multiple R-squared: 0.1666, Adjusted R-squared: 0.1494
## F-statistic: 9.664 on 3 and 145 DF, p-value: 7.422e-06
Polynomial DLM model with copper price as regressor variable is significant at 5% significance level. Notice that the standard errors of estimators are much less (to the powers of -2 for polynomial DLM vs power of -1 for ordinary finite DLM) than their unconstrained (Finite DLM) counterparts. But the R-squared value is much less (16.66% for polynomial DLM vs 39.02% for ordinary DLM). The 0th and 1st order regressors of copper price variable are significant, but the 2nd order regressor (z.t2) is insignificant. This implies, the polynomial component of the regressor variable in the model is redundant and not necessary. Lets check if multicollinearity is fixed,
VIF.PolyDLM.copperprice = vif(PolyDLM.copperprice$model) # variance inflation factors
VIF.PolyDLM.copperprice
## z.t0 z.t1 z.t2
## 106.2733 963.4178 477.2208
Multicollinearity is still significant.
checkresiduals(PolyDLM.copperprice$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 789.7, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
In this output,
bgtest(PolyDLM.copperprice$model)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: PolyDLM.copperprice$model
## LM test = 134.95, df = 1, p-value < 2.2e-16
Which has,
\(H_0\) : there is
no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any
order up to p
According to this test and ACF plot, we can conclude that the serial correlation left in residuals is highly significant.
Thus, because of insignificant order 2 component (z.t2), lower R-squared statistic and no improvement in multicollinearity, and violations in the test of assumptions (residual analysis), Polynomial DLM model is not a good fitting model.
Here the lag weights are positive and decline geometrically. This
model is called infinite geometric DLM, meaning there are infinite lag
weights. Koyck transformation is applied to implement this infinite
geometric DLM model by subtracting the first lag of geometric DLM
multiplied by \(\phi\). The Koyck
transformed model is represented as,
\(Y_t = \delta_1 + \delta_2Y_{t-1} +
\nu_t\)
where \(\delta_1 = \alpha(1-\phi), \delta_2
= \phi, \delta_3 = \beta\) and the random error after the
transformation is \(\nu_t = (\epsilon_t
-\phi\epsilon_{t-1})\).
The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models for each of the 3 regressors individually.
Koyck.GoldPrice = koyckDlm(x = as.vector(ASX$Gold.price) , y = as.vector(ASX$ASX.price) )
summary(Koyck.GoldPrice$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -682.19 -105.44 15.86 135.04 783.60
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.902e+02 8.958e+01 2.123 0.0353 *
## Y.1 9.635e-01 1.909e-02 50.469 <2e-16 ***
## X.t 2.595e-03 4.304e-02 0.060 0.9520
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 8006.64 < 2e-16 ***
## Wu-Hausman 1 156 18.07 3.66e-05 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488, Adjusted R-squared: 0.9481
## Wald test: 1454 on 2 and 157 DF, p-value: < 2.2e-16
The Koyck transformed geometric DLM model with predictor, Gold price is significant. From the Weak Instruments line, we conclude that the model at the first stage of the least-squares fitting is significant at 5% level of significance. In this model, both \(\delta_1\) and \(\delta_2\) are significant, but \(\delta_3\) is insignificant at 5% level. Meaning ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Gold Price. Using the coefficients, \(\hat{\phi}\)=0.9635, the ASX Price Index will decline quickly at the rate of 0.9635 of last years ASX price Index.
From the Wu-Hausman test result in the model output, we reject the null hypothesis that the correlation between explanatory variable (\(Y_{t-1}\)) and the error term is zero (There is no endogeneity) at 5% level. So, there is a significant correlation between the explanatory variable and the error term at 5% level.
checkresiduals(Koyck.GoldPrice$model)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 7.3793, df = 10, p-value = 0.6892
##
## Model df: 0. Total lags used: 10
From the Breusch-Godfrey test and ACF plot, serial correlation left in residuals is not significant. Also, from the time series plot and histogram of residuals, normality and linearity is not violated. Thus, the general assumptions hold. Thus no assumptions are violated.
Koyck.CrudeOilPrice = koyckDlm(x = as.vector(ASX$Crude.Oil..Brent._USD.bbl) , y = as.vector(ASX$ASX.price) )
summary(Koyck.CrudeOilPrice$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -683.91 -108.66 13.68 139.77 762.55
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 209.89536 87.89368 2.388 0.0181 *
## Y.1 0.97537 0.01905 51.193 <2e-16 ***
## X.t -0.99907 0.58045 -1.721 0.0872 .
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 3018.5 < 2e-16 ***
## Wu-Hausman 1 156 11.2 0.00103 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949, Adjusted R-squared: 0.9483
## Wald test: 1461 on 2 and 157 DF, p-value: < 2.2e-16
The results of the Koyck geometric DLM with predictor, Crude oil are completely identical to that of Gold price koyck DLM. Here too ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Crude Oil Price.
checkresiduals(Koyck.CrudeOilPrice$model)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 5.4432, df = 10, p-value = 0.8597
##
## Model df: 0. Total lags used: 10
Again, no assumptions are violated. Serial correlations are not significant as per BG test and ACF plot. Also, Normality and Linearity are not violated.
Koyck.CopperPrice = koyckDlm(x = as.vector(ASX$Copper_USD.tonne) , y = as.vector(ASX$ASX.price) )
summary(Koyck.CopperPrice$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 1966.87 < 2e-16 ***
## Wu-Hausman 1 156 10.97 0.00115 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
Again same results. ASX Price index is significantly dependent on Last years ASX price index and is not significantly dependent on the current Copper Price.
checkresiduals(Koyck.CopperPrice$model)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 6.2327, df = 10, p-value = 0.7953
##
## Model df: 0. Total lags used: 10
Again, no assumptions are violated. Serial correlations are not significant as per BG test and ACF plot. Also, Normality and Linearity are not violated.
vif(Koyck.GoldPrice$model) # variance inflation factors
## Y.1 X.t
## 1.140648 1.140648
vif(Koyck.CrudeOilPrice$model) # variance inflation factors
## Y.1 X.t
## 1.14038 1.14038
vif(Koyck.CopperPrice$model) # variance inflation factors
## Y.1 X.t
## 1.493966 1.493966
For all 3 Koyck models, multicollinearity is insignificant.
All 3 koyck models with individual regressors are significant. Response, ASX Price index is not significantly dependent on the individual regressor variable, rather, ASX Price index is significantly dependent on Last years ASX price index. Multicollinearity is insignificant. None of the test of assumptions are violated. Each of the 3 koyck are good fitting models with high R-squared statistics (>90%).
In both Polynomial and Koyck models, the response ASX Price Index
cannot be represented using all 3 predictor variables at once (keeping
the ). Only one predictor can be chosen at once. Autoregressive
Distributed lag model is a flexible and parsimonious infinite DLM which
can incorporate all 3 predictors at once. The model is represented as,
\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1}
+ \gamma_1 Y_{t-1} + e_t\)
Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 12.
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 144 ARDL (since max lag for response and predictor of ARDL model is 12, i.e, p = q = 12 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:12){
for(j in 1:12){
model4.1 = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),],1) # Best model as per AIC
## p q AIC BIC
## 136 12 4 2006.823 2142.001
head(df[order( df[,4] ),],1) # Best model as per BIC
## p q AIC BIC
## 12 1 12 2007.304 2067.382
ARDL(12,4) and ARDL(1,12) are the best models as per AIC and BIC scores respectively. Now, lets fit these 2 models,
ARDL.12x4.ASXPrice = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = 12, q = 4)
summary(ARDL.12x4.ASXPrice)
##
## Time series regression with "ts" data:
## Start = 13, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -432.64 -103.68 -16.67 109.37 552.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 260.237963 118.616216 2.194 0.03044 *
## Gold.price.t -1.116842 0.338215 -3.302 0.00131 **
## Gold.price.1 0.836800 0.482073 1.736 0.08553 .
## Gold.price.2 0.485515 0.498837 0.973 0.33264
## Gold.price.3 -0.243140 0.499190 -0.487 0.62722
## Gold.price.4 -0.720131 0.493132 -1.460 0.14719
## Gold.price.5 1.155515 0.486102 2.377 0.01926 *
## Gold.price.6 -1.032707 0.499816 -2.066 0.04127 *
## Gold.price.7 1.181472 0.508301 2.324 0.02203 *
## Gold.price.8 -1.198228 0.506886 -2.364 0.01992 *
## Gold.price.9 0.521266 0.492774 1.058 0.29257
## Gold.price.10 -0.016666 0.485648 -0.034 0.97269
## Gold.price.11 0.100623 0.479856 0.210 0.83431
## Gold.price.12 0.033926 0.323335 0.105 0.91664
## Crude.Oil..Brent._USD.bbl.t 2.091690 3.408196 0.614 0.54073
## Crude.Oil..Brent._USD.bbl.1 -3.451248 5.106748 -0.676 0.50064
## Crude.Oil..Brent._USD.bbl.2 -5.683870 5.204351 -1.092 0.27727
## Crude.Oil..Brent._USD.bbl.3 4.563400 5.242218 0.871 0.38601
## Crude.Oil..Brent._USD.bbl.4 -9.525602 5.228455 -1.822 0.07132 .
## Crude.Oil..Brent._USD.bbl.5 13.481929 5.277609 2.555 0.01207 *
## Crude.Oil..Brent._USD.bbl.6 -2.387740 5.399458 -0.442 0.65924
## Crude.Oil..Brent._USD.bbl.7 -4.622657 5.575113 -0.829 0.40890
## Crude.Oil..Brent._USD.bbl.8 9.051488 5.592799 1.618 0.10857
## Crude.Oil..Brent._USD.bbl.9 -9.676900 5.545055 -1.745 0.08389 .
## Crude.Oil..Brent._USD.bbl.10 10.542622 5.589598 1.886 0.06204 .
## Crude.Oil..Brent._USD.bbl.11 -8.833586 5.565585 -1.587 0.11548
## Crude.Oil..Brent._USD.bbl.12 3.022640 3.436238 0.880 0.38106
## Copper_USD.tonne.t 0.082274 0.042636 1.930 0.05634 .
## Copper_USD.tonne.1 -0.037119 0.063789 -0.582 0.56188
## Copper_USD.tonne.2 0.067839 0.063736 1.064 0.28960
## Copper_USD.tonne.3 -0.083801 0.062908 -1.332 0.18571
## Copper_USD.tonne.4 0.056879 0.063209 0.900 0.37026
## Copper_USD.tonne.5 -0.081410 0.064432 -1.264 0.20921
## Copper_USD.tonne.6 0.034045 0.064376 0.529 0.59803
## Copper_USD.tonne.7 0.041368 0.064737 0.639 0.52421
## Copper_USD.tonne.8 -0.107062 0.066065 -1.621 0.10811
## Copper_USD.tonne.9 0.002464 0.066860 0.037 0.97067
## Copper_USD.tonne.10 -0.032143 0.065428 -0.491 0.62426
## Copper_USD.tonne.11 0.122397 0.064058 1.911 0.05877 .
## Copper_USD.tonne.12 -0.046667 0.044246 -1.055 0.29397
## ASX.price.1 0.836936 0.100927 8.292 3.98e-13 ***
## ASX.price.2 0.217088 0.132384 1.640 0.10403
## ASX.price.3 0.049227 0.129576 0.380 0.70478
## ASX.price.4 -0.148738 0.098447 -1.511 0.13383
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 179.2 on 105 degrees of freedom
## Multiple R-squared: 0.9617, Adjusted R-squared: 0.946
## F-statistic: 61.26 on 43 and 105 DF, p-value: < 2.2e-16
The Autoregressive Distributed Lag Model with 12 lags of independent variable (p=12) and 4 lags of dependent variable (q=4) is found to be significant at 5% significance level. It has high R-squared value of 96.17% indicating very good fit although many coefficients are insignificant in this model.
checkresiduals(ARDL.12x4.ASXPrice$model)
##
## Breusch-Godfrey test for serial correlation of order up to 47
##
## data: Residuals
## LM test = 57.545, df = 47, p-value = 0.1394
From the Breusch-Godfrey test and ACF plot, serial correlation left in residuals is not significant. Also, from the time series plot and histogram of residuals, normality and linearity is not violated. Thus, the general assumptions hold and no assumptions are violated.
ARDL.1x12.ASXPrice = ardlDlm(formula = ASX.price ~ Gold.price + Crude.Oil..Brent._USD.bbl + Copper_USD.tonne, data = ASX, p = 1, q = 12)
summary(ARDL.1x12.ASXPrice)
##
## Time series regression with "ts" data:
## Start = 13, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -528.25 -115.49 -7.69 117.76 657.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 359.563212 116.676097 3.082 0.002513 **
## Gold.price.t -1.239259 0.312968 -3.960 0.000123 ***
## Gold.price.1 1.254689 0.309800 4.050 8.75e-05 ***
## Crude.Oil..Brent._USD.bbl.t 2.042785 3.256102 0.627 0.531516
## Crude.Oil..Brent._USD.bbl.1 -4.662725 3.206261 -1.454 0.148285
## Copper_USD.tonne.t 0.071622 0.040401 1.773 0.078611 .
## Copper_USD.tonne.1 -0.040138 0.041172 -0.975 0.331429
## ASX.price.1 0.864563 0.085725 10.085 < 2e-16 ***
## ASX.price.2 0.216545 0.116093 1.865 0.064398 .
## ASX.price.3 -0.080201 0.112800 -0.711 0.478359
## ASX.price.4 -0.072744 0.112351 -0.647 0.518469
## ASX.price.5 0.031715 0.113283 0.280 0.779954
## ASX.price.6 0.011837 0.114467 0.103 0.917794
## ASX.price.7 -0.001875 0.113392 -0.017 0.986834
## ASX.price.8 0.022526 0.114748 0.196 0.844675
## ASX.price.9 -0.080082 0.113842 -0.703 0.483035
## ASX.price.10 0.010614 0.112841 0.094 0.925206
## ASX.price.11 0.036039 0.113442 0.318 0.751231
## ASX.price.12 -0.031336 0.079910 -0.392 0.695599
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 190.7 on 130 degrees of freedom
## Multiple R-squared: 0.9462, Adjusted R-squared: 0.9388
## F-statistic: 127.1 on 18 and 130 DF, p-value: < 2.2e-16
The Autoregressive Distributed Lag Model with 1 lags of independent variable (p=1) and 12 lags of dependent variable (q=12) is found to be significant at 5% significance level. It has R-squared value of 94.62% which is lesser compared to ARDL(12,4) model.
checkresiduals(ARDL.1x12.ASXPrice$model)
##
## Breusch-Godfrey test for serial correlation of order up to 22
##
## data: Residuals
## LM test = 20.881, df = 22, p-value = 0.5281
Results of residual analysis for ARDL(1,12) are same as ARDL(12,4). No assumptions are violated.
For ARDL(12,4),
vif(ARDL.12x4.ASXPrice$model)
## Gold.price L(Gold.price, 1)
## 72.61221 149.74710
## L(Gold.price, 2) L(Gold.price, 3)
## 162.62798 165.42749
## L(Gold.price, 4) L(Gold.price, 5)
## 163.58482 161.21089
## L(Gold.price, 6) L(Gold.price, 7)
## 173.12932 181.15457
## L(Gold.price, 8) L(Gold.price, 9)
## 181.80426 172.71411
## L(Gold.price, 10) L(Gold.price, 11)
## 168.14553 163.92087
## L(Gold.price, 12) Crude.Oil..Brent._USD.bbl
## 74.42449 42.98759
## L(Crude.Oil..Brent._USD.bbl, 1) L(Crude.Oil..Brent._USD.bbl, 2)
## 97.76707 103.01963
## L(Crude.Oil..Brent._USD.bbl, 3) L(Crude.Oil..Brent._USD.bbl, 4)
## 105.90665 107.06259
## L(Crude.Oil..Brent._USD.bbl, 5) L(Crude.Oil..Brent._USD.bbl, 6)
## 110.57013 117.37147
## L(Crude.Oil..Brent._USD.bbl, 7) L(Crude.Oil..Brent._USD.bbl, 8)
## 126.56606 129.15452
## L(Crude.Oil..Brent._USD.bbl, 9) L(Crude.Oil..Brent._USD.bbl, 10)
## 128.59271 131.82120
## L(Crude.Oil..Brent._USD.bbl, 11) L(Crude.Oil..Brent._USD.bbl, 12)
## 131.57342 50.60618
## Copper_USD.tonne L(Copper_USD.tonne, 1)
## 28.12775 65.04369
## L(Copper_USD.tonne, 2) L(Copper_USD.tonne, 3)
## 67.16237 67.73033
## L(Copper_USD.tonne, 4) L(Copper_USD.tonne, 5)
## 70.83413 76.14236
## L(Copper_USD.tonne, 6) L(Copper_USD.tonne, 7)
## 78.55734 81.99171
## L(Copper_USD.tonne, 8) L(Copper_USD.tonne, 9)
## 87.85832 92.56974
## L(Copper_USD.tonne, 10) L(Copper_USD.tonne, 11)
## 91.03595 89.54301
## L(Copper_USD.tonne, 12) L(ASX.price, 1)
## 43.78156 28.54591
## L(ASX.price, 2) L(ASX.price, 3)
## 50.18514 48.94920
## L(ASX.price, 4)
## 28.91733
For ARDL(1,12),
vif(ARDL.1x12.ASXPrice$model)
## Gold.price L(Gold.price, 1)
## 54.85837 54.56510
## Crude.Oil..Brent._USD.bbl L(Crude.Oil..Brent._USD.bbl, 1)
## 34.61848 34.00318
## Copper_USD.tonne L(Copper_USD.tonne, 1)
## 22.28432 23.90739
## L(ASX.price, 1) L(ASX.price, 2)
## 18.17035 34.05160
## L(ASX.price, 3) L(ASX.price, 4)
## 32.72897 33.22967
## L(ASX.price, 5) L(ASX.price, 6)
## 34.54566 36.11062
## L(ASX.price, 7) L(ASX.price, 8)
## 36.48050 38.47006
## L(ASX.price, 9) L(ASX.price, 10)
## 38.88476 39.33336
## L(ASX.price, 11) L(ASX.price, 12)
## 40.89081 20.81649
For both the ARDL models, multicollinearity is significant.
Best model as per AIC is ARDL(12,4) and as per BIC is ARDL(1,12). Both are significant. ARDL(12,4) has better fit as per R-squared statistic (96.17% vs 94.62%). Multicollinearity is significant for both ARDL(12,4) and ARDL(1,12). Diagnostic check for the model suggests presence of no serial correlation and no violation of general assumptions for both the models.
Lets summarize the 4 models,
Fit Finite DLM:
Polynomial DLM:
Koyck transformed geometric DLM:
Autoregressive DLM:
Lets compare all the models generated based on AIC and BIC scores. Also lets comp
Akaike’s (1973) Information Criterion (AIC) and Bayesian Information Criterion (BIC) are useful model specification and selection criterion.
AICBIC = data.frame(
model = c("Finite DLM model with all 3 predictors", "Polynomial DLM with gold price as predictor", "Polynomial DLM with crude oil price as predictor", "Polynomial DLM with copper price as predictor", "Koyck DLM with gold price as predictor", "Koyck DLM with crude oil price as predictor", "Koyck DLM with copper price as predictor", "ARDL(1,12) model", "ARDL(12,4) model"),
AIC = c(AIC(DLM.model), AIC(PolyDLM.goldprice), AIC(PolyDLM.crudeoilprice), AIC(PolyDLM.copperprice), AIC(Koyck.GoldPrice), AIC(Koyck.CrudeOilPrice), AIC(Koyck.CopperPrice), AIC(ARDL.1x12.ASXPrice), AIC(ARDL.12x4.ASXPrice)),
BIC = c(BIC(DLM.model), BIC(PolyDLM.goldprice), BIC(PolyDLM.crudeoilprice), BIC(PolyDLM.copperprice), BIC(Koyck.GoldPrice), BIC(Koyck.CrudeOilPrice), BIC(Koyck.CopperPrice), BIC(ARDL.1x12.ASXPrice), AIC(ARDL.12x4.ASXPrice))
)
## [1] 2411.054
## [1] 2406.216
## [1] 2404.884
## [1] 2385.629
## [1] 2156.757
## [1] 2156.172
## [1] 2157.456
## [1] 2007.304
## [1] 2006.823
## [1] 2534.216
## [1] 2421.236
## [1] 2419.904
## [1] 2400.649
## [1] 2169.058
## [1] 2168.473
## [1] 2169.757
## [1] 2067.382
## [1] 2006.823
AICBIC[order(AICBIC[,2], AICBIC[,2]),]
## model AIC BIC
## 9 ARDL(12,4) model 2006.823 2006.823
## 8 ARDL(1,12) model 2007.304 2067.382
## 6 Koyck DLM with crude oil price as predictor 2156.172 2168.473
## 5 Koyck DLM with gold price as predictor 2156.757 2169.058
## 7 Koyck DLM with copper price as predictor 2157.456 2169.757
## 4 Polynomial DLM with copper price as predictor 2385.629 2400.649
## 3 Polynomial DLM with crude oil price as predictor 2404.884 2419.904
## 2 Polynomial DLM with gold price as predictor 2406.216 2421.236
## 1 Finite DLM model with all 3 predictors 2411.054 2534.216
Conclusion from AIC and BIC scores:
ADRL(12,4) is the best fitting model with lowest AIC and BIC scores of all models.
Another way to test the fit of a model is by checking the error
estimates. Lower the error estimates, better the models fit. Lets
perform error estimation for all the fitted models and compare the
results. This is done using the gof(), goodness of fit function which
computes,
GoF(DLM.model, PolyDLM.goldprice, PolyDLM.crudeoilprice, PolyDLM.copperprice, Koyck.GoldPrice, Koyck.CrudeOilPrice, Koyck.CopperPrice, ARDL.1x12.ASXPrice, ARDL.12x4.ASXPrice)
## n MAE MPE MAPE sMAPE MASE
## DLM.model 149 531.4470 -0.016121296 0.11176072 0.10943918 3.2904182
## PolyDLM.goldprice 149 613.3683 -0.025090357 0.13083066 0.12652744 3.7976286
## PolyDLM.crudeoilprice 149 621.5757 -0.024890659 0.13243151 0.12789303 3.8484442
## PolyDLM.copperprice 149 627.1317 -0.021388636 0.13125574 0.12842174 3.8828433
## Koyck.GoldPrice 160 150.4315 -0.001880814 0.03138333 0.03119260 0.9691180
## Koyck.CrudeOilPrice 160 151.5705 -0.001997884 0.03175681 0.03157271 0.9764557
## Koyck.CopperPrice 160 151.2266 -0.001941328 0.03158104 0.03138867 0.9742402
## ARDL.1x12.ASXPrice 149 139.6163 -0.001397628 0.02821228 0.02820369 0.8644246
## ARDL.12x4.ASXPrice 149 119.3358 -0.001023640 0.02447001 0.02448218 0.7388595
## MSE MRAE GMRAE MBRAE UMBRAE
## DLM.model 359818.80 24.425863 3.9742464 1.2278371 -5.3891007
## PolyDLM.goldprice 564733.42 21.788791 4.1978095 1.5066470 -2.9737607
## PolyDLM.crudeoilprice 559709.43 22.665212 4.3265877 1.0682567 -15.6505852
## PolyDLM.copperprice 491855.47 25.511671 5.0280616 1.5012496 -2.9950139
## Koyck.GoldPrice 39809.28 1.471603 0.9743760 0.4756390 0.9070833
## Koyck.CrudeOilPrice 39663.94 1.658135 0.9730326 0.5464510 1.2048335
## Koyck.CopperPrice 39983.58 1.486736 0.9775711 0.4696419 0.8855186
## ARDL.1x12.ASXPrice 31745.37 2.764232 0.9125113 0.9418756 16.2044693
## ARDL.12x4.ASXPrice 22622.70 3.218118 0.7892615 0.4195087 0.7226787
Conclusion from Error Estimation:
From error estimates table, all the 9 error estimates are lowest for the ADRL(12,4) model.
Residual analysis to test model assumptions.
Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.
The estimator error (or residual) is defined by:
\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)
The following problems are to be checked,
Lets first apply diagnostic check using checkresiduals() function,
checkresiduals(ARDL.12x4.ASXPrice)
## Time Series:
## Start = 13
## End = 161
## Frequency = 1
## 13 14 15 16 17 18
## -210.3182789 -61.9471949 -112.5717250 -169.2609508 -42.7950870 -53.6276645
## 19 20 21 22 23 24
## 18.5015408 -146.3937766 114.8750503 17.5999960 131.6652188 204.0152710
## 25 26 27 28 29 30
## 552.0808621 70.1891709 -7.1444563 22.4894273 -175.8151702 37.7195206
## 31 32 33 34 35 36
## -60.1750716 140.3881786 -78.8805690 -51.1927740 -62.7706162 109.3735936
## 37 38 39 40 41 42
## 161.3517407 180.1245484 -18.0030434 61.9717481 294.6838400 57.1777430
## 43 44 45 46 47 48
## -70.6153451 -18.6198705 329.4644025 98.7740259 -124.3349186 -13.2097315
## 49 50 51 52 53 54
## -279.3139638 -39.4517736 -123.3145062 109.8835875 154.0309115 -304.2952014
## 55 56 57 58 59 60
## -87.3626050 220.2080254 -251.7105126 -183.7704695 -97.0812452 132.9816662
## 61 62 63 64 65 66
## -107.8591475 -120.1664832 -103.2956065 66.5110330 -108.1829293 127.6420477
## 67 68 69 70 71 72
## -74.6583973 13.1559107 76.8281247 -99.7463808 -124.7711574 155.6336961
## 73 74 75 76 77 78
## -432.6416462 -103.6800076 193.0200314 -188.6093659 -51.8036246 -150.9054447
## 79 80 81 82 83 84
## 75.3015126 12.5373665 138.0748404 -15.5538633 -203.3735412 -28.5073578
## 85 86 87 88 89 90
## -64.7526774 -120.6244520 -75.8245248 -61.2446281 55.0634077 32.3995289
## 91 92 93 94 95 96
## -65.8056196 163.2278312 -182.1912604 293.2943311 83.5324385 6.6131178
## 97 98 99 100 101 102
## -15.7521939 68.5512165 19.9295812 59.9243097 -121.1739470 -123.1585728
## 103 104 105 106 107 108
## -0.3467265 -77.9715030 63.9441852 163.6910060 -27.6319352 136.6313095
## 109 110 111 112 113 114
## 118.4868516 200.1345196 -193.7573161 -36.9144283 -168.0751639 159.0209419
## 115 116 117 118 119 120
## -302.6134082 289.4281745 192.6248619 116.2707536 -45.7039086 -72.4857072
## 121 122 123 124 125 126
## -197.0383696 271.0947063 -12.1976707 86.5223719 -5.0940747 -23.9782735
## 127 128 129 130 131 132
## 265.6977566 14.4814480 -288.3300434 160.7737713 -148.9056635 23.3065746
## 133 134 135 136 137 138
## 272.7917007 208.5641276 -191.1740825 -154.1661892 -187.7797825 -78.1116039
## 139 140 141 142 143 144
## 94.9744259 -175.6984961 -105.7749890 160.6106992 -106.8145852 -31.2532715
## 145 146 147 148 149 150
## -40.1566425 -57.7674021 -34.4500192 -7.0885487 241.3572934 49.4967086
## 151 152 153 154 155 156
## 331.2859201 -126.4277838 -16.6735741 -248.1037869 118.1521024 70.9457614
## 157 158 159 160 161
## -27.3683241 -34.3376048 117.0365707 102.4020674 -78.0667753
##
## Ljung-Box test
##
## data: Residuals
## Q* = 2.9651, df = 10, p-value = 0.9822
##
## Model df: 0. Total lags used: 10
From the Residuals plot, linearity is not violated as the residuals are randomly distributed accross the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,
mean(ARDL.12x4.ASXPrice$model$residuals)
## [1] -2.092938e-15
As mean value of residuals is very very close to 0, zero mean residuals is not violated
bgtest(ARDL.12x4.ASXPrice$model)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: ARDL.12x4.ASXPrice$model
## LM test = 1.5879, df = 1, p-value = 0.2076
Which has,
\(H_0\) : there is
no serial correlation of any order up to p
\(H_a\) : there is serial correlation of any
order up to p
According to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(ARDL.12x4.ASXPrice$model$residuals)
##
## Shapiro-Wilk normality test
##
## data: ARDL.12x4.ASXPrice$model$residuals
## W = 0.98856, p-value = 0.2628
From the Shapiro-Wilk test, since p > 0.05 significance level, we cannot reject the null hypothesis that states the data is normal. Thus, residuals of ARDL(12,4) are normally distributed.
Summarizing residual analysis on \(full\) model:
Assumption 1: The error terms are randomly distributed and thus show
linearity: Not violated
Assumption 2:
The mean value of E is zero (zero mean residuals): Not
violated
Assumption 4: The error terms are
independently distributed, i.e. they are not autocorrelated:
Not violated
Assumption 5: The errors
are normally distributed. not Violated
The best fitting model we could achieved for monthly averages of ASX All Ordinaries (Ords) Price Index using the regressors, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) is ADRL(12,4) model. This is backed by AIC, BIC and error estimates of Autoregressive Distributed Lag ADRL(12,4) model. This model has R-squared of 96.17% which is the highest of all models. Although transformation and 1st order ordinary differencing improved stationarity, normality could not be achieved. Hence the raw (without any transformation) response, ASX Price Index is used for modelling. Residual analysis of ARDL(12,4) shows no serial correlation and no assumptions are violated. This model has significant multicollinearity.