The aim of the investigation is to perform the analysis of the Australian Securities Exchange (ASX) data and observe the results to reach upon conclusions
The dataset used ASX.csv includes monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004.
#Load the dataset
ASX_data <- read_csv("D:/Rmit/Sem4/Forecasting/ASX_data.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## `ASX price` = col_double(),
## `Gold price` = col_number(),
## `Crude Oil (Brent)_USD/bbl` = col_double(),
## `Copper_USD/tonne` = col_number()
## )
colnames(ASX_data)
## [1] "ASX price" "Gold price"
## [3] "Crude Oil (Brent)_USD/bbl" "Copper_USD/tonne"
head(ASX_data,5)
## # A tibble: 5 x 4
## `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
## <dbl> <dbl> <dbl> <dbl>
## 1 2935. 612. 31.3 1650
## 2 2778. 603. 32.6 1682
## 3 2849. 566. 30.3 1656
## 4 2971. 539. 25.0 1588
## 5 2980. 549. 25.8 1651
summary(ASX_data)
## ASX price Gold price Crude Oil (Brent)_USD/bbl Copper_USD/tonne
## Min. :2778 Min. : 520.9 Min. : 25.02 Min. :1588
## 1st Qu.:4325 1st Qu.: 825.9 1st Qu.: 47.23 1st Qu.:4642
## Median :4929 Median :1363.6 Median : 70.80 Median :6650
## Mean :4813 Mean :1202.0 Mean : 73.63 Mean :5989
## 3rd Qu.:5448 3rd Qu.:1546.8 3rd Qu.:106.98 3rd Qu.:7623
## Max. :6779 Max. :1776.3 Max. :133.90 Max. :9881
class(ASX_data)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
#convert to ts
Total_ts <- ts(ASX_data, start = c(2004,1), frequency = 12)
class(Total_ts)
## [1] "mts" "ts" "matrix"
Extract each column from series and convert into monthly time series data
#Timeseries on each column
Asx_ts <- ts(ASX_data$`ASX price`, start = c(2004,1),frequency = 12)
Goldx_ts <- ts(ASX_data$`Gold price`, start = c(2004,1),frequency = 12)
crude_ts <- ts(ASX_data$`Crude Oil (Brent)_USD/bbl`, start = c(2004,1),frequency = 12)
copper_ts <- ts(ASX_data$`Copper_USD/tonne`, start = c(2004,1), frequency = 12)
Looking into the each plots, we will get clear idea about the nature of series
plot(Asx_ts,type = "o", ylab="Price index", xlab="Year", main = " Fig.1 Time series plot of ASX data prices from 2004 to 2018")
As seen in Above plot (figure 1)There is a upward trend, no seasonality, sudden drop thus change in variance is observed, Nature of series is autoregressive
plot(Goldx_ts,type = "o", main = "Fig.2 Timeseries plotting of Gold prices from 2004 to 2018", ylab="AUD", xlab="Year")
The plot above Figure 2 shows that there is a upward trend over the observed time,with no seasonality and no changing variance in the series. There are no intervention points found in the plot. The fluctuations around mean level succeeding indicate the series do have autoregressive and moving average behaviour.
plot(crude_ts, type="o", main = "Fig.3 Time series plot of crude oil prices", ylab ="USD", xlab="Year")
From the plot in Figure 3, we can observe that there is no trend in the series There is no seasonality and no changing variance present in the series. There is an intervention point around 2008. Successive observation points and fluctuations around mean level suggest the existence of autoregressive and moving average behaviour in the series.
plot(copper_ts, type = "o", main = " Fig.4 Time Series plot of Copper price", ylab = "USD", xlab="Year")
The plot in Figure 4 depicts a normal trend in the time series. The seasonality and changing variance is not observed in plotbut there is an intervention point observed around year 2008, same as in the all ordinaries, oil and copper price series.The fluctuations found on Successive points around mean level suggest autoregressive and moving average behaviour of the series.
Lets further explore the relationship between the all variables indexs and our independent series,within the same plot. an technique used called Standarisation , which helps centering and scaling to clearly plot them on the same scale.
Total_data_scale = scale(Total_ts)
plot(Total_data_scale, plot.type = "s" , col = c("Red","blue", "Green", "black"),main="Time series plot of Scaled ASX total data")
legend("bottomright",lty=1, text.width = 3, col=c("Red","blue", "Green", "black"), c("ASX", "Gold", "Oil", "Copper"))
cor(Total_ts)
## ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price 1.0000000 0.3431908 0.3290338
## Gold price 0.3431908 1.0000000 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338 0.4366382 1.0000000
## Copper_USD/tonne 0.5617864 0.5364213 0.8664296
## Copper_USD/tonne
## ASX price 0.5617864
## Gold price 0.5364213
## Crude Oil (Brent)_USD/bbl 0.8664296
## Copper_USD/tonne 1.0000000
From the observations madw it is clear that there is a moderate positive correlation between the dependent series all ords index and copper price series with a correlation coefficient r = 0.5617. A strong positive correlation is also observed between oil and copper price series which supports the inference from the plot in Figure 5 (r = 0.8664).
#Analysing the non stationarity of data -to find the suitable lag and find stationarity unit root test is performed ###Augmented dicky fuller test and PP test
par(mfrow=c(1,2))
acf(Asx_ts, main = "Fig.6 ACF for the ASX series",cex.main=0.65)
pacf(Asx_ts, main = "PACF of ASX",cex.main=0.05)
par(mfrow=c(1,1))
ar(Asx_ts)
##
## Call:
## ar(x = Asx_ts)
##
## Coefficients:
## 1 2
## 1.0726 -0.1208
##
## Order selected 2 sigma^2 estimated as 67176
From the above result:order selected=2
adf.test(Asx_ts,k = 2)
##
## Augmented Dickey-Fuller Test
##
## data: Asx_ts
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary
Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.
PP.test(Asx_ts)
##
## Phillips-Perron Unit Root Test
##
## data: Asx_ts
## Dickey-Fuller = -2.2284, Truncation lag parameter = 4, p-value = 0.4811
According to PP test,p value is greater than 5% thus asx_ts series is non stationarity and alternate hypothesis is stationary.
par(mfrow=c(1,2))
acf(Goldx_ts, main = "Fig.7 ACF for the gold series",cex.main=0.65)
pacf(Goldx_ts, main = "PACF of gold",cex.main=0.05)
par(mfrow=c(1,1))
From the above acf, we can say no seasonality is found in acf, wheras the plot is declining downwards
ar(Goldx_ts)
##
## Call:
## ar(x = Goldx_ts)
##
## Coefficients:
## 1
## 0.9803
##
## Order selected 1 sigma^2 estimated as 6329
From the above result the order selected is 1
adf.test(Goldx_ts,k = 1)
##
## Augmented Dickey-Fuller Test
##
## data: Goldx_ts
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary
Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.
PP.test(Goldx_ts)
##
## Phillips-Perron Unit Root Test
##
## data: Goldx_ts
## Dickey-Fuller = -2.1044, Truncation lag parameter = 4, p-value = 0.5328
According to PP test,p value is greater than 5% thus Gold_ts series is non stationarity and alternate hypothesis is stationary.
par(mfrow=c(1,2))
acf(crude_ts, main = "Fig.8 ACF for the crude oil series",cex.main=0.65)
pacf(crude_ts, main = "PACF of crude oil",cex.main=0.05)
par(mfrow=c(1,1))
ar(crude_ts)
##
## Call:
## ar(x = crude_ts)
##
## Coefficients:
## 1 2 3
## 1.2243 -0.1493 -0.1186
##
## Order selected 3 sigma^2 estimated as 48.96
From the above result the order selected=3
adf.test(crude_ts,k = 3)
##
## Augmented Dickey-Fuller Test
##
## data: crude_ts
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary
Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.
PP.test(crude_ts)
##
## Phillips-Perron Unit Root Test
##
## data: crude_ts
## Dickey-Fuller = -1.7567, Truncation lag parameter = 4, p-value = 0.6778
According to PP test,p value is greater than 5% thus crude_ts series is non stationarity and alternate hypothesis is stationary
par(mfrow=c(1,2))
acf(copper_ts, main = "Fig.9 ACF for the copper series",cex.main=0.65)
pacf(copper_ts, main = "PACF of copper",cex.main=0.05)
par(mfrow=c(1,1))
ar(copper_ts)
##
## Call:
## ar(x = copper_ts)
##
## Coefficients:
## 1 2
## 1.1756 -0.2228
##
## Order selected 2 sigma^2 estimated as 331275
order selected=2
adf.test(copper_ts,k = 2)
##
## Augmented Dickey-Fuller Test
##
## data: copper_ts
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary
Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.
PP.test(copper_ts)
##
## Phillips-Perron Unit Root Test
##
## data: copper_ts
## Dickey-Fuller = -2.0718, Truncation lag parameter = 4, p-value = 0.5464
According to PP test,p value is greater than 5% thus copper_ts series is non stationarity and alternate hypothesis is stationary.
*Takeway points: On observing above plots precisely,the slowly decaying pattern of significant lags in the sample ACF plots in above Figures 6-9 is observed, we can reach on a conlcusion that all explored series have a particular trend. The ADF test reports shows p-values > 0.05 for all the series, so we fail to reject the H0 that the series are nonstationary at 5% level. Thus , from a descriptive analysis of time series plots, ACF & PACF plots and the ADF, PP test results, we can observe that the loaded data has non stationarity series.
Basically, there are three main decomposition methods for time series. * The most basic one is the classical decomposition, which provides the basis for other decomposition methods. The X-12-ARIMA decomposition is another decomposition which is more complex than the classical decomposition. It is mostly used for quarterly and monthly data.
One of the very robust and commonly used decomposition methods is the Seasonal and Trend decomposition using Loss (STL) decomposition). When a time series is displayed, it includes trend and seasonal effect in a confounded way.hence, it would be very difficult to infer about the main characteristics of the series under the effect of seasonality. Therefore, we use time series decomposition to extract eachcomponent from the series and adjust the series for various effects like seasonality.
stl_1 = stl(Asx_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_1, main = "Fig.10 STL decomposition of all price index series")
fitm1 = x12(Asx_ts)
plot(fitm1,sa=T, trend = T, main=" Fig.10 X12 decomposition of all asx price index series")
On observing the remainder graph of the all ords series in Figure 10, we can say the spikes in raw data are caused other factors, they happen around the intervention point.There were no seasonality found in time series nor in the acf, pacf plots.Thus the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series.
stl_2 = stl(Goldx_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_2, main = "Fig.11 STL decomposition of gold price index series")
fitm2 = x12(Goldx_ts)
plot(fitm2,sa=T, trend = T, main="Fig.11 X12 decomposition of all asx price index series")
On observing figure 11,we found there was no seasonal effect in the gold price series at the data visualisation stage, thus the seasonally adjusted data in X12 decomposition is almost replicating to the original series, and the seasonal pattern in STL decomposition is meaningless (Figure 11). The remainder component of this series is not smooth at all,which simply means there are certain other unknown factors impacting series.
stl_3 = stl(crude_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_3, main = "Fig.12 STL decomposition of oil price index series")
fitm3 = x12(crude_ts)
plot(fitm3,sa=T, trend = T, main="Fig.12 X12 decomposition of all asx price index series")
The seasonality observed in here is not much of different with seasonality as the previously explored series. From the STL decomposition in Figure 12,it is observed that the remainder has a major peak when the intervention happened, but otherwise is rather smooth.
stl_4 = stl(copper_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_2, main = "Fig.13 STL decomposition of copper price index series")
fitm4 = x12(copper_ts)
plot(fitm4,sa=T, trend = T, main="Fig.13 X12 decomposition of all copper price index series")
Perhaps ,it can be observed from the original series, the pattern of seasonality changes after 2004. In addition, seasonally adjusted series has a different characteristic after this date. This clearly states that there are some other factors affecting the series apart from the seasonal effect. Since we did not find any evidence of seasonality from the time series and ACF plots, the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series. The conclusions that we can make from the decomposition of the copper price data are not much of different the other previously analysed series. * There are shaky patterns around the intervention in the remainder of the series (Figure 13). * There was no seasonality found prior to decomposition, so there is no seasonal effect here.
To find the best appropriate acurate model for the all ordinaries price index is a tedious job , yet most important task. * For the finite and autoregressive distributed lag models we use variables gold and copper series. * These predictors were chosen based on their correlation with each other and with the dependent variable. * Because oil and copper price series were found to be strongly correlated, we do not use them both for modelling. * The copper price series is used for polynomial DLM and Koyck model because it has a stronger correlation with the independent series and is more likely to explain its variation and correlation structure.
dataf = ASX_data
colnames(dataf) <- c("y", "x1", "x2", "x3")
for ( i in 1:10){
model1.1 = dlm(formula = y ~ x1 + x3, data = data.frame(dataf), q = i )
cat("q = ", i, "AIC = ", AIC(model1.1$model), "BIC = ", BIC(model1.1$model),"\n")
}
## q = 1 AIC = 2577.989 BIC = 2596.44
## q = 2 AIC = 2564.871 BIC = 2589.423
## q = 3 AIC = 2551.583 BIC = 2582.209
## q = 4 AIC = 2538.084 BIC = 2574.759
## q = 5 AIC = 2523.864 BIC = 2566.562
## q = 6 AIC = 2509.866 BIC = 2558.561
## q = 7 AIC = 2495.447 BIC = 2550.112
## q = 8 AIC = 2481.403 BIC = 2542.012
## q = 9 AIC = 2467.268 BIC = 2533.793
## q = 10 AIC = 2454.097 BIC = 2526.512
model1 = dlm(formula = y ~ x1+x3, data = data.frame(dataf), q=10)
summary(model1)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1086.02 -639.88 61.36 543.00 1626.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.840e+03 2.555e+02 15.033 <2e-16 ***
## x1.t -6.761e-01 1.266e+00 -0.534 0.594
## x1.1 3.617e-01 1.839e+00 0.197 0.844
## x1.2 1.218e-01 1.878e+00 0.065 0.948
## x1.3 7.260e-02 1.894e+00 0.038 0.969
## x1.4 -2.534e-01 1.893e+00 -0.134 0.894
## x1.5 4.528e-01 1.887e+00 0.240 0.811
## x1.6 -2.478e-01 1.879e+00 -0.132 0.895
## x1.7 5.263e-01 1.892e+00 0.278 0.781
## x1.8 -3.294e-01 1.889e+00 -0.174 0.862
## x1.9 9.934e-02 1.863e+00 0.053 0.958
## x1.10 1.881e-01 1.243e+00 0.151 0.880
## x3.t 1.504e-01 1.418e-01 1.061 0.291
## x3.1 2.232e-02 2.301e-01 0.097 0.923
## x3.2 2.992e-02 2.315e-01 0.129 0.897
## x3.3 2.537e-02 2.275e-01 0.112 0.911
## x3.4 1.595e-02 2.269e-01 0.070 0.944
## x3.5 -2.686e-02 2.334e-01 -0.115 0.909
## x3.6 2.257e-02 2.335e-01 0.097 0.923
## x3.7 -7.365e-04 2.347e-01 -0.003 0.998
## x3.8 -1.282e-03 2.384e-01 -0.005 0.996
## x3.9 -4.984e-02 2.345e-01 -0.213 0.832
## x3.10 -7.532e-02 1.470e-01 -0.512 0.609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 758.2 on 128 degrees of freedom
## Multiple R-squared: 0.2145, Adjusted R-squared: 0.07954
## F-statistic: 1.589 on 22 and 128 DF, p-value: 0.05855
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2454.097 2526.512
residualcheck=function(x){
shapiro.test(x$residuals)
}
residualcheck(model1$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94326, p-value = 8.803e-06
checkresiduals(model1$model)
##
## Breusch-Godfrey test for serial correlation of order up to 26
##
## data: Residuals
## LM test = 140.85, df = 26, p-value < 2.2e-16
VIF_m1 = vif(model1$model)
#If the value of VIF is greater than 10, we canconclude that the effect of multicollinearity is high.
VIF_m1 > 10
## x1.t x1.1 x1.2 x1.3 x1.4 x1.5 x1.6 x1.7 x1.8 x1.9 x1.10 x3.t x3.1
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## x3.2 x3.3 x3.4 x3.5 x3.6 x3.7 x3.8 x3.9 x3.10
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
finiteDLMauto(x= as.vector(dataf$x3), y= as.vector(dataf$y),q.min = 1,q.max =10, k.order =1, model.type ="poly", error.type="AIC", trace= TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466 5.05005 0.84313 0.17549 0
## 9 9 - 1 4.00168 2435.953 2448.048 4.99410 -5.37833 0.19025 0
## 8 8 - 1 4.03309 2453.342 2465.464 5.05381 0.55608 0.20244 0
## 7 7 - 1 4.07118 2470.649 2482.797 5.10793 0.61094 0.21326 0
## 6 6 - 1 4.09428 2488.104 2500.277 5.18334 2.74265 0.22488 0
## 5 5 - 1 4.11367 2505.646 2517.846 5.08586 0.76203 0.23776 0
## 4 4 - 1 4.15203 2523.100 2535.325 5.05822 -0.05088 0.25039 0
## 3 3 - 1 4.19205 2540.208 2552.459 5.15270 0.42877 0.26371 0
## 2 2 - 1 4.21449 2557.367 2569.643 5.30198 0.73653 0.27856 0
## 1 1 - 1 4.24262 2574.488 2586.789 5.38993 -0.49453 0.29391 0
From the VIF values, it is obvious that the estimates of the finite DLM coefficients are suffering from the multicollinearity. * To deal with this issue, we can use the restricted leastsquares method to find parameter estimates. * In this approach, some restrictions are placed on the model parameters to reduce the variances of the estimators. * In the context of DLMs,we translate the pattern of time effects into the restrictions on parameters. * In the nextsection, we will use polynomial curves to restrict lag weights. * According to the significance tests of model coefficients obtained from the summary, all lag weights of predictors are not statistically significant at 5% level. Following this inference, the adjusted R2 is reported to be about 8% which is very low. F-test of the overall significance of the model reports the model is not statistically significant at 5% level (p-value > 0.05). Therefore, we can conclude that the model is not a good fit to the data.
Result: VIF values are reported > 10 so the effect of multicollinearity is high. The residual check function was created to apply a diagnostic check in a dynamic way. It displays residual analysis plots as well as performs the Breusch-Godfrey test of serial correlation and the Shapiro-Wilk normality test of the residuals.
Overall, we can conclude that the finite DLM of lag 10 is not appropriate for further analysis.
To reduce the harmful effect of multicollinearity,we will impose a polynomial shape on the lag distribution. To deal with multicollinearity problem in the finite DLM, we will attempt to use polynomial curves to restrict lag weights. We specify the optimal lag length using a function that fits finite DLMs for a range of lag lengths from 1 to 10 and orders the fitted models according to their AIC values.
The following code applies the polynomial DLM with all ordinaries price index and copper price series as well as performs residual analysis.
model2 =polyDlm(x=as.vector(dataf$x3),y= as.vector(dataf$y),q=10,k=1, show.beta = T)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.07360 0.01440 5.120 1.01e-06
## beta.1 0.06140 0.01170 5.240 5.82e-07
## beta.2 0.04920 0.00911 5.410 2.67e-07
## beta.3 0.03710 0.00658 5.640 8.95e-08
## beta.4 0.02490 0.00428 5.820 3.74e-08
## beta.5 0.01280 0.00287 4.450 1.73e-05
## beta.6 0.00062 0.00360 0.172 8.64e-01
## beta.7 -0.01150 0.00570 -2.020 4.48e-02
## beta.8 -0.02370 0.00817 -2.900 4.35e-03
## beta.9 -0.03580 0.01080 -3.330 1.11e-03
## beta.10 -0.04800 0.01340 -3.580 4.74e-04
summary(model2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1214.62 -619.46 25.16 588.44 1479.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.034e+03 2.040e+02 19.775 < 2e-16 ***
## z.t0 7.356e-02 1.438e-02 5.115 9.57e-07 ***
## z.t1 -1.216e-02 2.721e-03 -4.467 1.56e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 717.6 on 148 degrees of freedom
## Multiple R-squared: 0.1865, Adjusted R-squared: 0.1755
## F-statistic: 16.96 on 2 and 148 DF, p-value: 2.329e-07
vif(model2$model)
## z.t0 z.t1
## 25.86069 25.86069
residualcheck(model2$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94724, p-value = 1.828e-05
checkresiduals(model2$model)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 138.14, df = 10, p-value < 2.2e-16
Likewise, performed in the finite DLM fitting, the lowest AIC and BIC measures in the given range are for q = 10. We set the order of polynomial to 1 as it minimises the information criteria. According to the model summary, all lag weights are significant at 5% level except lag 6 (p-value > 0.05). The adjusted R2 = 17.5% is slightly better than the finite DLM but still very low. The overall significance test reports the model is statistically significant at 5% level.
VIF values are > 10 and suggest there is still multicollinearity effect on this model.
On have a Diagnostic checking in Figure above we realise that the residuals are not randomly spread. There are a lot of highly significant lags in the ACF plot, so there is autocorrelation present in the residuals. That is also supported by Beusch-Godfrey test at 5% level of significance. The normality of the residuals is also violated, as observed from the histogram and Shapiro-Wilk normality test report (p-value < 0.05).
So We conclude that the polynomial DLM of lag 10 is not appropriate for further analysis.
One way to deal with this infinite DLM is to use Koyck transformation The following code implements Koyck DLM with the all ordinaries price index and copper price series as well as performs diagnostic checking.
model3 = koyckDlm(x=as.vector(dataf$x3), y= as.vector(dataf$y))
summary(model3$model, diagnostics =T)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 1966.87 < 2e-16 ***
## Wu-Hausman 1 156 10.97 0.00115 **
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
vif(model3$model)
## Y.1 X.t
## 1.493966 1.493966
residualcheck(model3$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.96188, p-value = 0.0002204
checkresiduals(model3$model)
attr(model3$model, "class") ="lm"
AIC(model3$model)
## [1] 2157.456
The AIC measure is reported to be 2157.5 which is lower than the finite and polynomial DLMs. From the residual analysis in Figure 16, it is observed that the errors are spread randomly as desired. There are no significant lags in the ACF plot which suggests there is no serial correlation in the residuals. However, the error terms are not perfectly normal. The histogram of the residuals seems left-skewed, and the Shapiro-Wilk normality test suggests not normal residuals (p-value < 0.05).
Autoregressive DLMs are useful when we cannot find suitable solutions with neither polynomial nor Kyock DLMs.Actually, the autoregressive DLM is a flexible andparsimonious infinite DLM.
Before we fit the model we need to specify parameterrs p,q of ARDL(p,q), thus we use a loop that fits autoregressive DLMs for a range of lag lengths and orders of the AR process and fit the models that minimise the information criteria. * Based on the information criteria, we select the following models: #ARDL(1,5) #ARDL(2,5) #ARDL(3,5)
for(i in 1:5){
for (j in 1:5) {
model_4 = ardlDlm(formula = y ~ x1 +x3, data = data.frame(dataf), p= i, q=j)
cat("p= ", i, "q= ", j,"AIC =", AIC(model_4$model), "BIC =", BIC(model_4$model),"\n")
}
}
## p= 1 q= 1 AIC = 2135.427 BIC = 2156.954
## p= 1 q= 2 AIC = 2123.2 BIC = 2147.752
## p= 1 q= 3 AIC = 2108.165 BIC = 2135.728
## p= 1 q= 4 AIC = 2097.165 BIC = 2127.728
## p= 1 q= 5 AIC = 2086.767 BIC = 2120.315
## p= 2 q= 1 AIC = 2119.917 BIC = 2147.537
## p= 2 q= 2 AIC = 2121.523 BIC = 2152.212
## p= 2 q= 3 AIC = 2107.82 BIC = 2141.508
## p= 2 q= 4 AIC = 2096.569 BIC = 2133.244
## p= 2 q= 5 AIC = 2086.252 BIC = 2125.9
## p= 3 q= 1 AIC = 2109.717 BIC = 2143.406
## p= 3 q= 2 AIC = 2110.906 BIC = 2147.657
## p= 3 q= 3 AIC = 2109.982 BIC = 2149.795
## p= 3 q= 4 AIC = 2098.603 BIC = 2141.39
## p= 3 q= 5 AIC = 2088.342 BIC = 2134.09
## p= 4 q= 1 AIC = 2100.046 BIC = 2139.777
## p= 4 q= 2 AIC = 2101.141 BIC = 2143.929
## p= 4 q= 3 AIC = 2100.586 BIC = 2146.43
## p= 4 q= 4 AIC = 2102.043 BIC = 2150.943
## p= 4 q= 5 AIC = 2091.733 BIC = 2143.58
## p= 5 q= 1 AIC = 2089.626 BIC = 2135.374
## p= 5 q= 2 AIC = 2090.472 BIC = 2139.27
## p= 5 q= 3 AIC = 2089.878 BIC = 2141.726
## p= 5 q= 4 AIC = 2091.428 BIC = 2146.326
## p= 5 q= 5 AIC = 2092.86 BIC = 2150.807
#for p=1, q=5
model4.1 = ardlDlm(formula = y ~ x1 +x3, data = data.frame(dataf),p=1 ,q =5)$model
summary(model4.1)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -608.69 -108.99 -1.87 96.72 697.71
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 204.195354 91.833430 2.224 0.027715 *
## x1.t -1.168101 0.298139 -3.918 0.000137 ***
## x1.1 1.175789 0.293809 4.002 9.96e-05 ***
## x3.t 0.094982 0.032049 2.964 0.003551 **
## x3.1 -0.096433 0.032189 -2.996 0.003216 **
## y.1 0.932523 0.078420 11.891 < 2e-16 ***
## y.2 0.184956 0.111691 1.656 0.099876 .
## y.3 -0.094300 0.109515 -0.861 0.390613
## y.4 -0.063908 0.109287 -0.585 0.559605
## y.5 0.002755 0.077303 0.036 0.971618
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 187.2 on 146 degrees of freedom
## Multiple R-squared: 0.9531, Adjusted R-squared: 0.9502
## F-statistic: 329.5 on 9 and 146 DF, p-value: < 2.2e-16
residualcheck(model4.1)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98623, p-value = 0.1257
checkresiduals(model4.1)
##
## Breusch-Godfrey test for serial correlation of order up to 13
##
## data: Residuals
## LM test = 11.315, df = 13, p-value = 0.5844
vif(model4.1)
## x1 L(x1, 1) x3 L(x3, 1) L(y, 1) L(y, 2) L(y, 3) L(y, 4)
## 60.17390 59.02863 18.45786 19.20054 19.61994 40.62230 39.99994 40.92965
## L(y, 5)
## 20.93150
#for p=2, q=5
model4.2 = ardlDlm(formula = y ~ x1+x3, data = data.frame(dataf), p =2, q=5)$model
summary(model4.2)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -590.34 -108.63 0.29 93.85 705.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.953e+02 9.127e+01 2.139 0.034081 *
## x1.t -1.148e+00 2.966e-01 -3.868 0.000165 ***
## x1.1 1.270e+00 4.357e-01 2.915 0.004131 **
## x1.2 -1.007e-01 3.081e-01 -0.327 0.744282
## x3.t 7.565e-02 3.355e-02 2.255 0.025665 *
## x3.1 -1.200e-02 5.335e-02 -0.225 0.822345
## x3.2 -6.907e-02 3.449e-02 -2.003 0.047097 *
## y.1 9.092e-01 8.324e-02 10.923 < 2e-16 ***
## y.2 1.866e-01 1.113e-01 1.677 0.095794 .
## y.3 -5.901e-02 1.129e-01 -0.523 0.601953
## y.4 -7.267e-02 1.087e-01 -0.668 0.505049
## y.5 9.629e-04 7.673e-02 0.013 0.990005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared: 0.9544, Adjusted R-squared: 0.9509
## F-statistic: 274.1 on 11 and 144 DF, p-value: < 2.2e-16
residualcheck(model4.2)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98426, p-value = 0.07341
checkresiduals(model4.2)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 14.041, df = 15, p-value = 0.5224
vif(model4.2)
## x1 L(x1, 1) L(x1, 2) x3 L(x3, 1) L(x3, 2) L(y, 1) L(y, 2)
## 60.48199 131.76923 66.52833 20.53927 53.54225 23.07222 22.44197 40.96126
## L(y, 3) L(y, 4) L(y, 5)
## 43.14764 41.14631 20.93494
#for p=3, q=5
model4.3 = ardlDlm(formula = y ~ x1+x3, data = data.frame(dataf), p =2, q=5)$model
summary(model4.3)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -590.34 -108.63 0.29 93.85 705.46
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.953e+02 9.127e+01 2.139 0.034081 *
## x1.t -1.148e+00 2.966e-01 -3.868 0.000165 ***
## x1.1 1.270e+00 4.357e-01 2.915 0.004131 **
## x1.2 -1.007e-01 3.081e-01 -0.327 0.744282
## x3.t 7.565e-02 3.355e-02 2.255 0.025665 *
## x3.1 -1.200e-02 5.335e-02 -0.225 0.822345
## x3.2 -6.907e-02 3.449e-02 -2.003 0.047097 *
## y.1 9.092e-01 8.324e-02 10.923 < 2e-16 ***
## y.2 1.866e-01 1.113e-01 1.677 0.095794 .
## y.3 -5.901e-02 1.129e-01 -0.523 0.601953
## y.4 -7.267e-02 1.087e-01 -0.668 0.505049
## y.5 9.629e-04 7.673e-02 0.013 0.990005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared: 0.9544, Adjusted R-squared: 0.9509
## F-statistic: 274.1 on 11 and 144 DF, p-value: < 2.2e-16
residualcheck(model4.3)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98426, p-value = 0.07341
checkresiduals(model4.3)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 14.041, df = 15, p-value = 0.5224
vif(model4.3)
## x1 L(x1, 1) L(x1, 2) x3 L(x3, 1) L(x3, 2) L(y, 1) L(y, 2)
## 60.48199 131.76923 66.52833 20.53927 53.54225 23.07222 22.44197 40.96126
## L(y, 3) L(y, 4) L(y, 5)
## 43.14764 41.14631 20.93494
From the all the fitted ARDL models,we can say they were reported to be statistically significant at 5% level. The best model in terms of the significance of coefficients, AIC and adjusted R2 was marginally ARDL(1,5). The Model summary we aquiredis that the all ords price index might be related to its previous year levels, gold prices of the previous year and copper prices of the previous year as well. The AIC (2086.8) and adjusted R2 (95%) values are the best compared to those of all previously fitted models.
However,it is seen that all ARDL models fitted have multicollinearity effects with VIFs > 10.
While working on investigation of the ASX data, we firstly explored it by visualizing the plots to understand the nature of series variables.. The series given in the data were described separately before analysing the relationships between them. Around 2008,a clear intervention was observed in the dataset which is probably connected with the financial up-downs of that year. The series of interest all ords price index was found to have a moderate correlation with the copper price series.in addition there was (0.87)hih correlation wrt crude oil as well.
Further we went on performing analysis of non stationarity by plotting sample ACFs and running ADF tests on the series. After the performed analysis, it was found that there is non stationarity existent in the data as all series have a trend.
The next step to explore the series was followed by decomposition to explore the impact of the components of time series data on the given dataset. On performing all tests and plots we found that there was no seasonal effect in the dataset prior to decomposition, thus we made a conclusion that there were some other external factors influencing the data.
The most Important step was to select the Model.The data was fiited through four models namely Finit DLM, Polynomial DLM, Koyck model and Autoregressive DLM.Among all,the most accurate and suitable distributed lag model for the ASX price index was found, that is, Koyck model is the most appropriate for the given data. The model chosen had a high explainability, the diagnostic checking showed there were no major problems in the residuals. Koyck model did not have a multicollinearity effect unlike other promising models like ARDLs.
In conclusion, based on the analysis performed, we can suggest Koyck model as the most suitable for further use and forecasting.