Introduction

The aim of the investigation is to perform the analysis of the Australian Securities Exchange (ASX) data and observe the results to reach upon conclusions

using various techniques, we will analyse the nature of series.
investigate about Stationarity/ non stationarity of data set
perform transformation and decomposition
explore the various model fit to find accurate model for ASX price index

About Dataset

The dataset used ASX.csv includes monthly averages of ASX All Ordinaries (Ords) Price Index, Gold price (AUD), Crude Oil (Brent, USD/bbl) and Copper (USD/tonne) starting from January 2004.

#Load the dataset
ASX_data <- read_csv("D:/Rmit/Sem4/Forecasting/ASX_data.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   `ASX price` = col_double(),
##   `Gold price` = col_number(),
##   `Crude Oil (Brent)_USD/bbl` = col_double(),
##   `Copper_USD/tonne` = col_number()
## )

colnames(ASX_data)

## [1] "ASX price"                 "Gold price"               
## [3] "Crude Oil (Brent)_USD/bbl" "Copper_USD/tonne"

head(ASX_data,5)

## # A tibble: 5 x 4
##   `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
##         <dbl>        <dbl>                       <dbl>              <dbl>
## 1       2935.         612.                        31.3               1650
## 2       2778.         603.                        32.6               1682
## 3       2849.         566.                        30.3               1656
## 4       2971.         539.                        25.0               1588
## 5       2980.         549.                        25.8               1651

summary(ASX_data)

##    ASX price      Gold price     Crude Oil (Brent)_USD/bbl Copper_USD/tonne
##  Min.   :2778   Min.   : 520.9   Min.   : 25.02            Min.   :1588    
##  1st Qu.:4325   1st Qu.: 825.9   1st Qu.: 47.23            1st Qu.:4642    
##  Median :4929   Median :1363.6   Median : 70.80            Median :6650    
##  Mean   :4813   Mean   :1202.0   Mean   : 73.63            Mean   :5989    
##  3rd Qu.:5448   3rd Qu.:1546.8   3rd Qu.:106.98            3rd Qu.:7623    
##  Max.   :6779   Max.   :1776.3   Max.   :133.90            Max.   :9881

class(ASX_data)

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

#convert to ts
Total_ts <- ts(ASX_data, start = c(2004,1), frequency = 12)
class(Total_ts)

## [1] "mts"    "ts"     "matrix"

Convert to Timeseries

Extract each column from series and convert into monthly time series data

#Timeseries on each column

Asx_ts <- ts(ASX_data$`ASX price`, start = c(2004,1),frequency = 12)
Goldx_ts <- ts(ASX_data$`Gold price`, start = c(2004,1),frequency = 12)
crude_ts <- ts(ASX_data$`Crude Oil (Brent)_USD/bbl`, start = c(2004,1),frequency = 12)
copper_ts <- ts(ASX_data$`Copper_USD/tonne`, start = c(2004,1), frequency = 12)

Data visualisation And exploration of variables

Looking into the each plots, we will get clear idea about the nature of series

plot(Asx_ts,type = "o", ylab="Price index", xlab="Year", main = " Fig.1 Time series plot of ASX data prices from 2004 to 2018")

As seen in Above plot (figure 1)There is a upward trend, no seasonality, sudden drop thus change in variance is observed, Nature of series is autoregressive

plot(Goldx_ts,type = "o", main = "Fig.2 Timeseries plotting of Gold prices from 2004 to 2018", ylab="AUD", xlab="Year")

The plot above Figure 2 shows that there is a upward trend over the observed time,with no seasonality and no changing variance in the series. There are no intervention points found in the plot. The fluctuations around mean level succeeding indicate the series do have autoregressive and moving average behaviour.

plot(crude_ts, type="o", main = "Fig.3 Time series plot of crude oil prices", ylab ="USD", xlab="Year")

From the plot in Figure 3, we can observe that there is no trend in the series There is no seasonality and no changing variance present in the series. There is an intervention point around 2008. Successive observation points and fluctuations around mean level suggest the existence of autoregressive and moving average behaviour in the series.

plot(copper_ts, type = "o", main = " Fig.4 Time Series plot of Copper price", ylab = "USD", xlab="Year")

The plot in Figure 4 depicts a normal trend in the time series. The seasonality and changing variance is not observed in plotbut there is an intervention point observed around year 2008, same as in the all ordinaries, oil and copper price series.The fluctuations found on Successive points around mean level suggest autoregressive and moving average behaviour of the series.

Lets further explore the relationship between the all variables indexs and our independent series,within the same plot. an technique used called Standarisation , which helps centering and scaling to clearly plot them on the same scale.

Scaling down to avoid mismatch

Total_data_scale = scale(Total_ts)
plot(Total_data_scale, plot.type = "s" , col = c("Red","blue", "Green", "black"),main="Time series plot of Scaled ASX total data")
legend("bottomright",lty=1, text.width = 3, col=c("Red","blue", "Green", "black"), c("ASX", "Gold", "Oil", "Copper"))

Correlation factor

Find the correlation between them

cor(Total_ts)

##                           ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price                 1.0000000  0.3431908                 0.3290338
## Gold price                0.3431908  1.0000000                 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338  0.4366382                 1.0000000
## Copper_USD/tonne          0.5617864  0.5364213                 0.8664296
##                           Copper_USD/tonne
## ASX price                        0.5617864
## Gold price                       0.5364213
## Crude Oil (Brent)_USD/bbl        0.8664296
## Copper_USD/tonne                 1.0000000

From the observations madw it is clear that there is a moderate positive correlation between the dependent series all ords index and copper price series with a correlation coefficient r = 0.5617. A strong positive correlation is also observed between oil and copper price series which supports the inference from the plot in Figure 5 (r = 0.8664).

#Analysing the non stationarity of data -to find the suitable lag and find stationarity unit root test is performed ###Augmented dicky fuller test and PP test

1) ASX all ords data

par(mfrow=c(1,2))
acf(Asx_ts, main = "Fig.6 ACF for the ASX series",cex.main=0.65)
pacf(Asx_ts, main = "PACF of ASX",cex.main=0.05)

par(mfrow=c(1,1))
ar(Asx_ts)

## 
## Call:
## ar(x = Asx_ts)
## 
## Coefficients:
##       1        2  
##  1.0726  -0.1208  
## 
## Order selected 2  sigma^2 estimated as  67176

From the above result:order selected=2

adf.test(Asx_ts,k = 2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Asx_ts
## Dickey-Fuller = -2.442, Lag order = 2, p-value = 0.392
## alternative hypothesis: stationary

Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.

PP.test(Asx_ts)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  Asx_ts
## Dickey-Fuller = -2.2284, Truncation lag parameter = 4, p-value = 0.4811

According to PP test,p value is greater than 5% thus asx_ts series is non stationarity and alternate hypothesis is stationary.

2) Gold data

par(mfrow=c(1,2))
acf(Goldx_ts, main = "Fig.7 ACF for the gold series",cex.main=0.65)
pacf(Goldx_ts, main = "PACF of gold",cex.main=0.05)

par(mfrow=c(1,1))

From the above acf, we can say no seasonality is found in acf, wheras the plot is declining downwards

ar(Goldx_ts)

## 
## Call:
## ar(x = Goldx_ts)
## 
## Coefficients:
##      1  
## 0.9803  
## 
## Order selected 1  sigma^2 estimated as  6329

From the above result the order selected is 1

adf.test(Goldx_ts,k = 1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Goldx_ts
## Dickey-Fuller = -2.2824, Lag order = 1, p-value = 0.4585
## alternative hypothesis: stationary

Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.

PP.test(Goldx_ts)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  Goldx_ts
## Dickey-Fuller = -2.1044, Truncation lag parameter = 4, p-value = 0.5328

According to PP test,p value is greater than 5% thus Gold_ts series is non stationarity and alternate hypothesis is stationary.

3) Crude oil data

par(mfrow=c(1,2))
acf(crude_ts, main = "Fig.8 ACF for the crude oil series",cex.main=0.65)
pacf(crude_ts, main = "PACF of crude oil",cex.main=0.05)

par(mfrow=c(1,1))
ar(crude_ts)

## 
## Call:
## ar(x = crude_ts)
## 
## Coefficients:
##       1        2        3  
##  1.2243  -0.1493  -0.1186  
## 
## Order selected 3  sigma^2 estimated as  48.96

From the above result the order selected=3

adf.test(crude_ts,k = 3)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  crude_ts
## Dickey-Fuller = -2.0703, Lag order = 3, p-value = 0.547
## alternative hypothesis: stationary

Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.

PP.test(crude_ts)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  crude_ts
## Dickey-Fuller = -1.7567, Truncation lag parameter = 4, p-value = 0.6778

According to PP test,p value is greater than 5% thus crude_ts series is non stationarity and alternate hypothesis is stationary

4) Copper data

par(mfrow=c(1,2))
acf(copper_ts, main = "Fig.9 ACF for the copper series",cex.main=0.65)
pacf(copper_ts, main = "PACF of copper",cex.main=0.05)

par(mfrow=c(1,1))
ar(copper_ts)

## 
## Call:
## ar(x = copper_ts)
## 
## Coefficients:
##       1        2  
##  1.1756  -0.2228  
## 
## Order selected 2  sigma^2 estimated as  331275

order selected=2

adf.test(copper_ts,k = 2)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  copper_ts
## Dickey-Fuller = -2.3847, Lag order = 2, p-value = 0.4159
## alternative hypothesis: stationary

Since the p value is greater than 0.05, we fail to reject the null hypothesis that implies stationarity.

PP.test(copper_ts)

## 
##  Phillips-Perron Unit Root Test
## 
## data:  copper_ts
## Dickey-Fuller = -2.0718, Truncation lag parameter = 4, p-value = 0.5464

According to PP test,p value is greater than 5% thus copper_ts series is non stationarity and alternate hypothesis is stationary.

*Takeway points: On observing above plots precisely,the slowly decaying pattern of significant lags in the sample ACF plots in above Figures 6-9 is observed, we can reach on a conlcusion that all explored series have a particular trend. The ADF test reports shows p-values > 0.05 for all the series, so we fail to reject the H0 that the series are nonstationary at 5% level. Thus , from a descriptive analysis of time series plots, ACF & PACF plots and the ADF, PP test results, we can observe that the loaded data has non stationarity series.

Decomposition of time series

Decomposition of time series is executed into different components is helpful to observe the effects of the existing indivdual components and historical effects occurred in the past. The components that a time series can be decomposed into are
1 seasonal
2 trend and
3 remainder , which includes other effects that are not captured by the seasonal and trend components.

Basically, there are three main decomposition methods for time series. * The most basic one is the classical decomposition, which provides the basis for other decomposition methods. The X-12-ARIMA decomposition is another decomposition which is more complex than the classical decomposition. It is mostly used for quarterly and monthly data.

One of the very robust and commonly used decomposition methods is the Seasonal and Trend decomposition using Loss (STL) decomposition). When a time series is displayed, it includes trend and seasonal effect in a confounded way.hence, it would be very difficult to infer about the main characteristics of the series under the effect of seasonality. Therefore, we use time series decomposition to extract eachcomponent from the series and adjust the series for various effects like seasonality.

STL decomposition and x12 decomposition

STL handles any type of seasonality, whiles others are somewhat limited to onlymonthly and/or quarterly series.
The seasonal component can change by the time and the rate of change can be controlled by the user.
The smoothness of the trend-cycle can also be controlled by the user.
We can make it robust to outliers by sending the effect of occasional unusualobservations to the remainder component.

stl_1 = stl(Asx_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_1, main = "Fig.10 STL decomposition of all price index series")

fitm1 = x12(Asx_ts)
plot(fitm1,sa=T, trend = T, main=" Fig.10 X12 decomposition of all asx price index series")

On observing the remainder graph of the all ords series in Figure 10, we can say the spikes in raw data are caused other factors, they happen around the intervention point.There were no seasonality found in time series nor in the acf, pacf plots.Thus the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series.

stl_2 = stl(Goldx_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_2, main = "Fig.11 STL decomposition of gold price index series")

fitm2 = x12(Goldx_ts)
plot(fitm2,sa=T, trend = T, main="Fig.11 X12 decomposition of all asx price index series")

On observing figure 11,we found there was no seasonal effect in the gold price series at the data visualisation stage, thus the seasonally adjusted data in X12 decomposition is almost replicating to the original series, and the seasonal pattern in STL decomposition is meaningless (Figure 11). The remainder component of this series is not smooth at all,which simply means there are certain other unknown factors impacting series.

stl_3 = stl(crude_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_3, main = "Fig.12 STL decomposition of oil price index series")

fitm3 = x12(crude_ts)
plot(fitm3,sa=T, trend = T, main="Fig.12 X12 decomposition of all asx price index series")

The seasonality observed in here is not much of different with seasonality as the previously explored series. From the STL decomposition in Figure 12,it is observed that the remainder has a major peak when the intervention happened, but otherwise is rather smooth.

stl_4 = stl(copper_ts,t.window = 15, s.window = "periodic", robust = T)
plot(stl_2, main = "Fig.13 STL decomposition of copper price index series")

fitm4 = x12(copper_ts)
plot(fitm4,sa=T, trend = T, main="Fig.13 X12 decomposition of all copper price index series")

Perhaps ,it can be observed from the original series, the pattern of seasonality changes after 2004. In addition, seasonally adjusted series has a different characteristic after this date. This clearly states that there are some other factors affecting the series apart from the seasonal effect. Since we did not find any evidence of seasonality from the time series and ACF plots, the seasonal pattern from STL decomposition is not meaningful. From the X12 decomposition we can observe that the seasonally adjusted data is very close to the original series. The conclusions that we can make from the decomposition of the copper price data are not much of different the other previously analysed series. * There are shaky patterns around the intervention in the remainder of the series (Figure 13). * There was no seasonality found prior to decomposition, so there is no seasonal effect here.

Overall we can say, there is no seasonality effect on the ASX data provided, and all the fluctuations are due to other external factors.

Modelling process:

To find the best appropriate acurate model for the all ordinaries price index is a tedious job , yet most important task. * For the finite and autoregressive distributed lag models we use variables gold and copper series. * These predictors were chosen based on their correlation with each other and with the dependent variable. * Because oil and copper price series were found to be strongly correlated, we do not use them both for modelling. * The copper price series is used for polynomial DLM and Koyck model because it has a stronger correlation with the independent series and is more likely to explain its variation and correlation structure.

1) Finite DLM

dataf = ASX_data
colnames(dataf) <- c("y", "x1", "x2", "x3")
for ( i in 1:10){
  model1.1 = dlm(formula = y ~ x1 + x3, data = data.frame(dataf), q = i )
  cat("q = ", i, "AIC = ", AIC(model1.1$model), "BIC = ", BIC(model1.1$model),"\n")
}

## q =  1 AIC =  2577.989 BIC =  2596.44 
## q =  2 AIC =  2564.871 BIC =  2589.423 
## q =  3 AIC =  2551.583 BIC =  2582.209 
## q =  4 AIC =  2538.084 BIC =  2574.759 
## q =  5 AIC =  2523.864 BIC =  2566.562 
## q =  6 AIC =  2509.866 BIC =  2558.561 
## q =  7 AIC =  2495.447 BIC =  2550.112 
## q =  8 AIC =  2481.403 BIC =  2542.012 
## q =  9 AIC =  2467.268 BIC =  2533.793 
## q =  10 AIC =  2454.097 BIC =  2526.512

model1 = dlm(formula = y ~ x1+x3, data = data.frame(dataf), q=10)
summary(model1)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1086.02  -639.88    61.36   543.00  1626.62 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.840e+03  2.555e+02  15.033   <2e-16 ***
## x1.t        -6.761e-01  1.266e+00  -0.534    0.594    
## x1.1         3.617e-01  1.839e+00   0.197    0.844    
## x1.2         1.218e-01  1.878e+00   0.065    0.948    
## x1.3         7.260e-02  1.894e+00   0.038    0.969    
## x1.4        -2.534e-01  1.893e+00  -0.134    0.894    
## x1.5         4.528e-01  1.887e+00   0.240    0.811    
## x1.6        -2.478e-01  1.879e+00  -0.132    0.895    
## x1.7         5.263e-01  1.892e+00   0.278    0.781    
## x1.8        -3.294e-01  1.889e+00  -0.174    0.862    
## x1.9         9.934e-02  1.863e+00   0.053    0.958    
## x1.10        1.881e-01  1.243e+00   0.151    0.880    
## x3.t         1.504e-01  1.418e-01   1.061    0.291    
## x3.1         2.232e-02  2.301e-01   0.097    0.923    
## x3.2         2.992e-02  2.315e-01   0.129    0.897    
## x3.3         2.537e-02  2.275e-01   0.112    0.911    
## x3.4         1.595e-02  2.269e-01   0.070    0.944    
## x3.5        -2.686e-02  2.334e-01  -0.115    0.909    
## x3.6         2.257e-02  2.335e-01   0.097    0.923    
## x3.7        -7.365e-04  2.347e-01  -0.003    0.998    
## x3.8        -1.282e-03  2.384e-01  -0.005    0.996    
## x3.9        -4.984e-02  2.345e-01  -0.213    0.832    
## x3.10       -7.532e-02  1.470e-01  -0.512    0.609    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 758.2 on 128 degrees of freedom
## Multiple R-squared:  0.2145, Adjusted R-squared:  0.07954 
## F-statistic: 1.589 on 22 and 128 DF,  p-value: 0.05855
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2454.097 2526.512

residualcheck=function(x){
  shapiro.test(x$residuals)
}
residualcheck(model1$model)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.94326, p-value = 8.803e-06

checkresiduals(model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 26
## 
## data:  Residuals
## LM test = 140.85, df = 26, p-value < 2.2e-16

VIF_m1 = vif(model1$model)
#If the value of VIF is greater than 10, we canconclude that the effect of multicollinearity is high.
VIF_m1 > 10

##  x1.t  x1.1  x1.2  x1.3  x1.4  x1.5  x1.6  x1.7  x1.8  x1.9 x1.10  x3.t  x3.1 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##  x3.2  x3.3  x3.4  x3.5  x3.6  x3.7  x3.8  x3.9 x3.10 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

finiteDLMauto(x= as.vector(dataf$x3), y= as.vector(dataf$y),q.min = 1,q.max =10, k.order =1, model.type ="poly", error.type="AIC", trace= TRUE)

##     q - k    MASE      AIC      BIC   GMRAE    MBRAE R.Adj.Sq Ljung-Box
## 10 10 - 1 3.96781 2419.397 2431.466 5.05005  0.84313  0.17549         0
## 9   9 - 1 4.00168 2435.953 2448.048 4.99410 -5.37833  0.19025         0
## 8   8 - 1 4.03309 2453.342 2465.464 5.05381  0.55608  0.20244         0
## 7   7 - 1 4.07118 2470.649 2482.797 5.10793  0.61094  0.21326         0
## 6   6 - 1 4.09428 2488.104 2500.277 5.18334  2.74265  0.22488         0
## 5   5 - 1 4.11367 2505.646 2517.846 5.08586  0.76203  0.23776         0
## 4   4 - 1 4.15203 2523.100 2535.325 5.05822 -0.05088  0.25039         0
## 3   3 - 1 4.19205 2540.208 2552.459 5.15270  0.42877  0.26371         0
## 2   2 - 1 4.21449 2557.367 2569.643 5.30198  0.73653  0.27856         0
## 1   1 - 1 4.24262 2574.488 2586.789 5.38993 -0.49453  0.29391         0

From the VIF values, it is obvious that the estimates of the finite DLM coefficients are suffering from the multicollinearity. * To deal with this issue, we can use the restricted leastsquares method to find parameter estimates. * In this approach, some restrictions are placed on the model parameters to reduce the variances of the estimators. * In the context of DLMs,we translate the pattern of time effects into the restrictions on parameters. * In the nextsection, we will use polynomial curves to restrict lag weights. * According to the significance tests of model coefficients obtained from the summary, all lag weights of predictors are not statistically significant at 5% level. Following this inference, the adjusted R2 is reported to be about 8% which is very low. F-test of the overall significance of the model reports the model is not statistically significant at 5% level (p-value > 0.05). Therefore, we can conclude that the model is not a good fit to the data.

Result: VIF values are reported > 10 so the effect of multicollinearity is high. The residual check function was created to apply a diagnostic check in a dynamic way. It displays residual analysis plots as well as performs the Breusch-Godfrey test of serial correlation and the Shapiro-Wilk normality test of the residuals.

Overall, we can conclude that the finite DLM of lag 10 is not appropriate for further analysis.

2) Polynomial Distributed Lags model

To reduce the harmful effect of multicollinearity,we will impose a polynomial shape on the lag distribution. To deal with multicollinearity problem in the finite DLM, we will attempt to use polynomial curves to restrict lag weights. We specify the optimal lag length using a function that fits finite DLMs for a range of lag lengths from 1 to 10 and orders the fitted models according to their AIC values.

The following code applies the polynomial DLM with all ordinaries price index and copper price series as well as performs residual analysis.

model2 =polyDlm(x=as.vector(dataf$x3),y= as.vector(dataf$y),q=10,k=1, show.beta = T)

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0   0.07360    0.01440   5.120 1.01e-06
## beta.1   0.06140    0.01170   5.240 5.82e-07
## beta.2   0.04920    0.00911   5.410 2.67e-07
## beta.3   0.03710    0.00658   5.640 8.95e-08
## beta.4   0.02490    0.00428   5.820 3.74e-08
## beta.5   0.01280    0.00287   4.450 1.73e-05
## beta.6   0.00062    0.00360   0.172 8.64e-01
## beta.7  -0.01150    0.00570  -2.020 4.48e-02
## beta.8  -0.02370    0.00817  -2.900 4.35e-03
## beta.9  -0.03580    0.01080  -3.330 1.11e-03
## beta.10 -0.04800    0.01340  -3.580 4.74e-04

summary(model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1214.62  -619.46    25.16   588.44  1479.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.034e+03  2.040e+02  19.775  < 2e-16 ***
## z.t0         7.356e-02  1.438e-02   5.115 9.57e-07 ***
## z.t1        -1.216e-02  2.721e-03  -4.467 1.56e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 717.6 on 148 degrees of freedom
## Multiple R-squared:  0.1865, Adjusted R-squared:  0.1755 
## F-statistic: 16.96 on 2 and 148 DF,  p-value: 2.329e-07

vif(model2$model)

##     z.t0     z.t1 
## 25.86069 25.86069

residualcheck(model2$model)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.94724, p-value = 1.828e-05

checkresiduals(model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 138.14, df = 10, p-value < 2.2e-16

Likewise, performed in the finite DLM fitting, the lowest AIC and BIC measures in the given range are for q = 10. We set the order of polynomial to 1 as it minimises the information criteria. According to the model summary, all lag weights are significant at 5% level except lag 6 (p-value > 0.05). The adjusted R2 = 17.5% is slightly better than the finite DLM but still very low. The overall significance test reports the model is statistically significant at 5% level.

VIF values are > 10 and suggest there is still multicollinearity effect on this model.

On have a Diagnostic checking in Figure above we realise that the residuals are not randomly spread. There are a lot of highly significant lags in the ACF plot, so there is autocorrelation present in the residuals. That is also supported by Beusch-Godfrey test at 5% level of significance. The normality of the residuals is also violated, as observed from the histogram and Shapiro-Wilk normality test report (p-value < 0.05).

So We conclude that the polynomial DLM of lag 10 is not appropriate for further analysis.

3) KOYCK Transformation DL modeling

One way to deal with this infinite DLM is to use Koyck transformation The following code implements Koyck DLM with the all ordinaries price index and copper price series as well as performs diagnostic checking.

model3 = koyckDlm(x=as.vector(dataf$x3), y= as.vector(dataf$y))
summary(model3$model, diagnostics =T)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 157   1966.87 < 2e-16 ***
## Wu-Hausman         1 156     10.97 0.00115 ** 
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16

vif(model3$model)

##      Y.1      X.t 
## 1.493966 1.493966

residualcheck(model3$model)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.96188, p-value = 0.0002204

checkresiduals(model3$model)

Changing attribute of model to obtain aic value

attr(model3$model, "class") ="lm"
AIC(model3$model)

## [1] 2157.456

The AIC measure is reported to be 2157.5 which is lower than the finite and polynomial DLMs. From the residual analysis in Figure 16, it is observed that the errors are spread randomly as desired. There are no significant lags in the ACF plot which suggests there is no serial correlation in the residuals. However, the error terms are not perfectly normal. The histogram of the residuals seems left-skewed, and the Shapiro-Wilk normality test suggests not normal residuals (p-value < 0.05).

4) Autoregressive DLM

Autoregressive DLMs are useful when we cannot find suitable solutions with neither polynomial nor Kyock DLMs.Actually, the autoregressive DLM is a flexible andparsimonious infinite DLM.

Fit autoregressive DLMs.

Before we fit the model we need to specify parameterrs p,q of ARDL(p,q), thus we use a loop that fits autoregressive DLMs for a range of lag lengths and orders of the AR process and fit the models that minimise the information criteria. * Based on the information criteria, we select the following models: #ARDL(1,5) #ARDL(2,5) #ARDL(3,5)

for(i in 1:5){
  for (j in 1:5) {
    model_4 = ardlDlm(formula = y ~ x1 +x3, data = data.frame(dataf), p= i, q=j)
    cat("p= ", i, "q= ", j,"AIC =", AIC(model_4$model), "BIC =", BIC(model_4$model),"\n")
    
  }
}

## p=  1 q=  1 AIC = 2135.427 BIC = 2156.954 
## p=  1 q=  2 AIC = 2123.2 BIC = 2147.752 
## p=  1 q=  3 AIC = 2108.165 BIC = 2135.728 
## p=  1 q=  4 AIC = 2097.165 BIC = 2127.728 
## p=  1 q=  5 AIC = 2086.767 BIC = 2120.315 
## p=  2 q=  1 AIC = 2119.917 BIC = 2147.537 
## p=  2 q=  2 AIC = 2121.523 BIC = 2152.212 
## p=  2 q=  3 AIC = 2107.82 BIC = 2141.508 
## p=  2 q=  4 AIC = 2096.569 BIC = 2133.244 
## p=  2 q=  5 AIC = 2086.252 BIC = 2125.9 
## p=  3 q=  1 AIC = 2109.717 BIC = 2143.406 
## p=  3 q=  2 AIC = 2110.906 BIC = 2147.657 
## p=  3 q=  3 AIC = 2109.982 BIC = 2149.795 
## p=  3 q=  4 AIC = 2098.603 BIC = 2141.39 
## p=  3 q=  5 AIC = 2088.342 BIC = 2134.09 
## p=  4 q=  1 AIC = 2100.046 BIC = 2139.777 
## p=  4 q=  2 AIC = 2101.141 BIC = 2143.929 
## p=  4 q=  3 AIC = 2100.586 BIC = 2146.43 
## p=  4 q=  4 AIC = 2102.043 BIC = 2150.943 
## p=  4 q=  5 AIC = 2091.733 BIC = 2143.58 
## p=  5 q=  1 AIC = 2089.626 BIC = 2135.374 
## p=  5 q=  2 AIC = 2090.472 BIC = 2139.27 
## p=  5 q=  3 AIC = 2089.878 BIC = 2141.726 
## p=  5 q=  4 AIC = 2091.428 BIC = 2146.326 
## p=  5 q=  5 AIC = 2092.86 BIC = 2150.807

#for p=1, q=5
model4.1 = ardlDlm(formula = y ~ x1 +x3, data = data.frame(dataf),p=1 ,q =5)$model
summary(model4.1)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -608.69 -108.99   -1.87   96.72  697.71 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 204.195354  91.833430   2.224 0.027715 *  
## x1.t         -1.168101   0.298139  -3.918 0.000137 ***
## x1.1          1.175789   0.293809   4.002 9.96e-05 ***
## x3.t          0.094982   0.032049   2.964 0.003551 ** 
## x3.1         -0.096433   0.032189  -2.996 0.003216 ** 
## y.1           0.932523   0.078420  11.891  < 2e-16 ***
## y.2           0.184956   0.111691   1.656 0.099876 .  
## y.3          -0.094300   0.109515  -0.861 0.390613    
## y.4          -0.063908   0.109287  -0.585 0.559605    
## y.5           0.002755   0.077303   0.036 0.971618    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 187.2 on 146 degrees of freedom
## Multiple R-squared:  0.9531, Adjusted R-squared:  0.9502 
## F-statistic: 329.5 on 9 and 146 DF,  p-value: < 2.2e-16

residualcheck(model4.1)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98623, p-value = 0.1257

checkresiduals(model4.1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 13
## 
## data:  Residuals
## LM test = 11.315, df = 13, p-value = 0.5844

vif(model4.1)

##       x1 L(x1, 1)       x3 L(x3, 1)  L(y, 1)  L(y, 2)  L(y, 3)  L(y, 4) 
## 60.17390 59.02863 18.45786 19.20054 19.61994 40.62230 39.99994 40.92965 
##  L(y, 5) 
## 20.93150

#for p=2, q=5
model4.2 = ardlDlm(formula = y ~ x1+x3, data = data.frame(dataf), p =2, q=5)$model
summary(model4.2)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -590.34 -108.63    0.29   93.85  705.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.953e+02  9.127e+01   2.139 0.034081 *  
## x1.t        -1.148e+00  2.966e-01  -3.868 0.000165 ***
## x1.1         1.270e+00  4.357e-01   2.915 0.004131 ** 
## x1.2        -1.007e-01  3.081e-01  -0.327 0.744282    
## x3.t         7.565e-02  3.355e-02   2.255 0.025665 *  
## x3.1        -1.200e-02  5.335e-02  -0.225 0.822345    
## x3.2        -6.907e-02  3.449e-02  -2.003 0.047097 *  
## y.1          9.092e-01  8.324e-02  10.923  < 2e-16 ***
## y.2          1.866e-01  1.113e-01   1.677 0.095794 .  
## y.3         -5.901e-02  1.129e-01  -0.523 0.601953    
## y.4         -7.267e-02  1.087e-01  -0.668 0.505049    
## y.5          9.629e-04  7.673e-02   0.013 0.990005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared:  0.9544, Adjusted R-squared:  0.9509 
## F-statistic: 274.1 on 11 and 144 DF,  p-value: < 2.2e-16

residualcheck(model4.2)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98426, p-value = 0.07341

checkresiduals(model4.2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 14.041, df = 15, p-value = 0.5224

vif(model4.2)

##        x1  L(x1, 1)  L(x1, 2)        x3  L(x3, 1)  L(x3, 2)   L(y, 1)   L(y, 2) 
##  60.48199 131.76923  66.52833  20.53927  53.54225  23.07222  22.44197  40.96126 
##   L(y, 3)   L(y, 4)   L(y, 5) 
##  43.14764  41.14631  20.93494

#for p=3, q=5
model4.3 = ardlDlm(formula = y ~ x1+x3, data = data.frame(dataf), p =2, q=5)$model
summary(model4.3)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -590.34 -108.63    0.29   93.85  705.46 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.953e+02  9.127e+01   2.139 0.034081 *  
## x1.t        -1.148e+00  2.966e-01  -3.868 0.000165 ***
## x1.1         1.270e+00  4.357e-01   2.915 0.004131 ** 
## x1.2        -1.007e-01  3.081e-01  -0.327 0.744282    
## x3.t         7.565e-02  3.355e-02   2.255 0.025665 *  
## x3.1        -1.200e-02  5.335e-02  -0.225 0.822345    
## x3.2        -6.907e-02  3.449e-02  -2.003 0.047097 *  
## y.1          9.092e-01  8.324e-02  10.923  < 2e-16 ***
## y.2          1.866e-01  1.113e-01   1.677 0.095794 .  
## y.3         -5.901e-02  1.129e-01  -0.523 0.601953    
## y.4         -7.267e-02  1.087e-01  -0.668 0.505049    
## y.5          9.629e-04  7.673e-02   0.013 0.990005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 185.8 on 144 degrees of freedom
## Multiple R-squared:  0.9544, Adjusted R-squared:  0.9509 
## F-statistic: 274.1 on 11 and 144 DF,  p-value: < 2.2e-16

residualcheck(model4.3)

## 
##  Shapiro-Wilk normality test
## 
## data:  x$residuals
## W = 0.98426, p-value = 0.07341

checkresiduals(model4.3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 14.041, df = 15, p-value = 0.5224

vif(model4.3)

##        x1  L(x1, 1)  L(x1, 2)        x3  L(x3, 1)  L(x3, 2)   L(y, 1)   L(y, 2) 
##  60.48199 131.76923  66.52833  20.53927  53.54225  23.07222  22.44197  40.96126 
##   L(y, 3)   L(y, 4)   L(y, 5) 
##  43.14764  41.14631  20.93494

From the all the fitted ARDL models,we can say they were reported to be statistically significant at 5% level. The best model in terms of the significance of coefficients, AIC and adjusted R2 was marginally ARDL(1,5). The Model summary we aquiredis that the all ords price index might be related to its previous year levels, gold prices of the previous year and copper prices of the previous year as well. The AIC (2086.8) and adjusted R2 (95%) values are the best compared to those of all previously fitted models.

As per the Residual analysis of this model : -According to the histogram the results of Shapiro-Wilk test, The normality assumption is not violated at 5% level. -All the errors produced are randomly spread and have no particular trend, -Also there is no autocorrelation in the residuals based on ACF and Beusch-Godfrey test.

However,it is seen that all ARDL models fitted have multicollinearity effects with VIFs > 10.

Conclusion:

While working on investigation of the ASX data, we firstly explored it by visualizing the plots to understand the nature of series variables.. The series given in the data were described separately before analysing the relationships between them. Around 2008,a clear intervention was observed in the dataset which is probably connected with the financial up-downs of that year. The series of interest all ords price index was found to have a moderate correlation with the copper price series.in addition there was (0.87)hih correlation wrt crude oil as well.

Further we went on performing analysis of non stationarity by plotting sample ACFs and running ADF tests on the series. After the performed analysis, it was found that there is non stationarity existent in the data as all series have a trend.
The next step to explore the series was followed by decomposition to explore the impact of the components of time series data on the given dataset. On performing all tests and plots we found that there was no seasonal effect in the dataset prior to decomposition, thus we made a conclusion that there were some other external factors influencing the data.
The most Important step was to select the Model.The data was fiited through four models namely Finit DLM, Polynomial DLM, Koyck model and Autoregressive DLM.Among all,the most accurate and suitable distributed lag model for the ASX price index was found, that is, Koyck model is the most appropriate for the given data. The model chosen had a high explainability, the diagnostic checking showed there were no major problems in the residuals. Koyck model did not have a multicollinearity effect unlike other promising models like ARDLs.

In conclusion, based on the analysis performed, we can suggest Koyck model as the most suitable for further use and forecasting.

References:

Rmit Canvas study material

Forecasting Australian Securities Exchange time series data

Math1307 Assignment1

Rashmi Madan Walavalkar, s3804366

Introduction

About Dataset

Convert to Timeseries

Data visualisation And exploration of variables

Scaling down to avoid mismatch

Correlation factor

1) ASX all ords data

2) Gold data

3) Crude oil data

4) Copper data

Decomposition of time series

STL decomposition and x12 decomposition

Modelling process:

1) Finite DLM

2) Polynomial Distributed Lags model

3) KOYCK Transformation DL modeling

4) Autoregressive DLM

Fit autoregressive DLMs.

Conclusion:

References: