Assignment 1 Forecasting

Importing dataset

library(readr)
ASX_data <- read_csv("/Users/shubhamchougule/Downloads/ASX_data.csv")

## Parsed with column specification:
## cols(
##   `ASX price` = col_double(),
##   `Gold price` = col_number(),
##   `Crude Oil (Brent)_USD/bbl` = col_double(),
##   `Copper_USD/tonne` = col_number()
## )

#Checking the dataset
head(ASX_data)

## # A tibble: 6 x 4
##   `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
##         <dbl>        <dbl>                       <dbl>              <dbl>
## 1       2935.         612.                        31.3               1650
## 2       2778.         603.                        32.6               1682
## 3       2849.         566.                        30.3               1656
## 4       2971.         539.                        25.0               1588
## 5       2980.         549.                        25.8               1651
## 6       3000.         536.                        27.6               1685

#Checking the class of dataset
class(ASX_data)

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Changing the dataset to time series

#Converting dataset to time series starting from Jan 2004
ASX<-ts(ASX_data$`ASX price`,start = 2004,frequency = 12)

GOLD<-ts(ASX_data$`Gold price`,start = 2004,frequency = 12)

COP<-ts(ASX_data$`Copper_USD/tonne`,start = 2004,frequency = 12)

OIL<-ts(ASX_data$`Crude Oil (Brent)_USD/bbl`,start = 2004,frequency = 12)

data<-ts(ASX_data,start = 2004,frequency = 12)

Existence of nonstationarity in dataset

Non-stationarity of a series is checked using ACF and PACF plots also by conducting an ADF test. Non-stationarity can be removed by applying transformation and differencing to the series. Series can be well understood when it is decomposed using a different techniques.

Impact of the components of a time series data on the given dataset

Components of time series data are trend,seasonality and remainder.

Here first we will check the series and then make ACF and PACF tests to denote the non-stationarity of the series. Applying transformation and differencing to the required series and then performing decomposition to know the series.

Plot the time series dataset

par(mfrow=c(2,2))
plot(ASX,ylab="Monthly avg ASX Price",xlab="YEAR",main="ASX Price Series")
plot(GOLD,ylab="Monthly avg GOLD Price",xlab="YEAR",main="Gold Price Series")
plot(OIL,ylab="Monthly avg Crude Oil Price",xlab="YEAR",main="Crude Oil Series")
plot(COP,ylab="Monthly avg Copper Price",xlab="YEAR",main="Copper Price Series")

ASX time series: There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2009.
Gold time series: We see a trend in the time series and there is no seasonal pattern observed. Intervention is also not observed.
Crude Oil Series:There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2009 and 2015.
Copper price series:There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2010.

ACF and PACF plots

#ACF
par(mfrow=c(2,2))
acf(ASX,main="ACF for ASX Price")
acf(GOLD,main="ACF for Gold Price")
acf(OIL,main="ACF for Crude Oil Price")
acf(COP,main="ACF for Copper Price")

#PACF
par(mfrow=c(2,2))
pacf(ASX,main="PACF for ASX Price")
pacf(GOLD,main="PACF for Gold Price")
pacf(OIL,main="PACF for Crude Oil Price")
pacf(COP,main="PACF for Copper Price")

We see a rise and fall in the lags in the ACF plot. In PACF plot first lag is significant which clearly states that the series is non-stationary. To deal with nonstationary series we will use transformation.

Transformation

#Checking the lambda value of each series
ASXL=BoxCox.lambda(ASX)
GOLDL=BoxCox.lambda(GOLD)
OILL=BoxCox.lambda(OIL)
COPL=BoxCox.lambda(COP)

cbind(ASXL,GOLDL,OILL,COPL)

##          ASXL    GOLDL       OILL      COPL
## [1,] 1.999924 0.976695 -0.8304136 0.9336783

ASX price has lambda value near to 2 hence transformation is to be applied. Copper price and Gold price have lambda value near to 1 hence no need of transformation. The Crude Oil price has a lambda value negative hence needed transformation to be applied.

#Transformation of ASX price
ASX.TRS=BoxCox(ASX,lambda = ASXL)
plot(ASX.TRS,type='o',main="Time series plot after transformation",xlab="Year",ylab="Log of ASX Price")

After transformation we see no big change in the series.

#Transformation of Crude oil
OIL.TRS=BoxCox(OIL,lambda = OILL)
plot(OIL.TRS,type='o',main="Time series plot after transformation",xlab="Year",ylab="Log of Crude Oil")

No change can be seen after transformation in the ASX and Crude Oil prices series hence applying differencing to all the series.

Differencing

#Apply  first differencing
ASX.TRS.DIFF =diff(ASX.TRS,differences=1)
#First seasonal difference
plot(ASX.TRS.DIFF,ylab='ASX Price',xlab='Year',
     main = "Time series plot of the first difference")

#Apply  first differencing
OIL.TRS.DIFF =diff(OIL.TRS,differences=1)
#First seasonal difference
plot(OIL.TRS.DIFF,ylab='Crude Oil Price',xlab='Year',
     main = "Time series plot of the first difference")

#Apply  first differencing
GOLD.DIFF =diff(GOLD,differences=1)
#First seasonal difference
plot(GOLD.DIFF,ylab='Gold Price',xlab='Year',
     main = "Time series plot of the first difference")

#Apply  first differencing
COP.DIFF =diff(COP,differences=1)
#First seasonal difference
plot(COP.DIFF,ylab='Copper Price',xlab='Year',
     main = "Time series plot of the first difference")

Applying differencing made all the series look stationary. For confirmation we will check the stationary using ADF test.

ADF test

Non-stationary can be check by adf test.

Assumptions are as follows

NULL hypothesis:

H0:Non-stationary

Alternative hypothesis:

HA:stationary

#ADF test for ASX price
adf.test(ASX.TRS.DIFF)

## Warning in adf.test(ASX.TRS.DIFF): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ASX.TRS.DIFF
## Dickey-Fuller = -4.4343, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.

#ADF test for GOLD price
adf.test(GOLD.DIFF)

## Warning in adf.test(GOLD.DIFF): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  GOLD.DIFF
## Dickey-Fuller = -5.8718, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.

#ADF test fot Crude oil price
adf.test(OIL.TRS.DIFF)

## Warning in adf.test(OIL.TRS.DIFF): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  OIL.TRS.DIFF
## Dickey-Fuller = -5.5931, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.

#ADF test for Copper price
adf.test(COP.DIFF)

## Warning in adf.test(COP.DIFF): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  COP.DIFF
## Dickey-Fuller = -5.478, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

From the Augmented Dickey-Fuller Test we get a p-value of less than 5% hence rejecting the null hypothesis the series is stationary.

Decomposition

The individual effect of the existing components and past effects can be analyzed by using the decomposition technique. For decomposition, we will use the X12-ARIMA decomposition method as we will check STL decomposition.

#Decomposition function giving output decomposed series, SI ratio plot and STL plot.

decompose <- function(x){
  DECOM = x12(x)
  plot(DECOM, sa=TRUE , trend=TRUE , forecast = TRUE)
  
  plotSeasFac(DECOM)
  
  stldec=stl(x, t.window=15, s.window="periodic", robust=TRUE)
  plot(stldec)
}

#ASX decomposition
decompose(ASX)

We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first, second and third quarters. The trend graph shows a downward fall around 2009 and then an upward trend after 2009.

decompose(GOLD)

* We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.

We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first, third and fourth quarters. Trend graph shows an upward trend after that 2006 till the end

decompose(OIL)

We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at Second and third quarters.The trend graph shows a downward fall around 2009 and then an upward trend after that 2009 till 2014.

decompose(COP)

We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first and fourth quarters. The trend graph shows a downward fall around 2009 and then an upward trend after that 2009 till 2014.

Model Fitting

Comparing various models to see which one best fits the ASX All Ordinaries (Ords) Price Index. The finite Distributed-Lag Model, Polynomial Distributed Lags, Koyck Distributed Lag Model, and Autoregressive Distributed Lag Model are the models compared.

#Checking the correlation between the independent variable and dependent variable 
cor(data)

##                           ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price                 1.0000000  0.3431908                 0.3290338
## Gold price                0.3431908  1.0000000                 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338  0.4366382                 1.0000000
## Copper_USD/tonne          0.5617864  0.5364213                 0.8664296
##                           Copper_USD/tonne
## ASX price                        0.5617864
## Gold price                       0.5364213
## Crude Oil (Brent)_USD/bbl        0.8664296
## Copper_USD/tonne                 1.0000000

Here ASX price is dependent variable and Gold,Crude oil and Copper prices are independent variables.

DLM Model Fitting for ASX price VS Gold price

#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
  model1 = dlm( x = as.vector(GOLD) , y = as.vector(ASX), q = i )
  cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}

## q =  1 AIC =  2613.609 BIC =  2625.91 
## q =  2 AIC =  2596.292 BIC =  2611.637 
## q =  3 AIC =  2579.215 BIC =  2597.59 
## q =  4 AIC =  2562.296 BIC =  2583.69 
## q =  5 AIC =  2544.887 BIC =  2569.286 
## q =  6 AIC =  2527.575 BIC =  2554.966 
## q =  7 AIC =  2510.535 BIC =  2540.905 
## q =  8 AIC =  2493.885 BIC =  2527.22 
## q =  9 AIC =  2476.983 BIC =  2513.27 
## q =  10 AIC =  2460.345 BIC =  2499.57

Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.

model1<- dlm(x = as.vector(GOLD) , y = as.vector(ASX), q = 10)
summary(model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1535.24  -575.79    20.89   480.32  1951.02 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4523.02779  225.83961  20.028   <2e-16 ***
## x.t           -0.54891    1.27022  -0.432    0.666    
## x.1            0.07699    1.88146   0.041    0.967    
## x.2           -0.01009    1.90952  -0.005    0.996    
## x.3           -0.12278    1.92437  -0.064    0.949    
## x.4           -0.30955    1.92889  -0.160    0.873    
## x.5            0.47310    1.93180   0.245    0.807    
## x.6            0.02590    1.94990   0.013    0.989    
## x.7            0.67162    1.95391   0.344    0.732    
## x.8           -0.11584    1.94844  -0.059    0.953    
## x.9            0.11415    1.92690   0.059    0.953    
## x.10           0.11352    1.28818   0.088    0.930    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 798.9 on 139 degrees of freedom
## Multiple R-squared:  0.05296,    Adjusted R-squared:  -0.02199 
## F-statistic: 0.7066 on 11 and 139 DF,  p-value: 0.7306
## 
## AIC and BIC values for the model:
##        AIC     BIC
## 1 2460.345 2499.57

checkresiduals(model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 138.89, df = 15, p-value < 2.2e-16

The summary states that none of the lag is significant, Adj R-squared value is negative and in Breusch-Godfrey test we see value is less than 5% states that there is a serial correlation. From the ACF plot we conclude that residuals are highly significant. Hence violate general assumptions.

vif(model1$model)#states multicollinerity

##       x.t       x.1       x.2       x.3       x.4       x.5       x.6       x.7 
##  54.00602 120.12460 125.08221 128.79308 131.27007 133.47382 137.78856 139.77569 
##       x.8       x.9      x.10 
## 139.98862 137.01938  61.18711

Also there effect of multicollinearity is high. Hence we move for Polynomial Distributed Lag Model.

Polynomial Distributed Lag Model Fitting for ASX price VS Gold price

model2 = polyDlm(x = as.vector(GOLD) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0    0.396       1.28   0.310   0.757
## beta.1   -0.233       1.90  -0.122   0.903
## beta.2    0.536       1.27   0.421   0.675

summary(model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1632.50  -700.82     4.61   549.72  2213.87 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3998.3587   212.3161  18.832   <2e-16 ***
## z.t0           0.3958     1.2767   0.310    0.757    
## z.t1          -1.3268     5.7723  -0.230    0.819    
## z.t2           0.6983     2.8546   0.245    0.807    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 834.5 on 155 degrees of freedom
## Multiple R-squared:  0.1015, Adjusted R-squared:  0.08409 
## F-statistic: 5.835 on 3 and 155 DF,  p-value: 0.0008385

checkresiduals(model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 147.24, df = 10, p-value < 2.2e-16

From Summary none of the lag is significant, Adj R-squared value is very less and in Breusch-Godfrey test we see pvalue is less than 5% states that there is a serial correlation. From the ACF plot we conclude that residuals are highly significant. Hence violate general assumptions in Polynomial Distributed Lag.

Koyck Distributed Lag Model Fitting for ASX price VS Gold price

model3 = koyckDlm(x = as.vector(GOLD) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -682.19 -105.44   15.86  135.04  783.60 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.902e+02  8.958e+01   2.123   0.0353 *  
## Y.1         9.635e-01  1.909e-02  50.469   <2e-16 ***
## X.t         2.595e-03  4.304e-02   0.060   0.9520    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488,  Adjusted R-squared: 0.9481 
## Wald test:  1454 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2  statistic       p-value
## Weak instruments   1 157 8006.63657 1.266826e-136
## Wu-Hausman         1 156   18.06854  3.655601e-05
## 
##                            alpha        beta       phi
## Geometric coefficients:  5205.15 0.002595168 0.9634602

checkresiduals(model3$model)

Koyck Distributed Lag Model summary gives good results having one lag significant, also have a high Adj R-squared value. In Wu-Hausman diagnostic check we get p value less than 5% showing error terms have some degree of correlation. ACF plot do not have any lag significant and residuals are quite evenly distributed.

We will check the multicollinerity.

vif(model3$model)

##      Y.1      X.t 
## 1.140648 1.140648

Here we have values less than 10 confirming of no multicollinerity.

Autorregressive Distributed Lag Model Fitting for ASX price VS Gold price

#Checking p and q vales on basis fo AIC and BIC 
for (i in 1:5){
  for(j in 1:5){
    model4 = ardlDlm(x = as.vector(GOLD) , y = as.vector(ASX), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
  }
}

## p =  1 q =  1 AIC =  2140.897 BIC =  2156.273 
## p =  1 q =  2 AIC =  2128.524 BIC =  2146.938 
## p =  1 q =  3 AIC =  2113.99 BIC =  2135.428 
## p =  1 q =  4 AIC =  2102.754 BIC =  2127.204 
## p =  1 q =  5 AIC =  2092.194 BIC =  2119.643 
## p =  2 q =  1 AIC =  2128.627 BIC =  2147.04 
## p =  2 q =  2 AIC =  2130.523 BIC =  2152.005 
## p =  2 q =  3 AIC =  2115.89 BIC =  2140.39 
## p =  2 q =  4 AIC =  2104.694 BIC =  2132.2 
## p =  2 q =  5 AIC =  2094.14 BIC =  2124.639 
## p =  3 q =  1 AIC =  2118.109 BIC =  2139.547 
## p =  3 q =  2 AIC =  2120.027 BIC =  2144.528 
## p =  3 q =  3 AIC =  2117.305 BIC =  2144.868 
## p =  3 q =  4 AIC =  2105.731 BIC =  2136.293 
## p =  3 q =  5 AIC =  2095.264 BIC =  2128.812 
## p =  4 q =  1 AIC =  2107.002 BIC =  2131.452 
## p =  4 q =  2 AIC =  2108.914 BIC =  2136.42 
## p =  4 q =  3 AIC =  2106.276 BIC =  2136.839 
## p =  4 q =  4 AIC =  2107.456 BIC =  2141.074 
## p =  4 q =  5 AIC =  2097.01 BIC =  2133.608 
## p =  5 q =  1 AIC =  2094.908 BIC =  2122.357 
## p =  5 q =  2 AIC =  2096.86 BIC =  2127.359 
## p =  5 q =  3 AIC =  2094.144 BIC =  2127.692 
## p =  5 q =  4 AIC =  2095.425 BIC =  2132.023 
## p =  5 q =  5 AIC =  2097.324 BIC =  2136.972

We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.

model4 = ardlDlm(x = as.vector(GOLD) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -564.02 -106.74    8.99  126.34  691.74 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 234.72597   92.97346   2.525 0.012635 *  
## X.t          -1.21331    0.30480  -3.981 0.000107 ***
## X.1           1.21097    0.30013   4.035 8.73e-05 ***
## Y.1           0.96620    0.07927  12.189  < 2e-16 ***
## Y.2           0.13687    0.11316   1.210 0.228368    
## Y.3          -0.07572    0.11193  -0.676 0.499816    
## Y.4          -0.04931    0.11174  -0.441 0.659640    
## Y.5          -0.02130    0.07871  -0.271 0.787092    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 191.6 on 148 degrees of freedom
## Multiple R-squared:  0.9502, Adjusted R-squared:  0.9478 
## F-statistic: 403.1 on 7 and 148 DF,  p-value: < 2.2e-16

checkresiduals(model4$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5.7145, df = 11, p-value = 0.8917

Autorregressive Distributed Lag Model summary gives good results having some lags significant, also have a high Adj R-squared value. ACF plot do not have any lag significant and residuals are quite evenly distributed.

We will check the multicollinerity.

vif(model4$model)

##       X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  60.01512  58.78005  19.12905  39.78823  39.87314  40.82652  20.70864

Here we have values greater than 10 confirming of multicollinerity in the series.

DLM Model Fitting for ASX price VS Crude Oil price

#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
  model1 = dlm( x = as.vector(OIL) , y = as.vector(ASX), q = i )
  cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}

## q =  1 AIC =  2614.698 BIC =  2626.998 
## q =  2 AIC =  2596.715 BIC =  2612.059 
## q =  3 AIC =  2579.101 BIC =  2597.477 
## q =  4 AIC =  2561.888 BIC =  2583.281 
## q =  5 AIC =  2544.936 BIC =  2569.335 
## q =  6 AIC =  2527.701 BIC =  2555.091 
## q =  7 AIC =  2510.754 BIC =  2541.124 
## q =  8 AIC =  2493.914 BIC =  2527.249 
## q =  9 AIC =  2476.871 BIC =  2513.158 
## q =  10 AIC =  2459.842 BIC =  2499.066

Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.

model1<- dlm(x = as.vector(OIL) , y = as.vector(ASX), q = 10)
summary(model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1576.90  -628.63     6.78   568.36  1678.62 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4690.8891   198.7822  23.598   <2e-16 ***
## x.t            9.4539    11.3091   0.836    0.405    
## x.1            0.3802    18.9644   0.020    0.984    
## x.2           -0.2888    19.1969  -0.015    0.988    
## x.3            2.3047    19.2972   0.119    0.905    
## x.4           -5.6630    19.3383  -0.293    0.770    
## x.5           -0.4596    19.3953  -0.024    0.981    
## x.6            0.1522    19.3553   0.008    0.994    
## x.7           -0.8638    19.5004  -0.044    0.965    
## x.8            0.2994    19.5758   0.015    0.988    
## x.9           -7.2824    19.2792  -0.378    0.706    
## x.10           5.0307    11.3292   0.444    0.658    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 797.5 on 139 degrees of freedom
## Multiple R-squared:  0.05611,    Adjusted R-squared:  -0.01859 
## F-statistic: 0.7512 on 11 and 139 DF,  p-value: 0.6877
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2459.842 2499.066

checkresiduals(model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 138.41, df = 15, p-value < 2.2e-16

Summary states that none of the lag is significant, Adj R-squared value is negative and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions.

vif(model1$model)#states multicollinerity

##      x.t      x.1      x.2      x.3      x.4      x.5      x.6      x.7 
## 24.79895 70.60934 73.44324 75.12792 76.53499 78.10752 78.97569 81.18320 
##      x.8      x.9     x.10 
## 82.63829 80.72059 28.09278

Based on values there exists multicollinearity is series. Hence we move for Polynomial Distributed Lag Model.

Polynomial Distributed Lag Model Fitting for ASX price VS Crude OIL price

model2 = polyDlm(x = as.vector(OIL) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0    13.70       11.6   1.180   0.239
## beta.1     3.18       19.1   0.167   0.868
## beta.2    -8.41       11.5  -0.730   0.467

summary(model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1616.4  -703.8   -77.9   657.5  1783.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4209.0519   179.6342  23.431   <2e-16 ***
## z.t0          13.6870    11.5777   1.182    0.239    
## z.t1          -9.9541    57.9232  -0.172    0.864    
## z.t2          -0.5483    28.7836  -0.019    0.985    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 835.6 on 155 degrees of freedom
## Multiple R-squared:  0.09909,    Adjusted R-squared:  0.08165 
## F-statistic: 5.683 on 3 and 155 DF,  p-value: 0.001019

checkresiduals(model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 146.42, df = 10, p-value < 2.2e-16

From Summary none of the lag is significant, Adj R-squared value is very less and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions in Polynomial Distributed Lag.

Koyck Distributed Lag Model Fitting for ASX price VS Crude OIL price

model3 = koyckDlm(x = as.vector(OIL) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -683.91 -108.66   13.68  139.77  762.55 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 209.89536   87.89368   2.388   0.0181 *  
## Y.1           0.97537    0.01905  51.193   <2e-16 ***
## X.t          -0.99907    0.58045  -1.721   0.0872 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949,   Adjusted R-squared: 0.9483 
## Wald test:  1461 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2 statistic       p-value
## Weak instruments   1 157 3018.4519 1.997814e-104
## Wu-Hausman         1 156   11.1979  1.026113e-03
## 
##                             alpha       beta       phi
## Geometric coefficients:  8522.034 -0.9990694 0.9753703

checkresiduals(model3$model)

We will check the multicollinerity.

vif(model3$model)

##     Y.1     X.t 
## 1.14038 1.14038

Here we have values less than 10 confirming of no multicollinerity.

Autorregressive Distributed Lag Model Fitting for ASX price VS Gold price

#Checking p and q vales on basis fo AIC and BIC 
for (i in 1:5){
  for(j in 1:5){
    model4 = ardlDlm(x = as.vector(OIL) , y = as.vector(ASX), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
  }
}

## p =  1 q =  1 AIC =  2146.524 BIC =  2161.9 
## p =  1 q =  2 AIC =  2134.107 BIC =  2152.521 
## p =  1 q =  3 AIC =  2121.07 BIC =  2142.508 
## p =  1 q =  4 AIC =  2109.4 BIC =  2133.85 
## p =  1 q =  5 AIC =  2098.335 BIC =  2125.784 
## p =  2 q =  1 AIC =  2132.312 BIC =  2150.726 
## p =  2 q =  2 AIC =  2134.235 BIC =  2155.718 
## p =  2 q =  3 AIC =  2122.356 BIC =  2146.857 
## p =  2 q =  4 AIC =  2110.793 BIC =  2138.299 
## p =  2 q =  5 AIC =  2099.752 BIC =  2130.251 
## p =  3 q =  1 AIC =  2121.919 BIC =  2143.357 
## p =  3 q =  2 AIC =  2123.835 BIC =  2148.335 
## p =  3 q =  3 AIC =  2124.324 BIC =  2151.887 
## p =  3 q =  4 AIC =  2112.401 BIC =  2142.963 
## p =  3 q =  5 AIC =  2101.35 BIC =  2134.899 
## p =  4 q =  1 AIC =  2111.383 BIC =  2135.832 
## p =  4 q =  2 AIC =  2113.294 BIC =  2140.8 
## p =  4 q =  3 AIC =  2113.805 BIC =  2144.367 
## p =  4 q =  4 AIC =  2114.384 BIC =  2148.003 
## p =  4 q =  5 AIC =  2103.342 BIC =  2139.94 
## p =  5 q =  1 AIC =  2097.076 BIC =  2124.525 
## p =  5 q =  2 AIC =  2099.041 BIC =  2129.54 
## p =  5 q =  3 AIC =  2099.518 BIC =  2133.066 
## p =  5 q =  4 AIC =  2099.845 BIC =  2136.443 
## p =  5 q =  5 AIC =  2100.917 BIC =  2140.566

We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.

model4 = ardlDlm(x = as.vector(OIL) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -678.76 -119.18  -11.69  139.51  683.99 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 264.45303   93.37157   2.832  0.00527 ** 
## X.t           7.40072    2.65126   2.791  0.00594 ** 
## X.1          -8.08296    2.65998  -3.039  0.00281 ** 
## Y.1           0.93245    0.08490  10.982  < 2e-16 ***
## Y.2           0.14232    0.11676   1.219  0.22482    
## Y.3          -0.03681    0.11415  -0.322  0.74754    
## Y.4          -0.03104    0.11379  -0.273  0.78542    
## Y.5          -0.04827    0.07866  -0.614  0.54035    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.4 on 148 degrees of freedom
## Multiple R-squared:  0.9482, Adjusted R-squared:  0.9457 
## F-statistic: 386.7 on 7 and 148 DF,  p-value: < 2.2e-16

checkresiduals(model4$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5.4064, df = 11, p-value = 0.9099

We will check the multicollinerity.

vif(model4$model)

##       X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  24.76598  25.26947  21.09931  40.72911  39.86621  40.71023  19.88292

Here we have values greater than 10 confirming of multicollinerity in the series.

DLM Model Fitting for ASX price VS Copper price

#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
  model1 = dlm( x = as.vector(COP) , y = as.vector(ASX), q = i )
  cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}

## q =  1 AIC =  2574.488 BIC =  2586.789 
## q =  2 AIC =  2559.356 BIC =  2574.7 
## q =  3 AIC =  2544.155 BIC =  2562.531 
## q =  4 AIC =  2528.895 BIC =  2550.289 
## q =  5 AIC =  2513.265 BIC =  2537.664 
## q =  6 AIC =  2497.775 BIC =  2525.166 
## q =  7 AIC =  2481.988 BIC =  2512.357 
## q =  8 AIC =  2466.511 BIC =  2499.846 
## q =  9 AIC =  2451.016 BIC =  2487.302 
## q =  10 AIC =  2436.164 BIC =  2475.389

Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.

model1<- dlm(x = as.vector(COP) , y = as.vector(ASX), q = 10)
summary(model1)

## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1154.09  -643.75   -11.55   596.33  1429.23 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.981e+03  2.166e+02  18.382   <2e-16 ***
## x.t          1.536e-01  1.354e-01   1.134    0.259    
## x.1          1.857e-02  2.205e-01   0.084    0.933    
## x.2          4.480e-02  2.220e-01   0.202    0.840    
## x.3          2.830e-02  2.180e-01   0.130    0.897    
## x.4          1.889e-02  2.175e-01   0.087    0.931    
## x.5         -4.846e-02  2.191e-01  -0.221    0.825    
## x.6          3.046e-02  2.175e-01   0.140    0.889    
## x.7         -3.494e-03  2.189e-01  -0.016    0.987    
## x.8         -1.349e-03  2.239e-01  -0.006    0.995    
## x.9         -8.232e-02  2.222e-01  -0.371    0.712    
## x.10        -1.012e-02  1.340e-01  -0.076    0.940    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 737.4 on 139 degrees of freedom
## Multiple R-squared:  0.1931, Adjusted R-squared:  0.1292 
## F-statistic: 3.024 on 11 and 139 DF,  p-value: 0.001201
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 2436.164 2475.389

checkresiduals(model1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 138.89, df = 15, p-value < 2.2e-16

vif(model1$model)#states multicollinerity

##      x.t      x.1      x.2      x.3      x.4      x.5      x.6      x.7 
## 17.92226 49.15319 51.59119 51.49576 52.97842 55.52579 56.41365 58.90348 
##      x.8      x.9     x.10 
## 63.24668 63.87601 23.81718

Also there effect of multicollinearity is high. Hence we move for Polynomial Distributed Lag Model.

Polynomial Distributed Lag Model Fitting for ASX price VS Copper price

model2 = polyDlm(x = as.vector(COP) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##        Estimate Std. Error t value P(>|t|)
## beta.0  0.17800      0.131  1.3600   0.176
## beta.1  0.05290      0.207  0.2550   0.799
## beta.2 -0.00654      0.129 -0.0507   0.960

summary(model2)

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1302.7  -694.4  -135.3   635.5  1512.0 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3484.26812  181.89988  19.155   <2e-16 ***
## z.t0           0.17781    0.13067   1.361    0.176    
## z.t1          -0.15771    0.62898  -0.251    0.802    
## z.t2           0.03277    0.31191   0.105    0.916    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 743 on 155 degrees of freedom
## Multiple R-squared:  0.2877, Adjusted R-squared:  0.274 
## F-statistic: 20.87 on 3 and 155 DF,  p-value: 2.065e-11

checkresiduals(model2$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 146.69, df = 10, p-value < 2.2e-16

Koyck Distributed Lag Model Fitting for ASX price VS Copper price

model3 = koyckDlm(x = as.vector(COP) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -689.64 -108.62   12.78  140.20  771.79 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 189.368812  87.644648   2.161   0.0322 *  
## Y.1           0.971621   0.021895  44.376   <2e-16 ***
## X.t          -0.005864   0.009517  -0.616   0.5387    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485,  Adjusted R-squared: 0.9479 
## Wald test:  1448 on 2 and 157 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2  statistic      p-value
## Weak instruments   1 157 1966.86799 1.043205e-90
## Wu-Hausman         1 156   10.97528 1.147725e-03
## 
##                             alpha         beta       phi
## Geometric coefficients:  6672.885 -0.005863623 0.9716211

checkresiduals(model3$model)

We will check the multicollinerity.

vif(model3$model)

##      Y.1      X.t 
## 1.493966 1.493966

Here we have values less than 10 confirming of no multicollinerity.

Autorregressive Distributed Lag Model Fitting for ASX price VS Copper price

#Checking p and q vales on basis fo AIC and BIC 
for (i in 1:5){
  for(j in 1:5){
    model4 = ardlDlm(x = as.vector(COP) , y = as.vector(ASX), p = i , q = j )
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
  }
}

## p =  1 q =  1 AIC =  2147.741 BIC =  2163.116 
## p =  1 q =  2 AIC =  2135.4 BIC =  2153.813 
## p =  1 q =  3 AIC =  2121.12 BIC =  2142.558 
## p =  1 q =  4 AIC =  2109.759 BIC =  2134.209 
## p =  1 q =  5 AIC =  2099.056 BIC =  2126.505 
## p =  2 q =  1 AIC =  2130.043 BIC =  2148.456 
## p =  2 q =  2 AIC =  2132.038 BIC =  2153.52 
## p =  2 q =  3 AIC =  2119.241 BIC =  2143.741 
## p =  2 q =  4 AIC =  2107.649 BIC =  2135.155 
## p =  2 q =  5 AIC =  2097.021 BIC =  2127.52 
## p =  3 q =  1 AIC =  2117.307 BIC =  2138.745 
## p =  3 q =  2 AIC =  2119.247 BIC =  2143.748 
## p =  3 q =  3 AIC =  2119.696 BIC =  2147.259 
## p =  3 q =  4 AIC =  2108.537 BIC =  2139.1 
## p =  3 q =  5 AIC =  2097.832 BIC =  2131.38 
## p =  4 q =  1 AIC =  2105.916 BIC =  2130.366 
## p =  4 q =  2 AIC =  2107.774 BIC =  2135.28 
## p =  4 q =  3 AIC =  2108.608 BIC =  2139.17 
## p =  4 q =  4 AIC =  2110.085 BIC =  2143.704 
## p =  4 q =  5 AIC =  2099.454 BIC =  2136.052 
## p =  5 q =  1 AIC =  2095.118 BIC =  2122.566 
## p =  5 q =  2 AIC =  2096.96 BIC =  2127.459 
## p =  5 q =  3 AIC =  2097.887 BIC =  2131.436 
## p =  5 q =  4 AIC =  2099.497 BIC =  2136.095 
## p =  5 q =  5 AIC =  2101.419 BIC =  2141.067

We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.

model4 = ardlDlm(x = as.vector(COP) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)

## 
## Time series regression with "ts" data:
## Start = 6, End = 161
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -699.82 -120.15   -3.11  126.37  735.99 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 225.77135   93.06873   2.426  0.01648 *  
## X.t           0.10123    0.03350   3.022  0.00296 ** 
## X.1          -0.09988    0.03343  -2.987  0.00330 ** 
## Y.1           0.98307    0.08091  12.151  < 2e-16 ***
## Y.2           0.11625    0.11549   1.007  0.31577    
## Y.3          -0.07509    0.11447  -0.656  0.51284    
## Y.4          -0.04030    0.11417  -0.353  0.72457    
## Y.5          -0.03005    0.07947  -0.378  0.70585    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.9 on 148 degrees of freedom
## Multiple R-squared:  0.9479, Adjusted R-squared:  0.9455 
## F-statistic: 384.8 on 7 and 148 DF,  p-value: < 2.2e-16

checkresiduals(model4$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 6.8037, df = 11, p-value = 0.8148

We will check the multicollinerity.

vif(model4$model)

##       X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  18.41423  18.91724  19.07060  39.66170  39.90695  40.78752  20.19991

Here we have values greater than 10 confirming of multicollinerity in the series.

Conclusion

The dataset was initially non-stationary, as determined by ACF and PACF plots. The non-stationarity of the series was removed by executing transformations on the ASX and Crude Oil price series, as well as first, differencing on all of them and confirming stationarity with the ADF test.
Decomposition of series using X12 and STL method results in knowing the components responsible for non-stationarity they are trend and seasonality.
After conducting Finite Distributed-Lag Model, Polynomial Distributed Lags, Koyck Distributed Lag Model, and Autoregressive Distributed Lag Model on ASX vs GOLD, ASX vs Crude Oil and ASX vs Copper on basis of ACF, multicollinearity, diagnostic check, Adj R-squared, and Breusch-Godfrey test we can conclude that Koyck Distributed Lag Model is the best fitting model. Where ASX price is the dependent variable and Crude oil, Copper and Gold are the independent variables.

S3800545 Assignment 1

Shubham Chougule

Assignment 1 Forecasting

Importing dataset

Changing the dataset to time series

Existence of nonstationarity in dataset

Impact of the components of a time series data on the given dataset

Plot the time series dataset

ACF and PACF plots

Transformation

Differencing

ADF test

Decomposition

Model Fitting

DLM Model Fitting for ASX price VS Gold price

Polynomial Distributed Lag Model Fitting for ASX price VS Gold price

Koyck Distributed Lag Model Fitting for ASX price VS Gold price

Autorregressive Distributed Lag Model Fitting for ASX price VS Gold price

DLM Model Fitting for ASX price VS Crude Oil price

Polynomial Distributed Lag Model Fitting for ASX price VS Crude OIL price

Koyck Distributed Lag Model Fitting for ASX price VS Crude OIL price

Autorregressive Distributed Lag Model Fitting for ASX price VS Gold price

DLM Model Fitting for ASX price VS Copper price

Polynomial Distributed Lag Model Fitting for ASX price VS Copper price

Koyck Distributed Lag Model Fitting for ASX price VS Copper price

Autorregressive Distributed Lag Model Fitting for ASX price VS Copper price

Conclusion