library(readr)
ASX_data <- read_csv("/Users/shubhamchougule/Downloads/ASX_data.csv")
## Parsed with column specification:
## cols(
## `ASX price` = col_double(),
## `Gold price` = col_number(),
## `Crude Oil (Brent)_USD/bbl` = col_double(),
## `Copper_USD/tonne` = col_number()
## )
#Checking the dataset
head(ASX_data)
## # A tibble: 6 x 4
## `ASX price` `Gold price` `Crude Oil (Brent)_USD/bbl` `Copper_USD/tonne`
## <dbl> <dbl> <dbl> <dbl>
## 1 2935. 612. 31.3 1650
## 2 2778. 603. 32.6 1682
## 3 2849. 566. 30.3 1656
## 4 2971. 539. 25.0 1588
## 5 2980. 549. 25.8 1651
## 6 3000. 536. 27.6 1685
#Checking the class of dataset
class(ASX_data)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
#Converting dataset to time series starting from Jan 2004
ASX<-ts(ASX_data$`ASX price`,start = 2004,frequency = 12)
GOLD<-ts(ASX_data$`Gold price`,start = 2004,frequency = 12)
COP<-ts(ASX_data$`Copper_USD/tonne`,start = 2004,frequency = 12)
OIL<-ts(ASX_data$`Crude Oil (Brent)_USD/bbl`,start = 2004,frequency = 12)
data<-ts(ASX_data,start = 2004,frequency = 12)
Non-stationarity of a series is checked using ACF and PACF plots also by conducting an ADF test. Non-stationarity can be removed by applying transformation and differencing to the series. Series can be well understood when it is decomposed using a different techniques.
Components of time series data are trend,seasonality and remainder.
Here first we will check the series and then make ACF and PACF tests to denote the non-stationarity of the series. Applying transformation and differencing to the required series and then performing decomposition to know the series.
par(mfrow=c(2,2))
plot(ASX,ylab="Monthly avg ASX Price",xlab="YEAR",main="ASX Price Series")
plot(GOLD,ylab="Monthly avg GOLD Price",xlab="YEAR",main="Gold Price Series")
plot(OIL,ylab="Monthly avg Crude Oil Price",xlab="YEAR",main="Crude Oil Series")
plot(COP,ylab="Monthly avg Copper Price",xlab="YEAR",main="Copper Price Series")
ASX time series: There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2009.
Gold time series: We see a trend in the time series and there is no seasonal pattern observed. Intervention is also not observed.
Crude Oil Series:There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2009 and 2015.
Copper price series:There is no trend in the time series that would forecast and there is no seasonal pattern observed. Intervention can be observed around 2010.
#ACF
par(mfrow=c(2,2))
acf(ASX,main="ACF for ASX Price")
acf(GOLD,main="ACF for Gold Price")
acf(OIL,main="ACF for Crude Oil Price")
acf(COP,main="ACF for Copper Price")
#PACF
par(mfrow=c(2,2))
pacf(ASX,main="PACF for ASX Price")
pacf(GOLD,main="PACF for Gold Price")
pacf(OIL,main="PACF for Crude Oil Price")
pacf(COP,main="PACF for Copper Price")
We see a rise and fall in the lags in the ACF plot. In PACF plot first lag is significant which clearly states that the series is non-stationary. To deal with nonstationary series we will use transformation.
#Checking the lambda value of each series
ASXL=BoxCox.lambda(ASX)
GOLDL=BoxCox.lambda(GOLD)
OILL=BoxCox.lambda(OIL)
COPL=BoxCox.lambda(COP)
cbind(ASXL,GOLDL,OILL,COPL)
## ASXL GOLDL OILL COPL
## [1,] 1.999924 0.976695 -0.8304136 0.9336783
ASX price has lambda value near to 2 hence transformation is to be applied. Copper price and Gold price have lambda value near to 1 hence no need of transformation. The Crude Oil price has a lambda value negative hence needed transformation to be applied.
#Transformation of ASX price
ASX.TRS=BoxCox(ASX,lambda = ASXL)
plot(ASX.TRS,type='o',main="Time series plot after transformation",xlab="Year",ylab="Log of ASX Price")
After transformation we see no big change in the series.
#Transformation of Crude oil
OIL.TRS=BoxCox(OIL,lambda = OILL)
plot(OIL.TRS,type='o',main="Time series plot after transformation",xlab="Year",ylab="Log of Crude Oil")
No change can be seen after transformation in the ASX and Crude Oil prices series hence applying differencing to all the series.
#Apply first differencing
ASX.TRS.DIFF =diff(ASX.TRS,differences=1)
#First seasonal difference
plot(ASX.TRS.DIFF,ylab='ASX Price',xlab='Year',
main = "Time series plot of the first difference")
#Apply first differencing
OIL.TRS.DIFF =diff(OIL.TRS,differences=1)
#First seasonal difference
plot(OIL.TRS.DIFF,ylab='Crude Oil Price',xlab='Year',
main = "Time series plot of the first difference")
#Apply first differencing
GOLD.DIFF =diff(GOLD,differences=1)
#First seasonal difference
plot(GOLD.DIFF,ylab='Gold Price',xlab='Year',
main = "Time series plot of the first difference")
#Apply first differencing
COP.DIFF =diff(COP,differences=1)
#First seasonal difference
plot(COP.DIFF,ylab='Copper Price',xlab='Year',
main = "Time series plot of the first difference")
Applying differencing made all the series look stationary. For confirmation we will check the stationary using ADF test.
Non-stationary can be check by adf test.
Assumptions are as follows
NULL hypothesis:
H0:Non-stationary
Alternative hypothesis:
HA:stationary
#ADF test for ASX price
adf.test(ASX.TRS.DIFF)
## Warning in adf.test(ASX.TRS.DIFF): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ASX.TRS.DIFF
## Dickey-Fuller = -4.4343, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.
#ADF test for GOLD price
adf.test(GOLD.DIFF)
## Warning in adf.test(GOLD.DIFF): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: GOLD.DIFF
## Dickey-Fuller = -5.8718, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.
#ADF test fot Crude oil price
adf.test(OIL.TRS.DIFF)
## Warning in adf.test(OIL.TRS.DIFF): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: OIL.TRS.DIFF
## Dickey-Fuller = -5.5931, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
From Augmented Dickey-Fuller Test we get p-value less than 5% hence rejecting null hypothesis the series is stationary.
#ADF test for Copper price
adf.test(COP.DIFF)
## Warning in adf.test(COP.DIFF): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: COP.DIFF
## Dickey-Fuller = -5.478, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
From the Augmented Dickey-Fuller Test we get a p-value of less than 5% hence rejecting the null hypothesis the series is stationary.
The individual effect of the existing components and past effects can be analyzed by using the decomposition technique. For decomposition, we will use the X12-ARIMA decomposition method as we will check STL decomposition.
#Decomposition function giving output decomposed series, SI ratio plot and STL plot.
decompose <- function(x){
DECOM = x12(x)
plot(DECOM, sa=TRUE , trend=TRUE , forecast = TRUE)
plotSeasFac(DECOM)
stldec=stl(x, t.window=15, s.window="periodic", robust=TRUE)
plot(stldec)
}
#ASX decomposition
decompose(ASX)
We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first, second and third quarters. The trend graph shows a downward fall around 2009 and then an upward trend after 2009.
decompose(GOLD)
* We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first, third and fourth quarters. Trend graph shows an upward trend after that 2006 till the end
decompose(OIL)
* We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at Second and third quarters.The trend graph shows a downward fall around 2009 and then an upward trend after that 2009 till 2014.
decompose(COP)
We can observe seasonally adjusted series follows original series in the same pattern. which states that there is no seasonal effect affecting the series. The trend looks slightly away from the original series hence can say that trend has some effect on the series. We can also see the forecast.
We See that seasonal factors are not clustered around the mean for most of the months.
STL first graph shows the original series, from the seasonal pattern the graph is high at first and fourth quarters. The trend graph shows a downward fall around 2009 and then an upward trend after that 2009 till 2014.
Comparing various models to see which one best fits the ASX All Ordinaries (Ords) Price Index. The finite Distributed-Lag Model, Polynomial Distributed Lags, Koyck Distributed Lag Model, and Autoregressive Distributed Lag Model are the models compared.
#Checking the correlation between the independent variable and dependent variable
cor(data)
## ASX price Gold price Crude Oil (Brent)_USD/bbl
## ASX price 1.0000000 0.3431908 0.3290338
## Gold price 0.3431908 1.0000000 0.4366382
## Crude Oil (Brent)_USD/bbl 0.3290338 0.4366382 1.0000000
## Copper_USD/tonne 0.5617864 0.5364213 0.8664296
## Copper_USD/tonne
## ASX price 0.5617864
## Gold price 0.5364213
## Crude Oil (Brent)_USD/bbl 0.8664296
## Copper_USD/tonne 1.0000000
Here ASX price is dependent variable and Gold,Crude oil and Copper prices are independent variables.
#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
model1 = dlm( x = as.vector(GOLD) , y = as.vector(ASX), q = i )
cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}
## q = 1 AIC = 2613.609 BIC = 2625.91
## q = 2 AIC = 2596.292 BIC = 2611.637
## q = 3 AIC = 2579.215 BIC = 2597.59
## q = 4 AIC = 2562.296 BIC = 2583.69
## q = 5 AIC = 2544.887 BIC = 2569.286
## q = 6 AIC = 2527.575 BIC = 2554.966
## q = 7 AIC = 2510.535 BIC = 2540.905
## q = 8 AIC = 2493.885 BIC = 2527.22
## q = 9 AIC = 2476.983 BIC = 2513.27
## q = 10 AIC = 2460.345 BIC = 2499.57
Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.
model1<- dlm(x = as.vector(GOLD) , y = as.vector(ASX), q = 10)
summary(model1)
##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1535.24 -575.79 20.89 480.32 1951.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4523.02779 225.83961 20.028 <2e-16 ***
## x.t -0.54891 1.27022 -0.432 0.666
## x.1 0.07699 1.88146 0.041 0.967
## x.2 -0.01009 1.90952 -0.005 0.996
## x.3 -0.12278 1.92437 -0.064 0.949
## x.4 -0.30955 1.92889 -0.160 0.873
## x.5 0.47310 1.93180 0.245 0.807
## x.6 0.02590 1.94990 0.013 0.989
## x.7 0.67162 1.95391 0.344 0.732
## x.8 -0.11584 1.94844 -0.059 0.953
## x.9 0.11415 1.92690 0.059 0.953
## x.10 0.11352 1.28818 0.088 0.930
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 798.9 on 139 degrees of freedom
## Multiple R-squared: 0.05296, Adjusted R-squared: -0.02199
## F-statistic: 0.7066 on 11 and 139 DF, p-value: 0.7306
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2460.345 2499.57
checkresiduals(model1$model)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 138.89, df = 15, p-value < 2.2e-16
The summary states that none of the lag is significant, Adj R-squared value is negative and in Breusch-Godfrey test we see value is less than 5% states that there is a serial correlation. From the ACF plot we conclude that residuals are highly significant. Hence violate general assumptions.
vif(model1$model)#states multicollinerity
## x.t x.1 x.2 x.3 x.4 x.5 x.6 x.7
## 54.00602 120.12460 125.08221 128.79308 131.27007 133.47382 137.78856 139.77569
## x.8 x.9 x.10
## 139.98862 137.01938 61.18711
Also there effect of multicollinearity is high. Hence we move for Polynomial Distributed Lag Model.
model2 = polyDlm(x = as.vector(GOLD) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.396 1.28 0.310 0.757
## beta.1 -0.233 1.90 -0.122 0.903
## beta.2 0.536 1.27 0.421 0.675
summary(model2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1632.50 -700.82 4.61 549.72 2213.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3998.3587 212.3161 18.832 <2e-16 ***
## z.t0 0.3958 1.2767 0.310 0.757
## z.t1 -1.3268 5.7723 -0.230 0.819
## z.t2 0.6983 2.8546 0.245 0.807
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 834.5 on 155 degrees of freedom
## Multiple R-squared: 0.1015, Adjusted R-squared: 0.08409
## F-statistic: 5.835 on 3 and 155 DF, p-value: 0.0008385
checkresiduals(model2$model)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 147.24, df = 10, p-value < 2.2e-16
From Summary none of the lag is significant, Adj R-squared value is very less and in Breusch-Godfrey test we see pvalue is less than 5% states that there is a serial correlation. From the ACF plot we conclude that residuals are highly significant. Hence violate general assumptions in Polynomial Distributed Lag.
model3 = koyckDlm(x = as.vector(GOLD) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -682.19 -105.44 15.86 135.04 783.60
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.902e+02 8.958e+01 2.123 0.0353 *
## Y.1 9.635e-01 1.909e-02 50.469 <2e-16 ***
## X.t 2.595e-03 4.304e-02 0.060 0.9520
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.4 on 157 degrees of freedom
## Multiple R-Squared: 0.9488, Adjusted R-squared: 0.9481
## Wald test: 1454 on 2 and 157 DF, p-value: < 2.2e-16
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 8006.63657 1.266826e-136
## Wu-Hausman 1 156 18.06854 3.655601e-05
##
## alpha beta phi
## Geometric coefficients: 5205.15 0.002595168 0.9634602
checkresiduals(model3$model)
Koyck Distributed Lag Model summary gives good results having one lag significant, also have a high Adj R-squared value. In Wu-Hausman diagnostic check we get p value less than 5% showing error terms have some degree of correlation. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model3$model)
## Y.1 X.t
## 1.140648 1.140648
Here we have values less than 10 confirming of no multicollinerity.
#Checking p and q vales on basis fo AIC and BIC
for (i in 1:5){
for(j in 1:5){
model4 = ardlDlm(x = as.vector(GOLD) , y = as.vector(ASX), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
}
}
## p = 1 q = 1 AIC = 2140.897 BIC = 2156.273
## p = 1 q = 2 AIC = 2128.524 BIC = 2146.938
## p = 1 q = 3 AIC = 2113.99 BIC = 2135.428
## p = 1 q = 4 AIC = 2102.754 BIC = 2127.204
## p = 1 q = 5 AIC = 2092.194 BIC = 2119.643
## p = 2 q = 1 AIC = 2128.627 BIC = 2147.04
## p = 2 q = 2 AIC = 2130.523 BIC = 2152.005
## p = 2 q = 3 AIC = 2115.89 BIC = 2140.39
## p = 2 q = 4 AIC = 2104.694 BIC = 2132.2
## p = 2 q = 5 AIC = 2094.14 BIC = 2124.639
## p = 3 q = 1 AIC = 2118.109 BIC = 2139.547
## p = 3 q = 2 AIC = 2120.027 BIC = 2144.528
## p = 3 q = 3 AIC = 2117.305 BIC = 2144.868
## p = 3 q = 4 AIC = 2105.731 BIC = 2136.293
## p = 3 q = 5 AIC = 2095.264 BIC = 2128.812
## p = 4 q = 1 AIC = 2107.002 BIC = 2131.452
## p = 4 q = 2 AIC = 2108.914 BIC = 2136.42
## p = 4 q = 3 AIC = 2106.276 BIC = 2136.839
## p = 4 q = 4 AIC = 2107.456 BIC = 2141.074
## p = 4 q = 5 AIC = 2097.01 BIC = 2133.608
## p = 5 q = 1 AIC = 2094.908 BIC = 2122.357
## p = 5 q = 2 AIC = 2096.86 BIC = 2127.359
## p = 5 q = 3 AIC = 2094.144 BIC = 2127.692
## p = 5 q = 4 AIC = 2095.425 BIC = 2132.023
## p = 5 q = 5 AIC = 2097.324 BIC = 2136.972
We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.
model4 = ardlDlm(x = as.vector(GOLD) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -564.02 -106.74 8.99 126.34 691.74
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 234.72597 92.97346 2.525 0.012635 *
## X.t -1.21331 0.30480 -3.981 0.000107 ***
## X.1 1.21097 0.30013 4.035 8.73e-05 ***
## Y.1 0.96620 0.07927 12.189 < 2e-16 ***
## Y.2 0.13687 0.11316 1.210 0.228368
## Y.3 -0.07572 0.11193 -0.676 0.499816
## Y.4 -0.04931 0.11174 -0.441 0.659640
## Y.5 -0.02130 0.07871 -0.271 0.787092
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 191.6 on 148 degrees of freedom
## Multiple R-squared: 0.9502, Adjusted R-squared: 0.9478
## F-statistic: 403.1 on 7 and 148 DF, p-value: < 2.2e-16
checkresiduals(model4$model)
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 5.7145, df = 11, p-value = 0.8917
Autorregressive Distributed Lag Model summary gives good results having some lags significant, also have a high Adj R-squared value. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model4$model)
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 60.01512 58.78005 19.12905 39.78823 39.87314 40.82652 20.70864
Here we have values greater than 10 confirming of multicollinerity in the series.
#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
model1 = dlm( x = as.vector(OIL) , y = as.vector(ASX), q = i )
cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}
## q = 1 AIC = 2614.698 BIC = 2626.998
## q = 2 AIC = 2596.715 BIC = 2612.059
## q = 3 AIC = 2579.101 BIC = 2597.477
## q = 4 AIC = 2561.888 BIC = 2583.281
## q = 5 AIC = 2544.936 BIC = 2569.335
## q = 6 AIC = 2527.701 BIC = 2555.091
## q = 7 AIC = 2510.754 BIC = 2541.124
## q = 8 AIC = 2493.914 BIC = 2527.249
## q = 9 AIC = 2476.871 BIC = 2513.158
## q = 10 AIC = 2459.842 BIC = 2499.066
Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.
model1<- dlm(x = as.vector(OIL) , y = as.vector(ASX), q = 10)
summary(model1)
##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1576.90 -628.63 6.78 568.36 1678.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4690.8891 198.7822 23.598 <2e-16 ***
## x.t 9.4539 11.3091 0.836 0.405
## x.1 0.3802 18.9644 0.020 0.984
## x.2 -0.2888 19.1969 -0.015 0.988
## x.3 2.3047 19.2972 0.119 0.905
## x.4 -5.6630 19.3383 -0.293 0.770
## x.5 -0.4596 19.3953 -0.024 0.981
## x.6 0.1522 19.3553 0.008 0.994
## x.7 -0.8638 19.5004 -0.044 0.965
## x.8 0.2994 19.5758 0.015 0.988
## x.9 -7.2824 19.2792 -0.378 0.706
## x.10 5.0307 11.3292 0.444 0.658
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 797.5 on 139 degrees of freedom
## Multiple R-squared: 0.05611, Adjusted R-squared: -0.01859
## F-statistic: 0.7512 on 11 and 139 DF, p-value: 0.6877
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2459.842 2499.066
checkresiduals(model1$model)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 138.41, df = 15, p-value < 2.2e-16
Summary states that none of the lag is significant, Adj R-squared value is negative and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions.
vif(model1$model)#states multicollinerity
## x.t x.1 x.2 x.3 x.4 x.5 x.6 x.7
## 24.79895 70.60934 73.44324 75.12792 76.53499 78.10752 78.97569 81.18320
## x.8 x.9 x.10
## 82.63829 80.72059 28.09278
Based on values there exists multicollinearity is series. Hence we move for Polynomial Distributed Lag Model.
model2 = polyDlm(x = as.vector(OIL) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 13.70 11.6 1.180 0.239
## beta.1 3.18 19.1 0.167 0.868
## beta.2 -8.41 11.5 -0.730 0.467
summary(model2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1616.4 -703.8 -77.9 657.5 1783.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4209.0519 179.6342 23.431 <2e-16 ***
## z.t0 13.6870 11.5777 1.182 0.239
## z.t1 -9.9541 57.9232 -0.172 0.864
## z.t2 -0.5483 28.7836 -0.019 0.985
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 835.6 on 155 degrees of freedom
## Multiple R-squared: 0.09909, Adjusted R-squared: 0.08165
## F-statistic: 5.683 on 3 and 155 DF, p-value: 0.001019
checkresiduals(model2$model)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 146.42, df = 10, p-value < 2.2e-16
From Summary none of the lag is significant, Adj R-squared value is very less and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions in Polynomial Distributed Lag.
model3 = koyckDlm(x = as.vector(OIL) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -683.91 -108.66 13.68 139.77 762.55
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 209.89536 87.89368 2.388 0.0181 *
## Y.1 0.97537 0.01905 51.193 <2e-16 ***
## X.t -0.99907 0.58045 -1.721 0.0872 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.1 on 157 degrees of freedom
## Multiple R-Squared: 0.949, Adjusted R-squared: 0.9483
## Wald test: 1461 on 2 and 157 DF, p-value: < 2.2e-16
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 3018.4519 1.997814e-104
## Wu-Hausman 1 156 11.1979 1.026113e-03
##
## alpha beta phi
## Geometric coefficients: 8522.034 -0.9990694 0.9753703
checkresiduals(model3$model)
Koyck Distributed Lag Model summary gives good results having one lag significant, also have a high Adj R-squared value. In Wu-Hausman diagnostic check we get p value less than 5% showing error terms have some degree of correlation. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model3$model)
## Y.1 X.t
## 1.14038 1.14038
Here we have values less than 10 confirming of no multicollinerity.
#Checking p and q vales on basis fo AIC and BIC
for (i in 1:5){
for(j in 1:5){
model4 = ardlDlm(x = as.vector(OIL) , y = as.vector(ASX), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
}
}
## p = 1 q = 1 AIC = 2146.524 BIC = 2161.9
## p = 1 q = 2 AIC = 2134.107 BIC = 2152.521
## p = 1 q = 3 AIC = 2121.07 BIC = 2142.508
## p = 1 q = 4 AIC = 2109.4 BIC = 2133.85
## p = 1 q = 5 AIC = 2098.335 BIC = 2125.784
## p = 2 q = 1 AIC = 2132.312 BIC = 2150.726
## p = 2 q = 2 AIC = 2134.235 BIC = 2155.718
## p = 2 q = 3 AIC = 2122.356 BIC = 2146.857
## p = 2 q = 4 AIC = 2110.793 BIC = 2138.299
## p = 2 q = 5 AIC = 2099.752 BIC = 2130.251
## p = 3 q = 1 AIC = 2121.919 BIC = 2143.357
## p = 3 q = 2 AIC = 2123.835 BIC = 2148.335
## p = 3 q = 3 AIC = 2124.324 BIC = 2151.887
## p = 3 q = 4 AIC = 2112.401 BIC = 2142.963
## p = 3 q = 5 AIC = 2101.35 BIC = 2134.899
## p = 4 q = 1 AIC = 2111.383 BIC = 2135.832
## p = 4 q = 2 AIC = 2113.294 BIC = 2140.8
## p = 4 q = 3 AIC = 2113.805 BIC = 2144.367
## p = 4 q = 4 AIC = 2114.384 BIC = 2148.003
## p = 4 q = 5 AIC = 2103.342 BIC = 2139.94
## p = 5 q = 1 AIC = 2097.076 BIC = 2124.525
## p = 5 q = 2 AIC = 2099.041 BIC = 2129.54
## p = 5 q = 3 AIC = 2099.518 BIC = 2133.066
## p = 5 q = 4 AIC = 2099.845 BIC = 2136.443
## p = 5 q = 5 AIC = 2100.917 BIC = 2140.566
We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.
model4 = ardlDlm(x = as.vector(OIL) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -678.76 -119.18 -11.69 139.51 683.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 264.45303 93.37157 2.832 0.00527 **
## X.t 7.40072 2.65126 2.791 0.00594 **
## X.1 -8.08296 2.65998 -3.039 0.00281 **
## Y.1 0.93245 0.08490 10.982 < 2e-16 ***
## Y.2 0.14232 0.11676 1.219 0.22482
## Y.3 -0.03681 0.11415 -0.322 0.74754
## Y.4 -0.03104 0.11379 -0.273 0.78542
## Y.5 -0.04827 0.07866 -0.614 0.54035
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.4 on 148 degrees of freedom
## Multiple R-squared: 0.9482, Adjusted R-squared: 0.9457
## F-statistic: 386.7 on 7 and 148 DF, p-value: < 2.2e-16
checkresiduals(model4$model)
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 5.4064, df = 11, p-value = 0.9099
Autorregressive Distributed Lag Model summary gives good results having some lags significant, also have a high Adj R-squared value. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model4$model)
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 24.76598 25.26947 21.09931 40.72911 39.86621 40.71023 19.88292
Here we have values greater than 10 confirming of multicollinerity in the series.
#selection of "q" value on basis of AIC and BIC for fitting of DLM model
for(i in 1:10){
model1 = dlm( x = as.vector(COP) , y = as.vector(ASX), q = i )
cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),"\n")
}
## q = 1 AIC = 2574.488 BIC = 2586.789
## q = 2 AIC = 2559.356 BIC = 2574.7
## q = 3 AIC = 2544.155 BIC = 2562.531
## q = 4 AIC = 2528.895 BIC = 2550.289
## q = 5 AIC = 2513.265 BIC = 2537.664
## q = 6 AIC = 2497.775 BIC = 2525.166
## q = 7 AIC = 2481.988 BIC = 2512.357
## q = 8 AIC = 2466.511 BIC = 2499.846
## q = 9 AIC = 2451.016 BIC = 2487.302
## q = 10 AIC = 2436.164 BIC = 2475.389
Lower the value of AIC or BIC better is the model. From the output we can see that q=10 has the lowest AIC and BIC values hence selecting q=10 in DLM model.
model1<- dlm(x = as.vector(COP) , y = as.vector(ASX), q = 10)
summary(model1)
##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1154.09 -643.75 -11.55 596.33 1429.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.981e+03 2.166e+02 18.382 <2e-16 ***
## x.t 1.536e-01 1.354e-01 1.134 0.259
## x.1 1.857e-02 2.205e-01 0.084 0.933
## x.2 4.480e-02 2.220e-01 0.202 0.840
## x.3 2.830e-02 2.180e-01 0.130 0.897
## x.4 1.889e-02 2.175e-01 0.087 0.931
## x.5 -4.846e-02 2.191e-01 -0.221 0.825
## x.6 3.046e-02 2.175e-01 0.140 0.889
## x.7 -3.494e-03 2.189e-01 -0.016 0.987
## x.8 -1.349e-03 2.239e-01 -0.006 0.995
## x.9 -8.232e-02 2.222e-01 -0.371 0.712
## x.10 -1.012e-02 1.340e-01 -0.076 0.940
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 737.4 on 139 degrees of freedom
## Multiple R-squared: 0.1931, Adjusted R-squared: 0.1292
## F-statistic: 3.024 on 11 and 139 DF, p-value: 0.001201
##
## AIC and BIC values for the model:
## AIC BIC
## 1 2436.164 2475.389
checkresiduals(model1$model)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 138.89, df = 15, p-value < 2.2e-16
Summary states that none of the lag is significant, Adj R-squared value is negative and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions.
vif(model1$model)#states multicollinerity
## x.t x.1 x.2 x.3 x.4 x.5 x.6 x.7
## 17.92226 49.15319 51.59119 51.49576 52.97842 55.52579 56.41365 58.90348
## x.8 x.9 x.10
## 63.24668 63.87601 23.81718
Also there effect of multicollinearity is high. Hence we move for Polynomial Distributed Lag Model.
model2 = polyDlm(x = as.vector(COP) , y = as.vector(ASX) , q = 2 , k = 2 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.17800 0.131 1.3600 0.176
## beta.1 0.05290 0.207 0.2550 0.799
## beta.2 -0.00654 0.129 -0.0507 0.960
summary(model2)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -1302.7 -694.4 -135.3 635.5 1512.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3484.26812 181.89988 19.155 <2e-16 ***
## z.t0 0.17781 0.13067 1.361 0.176
## z.t1 -0.15771 0.62898 -0.251 0.802
## z.t2 0.03277 0.31191 0.105 0.916
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 743 on 155 degrees of freedom
## Multiple R-squared: 0.2877, Adjusted R-squared: 0.274
## F-statistic: 20.87 on 3 and 155 DF, p-value: 2.065e-11
checkresiduals(model2$model)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 146.69, df = 10, p-value < 2.2e-16
From Summary none of the lag is significant, Adj R-squared value is very less and in Breusch-Godfrey test we see pvalue is less than 5% states that there is serial correlation. From ACF plot we conclude that residuals are highly significant. Hence violate general assumptions in Polynomial Distributed Lag.
model3 = koyckDlm(x = as.vector(COP) , y = as.vector(ASX))
summary(model3,diagnostics=TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -689.64 -108.62 12.78 140.20 771.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 189.368812 87.644648 2.161 0.0322 *
## Y.1 0.971621 0.021895 44.376 <2e-16 ***
## X.t -0.005864 0.009517 -0.616 0.5387
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.9 on 157 degrees of freedom
## Multiple R-Squared: 0.9485, Adjusted R-squared: 0.9479
## Wald test: 1448 on 2 and 157 DF, p-value: < 2.2e-16
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 157 1966.86799 1.043205e-90
## Wu-Hausman 1 156 10.97528 1.147725e-03
##
## alpha beta phi
## Geometric coefficients: 6672.885 -0.005863623 0.9716211
checkresiduals(model3$model)
Koyck Distributed Lag Model summary gives good results having one lag significant, also have a high Adj R-squared value. In Wu-Hausman diagnostic check we get p value less than 5% showing error terms have some degree of correlation. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model3$model)
## Y.1 X.t
## 1.493966 1.493966
Here we have values less than 10 confirming of no multicollinerity.
#Checking p and q vales on basis fo AIC and BIC
for (i in 1:5){
for(j in 1:5){
model4 = ardlDlm(x = as.vector(COP) , y = as.vector(ASX), p = i , q = j )
cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"\n")
}
}
## p = 1 q = 1 AIC = 2147.741 BIC = 2163.116
## p = 1 q = 2 AIC = 2135.4 BIC = 2153.813
## p = 1 q = 3 AIC = 2121.12 BIC = 2142.558
## p = 1 q = 4 AIC = 2109.759 BIC = 2134.209
## p = 1 q = 5 AIC = 2099.056 BIC = 2126.505
## p = 2 q = 1 AIC = 2130.043 BIC = 2148.456
## p = 2 q = 2 AIC = 2132.038 BIC = 2153.52
## p = 2 q = 3 AIC = 2119.241 BIC = 2143.741
## p = 2 q = 4 AIC = 2107.649 BIC = 2135.155
## p = 2 q = 5 AIC = 2097.021 BIC = 2127.52
## p = 3 q = 1 AIC = 2117.307 BIC = 2138.745
## p = 3 q = 2 AIC = 2119.247 BIC = 2143.748
## p = 3 q = 3 AIC = 2119.696 BIC = 2147.259
## p = 3 q = 4 AIC = 2108.537 BIC = 2139.1
## p = 3 q = 5 AIC = 2097.832 BIC = 2131.38
## p = 4 q = 1 AIC = 2105.916 BIC = 2130.366
## p = 4 q = 2 AIC = 2107.774 BIC = 2135.28
## p = 4 q = 3 AIC = 2108.608 BIC = 2139.17
## p = 4 q = 4 AIC = 2110.085 BIC = 2143.704
## p = 4 q = 5 AIC = 2099.454 BIC = 2136.052
## p = 5 q = 1 AIC = 2095.118 BIC = 2122.566
## p = 5 q = 2 AIC = 2096.96 BIC = 2127.459
## p = 5 q = 3 AIC = 2097.887 BIC = 2131.436
## p = 5 q = 4 AIC = 2099.497 BIC = 2136.095
## p = 5 q = 5 AIC = 2101.419 BIC = 2141.067
We choose p=1 and q=5 because for this p and q values AIC and BIC are the least using same in the model.
model4 = ardlDlm(x = as.vector(COP) , y = as.vector(ASX), p = 1 , q = 5 )
summary(model4)
##
## Time series regression with "ts" data:
## Start = 6, End = 161
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -699.82 -120.15 -3.11 126.37 735.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.77135 93.06873 2.426 0.01648 *
## X.t 0.10123 0.03350 3.022 0.00296 **
## X.1 -0.09988 0.03343 -2.987 0.00330 **
## Y.1 0.98307 0.08091 12.151 < 2e-16 ***
## Y.2 0.11625 0.11549 1.007 0.31577
## Y.3 -0.07509 0.11447 -0.656 0.51284
## Y.4 -0.04030 0.11417 -0.353 0.72457
## Y.5 -0.03005 0.07947 -0.378 0.70585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.9 on 148 degrees of freedom
## Multiple R-squared: 0.9479, Adjusted R-squared: 0.9455
## F-statistic: 384.8 on 7 and 148 DF, p-value: < 2.2e-16
checkresiduals(model4$model)
##
## Breusch-Godfrey test for serial correlation of order up to 11
##
## data: Residuals
## LM test = 6.8037, df = 11, p-value = 0.8148
Autorregressive Distributed Lag Model summary gives good results having some lags significant, also have a high Adj R-squared value. ACF plot do not have any lag significant and residuals are quite evenly distributed.
We will check the multicollinerity.
vif(model4$model)
## X.t L(X.t, 1) L(y.t, 1) L(y.t, 2) L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 18.41423 18.91724 19.07060 39.66170 39.90695 40.78752 20.19991
Here we have values greater than 10 confirming of multicollinerity in the series.
The dataset was initially non-stationary, as determined by ACF and PACF plots. The non-stationarity of the series was removed by executing transformations on the ASX and Crude Oil price series, as well as first, differencing on all of them and confirming stationarity with the ADF test.
Decomposition of series using X12 and STL method results in knowing the components responsible for non-stationarity they are trend and seasonality.
After conducting Finite Distributed-Lag Model, Polynomial Distributed Lags, Koyck Distributed Lag Model, and Autoregressive Distributed Lag Model on ASX vs GOLD, ASX vs Crude Oil and ASX vs Copper on basis of ACF, multicollinearity, diagnostic check, Adj R-squared, and Breusch-Godfrey test we can conclude that Koyck Distributed Lag Model is the best fitting model. Where ASX price is the dependent variable and Crude oil, Copper and Gold are the independent variables.