library(TSA)
library(forecast)
library(x12)
library(tseries)
library(dLagM)
library(readr)
library(stats)
library(car)
library(dynlm)
library(lmtest)
sum.res = function(x){
summary(x)
checkresiduals(x$model)
shapiro.test(x$model$residuals)
}
sum.res2 = function(x){
summary(x)
checkresiduals(x)
shapiro.test(x$residuals)
}
explore=function(x){
plot(x, xlab='Year', main="Time series plot")
points(y=x,x=time(x), pch=as.vector(season(x)))
acf(x, lag.max = 48, main="Sample ACF plot")
adf.test(x)
}
The aim of this investigation is to analyse and forecast the amount of horizontal solar radiation reaching the ground at a particular location over the globe. The dataset used in this report is data1 which contains information on monthly average horizontal solar radiation and the monthly precipitation series measured at the same points between January 1960 and December 2014.
Firstly, a time series will be created for each of the variables in data1. Then we will examine them by looking at their stationarity and decompositions. After exploring the relationship between these variables we will proceed to modelling. Three types of models will be used including time series regression models, exponential smoothing models and state-space models. The best model will be decided based on significance of the model and terms, MASE and residual analysis. Finally, the best model chosen will be used to make 2 years ahead forecasts.
The following code chunk is used to load dataset data1 and convert it to time series:
data1 <- read_csv("data1.csv")
solar = ts(data1$solar,start=c(1960,1), frequency=12)
ppt= ts(data1$ppt,start=c(1960,1), frequency=12)
The following code chunk is used to explore and visualise the time series plot and sample ACF plot of monthly solar radiation:
explore(solar)
Figure 1: Time series plot and sample ACF plot of monthly solar radiation
Figure 1: Time series plot and sample ACF plot of monthly solar radiation
##
## Augmented Dickey-Fuller Test
##
## data: x
## Dickey-Fuller = -4.7661, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The time series plots for monthly solar radiation from Figure 1 shows that:
The sample ACF plot and the Augmented Dickey-Fuller test results for monthly solar radiation from Figure 1 shows that:
Based on the above evidence and observations made on the time series plot, solar is a stationary time series with seasonality.
The following code chunk is used to explore and visualise the time series plot of monthly precipitation:
explore(ppt)
Figure 2: Time series plot and sample ACF plot of monthly precipitation
Figure 2: Time series plot and sample ACF plot of monthly precipitation
##
## Augmented Dickey-Fuller Test
##
## data: x
## Dickey-Fuller = -6.7438, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The time series plots for monthly precipitation from Figure 2 shows that:
The sample ACF plot and the Augmented Dickey-Fuller test results for monthly solar radiation from Figure 2 shows that:
Based on the above evidence and observations made on the time series plot, ppt is a stationary time series with seasonality.
In order to better investigate seasonal, trend and other components in the time series, X12 decomposition is used. The following code chunk is used to perform and visualise X12 decomposition of solar and ppt series:
solar.x12 = x12(solar)
plot(solar.x12, sa = T, trend = T, main = "X12 decomposition of monthly solar radiation")
Figure 3: X12 decomposition of monthly solar radiation
Figure 3 shows the STL decomposition of solar series, we can observe that:
ppt.x12 = x12(ppt)
plot(ppt.x12, sa = T, trend = T, main = "X12 decomposition of monthly precipitation")
Figure 4: X12 decomposition of monthly precipitation
Figure 4 shows the STL decomposition of ppt series, we can observe that:
The following code chunk is used to test the correlation between solar and ppt:
cor(data1)
## solar ppt
## solar 1.0000000 -0.4540277
## ppt -0.4540277 1.0000000
From the output above, we observe moderately negative correlation between solar and ppt series, so we can use ppt as a predictor to make forecasts of solar with distributed lag models.
The following code chunk is used to auto search for a finite DLM model based on MASE:
finiteDLMauto(x = data1$ppt, y = data1$solar,q.min = 1, q.max = 12, model.type = "dlm", error.type = "MASE", trace = TRUE)
## q - k MASE AIC BIC R.Adj.Sq Ljung-Box
## 12 12 1.55160 4578.787 4645.895 0.30769 0
## 11 11 1.56213 4590.961 4653.617 0.30199 0
## 10 10 1.57800 4602.658 4660.858 0.29615 0
## 9 9 1.59312 4615.084 4668.827 0.28882 0
## 8 8 1.60481 4625.986 4675.267 0.28251 0
## 7 7 1.60704 4632.716 4677.532 0.28096 0
## 6 6 1.60753 4637.489 4677.837 0.28214 0
## 5 5 1.61385 4644.622 4680.499 0.28072 0
## 4 4 1.64636 4663.600 4695.003 0.26577 0
## 3 3 1.66270 4688.551 4715.478 0.24326 0
## 2 2 1.67597 4712.649 4735.095 0.22173 0
## 1 1 1.68846 4728.713 4746.676 0.21036 0
It is observed that the values of MASE, AIC and BIC decrease as the lag (q) increase.
To investigate the effect of adding term k=1, the following code chunk is used to auto search for a polynomial DLM based on MASE:
finiteDLMauto(x = data1$ppt , y = data1$solar, q.max = 12, k.order =1, model.type = "poly", error.type = "MASE", trace = TRUE)
## q - k MASE AIC BIC R.Adj.Sq Ljung-Box
## 5 5 - 1 1.63767 4648.873 4666.811 0.27163 0
## 6 6 - 1 1.64621 4657.208 4675.141 0.25454 0
## 4 4 - 1 1.65579 4663.361 4681.305 0.26269 0
## 3 3 - 1 1.66668 4687.030 4704.981 0.24272 0
## 2 2 - 1 1.67675 4710.708 4728.664 0.22285 0
## 1 1 - 1 1.68846 4728.713 4746.676 0.21036 0
## 7 7 - 1 1.70853 4689.073 4706.999 0.20898 0
## 8 8 - 1 1.82815 4740.059 4757.979 0.13620 0
## 12 12 - 1 1.91409 4769.354 4787.250 0.05530 0
## 9 9 - 1 1.94303 4791.522 4809.436 0.05601 0
## 11 11 - 1 1.99229 4811.020 4828.922 0.00523 0
## 10 10 - 1 2.00664 4819.815 4837.723 0.00341 0
It is observed that the values of MASE, AIC and BIC increase when “k=1” is added in the polynomial DLM models. So we will stick to finite DLM models.
We will fit a finite DLM with q= 12 using the following code chunk:
model1 = dlm(formula=solar~ppt, data=data.frame(data1), q = 12)
sum.res(model1)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.563 -5.239 -0.796 4.137 32.430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.5164 1.1151 17.501 < 2e-16 ***
## ppt.t -5.8876 1.9508 -3.018 0.00265 **
## ppt.1 0.9993 2.5647 0.390 0.69694
## ppt.2 0.4343 2.5571 0.170 0.86520
## ppt.3 1.8763 2.5580 0.734 0.46352
## ppt.4 1.7459 2.5587 0.682 0.49529
## ppt.5 3.3279 2.5601 1.300 0.19410
## ppt.6 0.7751 2.5617 0.303 0.76230
## ppt.7 1.7937 2.5615 0.700 0.48402
## ppt.8 0.2827 2.5593 0.110 0.91207
## ppt.9 -1.1022 2.5615 -0.430 0.66712
## ppt.10 -1.9333 2.5508 -0.758 0.44880
## ppt.11 -0.5613 2.5532 -0.220 0.82605
## ppt.12 -5.3492 1.9216 -2.784 0.00553 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.181 on 634 degrees of freedom
## Multiple R-squared: 0.3216, Adjusted R-squared: 0.3077
## F-statistic: 23.12 on 13 and 634 DF, p-value: < 2.2e-16
##
## AIC and BIC values for the model:
## AIC BIC
## 1 4578.787 4645.895
Figure 5: Residual analysis of model 1
##
## Shapiro-Wilk normality test
##
## data: x$model$residuals
## W = 0.94168, p-value = 2.922e-15
MASE(model1)
## MASE
## model1 1.5516
From the summary output above, we can conclude that:
model1model1 is 0.3077, which means that this model only explains 30.77% of the variability in solar radiation.Model1 has AIC value of 4578.787, BIC value of 4645.895 and MASE value of 1.5516.Figure 5 shows the residual plots for model1:
Overall, we can conclude that model1 is not appropriate for further analysis.
Next step:
solar or ppt, so it is not necessary to fit dynamic linear models since they are used for intervention analysis.ppt as a predictor for solar failed to give promising result, we will then use exponential smoothing models to investigate solar time series without any predictors.First of all, we must check the minimum value of solar series:
min(solar)
## [1] 1.581982
Since the minimum value of solar series is greater than 0, no transformation is needed before fitting multiplicative models.
There is obvious seasonality in solar time series, so we can fit Holt-Winter’s methods models because they have seasonal components. The following code chunk is used to fit Holt-Winter’s methods models using a loop:
seasonal = c("additive","multiplicative")
dampeds = c(TRUE,FALSE)
expand = expand.grid(seasonal,dampeds)
fit.AICc = array(NA,4)
fit.MASE=array(NA,4)
levels = array(NA, dim=c(4,2))
for(i in 1:4){fit.AICc[i]= hw(solar,
seasonal = toString(expand[i ,1]),
damped = expand[i ,2])$model$aicc
fit.MASE[i]= accuracy(hw(solar,
seasonal = toString(expand[i ,1]),
damped = expand[i ,2]))[,6]
levels[i,1]= toString(expand[i ,1])
levels[i,2]= expand[i ,2]
}
results = data.frame(levels, fit.MASE, fit.AICc)
colnames(results)= c("seasonal","damped","MASE", "AICc")
results
## seasonal damped MASE AICc
## 1 additive TRUE 0.2461797 5429.489
## 2 multiplicative TRUE 0.2035619 6367.768
## 3 additive FALSE 0.2471600 5435.662
## 4 multiplicative FALSE 0.2233077 6649.699
From the output above, we can see that:
To compare between the lowest AICc and top two lowest MASE models, we will look at their residuals:
hw.md<- hw(solar, seasonal = "multiplicative", damped = TRUE, h=2*frequency(solar))
sum.res(hw.md)
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",
##
## Call:
## damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.805
## beta = 1e-04
## gamma = 0.0124
## phi = 0.94
##
## Initial states:
## l = 9.9033
## b = 0.4703
## s = 0.4341 0.5649 0.8263 1.1716 1.4514 1.6316
## 1.5503 1.3834 1.1045 0.8837 0.5761 0.4221
##
## sigma: 0.3145
##
## AIC AICc BIC
## 6366.701 6367.768 6447.561
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0456208 2.045442 1.239102 -2.172476 10.00296 0.2035619
## ACF1
## Training set 0.04177493
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.611616 3.3502090 7.873023 2.1530925 9.070139
## Feb 2015 7.060438 3.3373603 10.783517 1.3664818 12.754395
## Mar 2015 10.515176 3.8550902 17.175263 0.3294535 20.700899
## Apr 2015 12.993446 3.5145037 22.472389 -1.5033454 27.490238
## May 2015 16.284381 2.9386340 29.630128 -4.1261776 36.694939
## Jun 2015 18.261020 1.7242212 34.797820 -7.0298315 43.551872
## Jul 2015 19.031597 0.2101276 37.853067 -9.7533565 47.816551
## Aug 2015 17.105654 -1.2074754 35.418784 -10.9018606 45.113169
## Sep 2015 13.764743 -2.0801991 29.609686 -10.4680049 37.997491
## Oct 2015 9.774733 -2.2585804 21.808047 -8.6286319 28.178099
## Nov 2015 6.581938 -2.0456267 15.209503 -6.6127835 19.776659
## Dec 2015 5.163049 -2.0168711 12.342970 -5.8176914 16.143790
## Jan 2016 5.616270 -2.6571166 13.889656 -7.0367828 18.269322
## Feb 2016 7.066360 -3.9144360 18.047156 -9.7273184 23.860039
## Mar 2016 10.524088 -6.6891829 27.737360 -15.8013383 36.849515
## Apr 2016 13.004566 -9.3405081 35.349641 -21.1692761 47.178409
## May 2016 16.298445 -13.0720043 45.668893 -28.6197807 61.216670
## Jun 2016 18.276925 -16.2140342 52.767885 -34.4724452 71.026296
## Jul 2016 19.048304 -18.5466197 56.643228 -38.4481704 76.544779
## Aug 2016 17.120781 -18.1782620 52.419825 -36.8644468 71.106009
## Sep 2016 13.777000 -15.8651593 43.419159 -31.5567705 59.110770
## Oct 2016 9.783493 -12.1627698 31.729756 -23.7804195 43.347405
## Nov 2016 6.587872 -8.8063951 21.982139 -16.9556278 30.131371
## Dec 2016 5.167730 -7.4021686 17.737629 -14.0562710 24.391731
Figure 6: Residual analysis of hw.md
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 7, p-value = 0.001355
##
## Model df: 17. Total lags used: 24
##
## Shapiro-Wilk normality test
##
## data: x$model$residuals
## W = 0.3802, p-value < 2.2e-16
From the summary output above, we can conclude that:
hw.md has AIC value of 6366.701, AICc value of 6367.768, BIC value of 6447.561 and MASE value of 0.2035619.hw.md model has larger AIC and BIC value than model1, MASE value is significantly reduced. This means that hw.md gives more accurate forecast values for solar series.Figure 6 shows the residual plots for hw.md:
hw.md model is significantly better than model1.Overall, we can conclude that model hw.md is better than all previous models.
hw.m<- hw(solar, seasonal = "multiplicative", damped = FALSE, h=2*frequency(solar))
sum.res(hw.m)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",
##
## Call:
## damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.8251
## beta = 1e-04
## gamma = 0.0259
##
## Initial states:
## l = 10.5659
## b = -0.2371
## s = 0.5024 0.6292 0.9319 1.2229 1.4923 1.6309
## 1.5606 1.3672 1.0278 0.8128 0.5106 0.3114
##
## sigma: 0.3971
##
## AIC AICc BIC
## 6648.746 6649.699 6725.114
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04281136 2.12539 1.359297 -0.4896895 10.87401 0.2233077
## ACF1
## Training set 0.06551473
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.610335 2.75534413 8.465326 1.2440033 9.976666
## Feb 2015 6.512806 2.03809262 10.987520 -0.3306776 13.356290
## Mar 2015 8.786120 1.33667747 16.235562 -2.6068190 20.179058
## Apr 2015 10.192785 -0.02672876 20.412298 -5.4366123 25.822181
## May 2015 12.512597 -1.96157245 26.986766 -9.6237347 34.648928
## Jun 2015 14.061609 -4.41048190 32.533701 -14.1890164 42.312235
## Jul 2015 14.502088 -6.89909775 35.903274 -18.2282011 47.232377
## Aug 2015 12.945620 -8.35024040 34.241481 -19.6235881 45.514829
## Sep 2015 10.352210 -8.52365174 29.228072 -18.5159294 39.220350
## Oct 2015 7.418279 -7.51119146 22.347750 -15.4143760 30.250935
## Nov 2015 4.989730 -6.05900058 16.038460 -11.9078451 21.887305
## Dec 2015 3.871776 -5.53876625 13.282319 -10.5204066 18.263960
## Jan 2016 4.199400 -7.01753622 15.416336 -12.9554236 21.354224
## Feb 2016 4.838892 -9.30305408 18.980837 -16.7893479 26.467131
## Mar 2016 6.477174 -14.21845674 27.172805 -25.1740619 38.128410
## Apr 2016 7.452633 -18.56883703 33.474103 -32.3437711 47.249037
## May 2016 9.069749 -25.52978297 43.669280 -43.8456686 61.985166
## Jun 2016 10.099485 -31.99838423 52.197355 -54.2836502 74.482621
## Jul 2016 10.315199 -36.68017471 57.310572 -61.5580226 82.188420
## Aug 2016 9.113765 -36.29212001 54.519651 -60.3285438 78.556075
## Sep 2016 7.208700 -32.09306283 46.510463 -52.8981594 67.315559
## Oct 2016 5.105869 -25.38360449 35.595343 -41.5237568 51.735495
## Nov 2016 3.391945 -18.81647619 25.600367 -30.5729044 37.356795
## Dec 2016 2.597256 -16.07148862 21.266001 -25.9541251 31.148637
Figure 7: Residual analysis of hw.m
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
##
## Model df: 16. Total lags used: 24
##
## Shapiro-Wilk normality test
##
## data: x$model$residuals
## W = 0.37415, p-value < 2.2e-16
From the summary output above, we can conclude that:
hw.m has AIC value of 6648.746, AICc value of 6649.699, BIC value of 6725.114 and MASE value of 0.2233077.hw.m has larger AIC, BIC and MASE value than hw,md model.Figure 7 shows the residual plots for hw.m:
hw.m model is slightly better than hw.md.Overall, we can conclude that the model hw.m is better than all previous models.
hw.ad<- hw(solar, seasonal = "additive", damped = TRUE, h=2*frequency(solar))
sum.res(hw.ad)
##
## Forecast method: Damped Holt-Winters' additive method
##
## Model Information:
## Damped Holt-Winters' additive method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "additive",
##
## Call:
## damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.945198 2.940530 8.949866 1.3499548 10.54044
## Feb 2015 8.479778 4.230548 12.729009 1.9811413 14.97842
## Mar 2015 13.784309 8.579902 18.988716 5.8248548 21.74376
## Apr 2015 17.598568 11.588776 23.608360 8.4073847 26.78975
## May 2015 22.632258 15.912803 29.351713 12.3557383 32.90878
## Jun 2015 25.599150 18.238024 32.960277 14.3412786 36.85702
## Jul 2015 26.761775 18.810497 34.713053 14.6013444 38.92221
## Aug 2015 23.722251 15.221610 32.222892 10.7216429 36.72286
## Sep 2015 18.222624 9.205955 27.239293 4.4328191 32.01243
## Oct 2015 12.293259 2.788471 21.798047 -2.2430606 26.82958
## Nov 2015 7.504219 -2.464877 17.473315 -7.7421977 22.75064
## Dec 2015 5.150488 -5.262286 15.563263 -10.7744758 21.07545
## Jan 2016 5.947275 -4.891175 16.785726 -10.6287044 22.52326
## Feb 2016 8.481728 -2.766251 19.729707 -8.7205712 25.68403
## Mar 2016 13.786139 2.142988 25.429291 -4.0205242 31.59280
## Apr 2016 17.600287 5.574906 29.625668 -0.7909464 35.99152
## May 2016 22.633871 10.238009 35.029734 3.6760353 41.59171
## Jun 2016 25.600665 12.845046 38.356283 6.0926298 45.10870
## Jul 2016 26.763197 13.657667 39.868726 6.7200186 46.80637
## Aug 2016 23.723586 10.277223 37.169950 3.1591478 44.28802
## Sep 2016 18.223877 4.445085 32.002669 -2.8489663 39.29672
## Oct 2016 12.294435 -1.808972 26.397843 -9.2748652 33.86374
## Nov 2016 7.505324 -6.915414 21.926061 -14.5492911 29.55994
## Dec 2016 5.151525 -9.579726 19.882776 -17.3779790 27.68103
Figure 8: Residual analysis of hw.ad
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 210.76, df = 7, p-value < 2.2e-16
##
## Model df: 17. Total lags used: 24
##
## Shapiro-Wilk normality test
##
## data: x$model$residuals
## W = 0.88756, p-value < 2.2e-16
From the summary output above, we can conclude that:
Hw.ad has AIC value of 55428.422, AICc value of 5429.489, BIC value of 5509.282 and MASE value of 0.2461797.Hw.ad model has larger AIC, AICc BIC and MASE values than hw.m. This means that it has lower forecast accuracy than hw.m.Figure 8 shows the residual plots for hw.ad:
hw.ad are slightly worse than hw.m model.Overall, we can conclude that model hw.m is still the best model so far.
Next step:
From previous observations based on time series plots and decompositions, we concluded that seasonality is leading cause of variations in solar time series. Therefore, state-space models with seasonality components should be used to fit the series.
The following code chunk is used to fit state-space models using a loop:
models = c("AAA","MAM","MMM")
dampeds = c(TRUE,FALSE)
expand = expand.grid(models,dampeds)
fit.AICc = array(NA,8)
fit.MASE=array(NA,8)
levels = array(NA, dim=c(8,2))
for(i in 1:6){fit.AICc[i]= ets(solar,
model = toString(expand[i ,1]),
damped = expand[i ,2])$aicc
fit.MASE[i]= accuracy(ets(solar,
model = toString(expand[i ,1]),
damped = expand[i ,2]))[,6]
levels[i,1]= toString(expand[i ,1])
levels[i,2]= expand[i ,2]
}
# Becasuse "ANN" and "ANA" cannot have damped =TRUE, they will be processed sepeately
fit.AICc[7]= ets(solar, model ="ANN")$aicc
fit.AICc[8]= ets(solar, model ="ANA")$aicc
fit.MASE[7]= accuracy(ets(solar, model ="ANN"))[,6]
fit.MASE[8]= accuracy(ets(solar, model ="ANA"))[,6]
levels[7,1]="ANN"
levels[8,1]="ANA"
levels[7,2]=FALSE
levels[8,2]=FALSE
results = data.frame(levels, fit.MASE, fit.AICc)
colnames(results)= c("Model","Damped","MASE","AICc")
results
## Model Damped MASE AICc
## 1 AAA TRUE 0.2461797 5429.489
## 2 MAM TRUE 0.3222574 5954.569
## 3 MMM TRUE 0.3201193 5996.617
## 4 AAA FALSE 0.2471600 5435.662
## 5 MAM FALSE 0.3721664 6106.912
## 6 MMM FALSE 0.5292151 6671.121
## 7 ANN FALSE 0.6368203 6296.407
## 8 ANA FALSE 0.2547040 5450.719
From the output above, we can see that:
The following code is used to trigger auto-search to search for the best model among all possible models based on AICc:
fit.auto.solar = ets(solar,model="ZZZ",ic="aicc")
fit.auto.solar$method
## [1] "ETS(A,Ad,A)"
The above output suggests model with “AAA” and “Damped = True”, same as what we concluded from outputs from the loop.
To compare between the top two lowest MASE models, we will look at their residuals:
ets.AAdA = ets(solar, model="AAA", damped = T)
sum.res2(ets.AAdA)
## ETS(A,Ad,A)
##
## Call:
## ets(y = solar, model = "AAA", damped = T)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
Figure 9: Residual analysis of ets.AAdA
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 7, p-value < 2.2e-16
##
## Model df: 17. Total lags used: 24
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.88756, p-value < 2.2e-16
From the summary output above, we can conclude that:
Ets.AAdA has AIC value of 55428.422, AICc value of 5429.489, BIC value of 5509.282 and MASE value of 0.2461797.Ets.AAdA model has smaller AIC, AICc and BIC values than hw.m but it has larger MASE value. This means that it has worse accuracy in forecasting than hw.m.Figure 9 shows the residual plots for Ets.AAdA:
Ets.AAdA are slightly worse than hw.m model.Overall, we can conclude that model hw.m is still the best model so far.
ets.AAA = ets(solar, model="AAA", damped = F)
sum.res2(ets.AAA)
## ETS(A,A,A)
##
## Call:
## ets(y = solar, model = "AAA", damped = F)
##
## Smoothing parameters:
## alpha = 0.9993
## beta = 0.0019
## gamma = 1e-04
##
## Initial states:
## l = 11.3306
## b = 0.209
## s = -10.7542 -8.4968 -3.227 3.0768 7.9264 10.9122
## 10.1422 7.3322 2.2353 -1.9528 -7.3626 -9.8316
##
## sigma: 2.3575
##
## AIC AICc BIC
## 5434.708 5435.662 5511.076
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1179476 2.328736 1.504489 -2.230359 12.68101 0.24716
## ACF1
## Training set 0.1703355
Figure 10: Residual analysis of ets.AAA
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,A)
## Q* = 205.55, df = 8, p-value < 2.2e-16
##
## Model df: 16. Total lags used: 24
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.88809, p-value < 2.2e-16
From the summary output above, we can conclude that:
Ets.AAA has AIC value of 5434.708, AICc value of 5435.662, BIC value of 5511.076 and MASE value of 0.24716.Ets.AAA model has smaller AIC, AICc and BIC values than hw.m but it has larger MASE value. This means that it has worse accuracy in forecasting than hw.m.Figure 10 shows the residual plots for AAA:
Ets.AAA are slightly worse than hw.m model.Overall, we can conclude that model hw.m is still the best model so far.
To summaries and compare across all exponential smoothing models and state-space models above, a dataframe containing their AICc, AIC, BIC and MASE values are created using the following code chunk:
models <- c("hw.md", "hw.m", "hw.ad","ets.AAdA","ets.AAA")
AICc <- c(hw.md$model$aicc, hw.m$model$aicc, hw.ad$model$aicc, ets.AAdA$aicc, ets.AAA$aicc)
AIC <- c(hw.md$model$aic, hw.m$model$aic, hw.ad$model$aic, ets.AAdA$aic, ets.AAA$aic)
BIC <- c(hw.md$model$bic, hw.m$model$bic, hw.ad$model$bic, ets.AAdA$bic, ets.AAA$bic)
MASE <- c(accuracy(hw.md)[,6], accuracy(hw.m)[,6], accuracy(hw.ad)[,6],
accuracy(ets.AAdA)[,6], accuracy(ets.AAA)[,6])
ALL <- data.frame(models, AICc, AIC, BIC, MASE)
colnames(ALL) <- c("model", "AICc", "AIC", "BIC", "MASE")
ALL <- ALL[order(MASE),]
ALL
## model AICc AIC BIC MASE
## 1 hw.md 6367.768 6366.701 6447.561 0.2035619
## 2 hw.m 6649.699 6648.746 6725.114 0.2233077
## 3 hw.ad 5429.489 5428.422 5509.282 0.2461797
## 4 ets.AAdA 5429.489 5428.422 5509.282 0.2461797
## 5 ets.AAA 5435.662 5434.708 5511.076 0.2471600
From the output above, we can see that hw.md and hw.m have the lowest MASE.
Based on previous observations, we concluded that all five models have violation in normality assumption, and they differ the most in residuals sample ACF plot. So the residuals sample ACF plot for the top two models will be compared side by side using the following code chunk:
par(mfrow=c(1,2))
acf(hw.md$residuals,lag.max = 48, main="Sample ACF for hw.md residuals")
acf(hw.m$residuals,lag.max = 48, main="Sample ACF for hw.m residuals")
Figure 11: Residual analysis of hw.md and hw.m
Figure 11 shows the Sample ACF plots for residuals in models hw.md and hw.m, we observe that:
hw.md shows significant lag at the first seasonal lag (lag12), the second seasonal lag (lag24) and the third seasonal lag (lag 36).hw.m model has no significant lag until the second seasonal lag (lag 24), which can be ignored since there is no significant lag at the first seasonal lag (lag12).Model hw.m has the second best MASE (very close to the top one) and the best residual check, therefore it is the best model for forecasting.
The following code chunk is used to plot the forecast using hw.m model:
upper.95.int = hw.m$upper[,2]
lower.95.int = hw.m$lower[,2]
centre = hw.m$mean
par(mfrow=c(1,1))
plot(solar, xlim=c(1960, 2018), ylim=c(-60, 80), ylab="Solar radiation", xlab="Year",
main = "Original series, forecasts and 95% forecast interval for the solar ration series")
lines(centre, col = "red")
lines(upper.95.int, col = "blue")
lines(lower.95.int, col = "blue")
legend("bottomleft", lty=1, cex=0.75, pch=1, pt.cex = 1.9, text.width = 2.3, col=c("black","red","blue"), c("Data","Forecasts","95% confidence limits"))
Figure 12: Forecast with hw.m model
Figure 12 shows the forecasts and 95% forecast interval for the solar ration series when solar series is fitted with hw.m model. Since common sense solar radiation cannot be negative numbers, we will mainly focus on the upper 95% confidence interval. Forecasts using hw.m model have reasonable mean values and confidence interval.
In conclusion, hw.m is determined to be the best model for forecasting solar time series out of all the models fitted above based on factors including residual checks and MASE values. Two years ahead forecasts made with hw.m model is as shown in Figure 12.
The aim of this investigation is to demonstrate whether the correlation between quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change is spurious or not. The dataset used in this report is data2 which contains information on the quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria between September 2003 and December 2016.
Firstly the dataframe data2 will be converted into time series. Then their relationship will be examined by looking at their time series plots and sample CCF plot. If an unusually high correlation is shown in the sample CCF plot, the prewhitening process will be carried out. Prewhitening is a linear operation, therefore any linear relationships between the original series will be preserved after prewhitening. Lastly, the relationship between the post prewhitened series will be examined by looking at their time series plots and sample CCF plot. If there is no significant correlations left the correlation between these series are determined to be spurious.
The following code chunk is used to load dataset data2 and convert it to time series:
data2<- read_csv("data2.csv")
## Parsed with column specification:
## cols(
## Quarter = col_character(),
## price = col_double(),
## change = col_double()
## )
colnames(data2)<- c("Quarter","res PPI", "pop change")
data2.ts<- ts(data2 [,2:3], start=c(2003,9),frequency = 4)
The following code chunk is used to visualise the time series plot of Melbourne residential PPI and population change series:
plot(data2.ts,yax.flip=T, main="Time series plots of Melbourne residential PPI and population change", type="o")
Figure 13
Figure 13 is the time series plots for Melbourne residential PPI and population change, it shows that:
To further explore the correlation structure between residential PPI and population change series the sample CCF plot will be displayed:
The following code chunk is used to visualise the sample CCF plot of Melbourne residential PPI and population change series:
ccf(as.vector(data2.ts[,1]), as.vector(data2.ts[,2]),ylab='CCF', main = "Sample CCF between residential PPI and population change")
Figure 14
Figure 14 is the Sample CCF plots for Melbourne residential PPI and population change, it shows that:
To investigate the existence of a spurious correlation, prewhitening procedure will be applied to these series:
Before applying prewhitening to the series we must ensure that both series are stationary.
The following code chunk is used to explore the stationarity of Melbourne residential PPI and population change series:
par(mfrow=c(1,2))
acf(data2.ts[,1],lag.max = 48, main="Sample ACF for residential PPI")
acf(data2.ts[,2],lag.max = 48, main="Sample ACF for population change")
Figure 15
par(mfrow=c(1,1))
adf.test(data2.ts[,1])
##
## Augmented Dickey-Fuller Test
##
## data: data2.ts[, 1]
## Dickey-Fuller = -1.3264, Lag order = 3, p-value = 0.8458
## alternative hypothesis: stationary
adf.test(data2.ts[,2])
##
## Augmented Dickey-Fuller Test
##
## data: data2.ts[, 2]
## Dickey-Fuller = -1.603, Lag order = 3, p-value = 0.7344
## alternative hypothesis: stationary
Figure 15 is the Sample ACF plots and the Augmented Dickey-Fuller test results for Melbourne residential PPI and population change series, it shows that:
The following chuck is used to convert data2.ts into stationary time series through differencing:
data2.diff <- ts.intersect(diff(diff(data2.ts[,1],4)),
diff(diff(data2.ts[,2],4)))
par(mfrow=c(1,2))
acf(data2.diff[,1],lag.max = 48, main="Sample ACF for residential PPI")
acf(data2.diff[,2],lag.max = 48, main="Sample ACF for population change")
Figure 16
par(mfrow=c(1,1))
adf.test(data2.diff[,1])
##
## Augmented Dickey-Fuller Test
##
## data: data2.diff[, 1]
## Dickey-Fuller = -5.1122, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
adf.test(data2.diff[,2])
##
## Augmented Dickey-Fuller Test
##
## data: data2.diff[, 2]
## Dickey-Fuller = -3.8985, Lag order = 3, p-value = 0.02136
## alternative hypothesis: stationary
Figure 16 is the Sample ACF plots and the Augmented Dickey-Fuller test results for Melbourne residential PPI and population change series after seasonal and ordinal differencing, it shows that:
Both time series are determined to be stationary after seasonal and ordinal differencing, they are now ready to be used to the prewhitening procedure.
The following code chunk is used to apply prewhitening process to Melbourne residential PPI and population change series:
prewhiten(as.vector(data2.diff[,1]), as.vector(data2.diff[,2]), ylab="CCF",
main="Sample CCF after prewhitening between residential PPI and population change")
Figure 17
Figure 17 is the sample CCF plots for Melbourne residential PPI and population change after prewhitening, it shows that:
The distinction of correlation structure between residential PPI and population change indicates that the correlation between residential PPI and population change is indeed spurious.
There is no significant correlations left in the sample CCF plot after prewhitening process, therefore the correlation between the quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change is spurious.
Demirhan H. Time Series Regression Models with Distributed Lag Models. Package ‘dLagM’. 2019 Aug 24. Sourced from: https://cran.r-project.org/web/packages/dLagM/dLagM.pdf