As the Title suggests, this report consists of two tasks. The aim of the first task is analysis and forecasting of the amount of horizontal solar radiation reaching the ground at a particular location over the globe. For this task, we will apply time series regression method to fit distributed lag models using monthly precipitation series as an independent explanatory series. IN addition to that, we will work with exponential smoothing methods with corresponding state-space models to forecast solar radiation series. Further more, We will demonstrate an appropriate comparison between these methods with respect to residual assumptions and goodness of fit measures. Overall expected aim of this analysis is to give 2 years ahead forecasts from the best suitable model in terms of its mean absolute scaled error (MASE) measure.
The objective of the second task is to analyse the correlation structure between quarterly Residential Property Price Index (PPI) in Melbourne and population change over the previous quarter in Victoria from September 2003 to December 2016. Here We will explore , investigate and demonstrate if the correlation between the two series is found spurious or not.
The dataset named Sundata is used for the Task 1 analysis which contains information about monthly average horizontal solar radiation and monthly precipitation measured at the same points between January 1960 and December 2014.
As seen below, the code loads the dataset and creates time series objects of solar radiation and precipitation values.
Sundata <- read.csv("D:/Rmit/Sem4/Forecasting/data1.csv")
head(Sundata)
solar<- ts(Sundata$solar, start =c(1960,1), frequency = 12)
head(solar)
## Jan Feb Mar Apr May Jun
## 1960 5.051729 6.415832 10.847920 16.930264 24.030797 26.298202
precip <- ts(Sundata$ppt, start = c(1960,1), frequency = 12)
head(precip)
## Jan Feb Mar Apr May Jun
## 1960 1.333 0.921 0.947 0.615 0.544 0.703
#Converting into timeseries
Sundata_ts <- ts(Sundata, start=c(1960,1), frequency= 12)
head(Sundata_ts)
## solar ppt
## Jan 1960 5.051729 1.333
## Feb 1960 6.415832 0.921
## Mar 1960 10.847920 0.947
## Apr 1960 16.930264 0.615
## May 1960 24.030797 0.544
## Jun 1960 26.298202 0.703
#Load data.x
precip_f <- read.csv("D:/Rmit/Sem4/Forecasting/data.x.csv")
head(precip_f)
plot(solar, main = "Figure.1 :Time series plot of solar radiation series", ylab = "Amount of solar radiation", xlab = "Time")
points(solar, x=time(solar), pch = as.vector(season(solar)))
From the above displayed plot in Figure 1, we can observe the following characteristics of the series:
There is no apparent trend observed.
Seasonality is present,wherein there are lower values observed in December and January and higher values in June and July. The seasonal pattern is not consistent across the observed time.
Due to the presence of seasonality, the Changing variance is not present and behaviour of the series are not obvious
Two intervention points are found around 1965 and 1987.
To study more on stationarity and seasonality in the series, We will further display sample ACF and conduct an Augmented Dickey-Fuller test .The length of our data allows to display more lags in the ACF plot to better observe any evidence of trend.
acf(solar, lag.max = 48, main="Figure.2: ACF plot of solar radiation series")
adf.test(solar, k=ar(solar)$order)
## Warning in adf.test(solar, k = ar(solar)$order): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: solar
## Dickey-Fuller = -4.557, Lag order = 25, p-value = 0.01
## alternative hypothesis: stationary
Figure 2 displayed ACF plot above shows strong seasonal patterns and observes no trend. Next, the ADF test results with lag order = 25 reports stationarity in the series at 5% level of significance,since p-value < 0.05. Thus, we conclude that solar radiation series has a strong seasonality pattern.
Now lets display a time series plot of precipitation series which we will use as a predictor series for distributed lag models.
plot(precip, main ="Figure.3: Time series plot of precipitation series", ylab="Precipitation", xlab = "Time")
points (precip, x= time(precip), pch = as.vector(season(precip)))
Observations made from Figure 3 plot, we can make the following comments on the characteristics of the series:
There is a possibility of a slight downward trend, especially in the beginning of the series.
Seasonality is present strongly, though the pattern changes overtime, we can say that lower values are observed in July and August and higher values in December-January.
Since the series is Seasonal,the changing variance and behaviour of the series is not apparent due to seasonality.
Intervention on points is absent.
To study more on trend and seasonality in the series, We will further display sample ACF and conduct an Augmented Dickey-Fuller test
acf(precip,lag.max = 48, main = "Figure.4: ACF plot of precipitation series")
adf.test(precip,k=ar(precip)$order)
##
## Augmented Dickey-Fuller Test
##
## data: precip
## Dickey-Fuller = -3.2594, Lag order = 28, p-value = 0.07769
## alternative hypothesis: stationary
Figure 4 displayed ACF plot above shows strong seasonal patterns and a decaying pattern of seasonal lags also suggests the possible existence of trend. The ADF test reports p-value = 0.078 > 0.05 which suggests the series is nonstationary at 5% level of significance.
To clearly visualise the dependent solar radiation series versus the explanatory precipitation series within the same plot, we will standardise the data. To explore the relationship of the series,the time series plot is plotted below .
shift<- scale(Sundata_ts)
plot(shift, plot.type="s",col=c("dodgerblue2", "firebrick"),main= "Fig 5:Solar Radiation versus Precipitation wrt time(Scaled)")
legend("topleft", lty=1, text.width = 10, col = c("dodgerblue2", "firebrick"), c("Solar Radiation", "Precipitation"))
Figure 5 Above shows that the dependent and the independent series are likely to be oppositely or negatively correlated. Higher values of solar radiation correspond to the lower values of precipitation and vice versa.
Therefore to check the relationship,We calculate the correlation coefficient .
cor(solar,precip)
## [1] -0.4540277
The conclusions made in figure 5 is proved by the result above.The correlation coefficient is reported r=−0.45 which suggests a moderate negative correlation between the series. After we have explored the characteristics of the individual series and found the evidence of relationship between them, we proceed to modelling stage.
Lets try fitting distributed lag models which include an independent explanatory series and its lags to help explain the overall variation and correlation structure in our dependent series.
We create a loop, in order to specify the finite lag length for the model, that computes accuracy measures like AIC/BIC and MASE for the models with different lag lengths and select a model with the lowest values.
for (i in 1:10){
model1 <- dlm(x = Sundata$ppt, y = Sundata$solar, q = i)
cat("q =", i, "AIC =", AIC(model1$model), "BIC =", BIC(model1$model), "MASE =", MASE(model1)$MASE, "\n")
}
## q = 1 AIC = 4728.713 BIC = 4746.676 MASE = 1.688457
## q = 2 AIC = 4712.649 BIC = 4735.095 MASE = 1.675967
## q = 3 AIC = 4688.551 BIC = 4715.478 MASE = 1.662703
## q = 4 AIC = 4663.6 BIC = 4695.003 MASE = 1.646357
## q = 5 AIC = 4644.622 BIC = 4680.499 MASE = 1.613848
## q = 6 AIC = 4637.489 BIC = 4677.837 MASE = 1.607532
## q = 7 AIC = 4632.716 BIC = 4677.532 MASE = 1.607042
## q = 8 AIC = 4625.986 BIC = 4675.267 MASE = 1.604806
## q = 9 AIC = 4615.084 BIC = 4668.827 MASE = 1.593121
## q = 10 AIC = 4602.658 BIC = 4660.858 MASE = 1.577996
#Finite dlm
ff_dlm <- dlm(x=Sundata$ppt, y=Sundata$solar, q=10)
summary(ff_dlm)
##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.9353 -5.4124 -0.7911 4.0184 30.8900
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.0105 1.0942 17.374 < 2e-16 ***
## x.t -7.3843 1.8995 -3.887 0.000112 ***
## x.1 -0.4763 2.5395 -0.188 0.851288
## x.2 -0.1324 2.5734 -0.051 0.958980
## x.3 1.7902 2.5781 0.694 0.487691
## x.4 1.9686 2.5808 0.763 0.445877
## x.5 3.4928 2.5807 1.353 0.176402
## x.6 0.5243 2.5787 0.203 0.838943
## x.7 1.6762 2.5797 0.650 0.516088
## x.8 0.9282 2.5673 0.362 0.717817
## x.9 0.3754 2.5338 0.148 0.882272
## x.10 -5.3798 1.8760 -2.868 0.004272 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.256 on 638 degrees of freedom
## Multiple R-squared: 0.3081, Adjusted R-squared: 0.2962
## F-statistic: 25.82 on 11 and 638 DF, p-value: < 2.2e-16
##
## AIC and BIC values for the model:
## AIC BIC
## 1 4602.658 4660.858
vif(ff_dlm$model)
## x.t x.1 x.2 x.3 x.4 x.5 x.6 x.7
## 4.244594 7.665259 7.910115 7.952633 7.957841 7.941836 7.911213 7.901898
## x.8 x.9 x.10
## 7.847965 7.653512 4.228221
From the above results ,it is clear that the values of information criteria as well as MASE decrease as the lag q increases, so we will fit a finite DLM with a number of lags = 10.
According to the significance tests of model coefficients obtained from the summary, nearly all lag weights of a predictor series are not statistically significant at 5% level. The adjusted R2 for finite_dlm is 0.296, which means that the model explains only 29.6% of the variability in solar radiation. F-test of the overall significance of the model reports the model is statistically significant at 5% level (p-value < 0.05). However, we conclude that the model is not a good fit to the data due to insignificant terms and low explainability.
There is no issue with multicollinearity in the model, VIF values are reported < 10.
residualcheck <- function(x){
checkresiduals(x)
bgtest(x)
shapiro.test(x$residuals)
}
residualcheck(ff_dlm$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94643, p-value = 1.386e-14
checkresiduals(ff_dlm$model)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 588.43, df = 15, p-value < 2.2e-16
Previous model Finite DLM had several limitations thus we will fit second order polynomial DLM with lag =10
polyd <- polyDlm(x=as.vector(Sundata$ppt), y=as.vector(Sundata$solar), q=10,k=2)
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -5.040 0.435 -11.600 2.39e-28
## beta.1 -2.320 0.314 -7.400 4.33e-13
## beta.2 -0.188 0.251 -0.749 4.54e-01
## beta.3 1.370 0.237 5.770 1.23e-08
## beta.4 2.340 0.243 9.600 1.74e-20
## beta.5 2.720 0.248 11.000 7.32e-26
## beta.6 2.530 0.243 10.400 1.40e-23
## beta.7 1.750 0.235 7.430 3.39e-13
## beta.8 0.387 0.249 1.560 1.20e-01
## beta.9 -1.560 0.312 -4.990 7.68e-07
## beta.10 -4.090 0.433 -9.440 7.08e-20
summary(polyd)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.012 -5.343 -1.090 4.097 31.641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.77806 1.08588 17.29 <2e-16 ***
## z.t0 -5.04398 0.43502 -11.60 <2e-16 ***
## z.t1 3.01112 0.18300 16.45 <2e-16 ***
## z.t2 -0.29153 0.01761 -16.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.237 on 646 degrees of freedom
## Multiple R-squared: 0.3025, Adjusted R-squared: 0.2992
## F-statistic: 93.38 on 3 and 646 DF, p-value: < 2.2e-16
From the above summary, we can say all lag weights are significant at 5% level except lag 2 and lag 8 (where p value >0.05).The adjusted R2 is 0.299 , the model explains about 30% of the variability in radiation which is similar to the finite DLM. The overall significance test reports the model is statistically significant at 5% level with p-value reported < 0.05.
For Z.t1 and z.t2 terms, the VIF values are greater thus it causes an multicollinearity issue.
residualcheck(polyd$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.94582, p-value = 1.12e-14
checkresiduals(polyd$model)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 580.37, df = 10, p-value < 2.2e-16
All the figures above are considered as Figure 7. The analysis of residuals from polynomial model with q=10,k=2 above shows the following:
The errors are not randomly spread.
There are a lot of highly significant lags in the ACF plot as well as a wavy pattern at seasonal lags, so there is autocorrelation and seasonality still present in the residuals.
Beusch-Godfrey test reports a p-value < 0.05, therefore there is serial correlation in the residuals at 5% level of significance.
The normality of the residuals is also violated, as observed from the histogram and Shapiro-Wilk normality test report (p-value < 0.05).
Overall, we can conclude that the second order polynomial of lag 10 is not successful at capturing the autocorrelation and seasonality in the series and has low explainability.
We will implement Koyck transformation model with precipitation predictor series as follows.
K_trans = koyckDlm(x=as.vector(Sundata$ppt), y= as.vector(Sundata$solar))
summary(K_trans$model, diagnostics=T)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.0926 -3.5961 0.3176 3.6103 14.8399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23925 0.76549 -2.925 0.00356 **
## Y.1 0.98546 0.02424 40.650 < 2e-16 ***
## X.t 5.34684 0.84383 6.336 4.37e-10 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 656 710.7 <2e-16 ***
## Wu-Hausman 1 655 146.8 <2e-16 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.814 on 656 degrees of freedom
## Multiple R-Squared: 0.7598, Adjusted R-squared: 0.7591
## Wald test: 1104 on 2 and 656 DF, p-value: < 2.2e-16
vif(K_trans$model)
## Y.1 X.t
## 1.605001 1.605001
From the model summary, we can conclude that all terms of Koyck model are significant at 5% level. The model is reported to be overall statistically significant at 5% level (p-value < 0.05) and its adjusted R2=0.759 which means the model explains about 76% of the variability in radiation.
According to the Weak instruments test (p-value < 0.05), the model at the first stage of least-squares estimation is significant at 5% level.
From the Wu-Hausman test (p-value < 0.05), we can conclude that there is significant correlation between the explanatory variable and the error term at 5% level.
residualcheck(K_trans$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.98487, p-value = 2.468e-06
checkresiduals(K_trans$model)
All the figures above are considered as Figure 8. From the residual analysis above, we can conclude the following:
The errors are not spread randomly.
All the lags in ACF plot are significant and have a wave-like pattern, which suggests serial correlation and seasonality remaining in the residuals.
The errors are not normal. The histogram and the Shapiro-Wilk normality test with p-value < 0.05 suggest not normal residuals.
Overall, we can conclude that the Koyck model is also not successful at capturing the autocorrelation and seasonality in the series.
Working on the last model type from time series regression method is Autoregressive distributed lag models. For specifying the parameters of ARDL(p,q), we create a loop that fits autoregressive DLMs for a range of lag lengths and orders of the AR process and obtains their accuracy measures, like AIC/BIC and MASE.
Three models with lowest values of MASE were chosen for fitting and analysis. The models were:
ARDL(3,5)
ARDL(4,5)
ARDL(5,5)
For loop is created with 5 iterations to fit these candidate models and do residual analysis in a dynamical way.
for (i in 1:5){
for(j in 1:5){
model2 = ardlDlm(x = as.vector(Sundata$ppt), y = as.vector(Sundata$solar), p = i , q = j)
cat("p =", i, "q =", j, "AIC =", AIC(model2$model), "BIC =", BIC(model2$model), "MASE =", MASE(model2)$MASE, "\n")
}
}
## p = 1 q = 1 AIC = 3712.311 BIC = 3734.765 MASE = 0.8392434
## p = 1 q = 2 AIC = 3239.416 BIC = 3266.352 MASE = 0.4971918
## p = 1 q = 3 AIC = 3143.522 BIC = 3174.936 MASE = 0.4740063
## p = 1 q = 4 AIC = 3138.399 BIC = 3174.288 MASE = 0.4697571
## p = 1 q = 5 AIC = 3100.283 BIC = 3140.644 MASE = 0.450425
## p = 2 q = 1 AIC = 3639.223 BIC = 3666.159 MASE = 0.7834855
## p = 2 q = 2 AIC = 3229.051 BIC = 3260.476 MASE = 0.4951319
## p = 2 q = 3 AIC = 3137.634 BIC = 3173.535 MASE = 0.4738939
## p = 2 q = 4 AIC = 3132.962 BIC = 3173.337 MASE = 0.4702773
## p = 2 q = 5 AIC = 3097.288 BIC = 3142.134 MASE = 0.4503599
## p = 3 q = 1 AIC = 3608.793 BIC = 3640.207 MASE = 0.7572489
## p = 3 q = 2 AIC = 3226.623 BIC = 3262.524 MASE = 0.4955334
## p = 3 q = 3 AIC = 3139.409 BIC = 3179.798 MASE = 0.4737144
## p = 3 q = 4 AIC = 3134.777 BIC = 3179.638 MASE = 0.4701162
## p = 3 q = 5 AIC = 3098.808 BIC = 3148.139 MASE = 0.4502885
## p = 4 q = 1 AIC = 3602.664 BIC = 3638.553 MASE = 0.7580664
## p = 4 q = 2 AIC = 3224.285 BIC = 3264.66 MASE = 0.4959949
## p = 4 q = 3 AIC = 3131.289 BIC = 3176.15 MASE = 0.4695096
## p = 4 q = 4 AIC = 3131.424 BIC = 3180.772 MASE = 0.4665123
## p = 4 q = 5 AIC = 3096.024 BIC = 3149.839 MASE = 0.4479481
## p = 5 q = 1 AIC = 3599.402 BIC = 3639.764 MASE = 0.7572617
## p = 5 q = 2 AIC = 3221.853 BIC = 3266.699 MASE = 0.4954501
## p = 5 q = 3 AIC = 3127.103 BIC = 3176.434 MASE = 0.4675479
## p = 5 q = 4 AIC = 3127.868 BIC = 3181.684 MASE = 0.4651969
## p = 5 q = 5 AIC = 3097.877 BIC = 3156.177 MASE = 0.4479311
for (i in c(3,4,5)){
ardl <- ardlDlm(x = as.vector(Sundata$ppt), y = as.vector(Sundata$solar), p = i, q = 5)
summary(ardl)
bgtest(ardl$model)
residualcheck(ardl$model)
}
##
## Time series regression with "ts" data:
## Start = 6, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6649 -1.4447 -0.2663 1.0644 18.7430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.20532 0.42237 5.221 2.40e-07 ***
## X.t -0.60604 0.54924 -1.103 0.270258
## X.1 0.89253 0.77682 1.149 0.251001
## X.2 1.60774 0.77838 2.065 0.039276 *
## X.3 -0.38294 0.55738 -0.687 0.492308
## Y.1 1.27345 0.03859 32.999 < 2e-16 ***
## Y.2 -0.02859 0.06250 -0.457 0.647495
## Y.3 -0.40351 0.06036 -6.685 5.01e-11 ***
## Y.4 -0.22632 0.06234 -3.630 0.000305 ***
## Y.5 0.22116 0.03779 5.853 7.69e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.553 on 645 degrees of freedom
## Multiple R-squared: 0.9333, Adjusted R-squared: 0.9324
## F-statistic: 1003 on 9 and 645 DF, p-value: < 2.2e-16
##
## Time series regression with "ts" data:
## Start = 6, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.5901 -1.3826 -0.2867 1.0526 18.5709
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.46103 0.43729 5.628 2.72e-08 ***
## X.t -0.61439 0.54768 -1.122 0.262364
## X.1 0.78469 0.77617 1.011 0.312409
## X.2 1.28470 0.79026 1.626 0.104509
## X.3 0.80457 0.77946 1.032 0.302355
## X.4 -1.20750 0.55570 -2.173 0.030148 *
## Y.1 1.27195 0.03849 33.049 < 2e-16 ***
## Y.2 -0.01868 0.06249 -0.299 0.765112
## Y.3 -0.40480 0.06019 -6.725 3.87e-11 ***
## Y.4 -0.23201 0.06221 -3.729 0.000209 ***
## Y.5 0.21746 0.03772 5.766 1.26e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.546 on 644 degrees of freedom
## Multiple R-squared: 0.9338, Adjusted R-squared: 0.9328
## F-statistic: 908.5 on 10 and 644 DF, p-value: < 2.2e-16
##
## Time series regression with "ts" data:
## Start = 6, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.5959 -1.3825 -0.2646 1.0410 18.5812
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.50740 0.45434 5.519 4.96e-08 ***
## X.t -0.61416 0.54804 -1.121 0.262863
## X.1 0.78299 0.77670 1.008 0.313788
## X.2 1.26543 0.79241 1.597 0.110772
## X.3 0.75184 0.79227 0.949 0.342998
## X.4 -1.00181 0.77678 -1.290 0.197617
## X.5 -0.21024 0.55439 -0.379 0.704639
## Y.1 1.27063 0.03867 32.861 < 2e-16 ***
## Y.2 -0.01727 0.06264 -0.276 0.782907
## Y.3 -0.40297 0.06043 -6.669 5.56e-11 ***
## Y.4 -0.23273 0.06229 -3.737 0.000203 ***
## Y.5 0.21571 0.03802 5.673 2.12e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.548 on 643 degrees of freedom
## Multiple R-squared: 0.9338, Adjusted R-squared: 0.9327
## F-statistic: 824.9 on 11 and 643 DF, p-value: < 2.2e-16
checkresiduals(ardl$model)
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 107.98, df = 15, p-value = 3.937e-16
#Figure 9
All the figures above are considered as Figure 9.From model summaries, we can conclude that all the fitted ARDL models were reported to be statistically significant at 5% level with p-value < 0.05. All models have an adjusted R2=0.933, which means they explain about 93.3% of the variability in solar radiation.
Regarding model coefficient estimates, we can observe for ARDL(3,5) only X.2 lag of predictor series is significant at 5% level (p-value = 0.0393 < 0.05), for ARDL(4,5) only X.4 lag of predictor series is significant at 5% level (p-value = 03014 < 0.05), and all lags of predictor series are not statistically significant at 5% level for ARDL(5,5). All lags of independent series are statistically significant in all models except Y.2 (p-value = 0.7829 > 0.05).
The plots from diagnostic checking in Figure 9 show that there is a very similar overall picture in residuals from all three fitted models:
The residuals are not as randomly spread as desired, they show evidence of changing variance.
There are a some highly significant lags in the ACF plot. The seasonal lags are also highly significant. Therefore, there is autocorrelation and seasonality still present in the residuals.
Beusch-Godfrey test reports a p-value < 0.05, therefore there is serial correlation in the residuals at 5% level of significance.
Long tails on the histogram of residuals suggest the normality of the residuals is violated.
Based on the observation about model estimates made earlier, we can try to decrease the number of lags for predictor series. We will fit ARDL(1,5) and perform diagnostic checking.
ardl_15 <- ardlDlm(x=as.vector(Sundata$ppt), y= as.vector(Sundata$solar), p=1, q=5)
summary(ardl_15)
##
## Time series regression with "ts" data:
## Start = 6, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.3364 -1.4314 -0.2553 1.0518 18.8604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.35018 0.39678 5.923 5.13e-09 ***
## X.t -1.03172 0.50686 -2.036 0.042207 *
## X.1 2.26211 0.50758 4.457 9.81e-06 ***
## Y.1 1.28228 0.03818 33.584 < 2e-16 ***
## Y.2 -0.03149 0.06265 -0.503 0.615375
## Y.3 -0.41374 0.06025 -6.867 1.54e-11 ***
## Y.4 -0.22906 0.06233 -3.675 0.000258 ***
## Y.5 0.22742 0.03770 6.032 2.72e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.56 on 647 degrees of freedom
## Multiple R-squared: 0.9328, Adjusted R-squared: 0.932
## F-statistic: 1282 on 7 and 647 DF, p-value: < 2.2e-16
residualcheck(ardl_15$model)
##
## Shapiro-Wilk normality test
##
## data: x$residuals
## W = 0.89664, p-value < 2.2e-16
#Figure 10
Figure 10 plotted above is the residual check of ardl_15 model. Above plot and results obtained gives a clear idea about the series.The p-value of the overall significance test is < 0.05, therefore ARDL(1,5) model is statistically significant at 5% level. All the estimated terms are significant at 5% level except Y.2 - second lag of independent series (p-value = 0.6154 > 0.05). Adjusted R2=0.932, which means they explain 93.2% of the variability in radiation.
The diagnostic checking in Figure 10 show the same picture as the diagnostic checkings in Figure 9, so the comments are the same as for previously fitted models.
Thus we can say, none of the models from time series regression method were successful at capturing the autocorrelation and seasonal pattern in radiation series.
We create a data frame accuracy to store the accuracy measures, like AIC/BIC and MASE from the models fitted so far. The accuracy measures for further models will be appended to this data frame.
attr(K_trans$model,"class") = "lm"
ardl_35 <- ardlDlm(x = as.vector(Sundata$ppt), y = as.vector(Sundata$solar), p = 3, q = 5)
ardl_45 <- ardlDlm(x = as.vector(Sundata$ppt), y = as.vector(Sundata$solar), p = 4, q = 5)
ardl_55 <- ardlDlm(x = as.vector(Sundata$ppt), y = as.vector(Sundata$solar), p = 5, q = 5)
models <- c("FF_DLM", "PolyD", "Koyck_trans", "ARDL_15", "ARDL_35", "ARDL_45", "ARDL_55")
aic <- AIC(ff_dlm$model, polyd$model, K_trans$model, ardl_15$model, ardl_35$model, ardl_45$model, ardl_55$model)$AIC
bic <- BIC(ff_dlm$model, polyd$model, K_trans$model, ardl_15$model, ardl_35$model, ardl_45$model, ardl_55$model)$BIC
MASE <- MASE(ff_dlm$model, polyd$model, K_trans$model, ardl_15, ardl_35, ardl_45, ardl_55)$MASE
accuracy <- data.frame(models, MASE, aic, bic)
colnames(accuracy) <- c("Model", "MASE", "AIC", "BIC")
head(accuracy)
The second forecasting method we will try is exponential smoothing. Because we have found a strong seasonal component in solar series that we want to make predictions for, we will only consider models that include either additive or multiplicative seasonality. Since there are 6 possible combinations of exponential smoothing models with seasonality that can be implemented in R, we create a loop to fit them and store their accuracy measures.
ES = c(T,F)
seasonality <- c("additive","multiplicative")
damped <- c(T,F)
expa <- expand.grid(ES, seasonality, damped)
expa <- expa[-c(1,5),]
f_aic <- array(NA, 6)
f_bic <- array(NA, 6)
f_mase <- array(NA, 6)
levels <- array(NA, dim=c(6,3))
for (i in 1:6){
holt_w <- hw(solar, ES = expa[i,1], seasonal = toString(expa[i,2], damped = expa[i,3]))
f_aic[i] <- holt_w$model$aic
f_bic[i] <- holt_w$model$bic
f_mase[i] <- accuracy(holt_w)[6]
levels[i,1] <- expa[i,1]
levels[i,2] <- toString(expa[i,2])
levels[i,3] <- expa[i,3]
checkresiduals(holt_w)
}
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 205.55, df = 8, p-value < 2.2e-16
##
## Model df: 16. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
##
## Model df: 16. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
##
## Model df: 16. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 205.55, df = 8, p-value < 2.2e-16
##
## Model df: 16. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
##
## Model df: 16. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
##
## Model df: 16. Total lags used: 24
All the figures above are considered as Figure 11. Above plotting shows residual analysis plots for each model fitted. We can observe that damping the trend in fact does not make any difference in terms of model fit, the residuals look identical.
Points to be considered are:
Figure 11 plot 1:Residuals from Holt-Winters’ additive method have issues with randomness, serial correlation and normality.
Figure 11 plot 2: Residuals from Holt-Winters’ multiplicative method with multiplicative trend have only one lag significant in ACF which is an improvement from all the previous models. However, Ljung-Box test reports p-value = 0.006521 (< 0.05) which suggests autocorrelation is still present in the residuals. There are issues with normality and there are also some large residuals (standardised residuals > 3).
Figure 11 plot 3: Residuals from Holt-Winters’ multiplicative method with additive trend show a similar picture to the previous model.
Overall, we can observe a slight improvement in residuals from exponential smoothing models in terms of serial correlation and especially seasonality. Holt-Winters’ multiplicative method is the most successful at capturing autocorrelation and seasonality in radiation series.
We append the accuracy measures for exponential smoothing models to the accuracy data frame. The format of model names is: trend (multiplicative or additive), seasonality (multiplicative or additive) and if the trend is damped or not damped.
newvalues <- data.frame(levels, f_mase, f_aic, f_bic)
colnames(newvalues) <- c("Trend", "Seasonality", "damped", "MASE", "AIC", "BIC")
newvalues$Trend <- factor(newvalues$Trend, levels = c(T,F), labels = c("multiplicative","additive"))
newvalues$damped <- factor(newvalues$damped, levels = c(T,F), labels = c("damped","N"))
newvalues <- unite(newvalues, col = "Model", c("Trend","Seasonality","damped"))
accuracy <- rbind(accuracy, newvalues)
accuracy
For each exponential smoothing method, there are two corresponding state-space models (with additive or multiplicative errors). There are 8 state-space variations which include seasonality that we cam implement in R (some combinations are forbidden due to their stability issues). We create a loop to fit these models and capture their accuracy measures for further comparison.
vlist <- c("AAA", "MAA", "MAM", "MMM")
damp <- c(T,F)
ets_models <- expand.grid(vlist, damp)
ets_aic <- array(NA, 8)
ets_mase <- array(NA,8)
ets_bic <- array(NA,8)
mod <- array(NA, dim=c(8,2))
for (i in 1:8){
ets <- ets(solar, model = toString(ets_models[i, 1]), damped = ets_models[i,2])
ets_aic[i] <- ets$aic
ets_bic[i] <- ets$bic
ets_mase[i] <- accuracy(ets)[6]
mod[i,1] <- toString(ets_models[i,1])
mod[i,2] <- ets_models[i,2]
}
Auto ETS fitted to see what the software automatically suggested model is.
auto_ets <- ets(solar)
summary(auto_ets)
## ETS(A,Ad,A)
##
## Call:
## ets(y = solar)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
The model suggested automatically is ETS(A,Ad,A) which is a model with additive errors, additive damped trend and additive seasonality.
# Figure 12
checkresiduals(auto_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 7, p-value < 2.2e-16
##
## Model df: 17. Total lags used: 24
The residual analysis in Figure 12 shows the following:
Residuals are not spread randomly and have changing variance.
There are a lot of significant lags in the ACF plot similar to what we have seen in Holt-Winters’ additive method. Ljung-Box test reports p-value < 0.05, therefore there is evidence of autocorrelation in the residuals. In addition, lag 12 is significant which suggests seasonality effect is still present in the residuals.
According to long tails in the histogram, there are also issues with normality.
Overall, the residual analysis suggest this model is not successful at capturing autocorrelation and seasonality in solar series.
We append the accuracy measures for state-space models to the accuracy data frame.
calculate <- data.frame(mod, ets_mase, ets_aic, ets_bic)
calculate$X2 <- factor(calculate$X2, levels = c(T,F), labels = c("Damped","N"))
calculate <- unite(calculate, "Model", c("X1","X2"))
colnames(calculate) <- c("Model", "MASE", "AIC", "BIC")
accuracy <- rbind(accuracy,calculate)
#The data frame accuracy is sorted by ascending MASE value.
accuracy <- arrange(accuracy, MASE)
kable(accuracy, caption = "Models and their accuracy parameters (sorted by MASE)")
| Model | MASE | AIC | BIC |
|---|---|---|---|
| multiplicative_multiplicative_damped | 0.2233077 | 6648.746 | 6725.114 |
| additive_multiplicative_damped | 0.2233077 | 6648.746 | 6725.114 |
| multiplicative_multiplicative_N | 0.2233077 | 6648.746 | 6725.114 |
| additive_multiplicative_N | 0.2233077 | 6648.746 | 6725.114 |
| AAA_Damped | 0.2461797 | 5428.422 | 5509.282 |
| additive_additive_damped | 0.2471600 | 5434.708 | 5511.076 |
| additive_additive_N | 0.2471600 | 5434.708 | 5511.076 |
| AAA_N | 0.2471600 | 5434.708 | 5511.076 |
| MMM_Damped | 0.3201193 | 5995.550 | 6076.410 |
| MAM_Damped | 0.3222574 | 5953.502 | 6034.363 |
| MAM_N | 0.3721664 | 6105.959 | 6182.327 |
| MAA_Damped | 0.3798095 | 6469.079 | 6549.940 |
| ARDL_55 | 0.4479311 | 3097.877 | 3156.177 |
| ARDL_45 | 0.4479481 | 3096.024 | 3149.839 |
| ARDL_35 | 0.4502885 | 3098.808 | 3148.139 |
| ARDL_15 | 0.4504250 | 3100.283 | 3140.644 |
| MAA_N | 0.4748561 | 7602.755 | 7679.123 |
| MMM_N | 0.5292151 | 6670.168 | 6746.536 |
| Koyck_trans | 1.0324829 | 3946.476 | 3964.439 |
| FF_DLM | 1.5779955 | 4602.658 | 4660.858 |
| PolyD | 1.5925546 | 4591.904 | 4614.289 |
The accuracy table will be used to compare all methods we have tried at the modelling stage in terms of their MASE. The model that minimises MASE is Holt-Winters’ multiplicative method with additive trend (there is no difference in models in terms of damping). The best state-space model in terms of lowest MASE is ETS(A,Ad,A) which is the model with additive errors, additive damped trend and additive seasonality. ETS(A,Ad,A) was also the model suggested automatically. We can see from the table that time series regression methods perform the worst in terms of MASE but this approach has lower AIC/BIC measures than the exponential smoothing approach.
Predicting the forecasts of solar radiation, on the final model to give two years ahead forecasts of solar radiation, we compare forecasts from three models:
Holt-Winters’ multiplicative method which has the lowest MASE and is the most successful at capturing the autocorrelation and seasonality in the series
Holt-Winters’ multiplicative method with multiplicative trend which has the second lowest MASE and is also good at capturing the autocorrelation and seasonality in the series
ETS(A,Ad,A) model was suggested by an automatic algorithm and has the lowest MASE of all state-space models but does not capture autocorrelation in the series
fitm1 <- hw(solar, seasonal = "multiplicative", h = 2*frequency(solar))
fitm2 <- hw(solar, seasonal = "multiplicative", exponential = T, h = 2*frequency(solar))
fitm3 <- ets(solar,model="AAA", damped=T)
#class(fit3)
#methods(forecast())
for_fit3 <- forecast.ets(fitm3)
plot(for_fit3, fcol = "white", main = " Fig.13: Solar radiation series with two years ahead forecasts", ylab = "solar", ylim = c(-10,55))
lines(fitted(fitm1), col = "darkgreen")
lines(fitm1$mean, col = "darkgreen", lwd = 2)
lines(fitted(fitm2), col = "brown2")
lines(fitm2$mean, col = "brown2", lwd = 2)
lines(fitted(fitm3), col = "dodgerblue3")
lines(for_fit3$mean, col = "dodgerblue3", lwd = 2)
legend("bottomleft", lty = 1, col = c("black", "darkgreen", "brown2", "dodgerblue3"), c("Data", "Holt-Winters' Multiplicative", "Holt-Winters' Multiplicative Exponential", "ETS(A,Ad,A)"))
Figure 13 plotted above depicts the Solar radiation series with two years ahead forecasts.From observations We can say that the fitted values versus the original data are generally similar with all the models, the state-space model however is the furthest of all from the data. There are points around years 1966 and 1987 where we can see the Holt-Winters’ multiplicative method approximates the original series better.
The forecasts given by three models are quite different from each other. The state-space model suggests the highest peaks for the forecasts. Holt-Winters’ multiplicative approach suggests the amount of solar radiation will decrease over the next 2 years. Holt-Winters’ multiplicative method with multiplicative trend suggests that the amount of solar radiation will remain somewhat the same over the next year but will slightly decrease over the second year. Therefore, based on the residual analysis and the lowest MASE value, we take Holt-Winters’ multiplicative model to give 2 years ahead forecasts of the amount of solar radiation.
Plotting the final forecasts below.
plot(fitm1, fcol = "white", main = "Fig.14: Solar radiation series with two years ahead forecasts", ylab = "Radiation")
lines(fitted(fitm1), col = "darkgreen")
lines(fitm1$mean, col = "darkgreen", lwd = 2)
legend("topleft", lty = 1, col = c("black", "darkgreen"), c("Data", "Forecasts"))
Figure 14 plotted above depicts Solar radiation series with two years ahead forecasts.The solar radiation 2 years ahead point forecast values with corresponding 95% confidence intervals are as follows:
forc <- fitm1$mean
ub <- fitm1$upper[,2]
lb <- fitm1$lower[,2]
forecasts <- ts.intersect(ts(lb, start = c(2015,1), frequency = 12), ts(forc,start = c(2015,1), frequency = 12), ts(ub,start = c(2015,1), frequency = 12))
colnames(forecasts) <- c("Lower bound", "Point forecast", "Upper bound")
forecasts
## Lower bound Point forecast Upper bound
## Jan 2015 1.2440033 5.610335 9.976666
## Feb 2015 -0.3306776 6.512806 13.356290
## Mar 2015 -2.6068190 8.786120 20.179058
## Apr 2015 -5.4366123 10.192785 25.822181
## May 2015 -9.6237347 12.512597 34.648928
## Jun 2015 -14.1890164 14.061609 42.312235
## Jul 2015 -18.2282011 14.502088 47.232377
## Aug 2015 -19.6235881 12.945620 45.514829
## Sep 2015 -18.5159294 10.352210 39.220350
## Oct 2015 -15.4143760 7.418279 30.250935
## Nov 2015 -11.9078451 4.989730 21.887305
## Dec 2015 -10.5204066 3.871776 18.263960
## Jan 2016 -12.9554236 4.199400 21.354224
## Feb 2016 -16.7893479 4.838892 26.467131
## Mar 2016 -25.1740619 6.477174 38.128410
## Apr 2016 -32.3437711 7.452633 47.249037
## May 2016 -43.8456686 9.069749 61.985166
## Jun 2016 -54.2836502 10.099485 74.482621
## Jul 2016 -61.5580226 10.315199 82.188420
## Aug 2016 -60.3285438 9.113765 78.556075
## Sep 2016 -52.8981594 7.208700 67.315559
## Oct 2016 -41.5237568 5.105869 51.735495
## Nov 2016 -30.5729044 3.391945 37.356795
## Dec 2016 -25.9541251 2.597256 31.148637
Overall, we can observe that the 95% confidence intervals for the forecasts from selected approach have a very wide range and do not provide reliable forecasts.
The objective behind working on this task is to explore the correlation structure between a Residential Property Price Index series and population change in Victoria series and demonstrate if the correlation between them is spurious or not.
The pc dataset used for this task contains the information on three variables: quarter, quarterly Residential Property Price Index (PPI) in Melbourne and population change over the previous quarter in Victoria from September 2003 to December 2016.
Load the data data2 into dataset Pc.
pc <- read_csv("D:/Rmit/Sem4/Forecasting/data2.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## Quarter = col_character(),
## price = col_double(),
## change = col_double()
## )
Now convert the variables of interest (price and change) to time series with a quarterly frequency, starting from the third quarter of 2003.
pc_data <- ts(pc[,2:3], start = c(2003,3), frequency = 4)
price <- ts(pc$price, start = c(2003,3), frequency = 4)
change <- ts(pc$change, start = c(2003,3), frequency = 4)
head(pc_data)
## price change
## 2003 Q3 60.7 14017
## 2003 Q4 62.1 12350
## 2004 Q1 60.8 17894
## 2004 Q2 60.9 9079
## 2004 Q3 60.9 16210
## 2004 Q4 62.4 13788
Lets Showcase both series in same plot to compare the nature
plot(pc_data, main = "Fig.1: Time series plots of residential PPI and population change data")
Figure 1 displayed above shows Time series plots of residential PPI and population change data.From the time series plot, we can observe that there are simultaneously increasing trends in the series. We also can suggest from this plot that the population change series has a seasonal pattern. Overall, the visualisation implies that there is some correlation between the two series.
To further explore the correlation structure between the series, we display the plot of sample cross-covariance function (CCF). Sample cross-correlations that are higher than 1.96/√n magnitude (dashed lines on the plot) are deemed to be significantly different from zero.
ccf(as.vector(price), as.vector(change), ylab = "CCF", main = "Figure.2: CCF of PPI and population change")
Figure 2 displayed above shows the CCF plot checking correlation.It depicts that a number of lags are significantly different from zero according to the 1.96/√n bounds. The CCF plot suggests the existence of strong cross-correlation between the Residential Property Price Index in Melbourne and population change in Victoria. However, the cause of spurious correlation found between the series is likely to be the existence of nonstationarity in them.
Here we carry out analysis on series that will explore and explain the nonstationarity of the series individually.
The following code displays a time series plot of quarterly Residential Property Price Index.
plot(price, ylab = "Property Price Index", type="o", main = "Fig.3: Time series plot of Residential PPI from 2003 to 2016")
From the plot in Figure 3, we can observe the following about the Residential PPI:
There is a clear upward general trend
There is no seasonality
There is no changing variance present
There is no intervention
Successive points imply autoregressive behaviour
We will also have a look at the sample ACF for this series. The following code creates an ACF plot of PPI series.
acf(price, main = "Figure.4: ACF of Residential PPI")
adf.test(price)
##
## Augmented Dickey-Fuller Test
##
## data: price
## Dickey-Fuller = -1.3264, Lag order = 3, p-value = 0.8458
## alternative hypothesis: stationary
Figure 4 displayed above shows the ACF plot of Residential PPI.The slowly decaying pattern in ACF plot (Figure 4) suggests that the series is nonstationary, it provides the evidence of trend and no seasonality in the series.
We also conducted an Augmented Dickey-Fuller unit root test to test the H0 that the process is difference nonstationary. The ADF test reports p-value = 0.8458 > 0.05, there is not enough evidence to reject the H0 that the series is nonstationary at 5% level. Overall, we can conclude that the Residential PPI series is nonstationary.
The following code creates a time series plot of quarterly population change in Victoria from 2003 to 2016. We also put quarters on the plot to explore seasonal pattern.
plot(change, ylab = "People", main = "Fig.5: Time series plot of population change from 2003 to 2016")
points(change,x=time(change), pch=as.vector(season(change)))
From the plot in Figure 5, we can observe the following about the population change:
There is an upward general tendency
There is seasonality, population growth is higher in the first quarters and lower in the second quarters of the year
Changing variance is not obvious due to the presence of seasonality
There might be an intervention happening around year 2010
The behaviour of the series is not obvious because of the seasonal pattern
Now lets Display the ACF of population change from the series:
acf(change, main = "Figure.6: ACF of population change")
adf.test(change)
##
## Augmented Dickey-Fuller Test
##
## data: change
## Dickey-Fuller = -1.603, Lag order = 3, p-value = 0.7344
## alternative hypothesis: stationary
Figure 6 displayed above shows the ACF plot of population change.It shows that there is a trend as well as seasonality present in the series. So we can confirm the previous observation from the time series plot. ADF test reports p-value = 0.7344 > 0.05 which means there is again not enough evidence to reject H0 that the process is nonstationary at 5% level of significance.
Based on the previous analysis, we can conclude that both of our series are nonstationary. It is difficult to assess the true dependence between two processes when we have strongly autocorrelated nonstationary data.
Based on the previous analysis of the series, we will perform both regular and seasonal difference and then apply prewhitening. Prewhitening approach is applied to separate the relationship between the series from their autocorrelation. The most important step in prewhitening the series is to make sure they are stationary. The seasonal difference was done for both series to ensure they have the same length.
diff <- ts.intersect(diff(diff(price,4)), diff(diff(change,4)))
prewhiten(as.vector(diff[,1]), as.vector(diff[,2]), ylab='CCF', main = "Figure.7: CFF prewhitened")
Figure 7 displayed above shows the sample CCF plot after prewhitening.It suggests that there is in fact no significant correlation found between a Residential Property Price Index and population change in Victoria. We can conclude that the strong cross-correlation pattern found between raw series was spurious.
The Aim of Task 1 was to give predictions of solar radiation amount for the next 2 years,and so we used three methods and compared them based on residual analysis and MASE value. The three methods were:
Time series regression models
Exponential smoothing
State-space models
From the model fitting plots,the results and findings obtained says that though the time series regression, some exponential smoothing models and the automatically suggested state-space model were used as best suitable models , yet were not successful at capturing auto correlation and seasonality in solar radiation data.Other issues captured were with normality assumption in the residuals from all the models. Holt-Winters proved to be the most successful models to capture seasonality and serial correlation.Specifically multiplicative method and Holt-Winters’ multiplicative method with multiplicative trend. These models were also found to be the best in terms of minimising MASE. Based on the performed analysis, we chose Holt-Winters’ multiplicative method for forecasting. The forecasts suggest the amount of solar radiation will decrease over the next 2 years. However, point forecasts have very wide 95% confidence intervals which means they can not be considered reliable.
The aim of the second task is to analyse the correlation structure between quarterly Residential Property Price Index (PPI) in Melbourne and population change over the previous quarter in Victoria from September 2003 to December 2016. We will explore and demonstrate if the correlation between the two series is found spurious or not.
There are simultaneously increasing trends in Residential PPI data and population change series. Overall, the visualisation analysis found that there is some correlation between the two series. There was the existence of strong cross-correlation between the series observed from the CCF plot .After We applied prewhitening technique to assess the true dependence between two processes, the sample CCF plot after prewhitening suggested that there is in fact no significant correlation found between a Residential Property Price Index and population change in Victoria. Thus as a final conclusion, we can say that the strong cross-correlation pattern found between raw series was spurious.