R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Aim of the Assignment

This assignment consist of two task.The first task is to analyse and forecast the amount of horizontal solar radiation reaching the ground at a particular location over the globe.We have applied different models like time series regression methods, dynamic linear models , and exponential smoothing and corresponding state-space models to forecast solar radiation for next 2 years based on MASE value. Finally we need to predict next 2 year forecast from given predictor data set series using our best fitted model.

The main of second task is to analyse the correlation between quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria between September 2003 and December 2016. Here we have checked whether correlation between price of residential property and change in population is spurious or not.Here we have applied different methods like Cross correlation function(CCF) and pre-whitening methods.

Task1 :Analyse and forecast the amount of horizontal solar radiation reaching the ground at a particular location over the globe.

library(readr)
library(TSA)
## 
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
## 
##     spec
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(urca)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(dLagM)
## Loading required package: nardl
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Loading required package: dynlm
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(tseries)
library(dynlm)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(AER)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: lmtest
## Loading required package: sandwich
library(x12)
## Loading required package: x13binary
## x12 is ready to use.
## Use the package x12GUI for a Graphical User Interface.
## By default the X13-ARIMA-SEATS binaries provided by the R package x13binary
## are used but this can be changed with x12path(validpath)
## ---------------
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/x12/issues
library(ggplot2)
library(expsmooth)
## Loading required package: forecast
## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:dLagM':
## 
##     forecast
library(forecast)
data1 <- read_csv("data1.csv")
## Rows: 660 Columns: 2
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (2): solar, ppt
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data1)
## # A tibble: 6 x 2
##   solar   ppt
##   <dbl> <dbl>
## 1  5.05 1.33 
## 2  6.42 0.921
## 3 10.8  0.947
## 4 16.9  0.615
## 5 24.0  0.544
## 6 26.3  0.703
class(data1$solar)
## [1] "numeric"
class(data1$ppt)
## [1] "numeric"
# Convert two columns into time series

solar = ts(data1$solar, start = c(1960 ,1), end = c( 2014,12),frequency =12)

ppt <- ts(data1$ppt, start = c(1960,1), end = c( 2014,12), frequency =12)

Data visualization

Observation for solar radiation:

Trend : No proper trend can be observed here.

Seasonality : There are some seasonality present in this plot. We can see that peak or higher values are repeating for the month of June and July.However this seasonality is not consistent throughout the plots.

Intervention: There are two intervention points 1965 and 1987.

Change in variance: No change in variance can be observed here.

Autocorrelation: Some local trends can be observed which indicate series might have Auto regressive and Moving Average like behavior.

plot(ppt, main ="Time series plot of precipitation series", ylab ="Amount of Precipitation", xlab ="Time")

points(ppt,x=time(ppt), pch=as.vector(season(ppt)))

Observation for precipitation:

Trend : We do not find proper trend in this plot.

Seasonality : There are some seasonality here .Like we can see higher values during December and January.However this values are not consistent over time.

Intervention: No intervention points can be observed.

Change in variance: No change of variance is found here.

Autocorrelation : No autocorrelation is found.

Test to check existence of nonstationarity

a) First find out ACF and PACF plots

par(mfrow=c(2,2))

acf(solar, lag.max = 24, main = "ACF of solar radiation")
pacf(solar, lag.max = 24, main = "PACF of solar radiation")

acf(ppt, lag.max = 24, main = "ACF of precipitation")
pacf(ppt, lag.max = 24, main = "PACF of precipitation")

Observations from ACF,PACF plots:

b) Now perform ADF(augmented Dickey-Fuller test) test to find if the series has stationarity.

adf.test(solar ,k=ar(solar)$order)
## Warning in adf.test(solar, k = ar(solar)$order): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  solar
## Dickey-Fuller = -4.557, Lag order = 25, p-value = 0.01
## alternative hypothesis: stationary
adf.test(ppt ,k = ar(ppt)$order)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ppt
## Dickey-Fuller = -3.2594, Lag order = 28, p-value = 0.07769
## alternative hypothesis: stationary

Observations from ADF test:

c) Perform Transformations and Seasonality Adjustments

One of the very robust and commonly used decomposition methods is the Seasonal and Trend decomposition using (STL) decomposition.We will use stl() function and try to analyze the trend ,seasonality and remainder of each variables and also perform transformation on variables.We will also analyze seasonal components using seasonality plots.

solar.decom <- stl(solar, t.window = 15, s.window ="periodic", robust =TRUE) 
plot(solar.decom, main ="Solar radiation Decomposition")

ppt.decom <- stl(ppt, t.window = 15, s.window ="periodic", robust =TRUE)
plot(ppt.decom, main ="Precipitation Decomposition")

#Seasonally Adjusted Plots

solar.season.adjusted <- seasadj(solar.decom)
plot(solar.season.adjusted)

ppt.season.adjusted <- seasadj(ppt.decom)
plot(ppt.season.adjusted)

# Again perform adf test to check stationarity

adf.test(solar.season.adjusted)
## Warning in adf.test(solar.season.adjusted): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  solar.season.adjusted
## Dickey-Fuller = -6.1528, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
adf.test(ppt.season.adjusted)
## Warning in adf.test(ppt.season.adjusted): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ppt.season.adjusted
## Dickey-Fuller = -5.7359, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary

Observation after transformation and seasonality adjustment:

Now lets check the correlation between two variables of solar dataset.

Correlation analysis between solar variables

# Check correlation between two variables

cor(solar, ppt)
## [1] -0.4540277
plot( x= ppt, y= solar , main ="Scatter plot between solar radiations and precipitation")

  • Scatter plots shows negative correlation betweeen solar and precipitation variables.

  • Also correlation coefficient between solar radiations and precipitation shows -0.4540277 indicates negative correlation.

Modelling

Time series regression methods

a) Distributed lag model

# Find out lowest AIC ,BIC MASE value

for(i in 1 : 10 ){model1 <- dlm(x =  as.vector(ppt), y =  as.vector(solar), q = i)
cat( "q =", i,"AIC =", AIC(model1$model),"BIC =", BIC(model1$model),"MASE =", MASE(model1)$MASE,"\n")}
## q = 1 AIC = 4728.713 BIC = 4746.676 MASE = 1.688457 
## q = 2 AIC = 4712.649 BIC = 4735.095 MASE = 1.675967 
## q = 3 AIC = 4688.551 BIC = 4715.478 MASE = 1.662703 
## q = 4 AIC = 4663.6 BIC = 4695.003 MASE = 1.646357 
## q = 5 AIC = 4644.622 BIC = 4680.499 MASE = 1.613848 
## q = 6 AIC = 4637.489 BIC = 4677.837 MASE = 1.607532 
## q = 7 AIC = 4632.716 BIC = 4677.532 MASE = 1.607042 
## q = 8 AIC = 4625.986 BIC = 4675.267 MASE = 1.604806 
## q = 9 AIC = 4615.084 BIC = 4668.827 MASE = 1.593121 
## q = 10 AIC = 4602.658 BIC = 4660.858 MASE = 1.577996

We can see that for value q=10, AIC ,BIC and MASE value is the lowest .So we will proceed with value of q=10.

print("Distributed lag model")
## [1] "Distributed lag model"
dlm1 <- dlm(x =  as.vector(ppt), y =  as.vector(solar), q = 10 )
summary(dlm1)
## 
## Call:
## lm(formula = model.formula, data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.9353  -5.4124  -0.7911   4.0184  30.8900 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.0105     1.0942  17.374  < 2e-16 ***
## x.t          -7.3843     1.8995  -3.887 0.000112 ***
## x.1          -0.4763     2.5395  -0.188 0.851288    
## x.2          -0.1324     2.5734  -0.051 0.958980    
## x.3           1.7902     2.5781   0.694 0.487691    
## x.4           1.9686     2.5808   0.763 0.445877    
## x.5           3.4928     2.5807   1.353 0.176402    
## x.6           0.5243     2.5787   0.203 0.838943    
## x.7           1.6762     2.5797   0.650 0.516088    
## x.8           0.9282     2.5673   0.362 0.717817    
## x.9           0.3754     2.5338   0.148 0.882272    
## x.10         -5.3798     1.8760  -2.868 0.004272 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.256 on 638 degrees of freedom
## Multiple R-squared:  0.3081, Adjusted R-squared:  0.2962 
## F-statistic: 25.82 on 11 and 638 DF,  p-value: < 2.2e-16
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 4602.658 4660.858

We will check residual components by applying residual functions as diagnostic check.

# Check the residual of models

checkresiduals(dlm1$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 588.43, df = 15, p-value < 2.2e-16
# Perform shapiro wilk test to check the normality

shapiro.test(residuals(dlm1$model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(dlm1$model)
## W = 0.94643, p-value = 1.386e-14

Observations from DLM :

  • Breusch-Godfrey test for serial correlation has p-value<0.05 which supports serial correlation at 5% level of significance.

  • Shapiro-Wilk normality test has p-value<0.05 and the histogram both fails to justify normality assumptions.So the data is not normally distributed.

  • ACF plot of residuals suggest strong seasonality and follows exponential decaying nature.So there is an autocorrelation and seasonality present in the residuals.

  • Adjusted R2 value is 0.2962 which means model only explains about 30% of the variability of solar radiations.

  • DML model does not capture completely auto correlation and seasonality .Hence we will check next model.

b) Polynomial Distributed lag model

print("Ploynomial DLM")
## [1] "Ploynomial DLM"
poly = polyDlm(x =  as.vector(ppt), y =  as.vector(solar) , q = 10 , k = 2 , show.beta = TRUE)
## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0    -5.040      0.435 -11.600 2.39e-28
## beta.1    -2.320      0.314  -7.400 4.33e-13
## beta.2    -0.188      0.251  -0.749 4.54e-01
## beta.3     1.370      0.237   5.770 1.23e-08
## beta.4     2.340      0.243   9.600 1.74e-20
## beta.5     2.720      0.248  11.000 7.32e-26
## beta.6     2.530      0.243  10.400 1.40e-23
## beta.7     1.750      0.235   7.430 3.39e-13
## beta.8     0.387      0.249   1.560 1.20e-01
## beta.9    -1.560      0.312  -4.990 7.68e-07
## beta.10   -4.090      0.433  -9.440 7.08e-20
summary(poly)
## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.012  -5.343  -1.090   4.097  31.641 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.77806    1.08588   17.29   <2e-16 ***
## z.t0        -5.04398    0.43502  -11.60   <2e-16 ***
## z.t1         3.01112    0.18300   16.45   <2e-16 ***
## z.t2        -0.29153    0.01761  -16.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.237 on 646 degrees of freedom
## Multiple R-squared:  0.3025, Adjusted R-squared:  0.2992 
## F-statistic: 93.38 on 3 and 646 DF,  p-value: < 2.2e-16

Check residuals characteristics of series as a diagnostic check.

# Check the residual of models

checkresiduals(poly$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 580.37, df = 10, p-value < 2.2e-16
# Perform shapiro wilk test to check the normality

shapiro.test(residuals(poly$model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly$model)
## W = 0.94582, p-value = 1.12e-14

Observations from polynomial DLM :

  • Here to Breusch-Godfrey test for serial correlation has p-value<0.05 which supports serial correlation at 5% level of significance.

  • Shapiro-Wilk normality test has p-value<0.05 ensure data is not normally distributed.Also histogram is not normally distributed.

  • ACF plot of residuals suggest strong seasonality and follows exponential decaying pattern.Therefore there is an autocorrelation and seasonality present.

  • Adjusted R2 value is 0.2992 which means model only explains about 30% of the variability of solar radiations which is not very good.

  • Polynomial model with lag 10 does not successfully capture complete auto correlation and seasonality .Hence we will check next model.

c) Koyck transformation

print("Koyck model")
## [1] "Koyck model"
Koyck = koyckDlm(x =  as.vector(ppt), y =  as.vector(solar))
summary(Koyck,diagnostics =T)
## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0926  -3.5961   0.3176   3.6103  14.8399 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.23925    0.76549  -2.925  0.00356 ** 
## Y.1          0.98546    0.02424  40.650  < 2e-16 ***
## X.t          5.34684    0.84383   6.336 4.37e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.814 on 656 degrees of freedom
## Multiple R-Squared: 0.7598,  Adjusted R-squared: 0.7591 
## Wald test:  1104 on 2 and 656 DF,  p-value: < 2.2e-16 
## 
## Diagnostic tests:
##                  df1 df2 statistic       p-value
## Weak instruments   1 656  710.7209 1.191744e-106
## Wu-Hausman         1 655  146.8017  1.248856e-30
## 
##                              alpha     beta       phi
## Geometric coefficients:  -154.0203 5.346844 0.9854613
# Check the residual of models

checkresiduals(Koyck$model)

# Perform shapiro wilk test to check the normality

shapiro.test(residuals(Koyck$model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(Koyck$model)
## W = 0.98487, p-value = 2.468e-06

Observations from Koyck model:

  • Shapiro-Wilk normality test has p-value<0.05 ensure data is not normally distributed.Also histogram is not normally distributed.

  • ACF plot of residuals suggest strong seasonality.There are significant lags which suggests autocorrelation and seasonality still present in the residuals.

  • Adjusted R2 value is 0.7591 which means model only explains about 76% of the variability of solar radiations which is good.

  • Koyck model does not successfully capture complete auto correlation and seasonality details of the series.Hence we will check our next model.

d) Autoregressive distributed lag models

# Find the best lowest value of MASE 

for (i in 1 : 5 ){
  for (j in 1 : 5 ){
  
model4 = ardlDlm(x =  as.vector(ppt), y =  as.vector(solar), p = i , q = j)

cat( "p =" , i, "q =", j,"AIC =", AIC(model4$model),"BIC =", BIC(model4$model),"MASE =", MASE(model4)$MASE,"\n" )
}
}
## p = 1 q = 1 AIC = 3712.311 BIC = 3734.765 MASE = 0.8392434 
## p = 1 q = 2 AIC = 3239.416 BIC = 3266.352 MASE = 0.4971918 
## p = 1 q = 3 AIC = 3143.522 BIC = 3174.936 MASE = 0.4740063 
## p = 1 q = 4 AIC = 3138.399 BIC = 3174.288 MASE = 0.4697571 
## p = 1 q = 5 AIC = 3100.283 BIC = 3140.644 MASE = 0.450425 
## p = 2 q = 1 AIC = 3639.223 BIC = 3666.159 MASE = 0.7834855 
## p = 2 q = 2 AIC = 3229.051 BIC = 3260.476 MASE = 0.4951319 
## p = 2 q = 3 AIC = 3137.634 BIC = 3173.535 MASE = 0.4738939 
## p = 2 q = 4 AIC = 3132.962 BIC = 3173.337 MASE = 0.4702773 
## p = 2 q = 5 AIC = 3097.288 BIC = 3142.134 MASE = 0.4503599 
## p = 3 q = 1 AIC = 3608.793 BIC = 3640.207 MASE = 0.7572489 
## p = 3 q = 2 AIC = 3226.623 BIC = 3262.524 MASE = 0.4955334 
## p = 3 q = 3 AIC = 3139.409 BIC = 3179.798 MASE = 0.4737144 
## p = 3 q = 4 AIC = 3134.777 BIC = 3179.638 MASE = 0.4701162 
## p = 3 q = 5 AIC = 3098.808 BIC = 3148.139 MASE = 0.4502885 
## p = 4 q = 1 AIC = 3602.664 BIC = 3638.553 MASE = 0.7580664 
## p = 4 q = 2 AIC = 3224.285 BIC = 3264.66 MASE = 0.4959949 
## p = 4 q = 3 AIC = 3131.289 BIC = 3176.15 MASE = 0.4695096 
## p = 4 q = 4 AIC = 3131.424 BIC = 3180.772 MASE = 0.4665123 
## p = 4 q = 5 AIC = 3096.024 BIC = 3149.839 MASE = 0.4479481 
## p = 5 q = 1 AIC = 3599.402 BIC = 3639.764 MASE = 0.7572617 
## p = 5 q = 2 AIC = 3221.853 BIC = 3266.699 MASE = 0.4954501 
## p = 5 q = 3 AIC = 3127.103 BIC = 3176.434 MASE = 0.4675479 
## p = 5 q = 4 AIC = 3127.868 BIC = 3181.684 MASE = 0.4651969 
## p = 5 q = 5 AIC = 3097.877 BIC = 3156.177 MASE = 0.4479311

Three models with lowest values of MASE ARDL(3,5) ,ARDL(4,5) ,ARDL(5,5)

print("AutoRegressive DLM model")
## [1] "AutoRegressive DLM model"
for (i in c( 3,4,5)){
autordlm <- ardlDlm(x =  as.vector(ppt), y =  as.vector(solar), p = i, q = 5 )
summary(autordlm)

# check residuals

checkresiduals(autordlm$model)

# Perform shapiro wilk test to check the normality


}
## 
## Time series regression with "ts" data:
## Start = 6, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.6649  -1.4447  -0.2663   1.0644  18.7430 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.20532    0.42237   5.221 2.40e-07 ***
## X.t         -0.60604    0.54924  -1.103 0.270258    
## X.1          0.89253    0.77682   1.149 0.251001    
## X.2          1.60774    0.77838   2.065 0.039276 *  
## X.3         -0.38294    0.55738  -0.687 0.492308    
## Y.1          1.27345    0.03859  32.999  < 2e-16 ***
## Y.2         -0.02859    0.06250  -0.457 0.647495    
## Y.3         -0.40351    0.06036  -6.685 5.01e-11 ***
## Y.4         -0.22632    0.06234  -3.630 0.000305 ***
## Y.5          0.22116    0.03779   5.853 7.69e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.553 on 645 degrees of freedom
## Multiple R-squared:  0.9333, Adjusted R-squared:  0.9324 
## F-statistic:  1003 on 9 and 645 DF,  p-value: < 2.2e-16

## 
## Time series regression with "ts" data:
## Start = 6, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5901  -1.3826  -0.2867   1.0526  18.5709 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.46103    0.43729   5.628 2.72e-08 ***
## X.t         -0.61439    0.54768  -1.122 0.262364    
## X.1          0.78469    0.77617   1.011 0.312409    
## X.2          1.28470    0.79026   1.626 0.104509    
## X.3          0.80457    0.77946   1.032 0.302355    
## X.4         -1.20750    0.55570  -2.173 0.030148 *  
## Y.1          1.27195    0.03849  33.049  < 2e-16 ***
## Y.2         -0.01868    0.06249  -0.299 0.765112    
## Y.3         -0.40480    0.06019  -6.725 3.87e-11 ***
## Y.4         -0.23201    0.06221  -3.729 0.000209 ***
## Y.5          0.21746    0.03772   5.766 1.26e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.546 on 644 degrees of freedom
## Multiple R-squared:  0.9338, Adjusted R-squared:  0.9328 
## F-statistic: 908.5 on 10 and 644 DF,  p-value: < 2.2e-16

## 
## Time series regression with "ts" data:
## Start = 6, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5959  -1.3825  -0.2646   1.0410  18.5812 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.50740    0.45434   5.519 4.96e-08 ***
## X.t         -0.61416    0.54804  -1.121 0.262863    
## X.1          0.78299    0.77670   1.008 0.313788    
## X.2          1.26543    0.79241   1.597 0.110772    
## X.3          0.75184    0.79227   0.949 0.342998    
## X.4         -1.00181    0.77678  -1.290 0.197617    
## X.5         -0.21024    0.55439  -0.379 0.704639    
## Y.1          1.27063    0.03867  32.861  < 2e-16 ***
## Y.2         -0.01727    0.06264  -0.276 0.782907    
## Y.3         -0.40297    0.06043  -6.669 5.56e-11 ***
## Y.4         -0.23273    0.06229  -3.737 0.000203 ***
## Y.5          0.21571    0.03802   5.673 2.12e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.548 on 643 degrees of freedom
## Multiple R-squared:  0.9338, Adjusted R-squared:  0.9327 
## F-statistic: 824.9 on 11 and 643 DF,  p-value: < 2.2e-16

shapiro.test(residuals(autordlm$model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(autordlm$model)
## W = 0.90353, p-value < 2.2e-16

Observations from ARDLM:

  • Model p-value<0.05 ensure that model is statistically significant at 5% level

  • Shapiro-Wilk normality test has p-value<0.05 shows data is not normally distributed.

  • ACF plots of residuals shows seasonality seasonality but not so strong as previous models.There is autocorrelation and seasonality still present in the residuals.

  • Adjusted R2 value is 0.9327 which means model only explains about 93% of the variability of solar radiations which is very good.

  • Auto regressive model overall does not successfully capture complete auto correlation and seasonality details of the series.

We will create data frame and will capture overall AIC,BIC and MASE values for all the models we have used.

# build Three models separately which are already having lowest values of MASE ARDL(3,5) ,ARDL(4,5) ,ARDL(5,5)

model_35 <- ardlDlm(x =  as.vector(ppt), y =  as.vector(solar), p = 3 , q = 5 )
model_45 <- ardlDlm(x =  as.vector(ppt), y =  as.vector(solar), p =4 , q = 5)
model_55 <- ardlDlm(x =  as.vector(ppt), y =  as.vector(solar), p = 5 , q = 5 )

model_name <- c( "dlm1" , "poly" , "Koyck" ,"ardlm","model_35","model_45","model_55")

attr(Koyck$model,"class") ="lm"
aic<- AIC( dlm1$model , poly$model , Koyck$model ,autordlm$model ,model_35$model ,model_45$model ,model_55$model )$AIC

bic <- BIC(dlm1$model , poly$model , Koyck$model ,autordlm$model ,model_35$model ,model_45$model ,model_55$model)$BIC

mase <- MASE(dlm1$model , poly$model , Koyck$model ,autordlm ,model_35 ,model_45 ,model_55)$MASE

dataframe <- data.frame(model_name ,aic ,bic ,mase)

dataframe
##   model_name      aic      bic      mase
## 1       dlm1 4602.658 4660.858 1.5779955
## 2       poly 4591.904 4614.289 1.5925546
## 3      Koyck 3946.476 3964.439 1.0324829
## 4      ardlm 3097.877 3156.177 0.4479311
## 5   model_35 3098.808 3148.139 0.4502885
## 6   model_45 3096.024 3149.839 0.4479481
## 7   model_55 3097.877 3156.177 0.4479311

We can see that Auto regressive DLM models: model_55 that is ARDL(5,5) followed by model_45 have the lowest mase value compared to other models.But we can have even further lowest value of mase by using other models . Hence we will check for exponential smoothing methods as our next set of models.

Exponential smoothing methods

Exponential smoothing methods are used to model time series data by fitting different trend and seasonality patterns.For Exponential smoothing methods, we will build models that is having either additive or multiplicative seasonality.We will also try with Exponential Trend Method method to capture trends as we need to find out best model that fit best in terms of lowest mean absolute scaled error MASE value.Although our time series suggests existence of seasonality along with trend but still we will experiment with all methods like Holt exponential trend and Holt winters trend and seasonality method both to get every possible details of series and finally judge best fit model.

Hence we will apply simple exponential smoothing ,Holt’s Exponential Trend Method method , Holt-Winter’s additive and Holt-Winter’s multiplicative methods.

a) Simple Exponential smoothing

#Simple Exponential smoothing

simpleEM <- ses(solar, alpha=0.1, initial="simple", h=5)
summary(simpleEM)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = solar, h = 5, initial = "simple", alpha = 0.1) 
## 
##   Smoothing parameters:
##     alpha = 0.1 
## 
##   Initial states:
##     l = 5.0517 
## 
##   sigma:  9.4649
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 0.1082408 9.464872 7.716889 -35.66493 63.73288 1.267744 0.8756093
## 
## Forecasts:
##          Point Forecast        Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2015       12.19562  0.065897818 24.32534 -6.355189 30.74643
## Feb 2015       12.19562  0.005400079 24.38584 -6.447712 30.83895
## Mar 2015       12.19562 -0.054798900 24.44604 -6.539779 30.93102
## Apr 2015       12.19562 -0.114703502 24.50594 -6.631395 31.02263
## May 2015       12.19562 -0.174318005 24.56556 -6.722567 31.11381
checkresiduals(simpleEM)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 3788.1, df = 22, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 24

Observations from SES:

  • From SES , we can see that it has high MASE value 1.267744.

  • Ljung-Box test shows p-value(< 0.05) which suggests autocorrelation is still present in the residuals.

  • ACF plots for residuals suggest existence of strong seasonality.Significant lags also indicate presence of auto correlation.

  • Histogram shows no sign of normal distribution.

Still we do not find any proper evidence from this model to prove if it is best fit model.Hence we shall proceed with next model methods.

b) Holt’s Exponential Trend Method method

# Holt's Exponential Trend Method method


holt_fit1 <- holt(solar, alpha=0.8, beta=0.1, initial="simple", h=2) # Set alpha and beta
summary(holt_fit1)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = solar, h = 2, initial = "simple", alpha = 0.8, beta = 0.1) 
## 
##   Smoothing parameters:
##     alpha = 0.8 
##     beta  = 0.1 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.3641 
## 
##   sigma:  5.5781
## Error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.04029953 5.578101 4.769722 -4.741421 33.28953 0.7835783
##                   ACF1
## Training set 0.7545459
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2015       4.717014 -2.431610 11.86564  -6.215863 15.64989
## Feb 2015       3.953301 -5.664189 13.57079 -10.755381 18.66198
checkresiduals(holt_fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4939.2, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
holt_fit2 <- holt(solar, initial="simple", h=2) # Let the software estimate both alpha and beta
summary(holt_fit2)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = solar, h = 2, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.9165 
##     beta  = 1 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.3641 
## 
##   sigma:  3.6493
## Error measures:
##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set -0.004941411 3.649286 2.806374 6.556498 21.13578 0.4610361
##                    ACF1
## Training set 0.06518525
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 2015       3.375538 -1.301210  8.052286  -3.77693 10.52801
## Feb 2015       1.750718 -8.358924 11.860360 -13.71065 17.21208
checkresiduals(holt_fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 1096.3, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
holt_fit3 <- holt(solar, alpha=0.2, beta=0.2, initial="simple", h=2) # Brown’s double exponential smoothing
summary(holt_fit3)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = solar, h = 2, initial = "simple", alpha = 0.2, beta = 0.2) 
## 
##   Smoothing parameters:
##     alpha = 0.2 
##     beta  = 0.2 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.3641 
## 
##   sigma:  10.603
## Error measures:
##                       ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -0.05694535 10.60297 8.782824 -32.33218 70.05492 1.442858
##                   ACF1
## Training set 0.8707233
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2015       11.76619 -1.822063 25.35444  -9.015248 32.54762
## Feb 2015       11.62693 -3.008061 26.26193 -10.755358 34.00922
checkresiduals(holt_fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4263.5, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
holt_fit4 <- holt(solar,  beta=0, initial="simple", h=2) # Simple exponential smoothing with drift
summary(holt_fit4)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = solar, h = 2, initial = "simple", beta = 0) 
## 
##   Smoothing parameters:
##     alpha = 1 
##     beta  = 0 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.3641 
## 
##   sigma:  4.768
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set -1.363956 4.768029 3.921249 -16.14931 29.79597 0.6441896 0.6677846
## 
## Forecasts:
##          Point Forecast      Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2015       6.512383  0.4019083 12.62286 -2.832781 15.85755
## Feb 2015       7.876486 -0.7650305 16.51800 -5.339573 21.09254
checkresiduals(holt_fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4227.3, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
holt_fit5 <- holt(solar, alpha=0.1, beta=0.8, initial="simple", exponential=TRUE, h=2) # Fit with exponential trend
summary(holt_fit5)
## 
## Forecast method: Holt's method with exponential trend
## 
## Model Information:
## Holt's method with exponential trend 
## 
## Call:
##  holt(y = solar, h = 2, initial = "simple", exponential = TRUE,  
## 
##  Call:
##      alpha = 0.1, beta = 0.8) 
## 
##   Smoothing parameters:
##     alpha = 0.1 
##     beta  = 0.8 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.27 
## 
##   sigma:  0.7396
## Error measures:
##                     ME    RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set -8.188372 22.9669 15.15821 -107.7281 138.2448 2.490218 0.9264016
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80 Lo 95     Hi 95
## Jan 2015       36.78143 1.594111 72.09449     0  90.82705
## Feb 2015       39.29194 1.611659 98.56586     0 151.30345
checkresiduals(holt_fit5)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method with exponential trend
## Q* = 2388.6, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
holt_fit6 <-holt(solar,alpha=0.8,beta=0.2,damped=TRUE,initial="simple",h=2) # Fit with additive damped trend
summary(holt_fit6)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = solar, h = 2, damped = TRUE, initial = "simple", alpha = 0.8,  
## 
##  Call:
##      beta = 0.2) 
## 
##   Smoothing parameters:
##     alpha = 0.8 
##     beta  = 0.2 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 2.8122 
##     b = 1.3497 
## 
##   sigma:  4.9808
## 
##      AIC     AICc      BIC 
## 6407.229 6407.291 6425.198 
## 
## Error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.008393097 4.961862 4.173557 -1.375292 28.67393 0.6856393
##                  ACF1
## Training set 0.707703
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2015       4.114067 -2.269040 10.49717  -5.648052 13.87619
## Feb 2015       3.160864 -5.687513 12.00924 -10.371561 16.69329
checkresiduals(holt_fit6)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt's method
## Q* = 4748.3, df = 19, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 24

Observations from Holt’s trend method:

  • Holt’s Exponential Trend Method method has lowest MASE value of 0.4610361 which is is holt_fit2 that is with h = 5, initial = “simple”.

  • Ljung-Box test shows p-value(< 0.05) which suggests autocorrelation is still present in the residuals.

  • None of the histogram satisfy any normal distribution.

  • ACF plots for residuals indicates seasonality as well as significant lags at different point indicates autocorrelation.

  • We can see all six ACF plots does not capture trends, autocorrelation and seasonality perfectly.This could be captured with Holt Winters Trend and Seasonality Method.Hence we will move with this method in our next section.

c) Holt Winters Trend and Seasonality Method

#Holt Winters Trend and Seasonality Method



holtWinter.model1 <- hw(solar,seasonal="additive",h = 2*frequency(solar))
summary(holtWinter.model1)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.9993 
##     beta  = 0.0019 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 11.3306 
##     b = 0.209 
##     s = -10.7542 -8.4968 -3.227 3.0768 7.9264 10.9122
##            10.1422 7.3322 2.2353 -1.9528 -7.3626 -9.8316
## 
##   sigma:  2.3575
## 
##      AIC     AICc      BIC 
## 5434.708 5435.662 5511.076 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE    MASE      ACF1
## Training set -0.1179476 2.328736 1.504489 -2.230359 12.68101 0.24716 0.1703355
## 
## Forecasts:
##          Point Forecast      Lo 80     Hi 80      Lo 95    Hi 95
## Jan 2015       6.127399  3.1061581  9.148641   1.506810 10.74799
## Feb 2015       8.655602  4.3802222 12.930983   2.116973 15.19423
## Mar 2015      14.122221  8.8814747 19.362968   6.107191 22.13725
## Apr 2015      18.366625 12.3095941 24.423656   9.103196 27.63005
## May 2015      23.522264 16.7439512 30.300578  13.155729 33.88880
## Jun 2015      26.390991 18.9586817 33.823300  15.024255 37.75773
## Jul 2015      27.219228 19.1837595 35.254697  14.930039 39.50842
## Aug 2015      24.290519 15.6920179 32.889020  11.140246 37.44079
## Sep 2015      19.495874 10.3670365 28.624711   5.534522 33.45723
## Oct 2015      13.253765  3.6218809 22.885648  -1.476930 27.98446
## Nov 2015       8.042144 -2.0695730 18.153861  -7.422393 23.50668
## Dec 2015       5.840110 -4.7313921 16.411612 -10.327607 22.00783
## Jan 2016       6.819616 -4.1942247 17.833457 -10.024600 23.66383
## Feb 2016       9.347819 -2.0927719 20.788410  -8.149055 26.84469
## Mar 2016      14.814438  2.9609172 26.667959  -3.313958 32.94283
## Apr 2016      19.058842  6.8048110 31.312872   0.317919 37.79976
## May 2016      24.214481 11.5711780 36.857784   4.878218 43.55074
## Jun 2016      27.083208 14.0608586 40.105557   7.167243 46.99917
## Jul 2016      27.911445 14.5194061 41.303485   7.430089 48.39280
## Aug 2016      24.982736 11.2296053 38.735867   3.949138 46.01633
## Sep 2016      20.188091  6.0818048 34.294377  -1.385612 41.76179
## Oct 2016      13.945981 -0.5061083 28.398071  -8.156582 36.04855
## Nov 2016       8.734361 -6.0566988 23.525420 -13.886613 31.35533
## Dec 2016       6.532327 -8.5913309 21.655984 -16.597311 29.66196
checkresiduals(holtWinter.model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 205.55, df = 8, p-value < 2.2e-16
## 
## Model df: 16.   Total lags used: 24
holtWinter.model2 <- hw(solar,seasonal="additive" ,damped =TRUE,h = 2*frequency(solar))
summary(holtWinter.model2)
## 
## Forecast method: Damped Holt-Winters' additive method
## 
## Model Information:
## Damped Holt-Winters' additive method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "additive",  
## 
##  Call:
##      damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80       Lo 95    Hi 95
## Jan 2015       5.945198  2.940530  8.949866   1.3499548 10.54044
## Feb 2015       8.479778  4.230548 12.729009   1.9811413 14.97842
## Mar 2015      13.784309  8.579902 18.988716   5.8248548 21.74376
## Apr 2015      17.598568 11.588776 23.608360   8.4073847 26.78975
## May 2015      22.632258 15.912803 29.351713  12.3557383 32.90878
## Jun 2015      25.599150 18.238024 32.960277  14.3412786 36.85702
## Jul 2015      26.761775 18.810497 34.713053  14.6013444 38.92221
## Aug 2015      23.722251 15.221610 32.222892  10.7216429 36.72286
## Sep 2015      18.222624  9.205955 27.239293   4.4328191 32.01243
## Oct 2015      12.293259  2.788471 21.798047  -2.2430606 26.82958
## Nov 2015       7.504219 -2.464877 17.473315  -7.7421977 22.75064
## Dec 2015       5.150488 -5.262286 15.563263 -10.7744758 21.07545
## Jan 2016       5.947275 -4.891175 16.785726 -10.6287044 22.52326
## Feb 2016       8.481728 -2.766251 19.729707  -8.7205712 25.68403
## Mar 2016      13.786139  2.142988 25.429291  -4.0205242 31.59280
## Apr 2016      17.600287  5.574906 29.625668  -0.7909464 35.99152
## May 2016      22.633871 10.238009 35.029734   3.6760353 41.59171
## Jun 2016      25.600665 12.845046 38.356283   6.0926298 45.10870
## Jul 2016      26.763197 13.657667 39.868726   6.7200186 46.80637
## Aug 2016      23.723586 10.277223 37.169950   3.1591478 44.28802
## Sep 2016      18.223877  4.445085 32.002669  -2.8489663 39.29672
## Oct 2016      12.294435 -1.808972 26.397843  -9.2748652 33.86374
## Nov 2016       7.505324 -6.915414 21.926061 -14.5492911 29.55994
## Dec 2016       5.151525 -9.579726 19.882776 -17.3779790 27.68103
checkresiduals(holtWinter.model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 210.76, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24
holtWinter.model3 <- hw(solar,seasonal= "multiplicative",h = 2*frequency(solar))
summary(holtWinter.model3)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.8251 
##     beta  = 1e-04 
##     gamma = 0.0259 
## 
##   Initial states:
##     l = 10.5659 
##     b = -0.2371 
##     s = 0.5024 0.6292 0.9319 1.2229 1.4923 1.6309
##            1.5606 1.3672 1.0278 0.8128 0.5106 0.3114
## 
##   sigma:  0.3971
## 
##      AIC     AICc      BIC 
## 6648.746 6649.699 6725.114 
## 
## Error measures:
##                      ME    RMSE      MAE        MPE     MAPE      MASE
## Training set 0.04281136 2.12539 1.359297 -0.4896895 10.87401 0.2233077
##                    ACF1
## Training set 0.06551473
## 
## Forecasts:
##          Point Forecast        Lo 80     Hi 80       Lo 95     Hi 95
## Jan 2015       5.610335   2.75534413  8.465326   1.2440033  9.976666
## Feb 2015       6.512806   2.03809262 10.987520  -0.3306776 13.356290
## Mar 2015       8.786120   1.33667747 16.235562  -2.6068190 20.179058
## Apr 2015      10.192785  -0.02672876 20.412298  -5.4366123 25.822181
## May 2015      12.512597  -1.96157245 26.986766  -9.6237347 34.648928
## Jun 2015      14.061609  -4.41048190 32.533701 -14.1890164 42.312235
## Jul 2015      14.502088  -6.89909775 35.903274 -18.2282011 47.232377
## Aug 2015      12.945620  -8.35024040 34.241481 -19.6235881 45.514829
## Sep 2015      10.352210  -8.52365174 29.228072 -18.5159294 39.220350
## Oct 2015       7.418279  -7.51119146 22.347750 -15.4143760 30.250935
## Nov 2015       4.989730  -6.05900058 16.038460 -11.9078451 21.887305
## Dec 2015       3.871776  -5.53876625 13.282319 -10.5204066 18.263960
## Jan 2016       4.199400  -7.01753622 15.416336 -12.9554236 21.354224
## Feb 2016       4.838892  -9.30305408 18.980837 -16.7893479 26.467131
## Mar 2016       6.477174 -14.21845674 27.172805 -25.1740619 38.128410
## Apr 2016       7.452633 -18.56883703 33.474103 -32.3437711 47.249037
## May 2016       9.069749 -25.52978297 43.669280 -43.8456686 61.985166
## Jun 2016      10.099485 -31.99838423 52.197355 -54.2836502 74.482621
## Jul 2016      10.315199 -36.68017471 57.310572 -61.5580226 82.188420
## Aug 2016       9.113765 -36.29212001 54.519651 -60.3285438 78.556075
## Sep 2016       7.208700 -32.09306283 46.510463 -52.8981594 67.315559
## Oct 2016       5.105869 -25.38360449 35.595343 -41.5237568 51.735495
## Nov 2016       3.391945 -18.81647619 25.600367 -30.5729044 37.356795
## Dec 2016       2.597256 -16.07148862 21.266001 -25.9541251 31.148637
checkresiduals(holtWinter.model3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 8, p-value = 5.868e-06
## 
## Model df: 16.   Total lags used: 24
holtWinter.model4 <- hw(solar,seasonal= "multiplicative",exponential =TRUE,h = 2*frequency(solar))
summary(holtWinter.model4)
## 
## Forecast method: Holt-Winters' multiplicative method with exponential trend
## 
## Model Information:
## Holt-Winters' multiplicative method with exponential trend 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",  
## 
##  Call:
##      exponential = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.6499 
##     beta  = 1e-04 
##     gamma = 0.0597 
## 
##   Initial states:
##     l = 9.3759 
##     b = 0.9889 
##     s = 0.3552 0.4789 0.952 1.0553 1.4608 1.7926
##            1.6746 1.4529 1.1145 0.8672 0.5333 0.2627
## 
##   sigma:  0.3755
## 
##      AIC     AICc      BIC 
## 6584.208 6585.161 6660.576 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE      MASE     ACF1
## Training set 0.06388152 2.208727 1.412454 -1.303792 11.28449 0.2320404 0.153871
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 2015       5.797870 3.0733184  8.686782 1.5878899 10.10585
## Feb 2015       7.513359 3.3140722 12.106447 1.8309330 15.08375
## Mar 2015      10.758212 4.2505477 18.223080 2.1563079 24.02990
## Apr 2015      12.694087 4.5288892 23.001713 2.2615148 31.60325
## May 2015      15.340074 4.9828722 28.558224 2.4478434 40.23394
## Jun 2015      17.239431 5.0811223 33.849317 2.4716506 49.21378
## Jul 2015      18.004700 4.8175360 35.583258 2.2593016 54.92168
## Aug 2015      16.206596 3.9376405 33.258180 1.7576300 53.10690
## Sep 2015      13.011256 2.8521979 27.356893 1.2298518 43.72759
## Oct 2015       9.215869 1.8593033 19.574796 0.7985775 33.79683
## Nov 2015       6.214762 1.1485557 13.585864 0.5107668 23.66196
## Dec 2015       4.565667 0.7861898 10.121305 0.3290951 18.32546
## Jan 2016       5.220661 0.8241418 11.947404 0.3314160 21.56768
## Feb 2016       6.765364 0.9909891 15.437474 0.3908224 30.54529
## Mar 2016       9.687175 1.3733224 22.353943 0.5306779 43.78711
## Apr 2016      11.430324 1.4894329 27.437160 0.5204822 55.75722
## May 2016      13.812889 1.6899950 32.583598 0.5844171 67.84726
## Jun 2016      15.523155 1.7570876 36.822051 0.6091007 74.73526
## Jul 2016      16.212237 1.7216636 39.105233 0.5648795 82.78712
## Aug 2016      14.593143 1.4750655 35.486270 0.5587706 75.11535
## Sep 2016      11.715916 1.0787876 28.018263 0.3936601 63.36130
## Oct 2016       8.298381 0.7335342 20.392315 0.2353507 47.04951
## Nov 2016       5.596049 0.4670756 13.817120 0.1513538 31.70069
## Dec 2016       4.111130 0.3044734  9.872080 0.1014132 23.46774
checkresiduals(holtWinter.model4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method with exponential trend
## Q* = 21.246, df = 8, p-value = 0.006521
## 
## Model df: 16.   Total lags used: 24

Observations from Holt Winters Trend and Seasonality Method :

  • In terms of MASE value, model with seasonal = “multiplicative” gives the lowest value of 0.2233077 that is holtWinter.model3. Next lowest model is holtWinter.model4 with seasonal = “multiplicative”, exponential = TRUE and MASE value of 0.2320404.

  • Ljung-Box test for both multiplicative and multiplicative exponential models has p-value (< 0.05) which suggests autocorrelation is still present in the residuals.

  • Histograms with Holt Winters multiplicative method and multiplicative exponential method trend almost shows normal distribution.

  • ACF plots for residuals likely follows exponential decaying pattern and likely presence of auto correlation.

  • Overall this model captures better results in terms of low MASE value compared to other models,also gives better result in terms of serial correlation and seasonality in residuals.Hence it is successful at capturing autocorrelation and seasonality in solar series.

State space Modelling

State space modelling consists of three components error, trend and seasonality.For each exponential smoothing method, there are two corresponding state-space models according to their error terms one is additive and other is multiplicative.Here we will apply eight different types state space models like AAN which means additive errors, additive trend and no seasonality ,AAA ,MMN etc.

model1.ets <- ets(solar, model ="AAN")
summary(model1.ets)
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = solar, model = "AAN") 
## 
##   Smoothing parameters:
##     alpha = 0.9279 
##     beta  = 0.9279 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 13.7196 
##     b = 2.5786 
## 
##   sigma:  3.4657
## 
##      AIC     AICc      BIC 
## 5932.524 5932.653 5959.478 
## 
## Training set error measures:
##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set -0.008312543 3.452593 2.638571 3.790894 19.25631 0.4334691
##                    ACF1
## Training set 0.07272876
checkresiduals(model1.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 1134, df = 19, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 24
model2.ets <- ets(solar, model ="ANA")
summary(model2.ets)
## ETS(A,N,A) 
## 
## Call:
##  ets(y = solar, model = "ANA") 
## 
##   Smoothing parameters:
##     alpha = 0.9997 
##     gamma = 3e-04 
## 
##   Initial states:
##     l = 22.6142 
##     s = -10.2887 -8.5269 -4.0534 1.7184 7.355 10.8208
##            10.4774 7.7101 2.4749 -1.7156 -6.7936 -9.1783
## 
##   sigma:  2.3884
## 
##      AIC     AICc      BIC 
## 5449.974 5450.719 5517.358 
## 
## Training set error measures:
##                      ME     RMSE     MAE       MPE     MAPE     MASE      ACF1
## Training set -0.0108645 2.362974 1.55041 -2.640142 12.95336 0.254704 0.1825724
checkresiduals(model2.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 207.44, df = 10, p-value < 2.2e-16
## 
## Model df: 14.   Total lags used: 24
model3.ets <- ets(solar, model ="MAN")
summary(model3.ets)
## ETS(M,A,N) 
## 
## Call:
##  ets(y = solar, model = "MAN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0018 
## 
##   Initial states:
##     l = 7.8559 
##     b = 2.4488 
## 
##   sigma:  0.3004
## 
##      AIC     AICc      BIC 
## 6433.364 6433.456 6455.825 
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set -1.466339 4.822041 3.964894 -17.35094 30.69019 0.6513597 0.6697971
checkresiduals(model3.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,N)
## Q* = 2007.1, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
model4.ets <- ets(solar, model ="AAA")
summary(model4.ets)
## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = solar, model = "AAA") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724
checkresiduals(model4.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24
model5.ets <- ets(solar, model ="MMN")
summary(model5.ets)
## ETS(M,M,N) 
## 
## Call:
##  ets(y = solar, model = "MMN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0063 
## 
##   Initial states:
##     l = 9.1784 
##     b = 1.1871 
## 
##   sigma:  0.3511
## 
##      AIC     AICc      BIC 
## 6609.163 6609.255 6631.624 
## 
## Training set error measures:
##                     ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set -1.644301 5.221518 4.235504 -14.9362 30.65972 0.695816 0.6833568
checkresiduals(model5.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,M,N)
## Q* = 1371.3, df = 20, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 24
model6.ets <- ets(solar, model ="MAM")
summary(model6.ets)
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = solar, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 0.7922 
##     beta  = 6e-04 
##     gamma = 0.0707 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 9.3693 
##     b = 1.2404 
##     s = 0.6953 0.3066 0.5802 0.9879 1.3594 1.5469
##            1.4723 1.4055 1.227 0.9573 0.6959 0.7656
## 
##   sigma:  0.2274
## 
##      AIC     AICc      BIC 
## 5953.502 5954.569 6034.363 
## 
## Training set error measures:
##                     ME    RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 0.1750605 3.02791 1.961614 -5.497209 17.21119 0.3222574 0.2565219
checkresiduals(model6.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,Ad,M)
## Q* = 102.28, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24
model7.ets <- ets(solar, model ="ANN")
summary(model7.ets)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = solar, model = "ANN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 5.0575 
## 
##   sigma:  4.576
## 
##      AIC     AICc      BIC 
## 6296.371 6296.407 6309.847 
## 
## Training set error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0001378357 4.569082 3.876391 -5.213129 27.30052 0.6368203
##                   ACF1
## Training set 0.6678374
checkresiduals(model7.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 4227.6, df = 22, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 24
model8.ets <- ets(solar, model ="MNN")
summary(model8.ets)
## ETS(M,N,N) 
## 
## Call:
##  ets(y = solar, model = "MNN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 4.4856 
## 
##   sigma:  0.3868
## 
##      AIC     AICc      BIC 
## 6619.776 6619.812 6633.253 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.001004352 4.569136 3.877241 -5.195978 27.31733 0.6369599
##                   ACF1
## Training set 0.6678785
checkresiduals(model8.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,N)
## Q* = 1334.8, df = 22, p-value < 2.2e-16
## 
## Model df: 2.   Total lags used: 24

Observation from state space model:

  • Smallest MASE value is with model4.ets with model = “AAA” ,MASE value of 0.2461797.

  • Ljung-Box test shows p-value < 0.05, therefore there is evidence of autocorrelation in the residuals.

  • Histogram for ETS(A,Ad,A) shows partly normal distribution.Other ets models does not exhibit normality assumptions.

  • ACF plots for residuals follows seasonality and had wavy pattern at seasonal lags, so there is autocorrelation present in the residuals.

  • Overall this model does not capture autocorrelation and seasonality successfully in solar series.

Forecasting for next 2 years

Now we have three models with lowest MASE value.

  • Holt-Winters’ multiplicative method which has the lowest MASE VALUE(0.2233077) and is the most successful at capturing autocorrelation and seasonality in the series

  • Holt-Winters’ multiplicative method with exponential trend which has the second lowest MASE VALUE(0.2320404) and is also good at capturing the autocorrelation and seasonality in the series

  • ETS(A,Ad,A) model has the third lowest MASE VALUE(0.2461797) but not fully successful in capturing autocorrelation and seasonality in the series

Now we will fit all three models to predict forecast for next 2 years of solar radiation.

#Taking the three models having lowest MASE value from above and storing it into new variables.

lowestmasemodel1 <- hw(solar,seasonal= "multiplicative",h = 2*frequency(solar)) #multiplicative

lowestmasemodel2 <- hw(solar,seasonal= "multiplicative",exponential =TRUE ,h = 2*frequency(solar))  # multiplicative exponential

lowestmasemodel3 <- ets(solar, model ="AAA" ,damped =T)  # ets


forcast <- forecast(lowestmasemodel1)
plot(forcast,fcol ="red", main ="Solar radiation series with 2 years ahead forecasts", ylab ="Solar Radiation")

lines(fitted(lowestmasemodel2), col ="darkgreen")
lines(lowestmasemodel2$mean, col ="darkgreen", lwd =2)

lines(fitted(lowestmasemodel3), col ="brown")
lines(lowestmasemodel3$mean, col ="brown", lwd =2)

lines(fitted(lowestmasemodel1), col ="dodgerblue3")
lines(forcast$mean, col ="dodgerblue3", lwd =2)

legend( "bottomleft" , lty =1, col = c("black","darkgreen","brown","dodgerblue3"), c("Data","Holt-Winters'Multiplicative Exponential","ETS(A,Ad,A)" ,"Holt-Winters' Multiplicative"))

Observations:

  • From the above forecast plot ,we can see that ETS model has the highest peak than other two model.However overall forecast shows a downward trend means amount of solar radiation will decrease over the next 2 years.

Final model for forecast for 2 years will be model ‘lowestmasemodel1’ that is Holt-Winters’ multiplicative method which has the lowest MASE value .Now we will take a look at only this model.

plot(forecast(lowestmasemodel1), fcol = "red" , main ="Solar radiation series with 2 years ahead forecasts", ylab ="Solar Radiation")

As we can see that by using Holt-Winters’ multiplicative method ,forecast suggest a downward trend that is amount of solar radiation is likely to decrease over the next two years.

# Capture forecated values

frc <- lowestmasemodel1$mean
ub <- lowestmasemodel1$upper[,2]
lb <- lowestmasemodel1$lower[,2]

forecasts <- ts.intersect(ts(lb, start = c(2015,1), frequency =12), ts(frc,start = c(2015,1), frequency =12), ts(ub,start = c(2015,1), frequency =12))
colnames(forecasts) <- c("Lower bound","Point forecast","Upper bound")
forecasts
##          Lower bound Point forecast Upper bound
## Jan 2015   1.2440033       5.610335    9.976666
## Feb 2015  -0.3306776       6.512806   13.356290
## Mar 2015  -2.6068190       8.786120   20.179058
## Apr 2015  -5.4366123      10.192785   25.822181
## May 2015  -9.6237347      12.512597   34.648928
## Jun 2015 -14.1890164      14.061609   42.312235
## Jul 2015 -18.2282011      14.502088   47.232377
## Aug 2015 -19.6235881      12.945620   45.514829
## Sep 2015 -18.5159294      10.352210   39.220350
## Oct 2015 -15.4143760       7.418279   30.250935
## Nov 2015 -11.9078451       4.989730   21.887305
## Dec 2015 -10.5204066       3.871776   18.263960
## Jan 2016 -12.9554236       4.199400   21.354224
## Feb 2016 -16.7893479       4.838892   26.467131
## Mar 2016 -25.1740619       6.477174   38.128410
## Apr 2016 -32.3437711       7.452633   47.249037
## May 2016 -43.8456686       9.069749   61.985166
## Jun 2016 -54.2836502      10.099485   74.482621
## Jul 2016 -61.5580226      10.315199   82.188420
## Aug 2016 -60.3285438       9.113765   78.556075
## Sep 2016 -52.8981594       7.208700   67.315559
## Oct 2016 -41.5237568       5.105869   51.735495
## Nov 2016 -30.5729044       3.391945   37.356795
## Dec 2016 -25.9541251       2.597256   31.148637

From the forecast values does not shows perfect trend overall, although from August 2016 on wards we can find downward trend(decreasing values of forecast) .

Solar radiation forecast using predictor dataset consists of precipitation measurements

We have dataset consist of predictor series that contains precipitation measurement for the months from January 2015 to December 2016 at the exact same locations. We will use this to forecast solar radiation for next two years. Here we will use model that is Holt-Winters’ multiplicative method to forecast which gave best result.

precipitate <- read_csv("data.x.csv")
## Rows: 24 Columns: 1
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (1): x
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(precipitate)
## # A tibble: 6 x 1
##       x
##   <dbl>
## 1 0.189
## 2 0.697
## 3 0.595
## 4 0.487
## 5 0.262
## 6 0.809
precipitate.ts = ts(precipitate$x, start = 2015,frequency =12)

Now that we already know our best model in terms of MASE for forecast is that of Holt Winters multiplicative model,so we will use that model lowestmasemodel1 to forecast.

# 'lowestmasemodel1' is our Holt-Winters’ multiplicative method which we have calculated before.We are using that model inside forecast function along with predictor dataset as x value that is predictor.

fore_preci <-forecast(lowestmasemodel1$model, x = as.vector(precipitate.ts),h=24)
fore_preci
##          Point Forecast        Lo 80     Hi 80       Lo 95     Hi 95
## Jan 2015       5.610335   2.75534413  8.465326   1.2440033  9.976666
## Feb 2015       6.512806   2.03809262 10.987520  -0.3306776 13.356290
## Mar 2015       8.786120   1.33667747 16.235562  -2.6068190 20.179058
## Apr 2015      10.192785  -0.02672876 20.412298  -5.4366123 25.822181
## May 2015      12.512597  -1.96157245 26.986766  -9.6237347 34.648928
## Jun 2015      14.061609  -4.41048190 32.533701 -14.1890164 42.312235
## Jul 2015      14.502088  -6.89909775 35.903274 -18.2282011 47.232377
## Aug 2015      12.945620  -8.35024040 34.241481 -19.6235881 45.514829
## Sep 2015      10.352210  -8.52365174 29.228072 -18.5159294 39.220350
## Oct 2015       7.418279  -7.51119146 22.347750 -15.4143760 30.250935
## Nov 2015       4.989730  -6.05900058 16.038460 -11.9078451 21.887305
## Dec 2015       3.871776  -5.53876625 13.282319 -10.5204066 18.263960
## Jan 2016       4.199400  -7.01753622 15.416336 -12.9554236 21.354224
## Feb 2016       4.838892  -9.30305408 18.980837 -16.7893479 26.467131
## Mar 2016       6.477174 -14.21845674 27.172805 -25.1740619 38.128410
## Apr 2016       7.452633 -18.56883703 33.474103 -32.3437711 47.249037
## May 2016       9.069749 -25.52978297 43.669280 -43.8456686 61.985166
## Jun 2016      10.099485 -31.99838423 52.197355 -54.2836502 74.482621
## Jul 2016      10.315199 -36.68017471 57.310572 -61.5580226 82.188420
## Aug 2016       9.113765 -36.29212001 54.519651 -60.3285438 78.556075
## Sep 2016       7.208700 -32.09306283 46.510463 -52.8981594 67.315559
## Oct 2016       5.105869 -25.38360449 35.595343 -41.5237568 51.735495
## Nov 2016       3.391945 -18.81647619 25.600367 -30.5729044 37.356795
## Dec 2016       2.597256 -16.07148862 21.266001 -25.9541251 31.148637
plot(ts(c(as.vector(solar), as.vector(fore_preci$forecasts)),start =1960,frequency =12),col ="red",xlim = c(1960,2017.5),type = 'l',ylab ="solar prediction",main ="Solar radiation 2 years ahead forecasts using precipitation measurements")

lines(solar,type = 'l')

Observation:

Forecast from predictor series shows downward trend for next two years which suggest solar radiation will decrease for given values of precipitation.

Task2 : Analyse the correlation between quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria between September 2003 and December 2016. Aim is to demonstrate whether the correlation between these two series is spurious or not.

data2 <- read_csv("data2.csv")
## Rows: 54 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Quarter
## dbl (2): price, change
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data2)
## # A tibble: 6 x 3
##   Quarter  price change
##   <chr>    <dbl>  <dbl>
## 1 Sep-2003  60.7  14017
## 2 Dec-2003  62.1  12350
## 3 Mar-2004  60.8  17894
## 4 Jun-2004  60.9   9079
## 5 Sep-2004  60.9  16210
## 6 Dec-2004  62.4  13788

Convert into time series

PPI <- ts(data2[,2:3], start = c(2003,3), frequency =4)

price <- ts(data2$price, start = c(2003,3), frequency =4)

change <- ts(data2$change, start = c(2003,3), frequency =4)
plot(PPI, main ="Time series plots of residential PPI and population change data")

plot(price,ylab = "Property Price ", type="o" ,main ="Time series plots of residential PPI price data")

plot(change,  type="o", main ="Time series plots of residential PPI population change data")

Observations:

Trend : Both price and change plot shows upward trend.

Seasonality : For change data , we can find repeating or seasonal patterns. For price data no seasonality found.

Intervention: For change, there might be intervention point around 2010.For price data,no intervention was found.

Change in variance: No change of variance found.

Analysis of non stationarity

acf(price, main = "Sample ACF of Residential PPI price")

pacf(price, main = "Sample PACF of Residential PPI price")

acf(change, main = "Sample ACF of population change")

pacf(change, main = "Sample PACF of population change")

Observations:

  • ACF plot for price suggest a decaying pattern.

  • PACF plot for price shows spike at first lag only .After that its starts decaying.

  • ACF plot for population change shows decaying pattern.

  • PACF plots for population change shows spikes mainly at first and second lag which are significant and indicates it is likely to have auto regressive of order2 (AR2).

# perform ADF test

adf.test(price)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  price
## Dickey-Fuller = -1.3264, Lag order = 3, p-value = 0.8458
## alternative hypothesis: stationary
adf.test(change)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  change
## Dickey-Fuller = -1.603, Lag order = 3, p-value = 0.7344
## alternative hypothesis: stationary

Observations:

  • ADF test for price data shows p-value 0.8458>0.05 .Hence we fail to reject null hypothesis of series being non -stationary.

  • ADF test for change data shows p-value 0.7344>0.05 .Hence we fail to reject null hypothesis of series being non -stationary.

Overall Residential PPI series is non-stationary.

Correlation Analysis

Cross correlation function(CCF) check

correlation function (CCF) helps to identify lags of the x-variable that might be useful predictors of y.Function ccf(x,y) estimates the correlation between x[t+k] and y[t].

ccf(as.vector(price), as.vector(change), ylab ="CCF", main ="Sample CCF of PPI price and population change")

Observations:

CCF plots shows cross correlations between Residential Property Price Index in Melbourne and population change in Victoria.The number of lags are significantly differs from zero.Hence there might be spurious correlation found between the series. So we will apply pre-whitening to separate the relationship between the series from their auto-correlation.we will perform both regular and seasonal difference and then apply pre-whitening.

Prewhitening

Prewhitening is applied to separate the relationship between the series from their autocorrelation. In this process ,we will apply both regular and seasonal difference and then apply pre-whitening as we need to ensure that the series must be stationary and earlier we found that both variables price and change are non stationary.

ppi_difference <- ts.intersect(diff(diff(price,4)), diff(diff(change,4)))

prewhiten(as.vector(ppi_difference[,1]), as.vector(ppi_difference[,2]), ylab='CCF', main ="Prewhitened Sample CFF ")

Observation:

After performing pre-whitening ,we found that no significant correlation exists between a Residential Property Price Index and population change in Victoria.Hence strong cross-correlation pattern that was found between Residential Property Price Index and population change was spurious.

Conclusion

  • From Task1 ,we have used different modelling techniques like regression methods ,exponential smoothing methods,state space models and tried to forecast solar radiation amount in globe for next 2 years. The forecast shows the amount of horizontal solar radiation will decrease over next 2 years. The best model found for predicting solar radiation is Holt Winters multiplicative model which is an exponential smoothing method and also have lowest MASE value .Although other models like Holt Winters multiplicative exponential model also captured good seasonality and auto regression.

  • For Task 2 ,initially we found the series is non-stationary. We also found that cross-correlation pattern between Residential Property Price Index and population change was spurious after applying cross-correlation and pre-whitening methods.Initially CFF plot suggested the existence of cross-correlation between series.But after applying pre whitening methods,no significant correlation found between a Residential Property Price Index and population change in Victoria.Hence cross-correlation pattern was found to be spurious between series.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.