Table of Contents

library(GGally) # for ggpairs()
library(TSA) # season(), prewhiten() and other functions
library(tseries) # adf.test()
library(forecast) # BoxCox.lambda()
library(dLagM) # For DLM modelling
library(car) # for vif()
library(tis) # for Lag()
library(dynlm) # for Dynamic linear modeling
library(stats) # for classical decomposition
library(x12) # X-12-ARIMA decomposition
library(lmtest) # for bgtest()

General note:

A significance level \(\alpha=5\%\) is used.

TASK 1: Forecasting Solar Radiation

Data Description

The dataset hold 2 columns and 660 observations. They are, monthly averages of horizontal solar radiation and precipitation measured at the same points between January 1960 and December 2014.

Objective

Our aim for the dataset is to give best 2 years ahead forecasts by determining the most accurate and suitable regression model that determines the monthly Solar radiation in terms of MASE. A descriptive analysis will be conducted initially. Model-building strategy will be applied to find the best fitting model from the time series regression methods (dLagM package), dynamic linear models (dynlm package), and exponential smoothing and corresponding state-space models.

Model Selection Criteria (MASE)

Out of various different error measures to assess the forecast accuracy, Mean absolute scaled error (MASE) is a generally applicable measure of forecast accuracy and is obtained by scaling the errors based on the in-sample MAE from the naive forecast method. It is the only available method which can be used in all circumstances and can be used to compare forecast accuracy between series as it is scale-free.

Read Data

solar_precipitation <- read.csv("C:/Users/admin/Downloads/data1.csv")
head(solar_precipitation)
##       solar   ppt
## 1  5.051729 1.333
## 2  6.415832 0.921
## 3 10.847920 0.947
## 4 16.930264 0.615
## 5 24.030797 0.544
## 6 26.298202 0.703

Identification of the response and the regressor variables

For fitting a regression model, the response is the monthly solar radiation (\(solar\)) and the regressor variable is the monthly precipitation (\(precipitation\)).

  • y = monthly solar radiation = \(solar\)
  • x = monthly precipitation = \(precipitation\)

Both variables are continuous variables.

Read Regressor and Response variables

Lets first get the regressor and response as TS objects,

solar = ts(solar_precipitation[,1], start=c(1960,1),frequency = 12)
precipitation = ts(solar_precipitation[,2], start=c(1960,1),frequency = 12)
data.ts = ts(solar_precipitation, start = c(1960,1), frequency = 12) # Y and x in single dataframe

Relationship between Regressor and Response variables

Lets scale, center and plot the 2 variables together,

data.scale = scale(data.ts)
plot(data.scale, plot.type="s", col=c("blue", "green"), main = "Monthly solar radiation and precipitation (Blue - Response and Green - regressor)")

It is hard to read the correlations between the regressor and the response. But it is fair to say the 2 variables show some correlations. Lets check for correlation statistically using ggpairs(),

ggpairs(data = solar_precipitation, columns = c(1,2), progress = FALSE) #library(ggally)

As shown in the matrix above, the response \(Solar radiation\) and regressor \(monthly precipitation\) are correlated by -0.454 correlation coefficient. We can generate regression model based on this correlation. First, lets look at the descriptive statistics

Descriptive Analysis

Since we are generating regression model which estimates the response, \(SolarRadiation\), lets focus on Solar radiation’s statistics

Summary statistics

summary(solar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.582   9.739  16.460  17.788  23.711  54.056

The mean and median of the Solar radiation are very close indicating symmetrical distribution.

Time Series plot:

The time series plot for our data is generated using the following code chunk,

plot(solar,ylab='Monthly solar radiation reaching ground',xlab='Year',
     main = "Time series plot of the monthly solar radiation reaching ground")
points(y=solar,x=time(solar), pch=as.vector(season(solar)), col=rainbow(12))

Plot Inference :

From the plot, we can comment on the time series’s,

  • Trend: The overall shape of the trend seems to be flat. Thus, indicating stationarity.

  • Seasonality: From the plot, seasonal behavior is quite evident every year. This needs to be confirmed using statistical tests.

  • Change in Variance: A change in variance can be seen as the variation is much higher during the earlier years than the latter.

  • Behavior: We notice mixed behavior of MA and AR series. AR behavior is dominant as we obverse more following data points. MA behavior is evident due to up and down fluctuations in the data points.

  • Intervention/Change points: No particular intervention point is noticed. This needs to be confirmed using statistical tests. Note, peak values seen in the plot (1998 July for example) are not intervention points as the mean solar radiation before and after (even after a long period) does not change. Maybe, 2013 January might be an intervention point as the variance of the series decreases drastically from this point. We need to check this by fitting a Dynamic Linear model and check if the pulse at 2013 January is significant or not.

ACF and PACF plots:

par(mfrow=c(2,1))
acf(solar, main="ACF of solar radiation")
pacf(solar, main ="PACF of solar radiation")

par(mfrow=c(1,1))
  • ACF plot: We notice multiple autocorrelations are significant. We see a wave pattern. Thus, seasonal behavior is observed. We observed seasonality in time series plot as well.

  • PACF plot: We see 1 high vertical spike indicating non stationary series. Also, the second correlation bar is significant as well.

Check normality

Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.

qqnorm(solar, main = "Normal Q-Q Plot of Solar Radiation Time Series")
qqline(solar, col = 2)

We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,

\(H_0\) : Time series is Normally distributed
\(H_a\) : Time series is not normal

shapiro.test(solar)
## 
##  Shapiro-Wilk normality test
## 
## data:  solar
## W = 0.93637, p-value = 3.641e-16

From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, solar radiation series is not normally distributed.

Test Stationarity

The PACF plot of Solar radiation the time series at the descriptive analysis stage of time series tells us nonstationarity in our time series. Lets use ADF and PP tests,

Using ADF (Augmented Dickey-Fuller) test :

Lets confirm the non-stationarity using Dickey-Fuller Test or ADF test. Lets state the hypothesis,

\(H_0\) : Time series is Difference non-stationary
\(H_a\) : Time series is Stationary

adf.test(solar) #library(tseries)
## Warning in adf.test(solar): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  solar
## Dickey-Fuller = -4.7661, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary

since p-value < 0.05, we reject null hypothesis of non stationarity. we can conclude that the series is stationary at 5% level of significance.

Using PP (Phillips-Perron) test :

The null and alternate hypothesis are same as ADF test.

PP.test(solar)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  solar
## Dickey-Fuller = -8.4247, Truncation lag parameter = 6, p-value = 0.01

According to the PP tests, solar radiation series is stationary at 5% level

Conclusion from descriptive analysis:

  • From the time series plot, ACF plot, ADF and PP tests, we found our ASX Price Index response is stationary. Differencing is not required.
  • Trend is not normal. Thus Box-cox transformation is required.

Lets perform with Box-Cox transformation,

Transformations

Box-Cox transformation to improve normality

To improve normality in our Solar Radiation time series, lets test Box-Cox transformations on the series

lambda = BoxCox.lambda(solar, method = "loglik") # library(forecast)
BC.solar = BoxCox(solar, lambda = lambda)

Check Normality of BC transformed Solar Radiation series

Visually comparing the time series plots before and after box-cox transformation,

par(mfrow=c(2,1))
plot(BC.solar,ylab='Monthly Solar Radiation',xlab='Time',
     type='o', main="Box-Cox Transformed Solar Radiation Time Series")
points(y=BC.solar,x=time(BC.solar))
plot(solar,ylab='ASX Price Index',xlab='Time',
     type='o', main="Original Solar Radiation Time Series")
points(y=solar,x=time(solar))

par(mfrow=c(1,1))

From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,

shapiro.test(BC.solar)
## 
##  Shapiro-Wilk normality test
## 
## data:  BC.solar
## W = 0.98834, p-value = 4.126e-05

From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed Solar radiation is not normal.

Conclusion after BC transformation

The BC transformed Solar radiation series is Stationary and not normal. BC transformation was not effective.

Decomposition

To observe the individual effects of the existing components and historical effects occurred in the past, lets perform decomposition of the Solar Radiation time series. The time series can be decomposed into are seasonal, trend and remainder components.

Three main decomposition methods for time series are known, classical decomposition, X-12-ARIMA decomposition and Loess (STL) decomposition.

Classical decomposition

To implement the classical decomposition, we use decompose() function from the stats package.

fit.classical.add <- decompose(solar, type="additive")
plot(fit.classical.add)

fit.classical.mult <- decompose(solar, type="multiplicative")
plot(fit.classical.mult)

Notice that a trend component was extracted from the series which wasn’t identified during the descriptive analysis stage. We got very similar estimates from the additive and multiplicative methods. It seems from the remainder series that the additive model handles the decomposition slightly better (mean much closer to zero). This hints that our final model would have additive components rather than multiplicative as per classical decomposition for our Solar Radiation series.

X-12-ARIMA decomposition

Although using the classical decomposition, a trend component was identified, we do not know how significant this trend effect is. X-12-ARIMA decomposition can help us detect the significance of the trend component. We use the function x12() to implement the X-12-ARIMA decomposition.

fit.x12 = x12(solar)
plot(fit.x12 , sa=TRUE , trend=TRUE)

From the plot, we can see the Seasonally Adjusted series does not follow the original series closely. This implies, that the original series is affected by seasonality a lot, meaning seasonality is highly significant in the Solar Radiation series. On the contrary, the Trend series does not deviate a lot from the original series (especially at the ups and downs), implying the Original series has not so significant trend component.

STL decomposition

Another decomposition method, STL has advantage of flexibility over classical decomposition. stl() function is used with t.window and s.window arguments to control rate of change of trend and seasonal components. Lets set t.window to 15 and look the STL decomposed plots,

We can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,

# Code gist - Apply STL decomposition to get seasonally adjusted and trend adjusted and visually compare w.r.t to original time series

stl.solar <- stl(window(solar,start=c(1960,1)), t.window=15, s.window="periodic", robust=TRUE)

par(mfrow=c(3,1))

plot(seasadj(stl.solar), ylab='Solar Radiation',xlab='Time', main = "Seasonally adjusted Solar Radiation")
points(y=seasadj(stl.solar),x=time(seasadj(stl.solar)), pch=as.vector(season(seasadj(stl.solar))))

plot(solar,ylab='Solar Radiation',xlab='Time',
     type='o', main="Original Solar Radiation Time Series")
points(y=solar,x=time(solar), pch=as.vector(season(solar)))

stl.solar.trend = stl.solar$time.series[,"trend"] # Extract the trend component from the output
stl.solar.trend.adjusted = solar - stl.solar.trend

plot(stl.solar.trend.adjusted, ylab='Solar Radiation',xlab='Time', main = "Trend adjusted Solar Radiation")
points(y=stl.solar.trend.adjusted,x=time(stl.solar.trend.adjusted), pch=as.vector(season(stl.solar.trend.adjusted)))

par(mfrow=c(1,1))

The Seasonally adjusted series is completely different from the original series, implying the Seasonal component of the series is the most influential component of the original series. No change is visually seen in the trend adjusted series compared to the original series. Thus, trend component is insignificant.

Conclusion of Decomposition

Trend component is insignificant when comparing trend adjusted series with the original series. The seasonal component of the series is the most influential component of the original series.

Modeling

Time series regression methods namely,

  • A. Distributed lag models (dLagM package),
  • B. Dynamic linear models (dynlm package)
  • C. Exponential smoothing,
  • D. and corresponding state-space models will be considered.

A. Distributed lag models

Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,

  • Basic Finite Distributed lag model,
  • Polynomial DLM,
  • Koyck transformed geometric DLM,
  • and Autoregressive DLM.

Fit Finite DLM

The response of a finite DLM model with 1 regressor is represented as shown below,

\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s X_{t-s} + \epsilon_t\)

where,

  • \(\alpha\) is intercept
  • \(\beta_s\) is coefficient of s lagged response \(X_t\)
  • and \(\epsilon_t\) is the error term

Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,

finiteDLMauto(formula = solar ~ ppt, data = solar_precipitation, q.min = 1, q.max = 12,
              model.type = "dlm", error.type = "AIC", trace = TRUE)
##    q - k    MASE      AIC      BIC   GMRAE     MBRAE R.Adj.Sq Ljung-Box
## 12    12 1.55160 4578.787 4645.895 1.32250   0.35491  0.30769         0
## 11    11 1.56213 4590.961 4653.617 1.32784   0.42614  0.30199         0
## 10    10 1.57800 4602.658 4660.858 1.36315   0.09680  0.29615         0
## 9      9 1.59312 4615.084 4668.827 1.36372   2.68197  0.28882         0
## 8      8 1.60481 4625.986 4675.267 1.39224   1.79485  0.28251         0
## 7      7 1.60704 4632.716 4677.532 1.39082 -60.50677  0.28096         0
## 6      6 1.60753 4637.489 4677.837 1.39182   0.52400  0.28214         0
## 5      5 1.61385 4644.622 4680.499 1.39327   0.55267  0.28072         0
## 4      4 1.64636 4663.600 4695.003 1.46450  -1.04420  0.26577         0
## 3      3 1.66270 4688.551 4715.478 1.44133  -1.39824  0.24326         0
## 2      2 1.67597 4712.649 4735.095 1.42249   0.14593  0.22173         0
## 1      1 1.68846 4728.713 4746.676 1.43103  -1.07642  0.21036         0

q = 12 has the smallest AIC and BIC scores. Fit model with q = 12,

DLM.model = dlm(formula = solar ~ ppt, data = solar_precipitation, q = 12)

Note - We are using solar and not the BC.solar (BC transformed solar radiation series) as normality is violated in both of these.

Diagnostic check for DLM.model (Residual analysis):

We can apply a diagnostic check using checkresiduals() function from the forecast package.

checkresiduals(DLM.model$model$residuals) # forecast package

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 1045.6, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

In this output,

  • from the time series plot and histogram of residuals, there is an obvious non-random pattern and very high residual values that violate general assumptions.
  • the Ljung-Box test output is displayed. According to this test, the null hypothesis that a series of residuals exhibits no autocorrelation up-to lag 10 is violated. According to this test and ACF plot, we can conclude that the serial correlation left in residuals is highly significant.

Model Summary for Finite DLM model (DLM.model) :

summary(DLM.model)
## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.563  -5.239  -0.796   4.137  32.430 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.5164     1.1151  17.501  < 2e-16 ***
## ppt.t        -5.8876     1.9508  -3.018  0.00265 ** 
## ppt.1         0.9993     2.5647   0.390  0.69694    
## ppt.2         0.4343     2.5571   0.170  0.86520    
## ppt.3         1.8763     2.5580   0.734  0.46352    
## ppt.4         1.7459     2.5587   0.682  0.49529    
## ppt.5         3.3279     2.5601   1.300  0.19410    
## ppt.6         0.7751     2.5617   0.303  0.76230    
## ppt.7         1.7937     2.5615   0.700  0.48402    
## ppt.8         0.2827     2.5593   0.110  0.91207    
## ppt.9        -1.1022     2.5615  -0.430  0.66712    
## ppt.10       -1.9333     2.5508  -0.758  0.44880    
## ppt.11       -0.5613     2.5532  -0.220  0.82605    
## ppt.12       -5.3492     1.9216  -2.784  0.00553 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.181 on 634 degrees of freedom
## Multiple R-squared:  0.3216, Adjusted R-squared:  0.3077 
## F-statistic: 23.12 on 13 and 634 DF,  p-value: < 2.2e-16
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 4578.787 4645.895
  • Finite DLM model is significant
  • R-squared is 32.16%, Adjusted R-squared is 30.77%

Lets consider the effect of collinearity on these results. To inspect this issue, we will display variance inflation factors (VIFs). If the value of VIF is greater than 10, we can conclude that the effect of multicollinearity is high.

vif(DLM.model$model) # variance inflation factors #library(car)
##    ppt.t    ppt.1    ppt.2    ppt.3    ppt.4    ppt.5    ppt.6    ppt.7 
## 4.432762 7.774629 7.820758 7.914873 7.941510 7.944820 7.943359 7.929999 
##    ppt.8    ppt.9   ppt.10   ppt.11   ppt.12 
## 7.916836 7.921508 7.867385 7.889225 4.508273
  • Multicollinearity is insignificant.
MASE(DLM.model)
##             MASE
## DLM.model 1.5516
Conclusion of Finite DLM model
  • Model is significant.
  • MASE is 1.5516
  • Low R-squared value of 32.16% suggests bad fit. Adjusted R-squared is 30.77%.
  • violations in the test of assumptions
  • Serial autocorrelation is significant
  • Multicollinearity is insignificant.

ATTENTION - Lets summarise the models from here on and not go into each models details for simplicity

Fit Polynomial DLM model

Polynomial DLM model helps remove the effect of multicollinearity, but our data has insignificant multicollinearity. lets fit polynomial DLM model of order 2 anyways,

PolyDLM = polyDlm(x = as.vector(precipitation), y = as.vector(solar), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM)
## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.689  -5.452  -0.686   4.129  32.797 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.22679    1.10784  17.355  < 2e-16 ***
## z.t0        -1.81945    0.39287  -4.631  4.4e-06 ***
## z.t1         1.57478    0.13106  12.015  < 2e-16 ***
## z.t2        -0.15708    0.01019 -15.409  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.175 on 644 degrees of freedom
## Multiple R-squared:  0.3119, Adjusted R-squared:  0.3087 
## F-statistic: 97.31 on 3 and 644 DF,  p-value: < 2.2e-16
checkresiduals(PolyDLM$model$residuals)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 1054.6, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10
MASE(PolyDLM)
##             MASE
## PolyDLM 1.563035
Conclusion of Polynomial DLM model
  • model is significant
  • MASE is higher 1.563035 (vs 1.5516)
  • Adjusted R-squared improved to 30.87% (vs 30.77)
  • violations in the test of assumptions
  • Serial autocorrelation is significant
  • Multicollinearity is insignificant.

Fit Koyck geometric DLM model

Here the lag weights are positive and decline geometrically. This model is called infinite geometric DLM, meaning there are infinite lag weights. Koyck transformation is applied to implement this infinite geometric DLM model by subtracting the first lag of geometric DLM multiplied by \(\phi\). The Koyck transformed model is represented as,

\(Y_t = \delta_1 + \delta_2Y_{t-1} + \nu_t\)

where \(\delta_1 = \alpha(1-\phi), \delta_2 = \phi, \delta_3 = \beta\) and the random error after the transformation is \(\nu_t = (\epsilon_t -\phi\epsilon_{t-1})\).

The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models using precipitation regressor,

Koyck = koyckDlm(x = as.vector(precipitation) , y = as.vector(solar) )
summary(Koyck$model, diagnostics = TRUE)
## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0926  -3.5961   0.3176   3.6103  14.8399 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.23925    0.76549  -2.925  0.00356 ** 
## Y.1          0.98546    0.02424  40.650  < 2e-16 ***
## X.t          5.34684    0.84383   6.336 4.37e-10 ***
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 656     710.7  <2e-16 ***
## Wu-Hausman         1 655     146.8  <2e-16 ***
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.814 on 656 degrees of freedom
## Multiple R-Squared: 0.7598,  Adjusted R-squared: 0.7591 
## Wald test:  1104 on 2 and 656 DF,  p-value: < 2.2e-16
checkresiduals(Koyck$model$residuals)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 1413.2, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10
MASE(Koyck)
##           MASE
## Koyck 1.032483
Conclusion of Koyck DLM model
  • model is significant
  • MASE is reduced 1.032483 (vs 1.5516)
  • Adjusted R-squared improved to 75.91% (vs 30.87)
  • violations in the test of assumptions
  • Serial autocorrelation is significant
  • Multicollinearity is insignificant.

Fit Autoregressive Distributed Lag Model

Autoregressive Distributed lag model is a flexible and parsimonious infinite DLM. The model is represented as,

\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1} + \gamma_1 Y_{t-1} + e_t\)

Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 12.

## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 144 ARDL (since max lag for response and predictor of ARDL model is 12, i.e, p = q = 12 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.

df = data.frame(matrix(
                vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
                stringsAsFactors=F) # create empty dataframe
for(i in 1:12){
  for(j in 1:12){
    model4.1 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = i, q = j)
    new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
    df[nrow(df) + 1, ] <- new
  }
} # Iterate and save in df
head(df[order( df[,3] ),],1) # Best model as per AIC
##     p  q      AIC     BIC
## 108 9 12 2913.567 3020.94
head(df[order( df[,4] ),],1) # Best model as per BIC
##    p  q      AIC      BIC
## 12 1 12 2942.604 3014.186

ARDL(9,12) and ARDL(1,12) are the best models as per AIC and BIC scores respectively. Now, lets fit these 2 models,

1. ARDL(9,12) model (BEST FINITE DLM MODEL)
ARDL.9x12 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = 9, q = 12)
summary(ARDL.9x12)
## 
## Time series regression with "ts" data:
## Start = 13, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9956  -1.0875  -0.1383   1.0119  17.8921 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.76694    0.50158   5.516 5.07e-08 ***
## ppt.t       -0.43557    0.52227  -0.834 0.404599    
## ppt.1        0.56576    0.70480   0.803 0.422437    
## ppt.2        0.49265    0.71620   0.688 0.491791    
## ppt.3        1.21039    0.71453   1.694 0.090768 .  
## ppt.4       -1.20551    0.71517  -1.686 0.092364 .  
## ppt.5        0.29587    0.71483   0.414 0.679089    
## ppt.6       -2.58048    0.70972  -3.636 0.000300 ***
## ppt.7        2.23825    0.71113   3.147 0.001725 ** 
## ppt.8        0.85913    0.71033   1.209 0.226934    
## ppt.9       -1.65311    0.52370  -3.157 0.001673 ** 
## solar.1      1.09400    0.03878  28.207  < 2e-16 ***
## solar.2      0.13127    0.05822   2.255 0.024491 *  
## solar.3     -0.19613    0.05777  -3.395 0.000729 ***
## solar.4     -0.16119    0.05747  -2.805 0.005192 ** 
## solar.5     -0.17548    0.05770  -3.041 0.002454 ** 
## solar.6      0.10762    0.05827   1.847 0.065240 .  
## solar.7      0.07751    0.05834   1.329 0.184462    
## solar.8     -0.09652    0.05791  -1.667 0.096056 .  
## solar.9      0.12831    0.05790   2.216 0.027041 *  
## solar.10     0.05449    0.05767   0.945 0.345124    
## solar.11     0.10086    0.05761   1.751 0.080505 .  
## solar.12    -0.21405    0.03806  -5.624 2.81e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.248 on 625 degrees of freedom
## Multiple R-squared:  0.9495, Adjusted R-squared:  0.9477 
## F-statistic:   534 on 22 and 625 DF,  p-value: < 2.2e-16
checkresiduals(ARDL.9x12$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 26
## 
## data:  Residuals
## LM test = 59.82, df = 26, p-value = 0.0001773
MASE(ARDL.9x12)
##               MASE
## ARDL.9x12 0.387012
Conclusion of ARDL(9x12) DLM model
  • model is significant
  • MASE is reduced 0.387012 (vs 1.032483)
  • Adjusted R-squared improved to 94.77% (vs 75.91)
  • violations in the test of assumptions
  • Serial autocorrelations are significant
  • Multicollinearity is insignificant.
2. ARDL(1,12) model
ARDL.1x12 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = 1, q = 12)
summary(ARDL.1x12)
## 
## Time series regression with "ts" data:
## Start = 13, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.6092  -1.1513  -0.0874   0.9094  18.5739 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.34387    0.43301   5.413 8.80e-08 ***
## ppt.t       -1.03237    0.47174  -2.188 0.029004 *  
## ppt.1        1.72091    0.46747   3.681 0.000252 ***
## solar.1      1.10860    0.03862  28.706  < 2e-16 ***
## solar.2      0.09279    0.05845   1.588 0.112883    
## solar.3     -0.20631    0.05835  -3.536 0.000436 ***
## solar.4     -0.14147    0.05863  -2.413 0.016097 *  
## solar.5     -0.14011    0.05875  -2.385 0.017383 *  
## solar.6      0.10081    0.05904   1.707 0.088233 .  
## solar.7      0.06802    0.05919   1.149 0.250930    
## solar.8     -0.12527    0.05866  -2.135 0.033108 *  
## solar.9      0.13133    0.05861   2.241 0.025402 *  
## solar.10     0.07952    0.05834   1.363 0.173358    
## solar.11     0.10995    0.05820   1.889 0.059320 .  
## solar.12    -0.22704    0.03829  -5.929 5.01e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.313 on 633 degrees of freedom
## Multiple R-squared:  0.9459, Adjusted R-squared:  0.9447 
## F-statistic: 789.8 on 14 and 633 DF,  p-value: < 2.2e-16
checkresiduals(ARDL.1x12$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 18
## 
## data:  Residuals
## LM test = 76.21, df = 18, p-value = 3.911e-09
MASE(ARDL.1x12)
##                MASE
## ARDL.1x12 0.3927514
Conclusion of ARDL(9x12) DLM model
  • model is significant
  • MASE is worsened 0.3927514 (vs 0.387012)
  • Adjusted R-squared worsens to 94.47% (vs 94.77)
  • violations in the test of assumptions
  • Serial autocorrelations are significant
  • Multicollinearity is insignificant.
Compare ARDL(9,12) and ARDL(1,12)
  • ARDL(9,12) has better fit as per R-squared statistic (94.77% vs 94.47%)
  • Better MASE 0.387012 (vs 0.3927514)
Conclusion of ARDL models

ARDL(9,12) is the best of all ARDL models with better MASE and adjusted R-squared statistic

Most appropriate DLM model based on MASE (Model Selection)

The 4 DLM models are,

  • Finite DLM model: DLM.model
  • Polynomial DLM model: PolyDLM
  • Koyck transformed geometric DLM model: Koyck
  • Autoregressive DLM model: ARDL.9x12

mean absolute scaled errors or MASE of these models are,

MASE(DLM.model, PolyDLM, Koyck, ARDL.9x12)
##             n     MASE
## DLM.model 648 1.551600
## PolyDLM   648 1.563035
## Koyck     659 1.032483
## ARDL.9x12 648 0.387012

Conclusion of Distributed Lag models (DLM) modelling

The Best DLM model for the Solar Radiation response based on the precipitation regressor which gives the most accurate forecasting based on the MASE measure is the Autoregressive DLM model, ARDL.9x12 with MASE measure of 0.3927514.

B. Dynamic linear models (dynlm package)

Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.

The response of a general Dynamic linear model is,

\(Y_t = \omega_2Y_{t-1} + (\omega_0 + \omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)

where,

  • \(Y_t\) is the response
  • \(\omega_2\) is the coefficient of 1 time unit lagged response
  • \(P_t\) is the current pulse affect at the intervention point with \((\omega_0 + \omega_1)\) coefficient representing the instantaneous effect of the intervention point
  • \(P_{t-1}\) is the past pulse affect with \(\omega_2\omega_0\) coefficient
  • \(N_t\) is the process represents the component where there is no intervention and is referred to as the natural or unperturbed process.

Lets revisit the time series plot for the response, Solar Radiation, to visualize possible intervention points

plot(solar)

As mentioned at the descriptive analysis stage, there is no clear intervention that we identify visually. Note, peak values seen in the plot (1998 July for example) are not intervention points as the mean solar radiation before and after (even after a long period) does not change. But maybe 2013 January might be an intervention point as the variance of the series decreases drastically from this point. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at 2013 January is significant or not.

Recall from the decomposition, our Solar Radiation series has a significant seasonal component and not so significant trend component. This can be revised using the ACF and PACF plots of the response, Solar Radiation series shown below,

acf(solar,main="ACF of Solar Radiation series")

pacf(solar,main="PACF of Solar Radiation series")

Repeating patterns in ACF suggest seasonality and largely significant first lag in PACF suggest trend. We will consider both in \(N_t\) component of our dynamic model.

Now, lets fit Dynamic Linear model using dynlm() with season() and trend() arguments as shown below, (Note, the intervention point of January 2013 was identified at T=463)

Y.t = solar
T = 463 # The time point when the intervention occurred 
P.t = 1*(seq(solar) == T)
P.t.1 = Lag(P.t,+1) #library(tis) 

Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1 ) + P.t + trend(Y.t) + season(Y.t)) # library(dynlm)
summary(Dyn.model)
## 
## Time series regression with "ts" data:
## Start = 1960(2), End = 2014(12)
## 
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t) + season(Y.t))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7897 -0.9825 -0.1185  0.8224 16.8331 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.2414380  0.3640905   3.410 0.000691 ***
## L(Y.t, k = 1)   0.9291221  0.0143451  64.769  < 2e-16 ***
## P.t             3.8381630  2.3366191   1.643 0.100951    
## trend(Y.t)     -0.0009279  0.0056352  -0.165 0.869268    
## season(Y.t)Feb  1.9282450  0.4392191   4.390 1.32e-05 ***
## season(Y.t)Mar  4.8968263  0.4415365  11.090  < 2e-16 ***
## season(Y.t)Apr  3.8173305  0.4560321   8.371 3.57e-16 ***
## season(Y.t)May  5.1909476  0.4742097  10.947  < 2e-16 ***
## season(Y.t)Jun  3.4254915  0.5056646   6.774 2.82e-11 ***
## season(Y.t)Jul  1.6262658  0.5258237   3.093 0.002069 ** 
## season(Y.t)Aug -2.0905594  0.5350045  -3.908 0.000103 ***
## season(Y.t)Sep -4.6402404  0.5123224  -9.057  < 2e-16 ***
## season(Y.t)Oct -5.7692334  0.4779000 -12.072  < 2e-16 ***
## season(Y.t)Nov -5.0734497  0.4509007 -11.252  < 2e-16 ***
## season(Y.t)Dec -2.8250704  0.4403092  -6.416 2.71e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.292 on 644 degrees of freedom
## Multiple R-squared:  0.9465, Adjusted R-squared:  0.9454 
## F-statistic: 814.5 on 14 and 644 DF,  p-value: < 2.2e-16
checkresiduals(Dyn.model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 24
## 
## data:  Residuals
## LM test = 131.99, df = 24, p-value < 2.2e-16
Conclusion of Dynamic Linear model

Most importantly, pulse (P.t) is insignificant at time 2013 (p=0.15 > 0.05). Although model is significant, since there is no significant intervention point that would change the mean level of the series, Dynamic Linear model is not suitable/necessary for our Solar Radiation time series.

C. Exponential Smoothing Method

In exponential smoothing, we represent the trend component as a combination of a level term \((ℓ)\) and a growth term \((b)\). The level and growth can be combined in a number of ways, giving different trend types and hence, different exponential smoothing methods. They are namely,

  • Simple Exponential Smoothing
  • Holt’s Linear Method
  • Holt-Winters’ Trend and Seasonality Method

NOTE - All variations of Exponential Smoothing Methods will be tested for exercise purpose although we know a model with seasonal component should be the best fit (namely Holt-Winters’ Exponential smoothing variant).

Simple exponential smoothing

Here, trend and seasonal component are not considered. We have just one smoothing constant, \(\alpha\) which will be decide by the software. Lets fit the Simple exponential smoothing model and get its forecasts, look at its summary, and check violation of residuals assumptions.

First. lets fit simple exponential model,

1. by setting initial = Simple

Here, argument initial is set to simple so the initial values are obtained using simple calculations on the first few observations.

ses.simple <- ses(solar, initial="simple", h=24) # We let the software estimate alpha. # h=24 for 2 years forecast
summary(ses.simple)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = solar, h = 24, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 1 
## 
##   Initial states:
##     l = 5.0517 
## 
##   sigma:  4.5688
## Error measures:
##                        ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 0.0001462894 4.568777 3.876091 -5.211851 27.29823 0.636771
##                   ACF1
## Training set 0.6677846
## 
## Forecasts:
##          Point Forecast       Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2015        5.14828  -0.7068426 11.00340  -3.806357 14.10292
## Feb 2015        5.14828  -3.1321139 13.42867  -7.515490 17.81205
## Mar 2015        5.14828  -4.9930900 15.28965 -10.361607 20.65817
## Apr 2015        5.14828  -6.5619655 16.85853 -12.760995 23.05756
## May 2015        5.14828  -7.9441725 18.24073 -14.874898 25.17146
## Jun 2015        5.14828  -9.1937832 19.49034 -16.786013 27.08257
## Jul 2015        5.14828 -10.3429188 20.63948 -18.543464 28.84002
## Aug 2015        5.14828 -11.4125081 21.70907 -20.179260 30.47582
## Sep 2015        5.14828 -12.4170884 22.71365 -21.715633 32.01219
## Oct 2015        5.14828 -13.3672440 23.66380 -23.168771 33.46533
## Nov 2015        5.14828 -14.2709655 24.56753 -24.550893 34.84745
## Dec 2015        5.14828 -15.1344604 25.43102 -25.871495 36.16806
## Jan 2016        5.14828 -15.9626655 26.25923 -27.138125 37.43469
## Feb 2016        5.14828 -16.7595836 27.05614 -28.356906 38.65347
## Mar 2016        5.14828 -17.5285132 27.82507 -29.532883 39.82944
## Apr 2016        5.14828 -18.2722113 28.56877 -30.670271 40.96683
## May 2016        5.14828 -18.9930099 29.28957 -31.772637 42.06920
## Jun 2016        5.14828 -19.6929024 29.98946 -32.843030 43.13959
## Jul 2016        5.14828 -20.3736087 30.67017 -33.884081 44.18064
## Aug 2016        5.14828 -21.0366254 31.33319 -34.898077 45.19464
## Sep 2016        5.14828 -21.6832636 31.97982 -35.887025 46.18359
## Oct 2016        5.14828 -22.3146804 32.61124 -36.852694 47.14925
## Nov 2016        5.14828 -22.9319027 33.22846 -37.796654 48.09321
## Dec 2016        5.14828 -23.5358467 33.83241 -38.720306 49.01687
checkresiduals(ses.simple)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 4227.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE score is 0.636771
  • Serial autocorrelations are highly significant. (As per Ljung-Box test output, the null hypothesis that a series of residuals exhibits no autocorrelation is violated as p value < 0.05.)

Now, lets fit optimal exponential model,

2. by setting initial = optimal

Here, argument initial is set to optimal so the initial values are optimized along with the smoothing parameters.

ses.optimal <- ses(solar, initial="optimal", h=24)
summary(ses.optimal)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = solar, h = 24, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 5.0575 
## 
##   sigma:  4.576
## 
##      AIC     AICc      BIC 
## 6296.371 6296.407 6309.847 
## 
## Error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0001378357 4.569082 3.876391 -5.213129 27.30052 0.6368203
##                   ACF1
## Training set 0.6678374
## 
## Forecasts:
##          Point Forecast       Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2015       5.148434  -0.7159726 11.01284  -3.820402 14.11727
## Feb 2015       5.148434  -3.1446744 13.44154  -7.534781 17.83165
## Mar 2015       5.148434  -5.0083386 15.30521 -10.385009 20.68188
## Apr 2015       5.148434  -6.5794989 16.87637 -12.787891 23.08476
## May 2015       5.148434  -7.9637280 18.26060 -14.904887 25.20175
## Jun 2015       5.148434  -9.2151719 19.51204 -16.818805 27.11567
## Jul 2015       5.148434 -10.3659966 20.66286 -18.578840 28.87571
## Aug 2015       5.148434 -11.4371603 21.73403 -20.217043 30.51391
## Sep 2015       5.148434 -12.4432208 22.74009 -21.755680 32.05255
## Oct 2015       5.148434 -13.3947777 23.69165 -23.210961 33.50783
## Nov 2015       5.148434 -14.2998328 24.59670 -24.595123 34.89199
## Dec 2015       5.148434 -15.1646028 25.46147 -25.917675 36.21454
## Jan 2016       5.148434 -15.9940315 26.29090 -27.186176 37.48304
## Feb 2016       5.148434 -16.7921272 27.08899 -28.406759 38.70363
## Mar 2016       5.148434 -17.5621937 27.85906 -29.584474 39.88134
## Apr 2016       5.148434 -18.3069916 28.60386 -30.723544 41.02041
## May 2016       5.148434 -19.0288564 29.32572 -31.827541 42.12441
## Jun 2016       5.148434 -19.7297844 30.02665 -32.899518 43.19638
## Jul 2016       5.148434 -20.4114982 30.70837 -33.942109 44.23898
## Aug 2016       5.148434 -21.0754962 31.37236 -34.957606 45.25447
## Sep 2016       5.148434 -21.7230918 32.01996 -35.948018 46.24489
## Oct 2016       5.148434 -22.3554435 32.65231 -36.915117 47.21198
## Nov 2016       5.148434 -22.9735798 33.27045 -37.860474 48.15734
## Dec 2016       5.148434 -23.5784183 33.87529 -38.785495 49.08236
checkresiduals(ses.optimal)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 4227.6, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE score worsens to 0.6368203 (vs 0.636771)
  • Serial autocorrelations are highly significant.
Conclusion of Simple exponential smoothing

Simple exponential smoothing model, ses.simple is better as per MASE measure (0.636771) compared to Optimal exponential smoothing model, ses.optimal.

Moving onto the next method, lets add Trend components to the model using Holt’s Linear models,

Holt’s Linear smoothing methods

Holt’s linear method extends simple exponential smoothing to linear exponential smoothing to be able include trend to the forecasting model. Here, we have two smoothing constants \(\alpha\) and \(\beta*\). Lets fit the models and get forecasts,

First, lets fit Holt’s simple Linear trend model,

1. Holts Linear trend
holt.linear <- holt(solar, initial="simple", h=24) # Let the software estimate both alpha and beta
summary(holt.linear)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = solar, h = 24, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.9165 
##     beta  = 1 
## 
##   Initial states:
##     l = 5.0517 
##     b = 1.3641 
## 
##   sigma:  3.6493
## Error measures:
##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set -0.004941411 3.649286 2.806374 6.556498 21.13578 0.4610361
##                    ACF1
## Training set 0.06518525
## 
## Forecasts:
##          Point Forecast       Lo 80      Hi 80      Lo 95     Hi 95
## Jan 2015      3.3755382   -1.301210   8.052286   -3.77693  10.52801
## Feb 2015      1.7507183   -8.358924  11.860360  -13.71065  17.21208
## Mar 2015      0.1258983  -16.851847  17.103643  -25.83932  26.09112
## Apr 2015     -1.4989216  -26.473564  23.475721  -39.69434  36.69650
## May 2015     -3.1237415  -37.070990  30.823507  -55.04158  48.79409
## Jun 2015     -4.7485614  -48.543955  39.046832  -71.72784  62.23071
## Jul 2015     -6.3733813  -60.819126  48.072363  -89.64096  76.89420
## Aug 2015     -7.9982012  -73.839431  57.843028 -108.69367  92.69727
## Sep 2015     -9.6230212  -87.558671  68.312629 -128.81531 109.56927
## Oct 2015    -11.2478411 -101.938399  79.442717 -149.94708 127.45140
## Nov 2015    -12.8726610 -116.945936  91.200614 -172.03900 146.29368
## Dec 2015    -14.4974809 -132.553051 103.558089 -195.04789 166.05293
## Jan 2016    -16.1223008 -148.735022 116.490421 -218.93596 186.69135
## Feb 2016    -17.7471207 -165.469968 129.975727 -243.66972 208.17548
## Mar 2016    -19.3719407 -182.738335 143.994454 -269.21928 230.47540
## Apr 2016    -20.9967606 -200.522511 158.528990 -295.55770 253.56418
## May 2016    -22.6215805 -218.806526 173.563365 -322.66056 277.41740
## Jun 2016    -24.2464004 -237.575807 189.083006 -350.50557 302.01277
## Jul 2016    -25.8712203 -256.816988 205.074547 -379.07229 327.32985
## Aug 2016    -27.4960402 -276.517752 221.525671 -408.34188 353.34980
## Sep 2016    -29.1208602 -296.666697 238.424977 -438.29691 380.05519
## Oct 2016    -30.7456801 -317.253233 255.761873 -468.92117 407.42981
## Nov 2016    -32.3705000 -338.267485 273.526485 -500.19957 435.45857
## Dec 2016    -33.9953199 -359.700221 291.709581 -532.11798 464.12734
checkresiduals(holt.linear)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 1096.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE score improves to 0.4610361 (vs 0.636771)
  • Serial autocorrelations are highly significant.

Next, lets fit Holt’s exponential trend model,

2. Holts Exponential trend
holt.exponential <- holt(solar, initial="optimal", exponential=TRUE, h=24) # Fit with exponential trend
summary(holt.exponential)
## 
## Forecast method: Holt's method with exponential trend
## 
## Model Information:
## Holt's method with exponential trend 
## 
## Call:
##  holt(y = solar, h = 24, initial = "optimal", exponential = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 10.1941 
##     b = 0.9715 
## 
##   sigma:  0.403
## 
##      AIC     AICc      BIC 
## 6643.390 6643.482 6665.851 
## 
## Error measures:
##                     ME    RMSE      MAE       MPE    MAPE      MASE      ACF1
## Training set 0.4447509 4.54425 3.884459 -2.682554 26.9491 0.6381458 0.6667328
## 
## Forecasts:
##          Point Forecast        Lo 80     Hi 80        Lo 95     Hi 95
## Jan 2015       5.029012 2.4296318751  7.579698 1.073018e+00  9.017032
## Feb 2015       4.912373 1.5155083897  8.751829 2.764830e-01 11.679209
## Mar 2015       4.798439 1.0315906559  9.522002 1.141066e-01 13.721646
## Apr 2015       4.687148 0.6605002956 10.040335 1.230762e-02 15.775806
## May 2015       4.578438 0.4739127358 10.569171 5.675393e-04 17.209258
## Jun 2015       4.472249 0.3238022750 10.744569 4.077062e-04 18.912175
## Jul 2015       4.368523 0.2469683712 10.700889 3.120147e-04 20.453544
## Aug 2015       4.267203 0.1539702299 10.524898 1.851822e-04 20.740674
## Sep 2015       4.168233 0.0963782855 10.489413 1.391957e-04 22.245367
## Oct 2015       4.071559 0.0629673440 10.315928 9.699652e-05 23.950374
## Nov 2015       3.977126 0.0455082667  9.714293 7.983113e-05 22.863355
## Dec 2015       3.884884 0.0304294802  9.871843 5.269775e-05 23.586423
## Jan 2016       3.794781 0.0180033787  9.290743 4.078185e-05 23.847726
## Feb 2016       3.706768 0.0093849072  9.168415 3.420632e-05 26.674567
## Mar 2016       3.620796 0.0044470304  9.184515 1.990900e-05 28.041288
## Apr 2016       3.536818 0.0018513373  8.528606 1.338326e-05 28.548223
## May 2016       3.454788 0.0013950518  8.336215 1.349745e-05 28.131927
## Jun 2016       3.374660 0.0008646059  7.874033 8.670389e-06 27.839331
## Jul 2016       3.296391 0.0006458170  7.754488 7.255265e-06 28.371312
## Aug 2016       3.219937 0.0004228233  7.412620 4.247694e-06 26.505341
## Sep 2016       3.145257 0.0002671188  7.371147 3.001340e-06 27.231159
## Oct 2016       3.072308 0.0001982331  6.650139 2.331978e-06 23.799449
## Nov 2016       3.001051 0.0001468675  6.427532 1.364540e-06 23.090690
## Dec 2016       2.931447 0.0001239960  5.822347 1.302301e-06 24.118000
checkresiduals(holt.exponential)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method with exponential trend
## Q* = 1316.7, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE score worsens to 0.6381458 (vs 0.4610361)
  • Serial autocorrelations are highly significant.

Note, we use initial=“optimal” since we get “optimization failure” when using simple method.

Next, lets fit Holt’s Damped trend model,

3. Holts Damped trend
holt.damped <- holt(solar, damped=TRUE, initial="simple", h=24) # Fit with additive damped trend
summary(holt.damped)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = solar, h = 24, damped = TRUE, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.9278 
##     beta  = 0.9278 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 13.7194 
##     b = 2.5785 
## 
##   sigma:  3.4657
## 
##      AIC     AICc      BIC 
## 5932.524 5932.652 5959.477 
## 
## Error measures:
##                       ME     RMSE      MAE      MPE     MAPE     MASE
## Training set -0.00831309 3.452592 2.638613 3.791232 19.25658 0.433476
##                    ACF1
## Training set 0.07287524
## 
## Forecasts:
##          Point Forecast       Lo 80    Hi 80       Lo 95     Hi 95
## Jan 2015     3.77468888  -0.6668422  8.21622   -3.018047  10.56742
## Feb 2015     2.73586859  -5.9098357 11.38157  -10.486595  15.95833
## Mar 2015     1.90481212 -11.3560708 15.16570  -18.375958  22.18558
## Apr 2015     1.23996674 -16.7554651 19.23540  -26.281671  28.76160
## May 2015     0.70809029 -22.0017301 23.41791  -34.023583  35.43976
## Jun 2015     0.28258901 -27.0463637 27.61154  -41.513437  42.07862
## Jul 2015    -0.05781212 -31.8700954 31.75447  -48.710501  48.59488
## Aug 2015    -0.33013310 -36.4696231 35.80936  -55.600714  54.94045
## Sep 2015    -0.54798994 -40.8505077 39.75453  -62.185372  61.08939
## Oct 2015    -0.72227547 -45.0230940 43.57854  -68.474531  67.02998
## Nov 2015    -0.86170394 -49.0000892 47.27668  -74.483011  72.75960
## Dec 2015    -0.97324674 -52.7951082 50.84861  -80.227945  78.28145
## Jan 2016    -1.06248101 -56.4218036 54.29684  -85.727259  83.60230
## Feb 2016    -1.13386844 -59.8933599 57.62562  -90.998756  88.73102
## Mar 2016    -1.19097841 -63.2222137 60.84026  -96.059566  93.67761
## Apr 2016    -1.23666639 -66.4199175 63.94658 -100.925846  98.45251
## May 2016    -1.27321679 -69.4970922 66.95066 -105.612630 103.06620
## Jun 2016    -1.30245712 -72.4634340 69.85852 -110.133780 107.52887
## Jul 2016    -1.32584939 -75.3277524 72.67605 -114.501994 111.85029
## Aug 2016    -1.34456321 -78.0980276 75.40890 -118.728857 116.03973
## Sep 2016    -1.35953427 -80.7814739 78.06241 -122.824909 120.10584
## Oct 2016    -1.37151112 -83.3846076 80.64159 -126.799718 124.05670
## Nov 2016    -1.38109261 -85.9133126 83.15113 -130.661967 127.89978
## Dec 2016    -1.38875780 -88.3729032 85.59539 -134.419528 131.64201
checkresiduals(holt.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt's method
## Q* = 1134.4, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE score improves to 0.433476 (vs 0.4610361)
  • Serial autocorrelations are highly significant.
Conclusion of Holt’s Linear smoothing methods

Holt’s Damped Linear smoothing model, \(holt.damped\) with MASE measure of 0.433476 is best Holt’s Linearly smoothed model.

Moving onto the next method, lets add Seasonal components to the model using Holt-Winters’ Trend and Seasonality Method,

Holt-Winters’ Trend and Seasonality Method

Since our Solar Radiation response time series has a seasonality, Holt’s-Winter method provides a solution to the model fitting problem. Holt-Winters’ method has the smoothing equations for the level, for trend, and seasonality. There are two different Holt-Winters’ methods, depending on whether seasonality is modelled in an additive or multiplicative way. Here, we have three smoothing constants \(\alpha\) , \(\beta*\), and \(\gamma\). Lets fit the models and get forecasts,

We will look at 5 cases,

  • Additive seasonality case
    • without damping
    • with damping

(Note, additive seasonality with exponential trend is a forbidden model combination, hence not considered)

  • Multiplicative seasonality case
    • without damping
    • with damping
    • with exponential trend

First, lets look at the additive seasonality case,

1. Additive seasonality case

a. Additive seasonality case without damping -

Add.hw <- hw(solar,seasonal="additive", h=2*frequency(solar))
summary(Add.hw)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.9993 
##     beta  = 0.0019 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 11.3306 
##     b = 0.209 
##     s = -10.7542 -8.4968 -3.227 3.0768 7.9264 10.9122
##            10.1422 7.3322 2.2353 -1.9528 -7.3626 -9.8316
## 
##   sigma:  2.3575
## 
##      AIC     AICc      BIC 
## 5434.708 5435.662 5511.076 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE    MASE      ACF1
## Training set -0.1179476 2.328736 1.504489 -2.230359 12.68101 0.24716 0.1703355
## 
## Forecasts:
##          Point Forecast      Lo 80     Hi 80      Lo 95    Hi 95
## Jan 2015       6.127399  3.1061581  9.148641   1.506810 10.74799
## Feb 2015       8.655602  4.3802222 12.930983   2.116973 15.19423
## Mar 2015      14.122221  8.8814747 19.362968   6.107191 22.13725
## Apr 2015      18.366625 12.3095941 24.423656   9.103196 27.63005
## May 2015      23.522264 16.7439512 30.300578  13.155729 33.88880
## Jun 2015      26.390991 18.9586817 33.823300  15.024255 37.75773
## Jul 2015      27.219228 19.1837595 35.254697  14.930039 39.50842
## Aug 2015      24.290519 15.6920179 32.889020  11.140246 37.44079
## Sep 2015      19.495874 10.3670365 28.624711   5.534522 33.45723
## Oct 2015      13.253765  3.6218809 22.885648  -1.476930 27.98446
## Nov 2015       8.042144 -2.0695730 18.153861  -7.422393 23.50668
## Dec 2015       5.840110 -4.7313921 16.411612 -10.327607 22.00783
## Jan 2016       6.819616 -4.1942247 17.833457 -10.024600 23.66383
## Feb 2016       9.347819 -2.0927719 20.788410  -8.149055 26.84469
## Mar 2016      14.814438  2.9609172 26.667959  -3.313958 32.94283
## Apr 2016      19.058842  6.8048110 31.312872   0.317919 37.79976
## May 2016      24.214481 11.5711780 36.857784   4.878218 43.55074
## Jun 2016      27.083208 14.0608586 40.105557   7.167243 46.99917
## Jul 2016      27.911445 14.5194061 41.303485   7.430089 48.39280
## Aug 2016      24.982736 11.2296053 38.735867   3.949138 46.01633
## Sep 2016      20.188091  6.0818048 34.294377  -1.385612 41.76179
## Oct 2016      13.945981 -0.5061083 28.398071  -8.156582 36.04855
## Nov 2016       8.734361 -6.0566988 23.525420 -13.886613 31.35533
## Dec 2016       6.532327 -8.5913309 21.655984 -16.597311 29.66196
checkresiduals(Add.hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 205.55, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.24716 (vs 0.433476 Holt’s damped linear trend model)
  • Serial autocorrelations are highly significant.

Lets consider damped = TRUE,

b. Additive seasonality case with damping -

Add.hw.damped <- hw(solar,seasonal="additive",damped = TRUE, h=2*frequency(solar))
summary(Add.hw.damped)
## 
## Forecast method: Damped Holt-Winters' additive method
## 
## Model Information:
## Damped Holt-Winters' additive method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "additive",  
## 
##  Call:
##      damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80       Lo 95    Hi 95
## Jan 2015       5.945198  2.940530  8.949866   1.3499548 10.54044
## Feb 2015       8.479778  4.230548 12.729009   1.9811413 14.97842
## Mar 2015      13.784309  8.579902 18.988716   5.8248548 21.74376
## Apr 2015      17.598568 11.588776 23.608360   8.4073847 26.78975
## May 2015      22.632258 15.912803 29.351713  12.3557383 32.90878
## Jun 2015      25.599150 18.238024 32.960277  14.3412786 36.85702
## Jul 2015      26.761775 18.810497 34.713053  14.6013444 38.92221
## Aug 2015      23.722251 15.221610 32.222892  10.7216429 36.72286
## Sep 2015      18.222624  9.205955 27.239293   4.4328191 32.01243
## Oct 2015      12.293259  2.788471 21.798047  -2.2430606 26.82958
## Nov 2015       7.504219 -2.464877 17.473315  -7.7421977 22.75064
## Dec 2015       5.150488 -5.262286 15.563263 -10.7744758 21.07545
## Jan 2016       5.947275 -4.891175 16.785726 -10.6287044 22.52326
## Feb 2016       8.481728 -2.766251 19.729707  -8.7205712 25.68403
## Mar 2016      13.786139  2.142988 25.429291  -4.0205242 31.59280
## Apr 2016      17.600287  5.574906 29.625668  -0.7909464 35.99152
## May 2016      22.633871 10.238009 35.029734   3.6760353 41.59171
## Jun 2016      25.600665 12.845046 38.356283   6.0926298 45.10870
## Jul 2016      26.763197 13.657667 39.868726   6.7200186 46.80637
## Aug 2016      23.723586 10.277223 37.169950   3.1591478 44.28802
## Sep 2016      18.223877  4.445085 32.002669  -2.8489663 39.29672
## Oct 2016      12.294435 -1.808972 26.397843  -9.2748652 33.86374
## Nov 2016       7.505324 -6.915414 21.926061 -14.5492911 29.55994
## Dec 2016       5.151525 -9.579726 19.882776 -17.3779790 27.68103
checkresiduals(Add.hw.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 210.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.2461797 (vs 0.24716 without damping)
  • Serial autocorrelations are highly significant.

Next, lets fit Multiplicative seasonality case,

2. Multiplicative seasonality case

a. without damping -

Mul.hw <- hw(solar,seasonal="multiplicative", h=2*frequency(solar))
summary(Mul.hw)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.8251 
##     beta  = 1e-04 
##     gamma = 0.0259 
## 
##   Initial states:
##     l = 10.5659 
##     b = -0.2371 
##     s = 0.5024 0.6292 0.9319 1.2229 1.4923 1.6309
##            1.5606 1.3672 1.0278 0.8128 0.5106 0.3114
## 
##   sigma:  0.3971
## 
##      AIC     AICc      BIC 
## 6648.746 6649.699 6725.114 
## 
## Error measures:
##                      ME    RMSE      MAE        MPE     MAPE      MASE
## Training set 0.04281136 2.12539 1.359297 -0.4896895 10.87401 0.2233077
##                    ACF1
## Training set 0.06551473
## 
## Forecasts:
##          Point Forecast        Lo 80     Hi 80       Lo 95     Hi 95
## Jan 2015       5.610335   2.75534413  8.465326   1.2440033  9.976666
## Feb 2015       6.512806   2.03809262 10.987520  -0.3306776 13.356290
## Mar 2015       8.786120   1.33667747 16.235562  -2.6068190 20.179058
## Apr 2015      10.192785  -0.02672876 20.412298  -5.4366123 25.822181
## May 2015      12.512597  -1.96157245 26.986766  -9.6237347 34.648928
## Jun 2015      14.061609  -4.41048190 32.533701 -14.1890164 42.312235
## Jul 2015      14.502088  -6.89909775 35.903274 -18.2282011 47.232377
## Aug 2015      12.945620  -8.35024040 34.241481 -19.6235881 45.514829
## Sep 2015      10.352210  -8.52365174 29.228072 -18.5159294 39.220350
## Oct 2015       7.418279  -7.51119146 22.347750 -15.4143760 30.250935
## Nov 2015       4.989730  -6.05900058 16.038460 -11.9078451 21.887305
## Dec 2015       3.871776  -5.53876625 13.282319 -10.5204066 18.263960
## Jan 2016       4.199400  -7.01753622 15.416336 -12.9554236 21.354224
## Feb 2016       4.838892  -9.30305408 18.980837 -16.7893479 26.467131
## Mar 2016       6.477174 -14.21845674 27.172805 -25.1740619 38.128410
## Apr 2016       7.452633 -18.56883703 33.474103 -32.3437711 47.249037
## May 2016       9.069749 -25.52978297 43.669280 -43.8456686 61.985166
## Jun 2016      10.099485 -31.99838423 52.197355 -54.2836502 74.482621
## Jul 2016      10.315199 -36.68017471 57.310572 -61.5580226 82.188420
## Aug 2016       9.113765 -36.29212001 54.519651 -60.3285438 78.556075
## Sep 2016       7.208700 -32.09306283 46.510463 -52.8981594 67.315559
## Oct 2016       5.105869 -25.38360449 35.595343 -41.5237568 51.735495
## Nov 2016       3.391945 -18.81647619 25.600367 -30.5729044 37.356795
## Dec 2016       2.597256 -16.07148862 21.266001 -25.9541251 31.148637
checkresiduals(Mul.hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 24, p-value = 0.03017
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.2233077 (vs 0.2461797 of damped additive seasonality case)
  • Serial autocorrelations are slightly significant (p-value = 0.03017 < 0.05)

Lets consider exponential = TRUE,

b. with exponential trend -

Mul.hw.exponential <- hw(solar,seasonal="multiplicative",exponential = TRUE, h=2*frequency(solar))
summary(Mul.hw.exponential)
## 
## Forecast method: Holt-Winters' multiplicative method with exponential trend
## 
## Model Information:
## Holt-Winters' multiplicative method with exponential trend 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",  
## 
##  Call:
##      exponential = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.6499 
##     beta  = 1e-04 
##     gamma = 0.0597 
## 
##   Initial states:
##     l = 9.3759 
##     b = 0.9889 
##     s = 0.3552 0.4789 0.952 1.0553 1.4608 1.7926
##            1.6746 1.4529 1.1145 0.8672 0.5333 0.2627
## 
##   sigma:  0.3755
## 
##      AIC     AICc      BIC 
## 6584.208 6585.161 6660.576 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE      MASE     ACF1
## Training set 0.06388152 2.208727 1.412454 -1.303792 11.28449 0.2320404 0.153871
## 
## Forecasts:
##          Point Forecast     Lo 80     Hi 80      Lo 95    Hi 95
## Jan 2015       5.797870 2.9534942  8.484304 1.44647488 10.02259
## Feb 2015       7.513359 3.3803742 12.076927 1.65775823 15.24874
## Mar 2015      10.758212 4.1017889 18.208520 1.97959544 23.95836
## Apr 2015      12.694087 4.3952552 22.592902 2.19691791 30.98982
## May 2015      15.340074 5.0869609 28.906789 2.58759574 40.74954
## Jun 2015      17.239431 5.0573707 33.362843 2.51898761 49.74076
## Jul 2015      18.004700 4.6924684 36.001955 2.22923936 52.99648
## Aug 2015      16.206596 3.7922981 32.355376 1.72293337 53.19834
## Sep 2015      13.011256 2.9163137 27.523502 1.30136347 43.61009
## Oct 2015       9.215869 1.8351550 19.585112 0.82306531 32.62240
## Nov 2015       6.214762 1.1054329 13.450353 0.45885176 23.17077
## Dec 2015       4.565667 0.7607070 10.141501 0.30085588 17.19157
## Jan 2016       5.220661 0.8130865 11.716518 0.32902433 21.50683
## Feb 2016       6.765364 0.9436812 14.998160 0.33437252 28.95817
## Mar 2016       9.687175 1.3193266 21.764921 0.48531177 41.81485
## Apr 2016      11.430324 1.4282730 25.833379 0.49890227 50.03020
## May 2016      13.812889 1.6000069 31.361185 0.58571723 61.10441
## Jun 2016      15.523155 1.6546349 35.899805 0.60471897 72.72351
## Jul 2016      16.212237 1.6436832 37.934511 0.54886622 76.71991
## Aug 2016      14.593143 1.4286329 33.833265 0.45933464 68.71260
## Sep 2016      11.715916 1.0324484 27.844093 0.34894033 55.54142
## Oct 2016       8.298381 0.7119457 20.028036 0.22505643 40.62045
## Nov 2016       5.596049 0.4482400 13.204112 0.14002433 28.50546
## Dec 2016       4.111130 0.3108836  9.975805 0.09380568 21.38243
checkresiduals(Mul.hw.exponential)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method with exponential trend
## Q* = 21.246, df = 24, p-value = 0.6242
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.2320404 (vs 0.2233077 of multiplicative seasonality without damping).
  • Serial autocorrelations are insignificant (p-value = 0.4864 > 0.05)

Lets consider damped = TRUE,

c. with damping (BEST EXPONENTIAL SMOOTHING/OVERALL REGRESSION MODEL) -

Mul.hw.damped <- hw(solar,seasonal="multiplicative", damped = TRUE, h=2*frequency(solar))
summary(Mul.hw.damped)
## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",  
## 
##  Call:
##      damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.805 
##     beta  = 1e-04 
##     gamma = 0.0124 
##     phi   = 0.94 
## 
##   Initial states:
##     l = 9.9033 
##     b = 0.4703 
##     s = 0.4341 0.5649 0.8263 1.1716 1.4514 1.6316
##            1.5503 1.3834 1.1045 0.8837 0.5761 0.4221
## 
##   sigma:  0.3145
## 
##      AIC     AICc      BIC 
## 6366.701 6367.768 6447.561 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.0456208 2.045442 1.239102 -2.172476 10.00296 0.2035619
##                    ACF1
## Training set 0.04177493
## 
## Forecasts:
##          Point Forecast       Lo 80     Hi 80       Lo 95     Hi 95
## Jan 2015       5.611616   3.3502090  7.873023   2.1530925  9.070139
## Feb 2015       7.060438   3.3373603 10.783517   1.3664818 12.754395
## Mar 2015      10.515176   3.8550902 17.175263   0.3294535 20.700899
## Apr 2015      12.993446   3.5145037 22.472389  -1.5033454 27.490238
## May 2015      16.284381   2.9386340 29.630128  -4.1261776 36.694939
## Jun 2015      18.261020   1.7242212 34.797820  -7.0298315 43.551872
## Jul 2015      19.031597   0.2101276 37.853067  -9.7533565 47.816551
## Aug 2015      17.105654  -1.2074754 35.418784 -10.9018606 45.113169
## Sep 2015      13.764743  -2.0801991 29.609686 -10.4680049 37.997491
## Oct 2015       9.774733  -2.2585804 21.808047  -8.6286319 28.178099
## Nov 2015       6.581938  -2.0456267 15.209503  -6.6127835 19.776659
## Dec 2015       5.163049  -2.0168711 12.342970  -5.8176914 16.143790
## Jan 2016       5.616270  -2.6571166 13.889656  -7.0367828 18.269322
## Feb 2016       7.066360  -3.9144360 18.047156  -9.7273184 23.860039
## Mar 2016      10.524088  -6.6891829 27.737360 -15.8013383 36.849515
## Apr 2016      13.004566  -9.3405081 35.349641 -21.1692761 47.178409
## May 2016      16.298445 -13.0720043 45.668893 -28.6197807 61.216670
## Jun 2016      18.276925 -16.2140342 52.767885 -34.4724452 71.026296
## Jul 2016      19.048304 -18.5466197 56.643228 -38.4481704 76.544779
## Aug 2016      17.120781 -18.1782620 52.419825 -36.8644468 71.106009
## Sep 2016      13.777000 -15.8651593 43.419159 -31.5567705 59.110770
## Oct 2016       9.783493 -12.1627698 31.729756 -23.7804195 43.347405
## Nov 2016       6.587872  -8.8063951 21.982139 -16.9556278 30.131371
## Dec 2016       5.167730  -7.4021686 17.737629 -14.0562710 24.391731
checkresiduals(Mul.hw.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 24, p-value = 0.4864
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.2035619 (vs 0.2233077 of multiplicative seasonality without damping)
  • Serial autocorrelations are insignificant (p-value = 0.4864 > 0.05)
Conclusion of Holt-Winters’ methods

Holt-Winters Multiplicative Seasonal model with Damped trend, \(Mul.hw.damped\) with MASE measure of 0.2035619 is best Holt-Winters’ model.

Most appropriate Exponential Smoothing model based on MASE (Model Selection)

Lets summarise all the Exponential Smoothing methods and their models considered with their corresponding forecast accuracy measure, MASE,

  • Simple Exponential smoothing
    • initial = Simple | 0.636771 (MASE measure)
    • initial = optimal | 0.6368203
  • Holts Linear method
    • Holts Linear trend | 0.4610361
    • Holts Exponential trend | 0.6381458
    • Holts Damped trend | 0.433476
  • Holt-Winters Trend and seasonality method
      1. Additive seasonality
      • without damping | 0.24716
      • with damping | 0.2461797
      1. Multiplicative seasonality
      • without damping | 0.2233077
      • with damping | 0.2035619
      • with exponential | 0.2320404

The best model among the 3 types of Exponential Smoothing models considered were,

  • Best Simple exponential smoothing model with no trend or seasonality is with MASE measure = 0.636771
  • Best Holt’s Linear smoothing model with trend and no seasonality is holt.damped with MASE measure = 0.433476
  • Best Holt-Winters’ smoothing model with trend and multiplicative seasonality is Mul.hw.damped with MASE measure = 0.2035619

From the above results, clearly our Solar Radiation time series has a damped trend. The seasonal component is multiplicative than additive unlike what was inferred from the classical decomposition analysis. *The best fitting model so far is the Holt-Winters’ multiplicative damped smoothing model with lowest MASE measure of 0.2035619.

Conclusion of Exponential Smoothing method

The best Exponential smoothing model which gives the most accurate forecasting based on the MASE measure is the Holt-Winters’ multiplicative model with damped trend (\(Mul.hw.damped\)) with MASE measure of 0.2035619.

D. State-Space models

State-Space models are basically Exponential smoothing models but with Error components taken into account. State-Space models uses the innovations formulation of the model where \(Y_t\) denotes the observation at time t, and let \(X_t\) is the state vector that describes the level, trend and seasonality of the series. A linear (Additive homoscedastic errors ) innovations state-space model is written as,

\(Y_t = \omega'X_t + \epsilon_t\),

where \(\epsilon_t\) is a white noise series

Nonlinear (Multiplicative heteroscedastic errors terms) state-space models are also possible and have the form,

\(Y_t = \omega'(X_{t-1}) + r(X_{t-1})\epsilon_t\),

Lets first look at the linear (Additive errors) State Space models,

The State-Space models are generally depicted as ETS(Z,Z,Z), where Z can be None, Additive or Multiplicative (and damped). For example, ETS(A,N,N) is a state-space model with Additive error terms and no trend or seasonal components,

Since we do not know if the errors are additive or multiplicative, we will test both of these cases.

We will look at the following cases and compare them with their respective Exponential Smoothed cases withour error component,

  • Additive error models: ETS(A,-,-)
    • ETS(A,N,N) - also known as, Holt’s Local Level Model
    • ETS(A,A,N) - Holt’s Local Trend Model
      • ETS(A,A,N) with drift
      • ETS(A,A,N) with damping
    • ETS(A,N,A)
    • ETS(A,A,A) - Holt’s Local Additive Seasonal Model
  • Multiplicative error models: ETS(M,-,-)
    • ETS(M,N,N) - Holt-Winters’ Local Level Model
    • ETS(M,A,N) - Holt-Winters’ Local Trend Model
    • ETS(M,A,A) - Holt-Winters’ Additive Seasonal Model
    • ETS(M,A,M) - Multiplicative seasonal and error model
    • ETS(M,M,N) - Multiplicative trend and error model

Since many other variations are

1. Additive error models: ETS(A,-,-)

Since our response, Solar Radiation has seasonal component, we

a. ETS(A,N,N)

This Additive error model has no trend or seasonal component. Note - This is also the Holt’s Local Level Model: ETS(A,N,N). In this model, at any point along the data path, the values in the neighbourhood of the point are approximated by a short flat line representing what is referred to as a local level. This model corresponds to the simple exponential smoothing model with additive error terms. We use the ets() function to fit the model.

Add.SS.ANN = ets(solar, model="ANN")
summary(Add.SS.ANN)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = solar, model = "ANN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 5.0575 
## 
##   sigma:  4.576
## 
##      AIC     AICc      BIC 
## 6296.371 6296.407 6309.847 
## 
## Training set error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0001378357 4.569082 3.876391 -5.213129 27.30052 0.6368203
##                   ACF1
## Training set 0.6678374
checkresiduals(Add.SS.ANN)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 4227.6, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.6368203 (vs 0.2035619 Holt-Winters model without error component).
  • Serial autocorrelations are highly significant
b. ETS(A,A,N)

This Additive error model has Additive trend and no seasonal component. Note - This is also the Holt’s Local Trend Model: ETS(A,A,N). In this model, at each point, the data path is approximated by a tangential line in the deterministic case. We can fit 3 different models by considering damping and drift aspects,

x. General ETS(A,A,N) -

Add.SS.AAN = ets(solar, model="AAN")
summary(Add.SS.AAN)
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = solar, model = "AAN") 
## 
##   Smoothing parameters:
##     alpha = 0.9279 
##     beta  = 0.9279 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 13.7196 
##     b = 2.5786 
## 
##   sigma:  3.4657
## 
##      AIC     AICc      BIC 
## 5932.524 5932.653 5959.478 
## 
## Training set error measures:
##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set -0.008312543 3.452593 2.638571 3.790894 19.25631 0.4334691
##                    ACF1
## Training set 0.07272876
checkresiduals(Add.SS.AAN)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 1134, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.4334691 (vs 0.6368203 of ETS(A,N,N)).
  • Serial autocorrelations are highly significant

y. ETS(A,A,N) with drift -

A local trend model allows the growth rate to change stochastically over time. If \(\beta\)=0, the growth rate is constant and equal to a value that will be denoted by b. This model is hence the local level model with drift and is also known as simple exponential smoothing with drift. We use ets() function with beta=0 to fit this model. We set the argument beta to a value very close to zero.

Add.SS.AAN.drift = ets(solar, model="AAN", beta = 1E-4)
summary(Add.SS.AAN.drift)
## ETS(A,A,N) 
## 
## Call:
##  ets(y = solar, model = "AAN", beta = 1e-04) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 11.356 
##     b = -0.0206 
## 
##   sigma:  4.5898
## 
##      AIC     AICc      BIC 
## 6300.338 6300.399 6318.307 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.009889528 4.575899 3.887095 -5.243348 27.46947 0.6385788
##                   ACF1
## Training set 0.6652999
checkresiduals(Add.SS.AAN.drift)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,A,N)
## Q* = 4200.2, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.6385788 (vs 0.4334691 of ETS(A,A,N)). (Drift worsens the model)
  • Serial autocorrelations are highly significant

z. ETS(A,A,N) with damping -

Add.SS.AAN.damped = ets(solar, model="AAN")
summary(Add.SS.AAN.damped)
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = solar, model = "AAN") 
## 
##   Smoothing parameters:
##     alpha = 0.9279 
##     beta  = 0.9279 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 13.7196 
##     b = 2.5786 
## 
##   sigma:  3.4657
## 
##      AIC     AICc      BIC 
## 5932.524 5932.653 5959.478 
## 
## Training set error measures:
##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set -0.008312543 3.452593 2.638571 3.790894 19.25631 0.4334691
##                    ACF1
## Training set 0.07272876
checkresiduals(Add.SS.AAN.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 1134, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.4334691 (vs 0.4334691 of general ETS(A,A,N)). (Damping makes no difference)
  • Serial autocorrelations are highly significant
c. ETS(A,N,A)

This Additive error model has additive seasonal and no trend component.

Add.SS.ANA = ets(solar, model="ANA")
summary(Add.SS.ANA)
## ETS(A,N,A) 
## 
## Call:
##  ets(y = solar, model = "ANA") 
## 
##   Smoothing parameters:
##     alpha = 0.9997 
##     gamma = 3e-04 
## 
##   Initial states:
##     l = 22.6142 
##     s = -10.2887 -8.5269 -4.0534 1.7184 7.355 10.8208
##            10.4774 7.7101 2.4749 -1.7156 -6.7936 -9.1783
## 
##   sigma:  2.3884
## 
##      AIC     AICc      BIC 
## 5449.974 5450.719 5517.358 
## 
## Training set error measures:
##                      ME     RMSE     MAE       MPE     MAPE     MASE      ACF1
## Training set -0.0108645 2.362974 1.55041 -2.640142 12.95336 0.254704 0.1825724
checkresiduals(Add.SS.ANA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 207.44, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure worsens to 0.254704 (vs 0.2461797 of general ETS(A,A,N)).
  • Serial autocorrelations are highly significant
d. ETS(A,A,A)

This Additive error model has Additive trend and Additive seasonal component. Note - This is also the Local Additive Seasonal Model: ETS(A,A,A).

Add.SS.AAA = ets(solar, model="AAA")
summary(Add.SS.AAA)
## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = solar, model = "AAA") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724
checkresiduals(Add.SS.AAA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure imrpoves to 0.2461797 (vs 0.4334691 of general ETS(A,A,N)).
  • Serial autocorrelations are highly significant

2. Multiplicative error models: ETS(M,-,-)

Lets not consider drift and damping for each model to maintain simplicity of reporting (The best model will be chosen as per auto-fit anyways).

a. ETS(M,N,N)

This Multiplicative error model has no trend or seasonal component. Note - This is also the Holt-Winters’ Local Level Model: ETS(M,N,N)

Mul.SS.MNN = ets(solar, model="MNN")
summary(Mul.SS.MNN)
## ETS(M,N,N) 
## 
## Call:
##  ets(y = solar, model = "MNN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 4.4856 
## 
##   sigma:  0.3868
## 
##      AIC     AICc      BIC 
## 6619.776 6619.812 6633.253 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.001004352 4.569136 3.877241 -5.195978 27.31733 0.6369599
##                   ACF1
## Training set 0.6678785
checkresiduals(Mul.SS.MNN) 

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,N)
## Q* = 1334.8, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.6369599 (vs 0.2035619 Holt-Winters model without error component).
  • Serial autocorrelations are highly significant

Comment -

Note - MNN model is bad compared to ANN model (MASE - 0.6368203 vs MASE - 0.636959). This is because, our time series is homoscedastic (equal variance of error terms) as seen from its time series plot. And we know for a homoscedastic series, a model with Additive error terms fit the series better than multiplicative error terms.

b. ETS(M,A,N)
Mul.SS.MAN = ets(solar, model="MAN")
summary(Mul.SS.MAN)
## ETS(M,A,N) 
## 
## Call:
##  ets(y = solar, model = "MAN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0018 
## 
##   Initial states:
##     l = 7.8559 
##     b = 2.4488 
## 
##   sigma:  0.3004
## 
##      AIC     AICc      BIC 
## 6433.364 6433.456 6455.825 
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set -1.466339 4.822041 3.964894 -17.35094 30.69019 0.6513597 0.6697971
checkresiduals(Mul.SS.MAN)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,N)
## Q* = 2007.1, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.6513597
  • Serial autocorrelations are highly significant
c. ETS(M,A,A)
Mul.SS.MAA = ets(solar, model="MAA")
summary(Mul.SS.MAA)
## ETS(M,Ad,A) 
## 
## Call:
##  ets(y = solar, model = "MAA") 
## 
##   Smoothing parameters:
##     alpha = 0.5273 
##     beta  = 0.0035 
##     gamma = 0.0104 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 11.5982 
##     b = 2.6304 
##     s = -10.4131 -6.5688 -2.8947 0.153 7.6029 8.4081
##            8.1855 8.4954 2.4863 -2.2582 -6.2468 -6.9497
## 
##   sigma:  0.3329
## 
##      AIC     AICc      BIC 
## 6469.079 6470.146 6549.940 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.02086702 3.282879 2.311939 -5.186754 18.98277 0.3798095
##                  ACF1
## Training set 0.551665
checkresiduals(Mul.SS.MAA) 

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,Ad,A)
## Q* = 346.85, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.3798095
  • Serial autocorrelations are highly significant
d. ETS(M,A,M)

Note - This is also the Non-linear seasonal model, namely, Multiplicative Seasonal and Error Model: ETS(M,A,M). When the pattern of seasonality changes through time, linear seasonal structure cant be used. In that case, we use nonlinear seasonal models. It is hard to tell if our solar radiation series’s seasonality pattern changes through time or not. So, lets consider the nonlinear multiplicative seasonal component (MAM) model,

Mul.SS.MAM = ets(solar, model="MAM")
summary(Mul.SS.MAM)
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = solar, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 0.7922 
##     beta  = 6e-04 
##     gamma = 0.0707 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 9.3693 
##     b = 1.2404 
##     s = 0.6953 0.3066 0.5802 0.9879 1.3594 1.5469
##            1.4723 1.4055 1.227 0.9573 0.6959 0.7656
## 
##   sigma:  0.2274
## 
##      AIC     AICc      BIC 
## 5953.502 5954.569 6034.363 
## 
## Training set error measures:
##                     ME    RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 0.1750605 3.02791 1.961614 -5.497209 17.21119 0.3222574 0.2565219
checkresiduals(Mul.SS.MAM)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,Ad,M)
## Q* = 102.28, df = 24, p-value = 1.222e-11
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure improves to 0.3222574
  • Serial autocorrelations are highly significant
e. ETS(M,M,N)

We know our series is seasonal, so not having seasonal component as none (MMN) should be a bad fit. Lets check anyways,

Mul.SS.MMN = ets(solar, model="MMN")
summary(Mul.SS.MMN)
## ETS(M,M,N) 
## 
## Call:
##  ets(y = solar, model = "MMN") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0063 
## 
##   Initial states:
##     l = 9.1784 
##     b = 1.1871 
## 
##   sigma:  0.3511
## 
##      AIC     AICc      BIC 
## 6609.163 6609.255 6631.624 
## 
## Training set error measures:
##                     ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set -1.644301 5.221518 4.235504 -14.9362 30.65972 0.695816 0.6833568
checkresiduals(Mul.SS.MMN)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,M,N)
## Q* = 1371.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Model summary:

  • MASE measure degrades to 0.695816
  • Serial autocorrelations are highly significant

Most appropriate State-Space model based on MASE (Model Selection)

Lets summarise all the State-Space models considered with their corresponding forecast accuracy measure, MASE,

  • Additive error models: ETS(A,-,-)
    1. ETS(A,N,N) - Holt’s Local Level Model | 0.6368203
    2. ETS(A,A,N) - Holt’s Local Trend Model
      1. ETS(A,A,N) | 0.4334691
      2. ETS(A,A,N) with drift | 0.6385788
      3. ETS(A,A,N) with damping | 0.4334691
    3. ETS(A,N,A) | 0.254704
    4. ETS(A,A,A) - Holt’s Local Additive Seasonal Model | 0.2461797

Best State-Space model with additive error term is ETS(A,A,A) with MASE measure of 0.2461797.

  • Multiplicative error models: ETS(M,-,-)
    1. ETS(M,N,N) - Holt-Winters’ Local Level Model | 0.6369599
    2. ETS(M,A,N) - Holt-Winters’ Local Trend Model | 0.6513597
    3. ETS(M,A,A) - Holt-Winters’ Local Additive Seasonal Model | 0.3798095
    4. ETS(M,A,M) | 0.3222574
    5. ETS(M,M,N) | 0.695816

Best State-Space model with Multiplicative error term is ETS(M,A,M) with MASE measure of 0.3222574.

It seems in general models with Additive error components have better forecasting accuracy measure than their multiplicative error counterpart. It is important to note that, there are many other possible models which we have missed out. To avoid the hassle, we can autofit the best State-Space model using the below code,

autofit.SS = ets(solar)
summary(autofit.SS)
## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = solar) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724
checkresiduals(autofit.SS) # Autofit finds the ETS(A,Ad,A).

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The auto fitted model is the ETS(A,Ad,A) model having the least MASE value of 0.2461797 of all possible State space models.

Conclusion of State-Space method

The best State-space model which gives the most accurate forecasting based on the MASE measure is ETS(A,Ad,A) having lowest MASE measure of 0.2461797 of all possible State space models.

Overall Most Appropriate Regression model (Model Selection)

Based on the 4 Time series regression methods considered, the best model as per MASE measure for each method is summarized below,

  • A. Best Distributed lag models is - Autoregressive Distributed Lag model ARDL(9,12) with MASE measure of 0.3927514
  • B. Best Dynamic linear models is - None (No intervention points were present)
  • C. Best Exponential smoothing model is - Holt-Winters’ multiplicative Damped model with MASE measure of 0.2035619
  • D. Best State-Space model is - ETS(A,Ad,A) with MASE measure of 0.2461797

Best Time Series regression model for Forecasting

Best Time Series regression model is - Holt-Winters’ multiplicative Damped model (\(Mul.hw.damped\)) with MASE measure of 0.2035619.

Detailed Graphical and statistical tests of assumptions for \(Mul.hw.damped\) model (Residual Analysis)

Residual analysis to test model assumptions.

Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.

The estimator error (or residual) is defined by:

\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)

The following problems are to be checked,

  1. linearity in distribution of error terms
  2. The mean value of residuals is zero
  3. Serial autocorrelation
  4. Normality of distribution of error terms

Lets first apply diagnostic check using checkresiduals() function,

checkresiduals(Mul.hw.damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 24, p-value = 0.4864
## 
## Model df: 0.   Total lags used: 24
  1. From the Residuals plot, linearity is not violated as the residuals are randomly distributed across the mean. Thus, linearity in distribution of error terms is not violated

  2. To test mean value of residuals is zero or not, lets calculate mean value of residuals as,

mean(Mul.hw.damped$model$residuals)
## [1] 0.0190173

As mean value of residuals is close to 0, zero mean residuals is not violated.

  1. In the checkresiduals output, the Ljung-Box test output is displayed. According to this test, the hypothesis are,

Which has,
\(H_0\) : series of residuals exhibit no serial autocorrelation of any order up to p
\(H_a\) : series of residuals exhibit serial autocorrelation of any order up to p

From the Ljung-Box test output, since p (0.4864) > 0.05, we do not reject the null hypothesis of no serial autocorrelation.

Thus, according to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.

  1. From the histogram shown by checkresiduals(), residuals seem to follow normality. Lets test this statistically,

\(H_0\) : Time series is Normally distributed
\(H_a\) : Time series is not normal

shapiro.test(Mul.hw.damped$model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  Mul.hw.damped$model$residuals
## W = 0.3802, p-value < 2.2e-16

From the Shapiro-Wilk test, since p<0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, residuals of Mul.hw.damped are Not normally distributed.

Summarizing residual analysis on \(full\) model:

Assumption 1: The error terms are randomly distributed and thus show linearity: Not violated
Assumption 2: The mean value of E is zero (zero mean residuals): Not violated
Assumption 4: The error terms are independently distributed, i.e. they are not autocorrelated: Not violated
Assumption 5: The errors are normally distributed. Violated

Although normality of residuals assumption is violated, ‘There is no normality assumption in fitting an exponential smoothing model’ (Rob Hyndman 2013). Having no residual assumptions’ violations, the Holt-Winters’ multiplicative Damped model is good for accurate forecasting. Lets forecast for the next 2 years,

Forecasting

Using the model with best MASE measure, Holt-Winters’ multiplicative Damped model (Mul.hw.damped), lets estimate and plot 2 years (2015 Jan - 2016 Dec) ahead forecasts. The following code chunk displays the fitted model, observed series and forecasts for 2 year time points ahead.

Note, Since we are forecasting using Holt-Winters’ model which does not incorporate regressor variables, we do not need the regressor (Precipitation) variable’s measures (Unlike Distributed Lag models)

Mul.hw.damped <- hw(solar,seasonal="multiplicative", damped = TRUE, h=2*frequency(solar)) # Revisit the fitted model

plot(Add.hw.damped, ylab="Monthly solar radiation reaching ground", main = "Forecasts from Holt-Winter's damped Multiplicative method", plot.conf=FALSE, type="l", fcol="white", xlab="Year") 
lines(fitted(Mul.hw.damped), col="blue", lty=1) 
lines(Mul.hw.damped$mean, type="l", col="blue")
legend("topleft",lty=1, pch=1, col=c(1,4), 
       c("Raw Data", "Holt Winters' fitted data"))

In the plot above, Blue lines after 2014 shows the forecasted 2 year Solar Radiation data. The gray region around it shows the 95% and 80% prediction intervals of the forecasts.

The point forecast values are,

Mul.hw.damped$mean
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2015  5.611616  7.060438 10.515176 12.993446 16.284381 18.261020 19.031597
## 2016  5.616270  7.066360 10.524088 13.004566 16.298445 18.276925 19.048304
##            Aug       Sep       Oct       Nov       Dec
## 2015 17.105654 13.764743  9.774733  6.581938  5.163049
## 2016 17.120781 13.777000  9.783493  6.587872  5.167730

Task 2: Investigating the Relationship Between Property Price Index and Population Change in Victoria: An Analysis of Spurious Correlation

Data Description

The dataset hold 2 columns and 54 observations. They are, quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria between September 2003 and December 2016.

Objective

Our aim for the given dataset is to analyse whether the correlation between the quarterly Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria is spurious or not.

Read Data

data <- read.csv("C:/Users/admin/Downloads/data2.csv")
data.ts = ts(data, start=c(2003,3),frequency = 4)

head(data.ts)
##         Quarter price change
## 2003 Q3      41  60.7  14017
## 2003 Q4       1  62.1  12350
## 2004 Q1      28  60.8  17894
## 2004 Q2      15  60.9   9079
## 2004 Q3      42  60.9  16210
## 2004 Q4       2  62.4  13788

Analysing Spurious Correlation

Identifying Response and Covariate time series

The response, \(Y\) = {\(Y_t\)} is the Price Index (PPI) and the co variate time series, \(X\) = {\(X_t\)} is Population Change that we hope will explain \(Y\).

  • Response time series = Y = Quarterly Property Price Index (PPI) in Melbourne
  • Co-variate time series = X = Quarterly Population Change over previous quarter in Victoria

Read Response and Covariate TS -

data_PPI = data.ts[,2]
head(data_PPI) #Response TS
##      Qtr1 Qtr2 Qtr3 Qtr4
## 2003           60.7 62.1
## 2004 60.8 60.9 60.9 62.4
data_pchange = data.ts[,3]
head(data_pchange) #Covariate TS
##       Qtr1  Qtr2  Qtr3  Qtr4
## 2003             14017 12350
## 2004 17894  9079 16210 13788
data.joint=ts.intersect(data_PPI,data_pchange) # Response and Covariate TS together
head(data.joint)
##         data_PPI data_pchange
## 2003 Q3     60.7        14017
## 2003 Q4     62.1        12350
## 2004 Q1     60.8        17894
## 2004 Q2     60.9         9079
## 2004 Q3     60.9        16210
## 2004 Q4     62.4        13788

The sample cross-correlation function (sample CCF)

The sample cross-correlation function (sample CCF) defined by,

\(r_k(X,Y) = \frac{\sum (X_t - \overline{X})(Y_t - \overline{Y})}{\sqrt{\sum (X_t - \overline{X})^2}\sqrt{\sum (Y_t - \overline{Y})^2}}\)

Sample cross-correlations that are larger than 1.96/n in magnitude are then deemed significantly different from zero.

Cross-correlations or Spurious correlations may arise due to the non-stationarity of the interaction Time series. Lets check nonstationarity in the response and covariate series by analyzing the time series plot and testing statistically as well,

Analysing Time Series plots for non-stationarity -

plot(data.joint,yax.flip=T)

Time series plots of both PPI and Population Change suggest non-stationarity as there is a clear upward trend in both of these time series. Lets confirm non-stationarity in both the response and covariate TS statistically,

Test Stationarity of Response and Covariate time series -

Lets use ADF (Augmented Dickey-Fuller) test. The test hypothesis are,

\(H_0\) : Time series is Difference non-stationary
\(H_a\) : Time series is Stationary

adf.test(data.joint[,1]) # For Property Price Index (PPI)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data.joint[, 1]
## Dickey-Fuller = -1.3264, Lag order = 3, p-value = 0.8458
## alternative hypothesis: stationary
adf.test(data.joint[,2]) # For Population Change
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data.joint[, 2]
## Dickey-Fuller = -1.603, Lag order = 3, p-value = 0.7344
## alternative hypothesis: stationary

For both response and covariate TS, since p-value > 0.05, we fail to reject null hypothesis of non stationarity. We can conclude that the series is non stationary at 5% level of significance.

Thus if we see significant correlation between PPI and Population change series, it is likely due the spurious correlation due the non-stationarity of the interacting variables. We can check this by looking at the cross-correlations between PPI response and Population change covariate time series using CCF(),

Sample CCF plot -

ccf(as.vector(data.joint[,1]), as.vector(data.joint[,2]),ylab='CCF', main = "Sample CCF between PPI and population change")

Nearly all of the cross-correlations are significantly different from zero according to the 1.96/n rule. The nonstationarity in the PPI series and the Population change series is more likely the cause of the spurious correlations found between the two series.

Prewhitening

Lets remove the spurious correlation between PPI and population change series by the process of prewhitening which is an approach to disentangle the linear association between PPI and Population change series from their autocorrelation.

An approximate Prewhitening can be achieved by differencing (hence, eliminating non-stationarity in these 2 series). Further prewhitening can be done using the prewhiten() which transforms one of the two series such that it is uncorrelated (white noise).

Since we have seasonal component in our response and covariate time series (not testing, rather just assuming as the data is quarterly), lets perform ordinary and seasonal differencing on both PPI and Population change series and see if the cross-correlations are removed,

# Code gist - Inner diff(TS,freq) does seasonal differencing. Outer diff(TS) does ordinary differencing
data.dif=ts.intersect(diff(diff(data_PPI,4)),diff(diff(data_pchange,4))) # Note quarterly frequency of 4 is used in seasonal differencing
ccf(as.vector(data.dif[,1]), as.vector(data.dif[,2]),ylab='CCF', main = "Sample CCF between PPI and population change after differencing")

From the sample CCF plot, we notice still some cross-correlations are significant after differencing.

Next, lets further prewhiten the differenced series using prewhiten() and check the sample CCF plot,

prewhiten(as.vector(data.dif[,1]),as.vector(data.dif[,2]),ylab='CCF', main="Sample CFF after prewhitening") # library(TSA)

From the plot above, all the cross-correlations are insignificant according to the 1.96/n rule. Hence, spurious correlation has been eliminated from PPI and Population change series.

Thus, it seems that Property Price Index (PPI) and Population change are in fact largely uncorrelated, and the strong cross-correlations found between the raw data series is indeed spurious.

Reference List

Rob Hyndman (2013) Does the Holt-Winters algorithm for exponential smoothing in time series modelling require the normality assumption in residuals?, Stack Exchange Website, accessed 26 September 2023. https://stats.stackexchange.com/questions/64911/does-the-holt-winters-algorithm-for-exponential-smoothing-in-time-series-modelli#:~:text=There%20is%20no%20normality%20assumption,under%20almost%20all%20residual%20distributions.