26-09-2023library(GGally) # for ggpairs()
library(TSA) # season(), prewhiten() and other functions
library(tseries) # adf.test()
library(forecast) # BoxCox.lambda()
library(dLagM) # For DLM modelling
library(car) # for vif()
library(tis) # for Lag()
library(dynlm) # for Dynamic linear modeling
library(stats) # for classical decomposition
library(x12) # X-12-ARIMA decomposition
library(lmtest) # for bgtest()
A significance level \(\alpha=5\%\) is used.
The dataset hold 2 columns and 660 observations. They are, monthly averages of horizontal solar radiation and precipitation measured at the same points between January 1960 and December 2014.
Our aim for the dataset is to give best 2 years ahead forecasts by determining the most accurate and suitable regression model that determines the monthly Solar radiation in terms of MASE. A descriptive analysis will be conducted initially. Model-building strategy will be applied to find the best fitting model from the time series regression methods (dLagM package), dynamic linear models (dynlm package), and exponential smoothing and corresponding state-space models.
Out of various different error measures to assess the forecast accuracy, Mean absolute scaled error (MASE) is a generally applicable measure of forecast accuracy and is obtained by scaling the errors based on the in-sample MAE from the naive forecast method. It is the only available method which can be used in all circumstances and can be used to compare forecast accuracy between series as it is scale-free.
solar_precipitation <- read.csv("C:/Users/admin/Downloads/data1.csv")
head(solar_precipitation)
## solar ppt
## 1 5.051729 1.333
## 2 6.415832 0.921
## 3 10.847920 0.947
## 4 16.930264 0.615
## 5 24.030797 0.544
## 6 26.298202 0.703
For fitting a regression model, the response is the monthly solar radiation (\(solar\)) and the regressor variable is the monthly precipitation (\(precipitation\)).
Both variables are continuous variables.
Lets first get the regressor and response as TS objects,
solar = ts(solar_precipitation[,1], start=c(1960,1),frequency = 12)
precipitation = ts(solar_precipitation[,2], start=c(1960,1),frequency = 12)
data.ts = ts(solar_precipitation, start = c(1960,1), frequency = 12) # Y and x in single dataframe
Lets scale, center and plot the 2 variables together,
data.scale = scale(data.ts)
plot(data.scale, plot.type="s", col=c("blue", "green"), main = "Monthly solar radiation and precipitation (Blue - Response and Green - regressor)")
It is hard to read the correlations between the regressor and the response. But it is fair to say the 2 variables show some correlations. Lets check for correlation statistically using ggpairs(),
ggpairs(data = solar_precipitation, columns = c(1,2), progress = FALSE) #library(ggally)
As shown in the matrix above, the response \(Solar radiation\) and regressor \(monthly precipitation\) are correlated by -0.454 correlation coefficient. We can generate regression model based on this correlation. First, lets look at the descriptive statistics
Since we are generating regression model which estimates the response, \(SolarRadiation\), lets focus on Solar radiation’s statistics
summary(solar)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.582 9.739 16.460 17.788 23.711 54.056
The mean and median of the Solar radiation are very close indicating symmetrical distribution.
The time series plot for our data is generated using the following code chunk,
plot(solar,ylab='Monthly solar radiation reaching ground',xlab='Year',
main = "Time series plot of the monthly solar radiation reaching ground")
points(y=solar,x=time(solar), pch=as.vector(season(solar)), col=rainbow(12))
Plot Inference :
From the plot, we can comment on the time series’s,
Trend: The overall shape of the trend seems to be flat. Thus, indicating stationarity.
Seasonality: From the plot, seasonal behavior is quite evident every year. This needs to be confirmed using statistical tests.
Change in Variance: A change in variance can be seen as the variation is much higher during the earlier years than the latter.
Behavior: We notice mixed behavior of MA and AR series. AR behavior is dominant as we obverse more following data points. MA behavior is evident due to up and down fluctuations in the data points.
Intervention/Change points: No particular intervention point is noticed. This needs to be confirmed using statistical tests. Note, peak values seen in the plot (1998 July for example) are not intervention points as the mean solar radiation before and after (even after a long period) does not change. Maybe, 2013 January might be an intervention point as the variance of the series decreases drastically from this point. We need to check this by fitting a Dynamic Linear model and check if the pulse at 2013 January is significant or not.
par(mfrow=c(2,1))
acf(solar, main="ACF of solar radiation")
pacf(solar, main ="PACF of solar radiation")
par(mfrow=c(1,1))
ACF plot: We notice multiple autocorrelations are significant. We see a wave pattern. Thus, seasonal behavior is observed. We observed seasonality in time series plot as well.
PACF plot: We see 1 high vertical spike indicating non stationary series. Also, the second correlation bar is significant as well.
Many model estimating procedures assume normality of the residuals. If this assumption doesn’t hold, then the coefficient estimates are not optimum. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.
qqnorm(solar, main = "Normal Q-Q Plot of Solar Radiation Time Series")
qqline(solar, col = 2)
We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Lets check statistically using shapiro-wilk test. Lets state the hypothesis of this test,
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(solar)
##
## Shapiro-Wilk normality test
##
## data: solar
## W = 0.93637, p-value = 3.641e-16
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, solar radiation series is not normally distributed.
The PACF plot of Solar radiation the time series at the descriptive analysis stage of time series tells us nonstationarity in our time series. Lets use ADF and PP tests,
Using ADF (Augmented Dickey-Fuller) test :
Lets confirm the non-stationarity using Dickey-Fuller Test or ADF
test. Lets state the hypothesis,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
adf.test(solar) #library(tseries)
## Warning in adf.test(solar): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: solar
## Dickey-Fuller = -4.7661, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
since p-value < 0.05, we reject null hypothesis of non stationarity. we can conclude that the series is stationary at 5% level of significance.
Using PP (Phillips-Perron) test :
The null and alternate hypothesis are same as ADF test.
PP.test(solar)
##
## Phillips-Perron Unit Root Test
##
## data: solar
## Dickey-Fuller = -8.4247, Truncation lag parameter = 6, p-value = 0.01
According to the PP tests, solar radiation series is stationary at 5% level
Lets perform with Box-Cox transformation,
To improve normality in our Solar Radiation time series, lets test Box-Cox transformations on the series
lambda = BoxCox.lambda(solar, method = "loglik") # library(forecast)
BC.solar = BoxCox(solar, lambda = lambda)
Visually comparing the time series plots before and after box-cox transformation,
par(mfrow=c(2,1))
plot(BC.solar,ylab='Monthly Solar Radiation',xlab='Time',
type='o', main="Box-Cox Transformed Solar Radiation Time Series")
points(y=BC.solar,x=time(BC.solar))
plot(solar,ylab='ASX Price Index',xlab='Time',
type='o', main="Original Solar Radiation Time Series")
points(y=solar,x=time(solar))
par(mfrow=c(1,1))
From the plot, almost no improvement in the variance of the time series is visible after BC transformation. Lets check for normality using shapiro test,
shapiro.test(BC.solar)
##
## Shapiro-Wilk normality test
##
## data: BC.solar
## W = 0.98834, p-value = 4.126e-05
From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, BC Transformed Solar radiation is not normal.
The BC transformed Solar radiation series is Stationary and not normal. BC transformation was not effective.
To observe the individual effects of the existing components and historical effects occurred in the past, lets perform decomposition of the Solar Radiation time series. The time series can be decomposed into are seasonal, trend and remainder components.
Three main decomposition methods for time series are known, classical decomposition, X-12-ARIMA decomposition and Loess (STL) decomposition.
To implement the classical decomposition, we use decompose() function from the stats package.
fit.classical.add <- decompose(solar, type="additive")
plot(fit.classical.add)
fit.classical.mult <- decompose(solar, type="multiplicative")
plot(fit.classical.mult)
Notice that a trend component was extracted from the series which wasn’t identified during the descriptive analysis stage. We got very similar estimates from the additive and multiplicative methods. It seems from the remainder series that the additive model handles the decomposition slightly better (mean much closer to zero). This hints that our final model would have additive components rather than multiplicative as per classical decomposition for our Solar Radiation series.
Although using the classical decomposition, a trend component was identified, we do not know how significant this trend effect is. X-12-ARIMA decomposition can help us detect the significance of the trend component. We use the function x12() to implement the X-12-ARIMA decomposition.
fit.x12 = x12(solar)
plot(fit.x12 , sa=TRUE , trend=TRUE)
From the plot, we can see the Seasonally Adjusted series does not follow
the original series closely. This implies, that the original series is
affected by seasonality a lot, meaning seasonality is highly significant
in the Solar Radiation series. On the contrary, the Trend series does
not deviate a lot from the original series (especially at the ups and
downs), implying the Original series has not so significant trend
component.
Another decomposition method, STL has advantage of flexibility over classical decomposition. stl() function is used with t.window and s.window arguments to control rate of change of trend and seasonal components. Lets set t.window to 15 and look the STL decomposed plots,
We can adjust the series for seasonality by subtracting the seasonal component from the original series using the following code chunk,
# Code gist - Apply STL decomposition to get seasonally adjusted and trend adjusted and visually compare w.r.t to original time series
stl.solar <- stl(window(solar,start=c(1960,1)), t.window=15, s.window="periodic", robust=TRUE)
par(mfrow=c(3,1))
plot(seasadj(stl.solar), ylab='Solar Radiation',xlab='Time', main = "Seasonally adjusted Solar Radiation")
points(y=seasadj(stl.solar),x=time(seasadj(stl.solar)), pch=as.vector(season(seasadj(stl.solar))))
plot(solar,ylab='Solar Radiation',xlab='Time',
type='o', main="Original Solar Radiation Time Series")
points(y=solar,x=time(solar), pch=as.vector(season(solar)))
stl.solar.trend = stl.solar$time.series[,"trend"] # Extract the trend component from the output
stl.solar.trend.adjusted = solar - stl.solar.trend
plot(stl.solar.trend.adjusted, ylab='Solar Radiation',xlab='Time', main = "Trend adjusted Solar Radiation")
points(y=stl.solar.trend.adjusted,x=time(stl.solar.trend.adjusted), pch=as.vector(season(stl.solar.trend.adjusted)))
par(mfrow=c(1,1))
The Seasonally adjusted series is completely different from the original series, implying the Seasonal component of the series is the most influential component of the original series. No change is visually seen in the trend adjusted series compared to the original series. Thus, trend component is insignificant.
Trend component is insignificant when comparing trend adjusted series with the original series. The seasonal component of the series is the most influential component of the original series.
Time series regression methods namely,
Based on whether the lags are known (Finite DLM) or undetermined (Infinite DLM), 4 major modelling methods will be tested, namely,
The response of a finite DLM model with 1 regressor is represented as
shown below,
\(Y_t = \alpha + \sum_{s=0}^{q} \beta_s
X_{t-s} + \epsilon_t\)
where,
Now, lets use AIC and BIC score to find the best lag length for Finite DLM model,
finiteDLMauto(formula = solar ~ ppt, data = solar_precipitation, q.min = 1, q.max = 12,
model.type = "dlm", error.type = "AIC", trace = TRUE)
## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 12 12 1.55160 4578.787 4645.895 1.32250 0.35491 0.30769 0
## 11 11 1.56213 4590.961 4653.617 1.32784 0.42614 0.30199 0
## 10 10 1.57800 4602.658 4660.858 1.36315 0.09680 0.29615 0
## 9 9 1.59312 4615.084 4668.827 1.36372 2.68197 0.28882 0
## 8 8 1.60481 4625.986 4675.267 1.39224 1.79485 0.28251 0
## 7 7 1.60704 4632.716 4677.532 1.39082 -60.50677 0.28096 0
## 6 6 1.60753 4637.489 4677.837 1.39182 0.52400 0.28214 0
## 5 5 1.61385 4644.622 4680.499 1.39327 0.55267 0.28072 0
## 4 4 1.64636 4663.600 4695.003 1.46450 -1.04420 0.26577 0
## 3 3 1.66270 4688.551 4715.478 1.44133 -1.39824 0.24326 0
## 2 2 1.67597 4712.649 4735.095 1.42249 0.14593 0.22173 0
## 1 1 1.68846 4728.713 4746.676 1.43103 -1.07642 0.21036 0
q = 12 has the smallest AIC and BIC scores. Fit model with q = 12,
DLM.model = dlm(formula = solar ~ ppt, data = solar_precipitation, q = 12)
Note - We are using solar and not the BC.solar (BC transformed solar radiation series) as normality is violated in both of these.
Diagnostic check for DLM.model (Residual analysis):
We can apply a diagnostic check using checkresiduals() function from the forecast package.
checkresiduals(DLM.model$model$residuals) # forecast package
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1045.6, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
In this output,
Model Summary for Finite DLM model (DLM.model) :
summary(DLM.model)
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.563 -5.239 -0.796 4.137 32.430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.5164 1.1151 17.501 < 2e-16 ***
## ppt.t -5.8876 1.9508 -3.018 0.00265 **
## ppt.1 0.9993 2.5647 0.390 0.69694
## ppt.2 0.4343 2.5571 0.170 0.86520
## ppt.3 1.8763 2.5580 0.734 0.46352
## ppt.4 1.7459 2.5587 0.682 0.49529
## ppt.5 3.3279 2.5601 1.300 0.19410
## ppt.6 0.7751 2.5617 0.303 0.76230
## ppt.7 1.7937 2.5615 0.700 0.48402
## ppt.8 0.2827 2.5593 0.110 0.91207
## ppt.9 -1.1022 2.5615 -0.430 0.66712
## ppt.10 -1.9333 2.5508 -0.758 0.44880
## ppt.11 -0.5613 2.5532 -0.220 0.82605
## ppt.12 -5.3492 1.9216 -2.784 0.00553 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.181 on 634 degrees of freedom
## Multiple R-squared: 0.3216, Adjusted R-squared: 0.3077
## F-statistic: 23.12 on 13 and 634 DF, p-value: < 2.2e-16
##
## AIC and BIC values for the model:
## AIC BIC
## 1 4578.787 4645.895
Lets consider the effect of collinearity on these results. To inspect this issue, we will display variance inflation factors (VIFs). If the value of VIF is greater than 10, we can conclude that the effect of multicollinearity is high.
vif(DLM.model$model) # variance inflation factors #library(car)
## ppt.t ppt.1 ppt.2 ppt.3 ppt.4 ppt.5 ppt.6 ppt.7
## 4.432762 7.774629 7.820758 7.914873 7.941510 7.944820 7.943359 7.929999
## ppt.8 ppt.9 ppt.10 ppt.11 ppt.12
## 7.916836 7.921508 7.867385 7.889225 4.508273
MASE(DLM.model)
## MASE
## DLM.model 1.5516
ATTENTION - Lets summarise the models from here on
and not go into each models details for simplicity
Polynomial DLM model helps remove the effect of multicollinearity, but our data has insignificant multicollinearity. lets fit polynomial DLM model of order 2 anyways,
PolyDLM = polyDlm(x = as.vector(precipitation), y = as.vector(solar), q = 12, k = 2, show.beta = FALSE)
summary(PolyDLM)
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.689 -5.452 -0.686 4.129 32.797
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.22679 1.10784 17.355 < 2e-16 ***
## z.t0 -1.81945 0.39287 -4.631 4.4e-06 ***
## z.t1 1.57478 0.13106 12.015 < 2e-16 ***
## z.t2 -0.15708 0.01019 -15.409 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.175 on 644 degrees of freedom
## Multiple R-squared: 0.3119, Adjusted R-squared: 0.3087
## F-statistic: 97.31 on 3 and 644 DF, p-value: < 2.2e-16
checkresiduals(PolyDLM$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1054.6, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
MASE(PolyDLM)
## MASE
## PolyDLM 1.563035
Here the lag weights are positive and decline geometrically. This
model is called infinite geometric DLM, meaning there are infinite lag
weights. Koyck transformation is applied to implement this infinite
geometric DLM model by subtracting the first lag of geometric DLM
multiplied by \(\phi\). The Koyck
transformed model is represented as,
\(Y_t = \delta_1 + \delta_2Y_{t-1} +
\nu_t\)
where \(\delta_1 = \alpha(1-\phi), \delta_2
= \phi, \delta_3 = \beta\) and the random error after the
transformation is \(\nu_t = (\epsilon_t
-\phi\epsilon_{t-1})\).
The koyckDlm() function is used to implement a two-staged least squares method to first estimate the \(\hat{Y}_{t-1}\) and the estimate \(Y_{t}\) through simple linear regression. Lets deduce Koyck geometric GLM models using precipitation regressor,
Koyck = koyckDlm(x = as.vector(precipitation) , y = as.vector(solar) )
summary(Koyck$model, diagnostics = TRUE)
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.0926 -3.5961 0.3176 3.6103 14.8399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23925 0.76549 -2.925 0.00356 **
## Y.1 0.98546 0.02424 40.650 < 2e-16 ***
## X.t 5.34684 0.84383 6.336 4.37e-10 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 656 710.7 <2e-16 ***
## Wu-Hausman 1 655 146.8 <2e-16 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.814 on 656 degrees of freedom
## Multiple R-Squared: 0.7598, Adjusted R-squared: 0.7591
## Wald test: 1104 on 2 and 656 DF, p-value: < 2.2e-16
checkresiduals(Koyck$model$residuals)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 1413.2, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
MASE(Koyck)
## MASE
## Koyck 1.032483
Autoregressive Distributed lag model is a flexible and parsimonious
infinite DLM. The model is represented as,
\(Y_t = \mu + \beta_0 X_t + \beta_1 X_{t-1}
+ \gamma_1 Y_{t-1} + e_t\)
Similar to the Koyck DLM, it is possible to write this model as an infinite DLM with infinite lag distribution of any shape rather than a polynomial or geometric shape. The model is denoted as ARDL(p,q). To fit the model we will use ardlDlm() function is used. Lets find the best lag length using AIC and BIC score through an iteration. Lets set max lag length to 12.
## Code gist to find the best ARDL(p,q) model as per AIC and BIC scores.
# First create an empty df. Iterate over 144 ARDL (since max lag for response and predictor of ARDL model is 12, i.e, p = q = 12 at max).
# Save the model's AIC and BIC scores through iteration and display the model with best AIC and BIC scores.
df = data.frame(matrix(
vector(), 0, 4, dimnames=list(c(), c("p","q","AIC","BIC"))),
stringsAsFactors=F) # create empty dataframe
for(i in 1:12){
for(j in 1:12){
model4.1 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = i, q = j)
new <- data.frame(i, j, AIC(model4.1$model), BIC(model4.1$model))
df[nrow(df) + 1, ] <- new
}
} # Iterate and save in df
head(df[order( df[,3] ),],1) # Best model as per AIC
## p q AIC BIC
## 108 9 12 2913.567 3020.94
head(df[order( df[,4] ),],1) # Best model as per BIC
## p q AIC BIC
## 12 1 12 2942.604 3014.186
ARDL(9,12) and ARDL(1,12) are the best models as per AIC and BIC
scores respectively. Now, lets fit these 2 models,
ARDL.9x12 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = 9, q = 12)
summary(ARDL.9x12)
##
## Time series regression with "ts" data:
## Start = 13, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9956 -1.0875 -0.1383 1.0119 17.8921
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.76694 0.50158 5.516 5.07e-08 ***
## ppt.t -0.43557 0.52227 -0.834 0.404599
## ppt.1 0.56576 0.70480 0.803 0.422437
## ppt.2 0.49265 0.71620 0.688 0.491791
## ppt.3 1.21039 0.71453 1.694 0.090768 .
## ppt.4 -1.20551 0.71517 -1.686 0.092364 .
## ppt.5 0.29587 0.71483 0.414 0.679089
## ppt.6 -2.58048 0.70972 -3.636 0.000300 ***
## ppt.7 2.23825 0.71113 3.147 0.001725 **
## ppt.8 0.85913 0.71033 1.209 0.226934
## ppt.9 -1.65311 0.52370 -3.157 0.001673 **
## solar.1 1.09400 0.03878 28.207 < 2e-16 ***
## solar.2 0.13127 0.05822 2.255 0.024491 *
## solar.3 -0.19613 0.05777 -3.395 0.000729 ***
## solar.4 -0.16119 0.05747 -2.805 0.005192 **
## solar.5 -0.17548 0.05770 -3.041 0.002454 **
## solar.6 0.10762 0.05827 1.847 0.065240 .
## solar.7 0.07751 0.05834 1.329 0.184462
## solar.8 -0.09652 0.05791 -1.667 0.096056 .
## solar.9 0.12831 0.05790 2.216 0.027041 *
## solar.10 0.05449 0.05767 0.945 0.345124
## solar.11 0.10086 0.05761 1.751 0.080505 .
## solar.12 -0.21405 0.03806 -5.624 2.81e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.248 on 625 degrees of freedom
## Multiple R-squared: 0.9495, Adjusted R-squared: 0.9477
## F-statistic: 534 on 22 and 625 DF, p-value: < 2.2e-16
checkresiduals(ARDL.9x12$model)
##
## Breusch-Godfrey test for serial correlation of order up to 26
##
## data: Residuals
## LM test = 59.82, df = 26, p-value = 0.0001773
MASE(ARDL.9x12)
## MASE
## ARDL.9x12 0.387012
ARDL.1x12 = ardlDlm(formula = solar ~ ppt, data = solar_precipitation, p = 1, q = 12)
summary(ARDL.1x12)
##
## Time series regression with "ts" data:
## Start = 13, End = 660
##
## Call:
## dynlm(formula = as.formula(model.text), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.6092 -1.1513 -0.0874 0.9094 18.5739
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.34387 0.43301 5.413 8.80e-08 ***
## ppt.t -1.03237 0.47174 -2.188 0.029004 *
## ppt.1 1.72091 0.46747 3.681 0.000252 ***
## solar.1 1.10860 0.03862 28.706 < 2e-16 ***
## solar.2 0.09279 0.05845 1.588 0.112883
## solar.3 -0.20631 0.05835 -3.536 0.000436 ***
## solar.4 -0.14147 0.05863 -2.413 0.016097 *
## solar.5 -0.14011 0.05875 -2.385 0.017383 *
## solar.6 0.10081 0.05904 1.707 0.088233 .
## solar.7 0.06802 0.05919 1.149 0.250930
## solar.8 -0.12527 0.05866 -2.135 0.033108 *
## solar.9 0.13133 0.05861 2.241 0.025402 *
## solar.10 0.07952 0.05834 1.363 0.173358
## solar.11 0.10995 0.05820 1.889 0.059320 .
## solar.12 -0.22704 0.03829 -5.929 5.01e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.313 on 633 degrees of freedom
## Multiple R-squared: 0.9459, Adjusted R-squared: 0.9447
## F-statistic: 789.8 on 14 and 633 DF, p-value: < 2.2e-16
checkresiduals(ARDL.1x12$model)
##
## Breusch-Godfrey test for serial correlation of order up to 18
##
## data: Residuals
## LM test = 76.21, df = 18, p-value = 3.911e-09
MASE(ARDL.1x12)
## MASE
## ARDL.1x12 0.3927514
ARDL(9,12) is the best of all ARDL models with better MASE and adjusted R-squared statistic
The 4 DLM models are,
mean absolute scaled errors or MASE
of these models are,
MASE(DLM.model, PolyDLM, Koyck, ARDL.9x12)
## n MASE
## DLM.model 648 1.551600
## PolyDLM 648 1.563035
## Koyck 659 1.032483
## ARDL.9x12 648 0.387012
The Best DLM model for the Solar Radiation response based on the precipitation regressor which gives the most accurate forecasting based on the MASE measure is the Autoregressive DLM model, ARDL.9x12 with MASE measure of 0.3927514.
Dynamic linear models are general class of time series regression models which can account for trends, seasonality, serial correlation between response and regressor variable, and most importantly the affect of intervention points.
The response of a general Dynamic linear model is,
\(Y_t = \omega_2Y_{t-1} + (\omega_0 +
\omega_1)P_t - \omega_2\omega_0P_{t-1} + N_t\)
where,
Lets revisit the time series plot for the response, Solar Radiation,
to visualize possible intervention points
plot(solar)
As mentioned at the descriptive analysis stage, there is no clear intervention that we identify visually. Note, peak values seen in the plot (1998 July for example) are not intervention points as the mean solar radiation before and after (even after a long period) does not change. But maybe 2013 January might be an intervention point as the variance of the series decreases drastically from this point. Assuming this intervention point lets fit a Dynamic Linear model and see if the pulse function at 2013 January is significant or not.
Recall from the decomposition, our Solar Radiation series has a significant seasonal component and not so significant trend component. This can be revised using the ACF and PACF plots of the response, Solar Radiation series shown below,
acf(solar,main="ACF of Solar Radiation series")
pacf(solar,main="PACF of Solar Radiation series")
Repeating patterns in ACF suggest seasonality and largely significant first lag in PACF suggest trend. We will consider both in \(N_t\) component of our dynamic model.
Now, lets fit Dynamic Linear model using dynlm() with season() and trend() arguments as shown below, (Note, the intervention point of January 2013 was identified at T=463)
Y.t = solar
T = 463 # The time point when the intervention occurred
P.t = 1*(seq(solar) == T)
P.t.1 = Lag(P.t,+1) #library(tis)
Dyn.model = dynlm(Y.t ~ L(Y.t , k = 1 ) + P.t + trend(Y.t) + season(Y.t)) # library(dynlm)
summary(Dyn.model)
##
## Time series regression with "ts" data:
## Start = 1960(2), End = 2014(12)
##
## Call:
## dynlm(formula = Y.t ~ L(Y.t, k = 1) + P.t + trend(Y.t) + season(Y.t))
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7897 -0.9825 -0.1185 0.8224 16.8331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2414380 0.3640905 3.410 0.000691 ***
## L(Y.t, k = 1) 0.9291221 0.0143451 64.769 < 2e-16 ***
## P.t 3.8381630 2.3366191 1.643 0.100951
## trend(Y.t) -0.0009279 0.0056352 -0.165 0.869268
## season(Y.t)Feb 1.9282450 0.4392191 4.390 1.32e-05 ***
## season(Y.t)Mar 4.8968263 0.4415365 11.090 < 2e-16 ***
## season(Y.t)Apr 3.8173305 0.4560321 8.371 3.57e-16 ***
## season(Y.t)May 5.1909476 0.4742097 10.947 < 2e-16 ***
## season(Y.t)Jun 3.4254915 0.5056646 6.774 2.82e-11 ***
## season(Y.t)Jul 1.6262658 0.5258237 3.093 0.002069 **
## season(Y.t)Aug -2.0905594 0.5350045 -3.908 0.000103 ***
## season(Y.t)Sep -4.6402404 0.5123224 -9.057 < 2e-16 ***
## season(Y.t)Oct -5.7692334 0.4779000 -12.072 < 2e-16 ***
## season(Y.t)Nov -5.0734497 0.4509007 -11.252 < 2e-16 ***
## season(Y.t)Dec -2.8250704 0.4403092 -6.416 2.71e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.292 on 644 degrees of freedom
## Multiple R-squared: 0.9465, Adjusted R-squared: 0.9454
## F-statistic: 814.5 on 14 and 644 DF, p-value: < 2.2e-16
checkresiduals(Dyn.model)
##
## Breusch-Godfrey test for serial correlation of order up to 24
##
## data: Residuals
## LM test = 131.99, df = 24, p-value < 2.2e-16
Most importantly, pulse (P.t) is insignificant at time 2013 (p=0.15 > 0.05). Although model is significant, since there is no significant intervention point that would change the mean level of the series, Dynamic Linear model is not suitable/necessary for our Solar Radiation time series.
In exponential smoothing, we represent the trend component as a combination of a level term \((ℓ)\) and a growth term \((b)\). The level and growth can be combined in a number of ways, giving different trend types and hence, different exponential smoothing methods. They are namely,
NOTE - All variations of Exponential Smoothing Methods will be tested for exercise purpose although we know a model with seasonal component should be the best fit (namely Holt-Winters’ Exponential smoothing variant).
Here, trend and seasonal component are not considered. We have just one smoothing constant, \(\alpha\) which will be decide by the software. Lets fit the Simple exponential smoothing model and get its forecasts, look at its summary, and check violation of residuals assumptions.
First. lets fit simple exponential model,
Here, argument initial is set to simple so the initial values are obtained using simple calculations on the first few observations.
ses.simple <- ses(solar, initial="simple", h=24) # We let the software estimate alpha. # h=24 for 2 years forecast
summary(ses.simple)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = solar, h = 24, initial = "simple")
##
## Smoothing parameters:
## alpha = 1
##
## Initial states:
## l = 5.0517
##
## sigma: 4.5688
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0001462894 4.568777 3.876091 -5.211851 27.29823 0.636771
## ACF1
## Training set 0.6677846
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.14828 -0.7068426 11.00340 -3.806357 14.10292
## Feb 2015 5.14828 -3.1321139 13.42867 -7.515490 17.81205
## Mar 2015 5.14828 -4.9930900 15.28965 -10.361607 20.65817
## Apr 2015 5.14828 -6.5619655 16.85853 -12.760995 23.05756
## May 2015 5.14828 -7.9441725 18.24073 -14.874898 25.17146
## Jun 2015 5.14828 -9.1937832 19.49034 -16.786013 27.08257
## Jul 2015 5.14828 -10.3429188 20.63948 -18.543464 28.84002
## Aug 2015 5.14828 -11.4125081 21.70907 -20.179260 30.47582
## Sep 2015 5.14828 -12.4170884 22.71365 -21.715633 32.01219
## Oct 2015 5.14828 -13.3672440 23.66380 -23.168771 33.46533
## Nov 2015 5.14828 -14.2709655 24.56753 -24.550893 34.84745
## Dec 2015 5.14828 -15.1344604 25.43102 -25.871495 36.16806
## Jan 2016 5.14828 -15.9626655 26.25923 -27.138125 37.43469
## Feb 2016 5.14828 -16.7595836 27.05614 -28.356906 38.65347
## Mar 2016 5.14828 -17.5285132 27.82507 -29.532883 39.82944
## Apr 2016 5.14828 -18.2722113 28.56877 -30.670271 40.96683
## May 2016 5.14828 -18.9930099 29.28957 -31.772637 42.06920
## Jun 2016 5.14828 -19.6929024 29.98946 -32.843030 43.13959
## Jul 2016 5.14828 -20.3736087 30.67017 -33.884081 44.18064
## Aug 2016 5.14828 -21.0366254 31.33319 -34.898077 45.19464
## Sep 2016 5.14828 -21.6832636 31.97982 -35.887025 46.18359
## Oct 2016 5.14828 -22.3146804 32.61124 -36.852694 47.14925
## Nov 2016 5.14828 -22.9319027 33.22846 -37.796654 48.09321
## Dec 2016 5.14828 -23.5358467 33.83241 -38.720306 49.01687
checkresiduals(ses.simple)
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 4227.3, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Now, lets fit optimal exponential model,
Here, argument initial is set to optimal so the initial values are optimized along with the smoothing parameters.
ses.optimal <- ses(solar, initial="optimal", h=24)
summary(ses.optimal)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = solar, h = 24, initial = "optimal")
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 5.0575
##
## sigma: 4.576
##
## AIC AICc BIC
## 6296.371 6296.407 6309.847
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0001378357 4.569082 3.876391 -5.213129 27.30052 0.6368203
## ACF1
## Training set 0.6678374
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.148434 -0.7159726 11.01284 -3.820402 14.11727
## Feb 2015 5.148434 -3.1446744 13.44154 -7.534781 17.83165
## Mar 2015 5.148434 -5.0083386 15.30521 -10.385009 20.68188
## Apr 2015 5.148434 -6.5794989 16.87637 -12.787891 23.08476
## May 2015 5.148434 -7.9637280 18.26060 -14.904887 25.20175
## Jun 2015 5.148434 -9.2151719 19.51204 -16.818805 27.11567
## Jul 2015 5.148434 -10.3659966 20.66286 -18.578840 28.87571
## Aug 2015 5.148434 -11.4371603 21.73403 -20.217043 30.51391
## Sep 2015 5.148434 -12.4432208 22.74009 -21.755680 32.05255
## Oct 2015 5.148434 -13.3947777 23.69165 -23.210961 33.50783
## Nov 2015 5.148434 -14.2998328 24.59670 -24.595123 34.89199
## Dec 2015 5.148434 -15.1646028 25.46147 -25.917675 36.21454
## Jan 2016 5.148434 -15.9940315 26.29090 -27.186176 37.48304
## Feb 2016 5.148434 -16.7921272 27.08899 -28.406759 38.70363
## Mar 2016 5.148434 -17.5621937 27.85906 -29.584474 39.88134
## Apr 2016 5.148434 -18.3069916 28.60386 -30.723544 41.02041
## May 2016 5.148434 -19.0288564 29.32572 -31.827541 42.12441
## Jun 2016 5.148434 -19.7297844 30.02665 -32.899518 43.19638
## Jul 2016 5.148434 -20.4114982 30.70837 -33.942109 44.23898
## Aug 2016 5.148434 -21.0754962 31.37236 -34.957606 45.25447
## Sep 2016 5.148434 -21.7230918 32.01996 -35.948018 46.24489
## Oct 2016 5.148434 -22.3554435 32.65231 -36.915117 47.21198
## Nov 2016 5.148434 -22.9735798 33.27045 -37.860474 48.15734
## Dec 2016 5.148434 -23.5784183 33.87529 -38.785495 49.08236
checkresiduals(ses.optimal)
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 4227.6, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Simple exponential smoothing model, ses.simple is
better as per MASE measure (0.636771) compared to Optimal exponential
smoothing model, ses.optimal.
Moving onto the next method, lets add Trend components to the model
using Holt’s Linear models,
Holt’s linear method extends simple exponential smoothing to linear exponential smoothing to be able include trend to the forecasting model. Here, we have two smoothing constants \(\alpha\) and \(\beta*\). Lets fit the models and get forecasts,
First, lets fit Holt’s simple Linear trend model,
holt.linear <- holt(solar, initial="simple", h=24) # Let the software estimate both alpha and beta
summary(holt.linear)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = solar, h = 24, initial = "simple")
##
## Smoothing parameters:
## alpha = 0.9165
## beta = 1
##
## Initial states:
## l = 5.0517
## b = 1.3641
##
## sigma: 3.6493
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.004941411 3.649286 2.806374 6.556498 21.13578 0.4610361
## ACF1
## Training set 0.06518525
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 3.3755382 -1.301210 8.052286 -3.77693 10.52801
## Feb 2015 1.7507183 -8.358924 11.860360 -13.71065 17.21208
## Mar 2015 0.1258983 -16.851847 17.103643 -25.83932 26.09112
## Apr 2015 -1.4989216 -26.473564 23.475721 -39.69434 36.69650
## May 2015 -3.1237415 -37.070990 30.823507 -55.04158 48.79409
## Jun 2015 -4.7485614 -48.543955 39.046832 -71.72784 62.23071
## Jul 2015 -6.3733813 -60.819126 48.072363 -89.64096 76.89420
## Aug 2015 -7.9982012 -73.839431 57.843028 -108.69367 92.69727
## Sep 2015 -9.6230212 -87.558671 68.312629 -128.81531 109.56927
## Oct 2015 -11.2478411 -101.938399 79.442717 -149.94708 127.45140
## Nov 2015 -12.8726610 -116.945936 91.200614 -172.03900 146.29368
## Dec 2015 -14.4974809 -132.553051 103.558089 -195.04789 166.05293
## Jan 2016 -16.1223008 -148.735022 116.490421 -218.93596 186.69135
## Feb 2016 -17.7471207 -165.469968 129.975727 -243.66972 208.17548
## Mar 2016 -19.3719407 -182.738335 143.994454 -269.21928 230.47540
## Apr 2016 -20.9967606 -200.522511 158.528990 -295.55770 253.56418
## May 2016 -22.6215805 -218.806526 173.563365 -322.66056 277.41740
## Jun 2016 -24.2464004 -237.575807 189.083006 -350.50557 302.01277
## Jul 2016 -25.8712203 -256.816988 205.074547 -379.07229 327.32985
## Aug 2016 -27.4960402 -276.517752 221.525671 -408.34188 353.34980
## Sep 2016 -29.1208602 -296.666697 238.424977 -438.29691 380.05519
## Oct 2016 -30.7456801 -317.253233 255.761873 -468.92117 407.42981
## Nov 2016 -32.3705000 -338.267485 273.526485 -500.19957 435.45857
## Dec 2016 -33.9953199 -359.700221 291.709581 -532.11798 464.12734
checkresiduals(holt.linear)
##
## Ljung-Box test
##
## data: Residuals from Holt's method
## Q* = 1096.3, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Next, lets fit Holt’s exponential trend model,
holt.exponential <- holt(solar, initial="optimal", exponential=TRUE, h=24) # Fit with exponential trend
summary(holt.exponential)
##
## Forecast method: Holt's method with exponential trend
##
## Model Information:
## Holt's method with exponential trend
##
## Call:
## holt(y = solar, h = 24, initial = "optimal", exponential = TRUE)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
##
## Initial states:
## l = 10.1941
## b = 0.9715
##
## sigma: 0.403
##
## AIC AICc BIC
## 6643.390 6643.482 6665.851
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.4447509 4.54425 3.884459 -2.682554 26.9491 0.6381458 0.6667328
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.029012 2.5068866555 7.616436 1.166637e+00 8.973039
## Feb 2015 4.912373 1.5638104729 8.747940 4.247095e-01 11.664917
## Mar 2015 4.798439 1.0138290790 9.657913 1.153330e-01 13.877949
## Apr 2015 4.687148 0.7071989937 10.044869 1.542664e-02 15.960335
## May 2015 4.578438 0.4635091608 10.724293 7.998659e-04 17.384233
## Jun 2015 4.472249 0.2876557241 10.920593 3.599922e-04 19.304545
## Jul 2015 4.368523 0.1883685698 11.067905 2.826226e-04 20.597591
## Aug 2015 4.267203 0.1357149098 11.183575 2.584416e-04 21.285425
## Sep 2015 4.168233 0.0940697518 11.032302 1.808325e-04 22.270314
## Oct 2015 4.071559 0.0664038505 10.921987 1.397896e-04 23.846441
## Nov 2015 3.977126 0.0404928579 10.908680 9.750135e-05 24.069262
## Dec 2015 3.884884 0.0262778497 10.652709 6.035473e-05 27.153107
## Jan 2016 3.794781 0.0132672953 10.440600 3.540944e-05 25.934933
## Feb 2016 3.706768 0.0076045199 9.783185 3.092623e-05 27.127076
## Mar 2016 3.620796 0.0037146916 9.238606 2.351276e-05 26.483187
## Apr 2016 3.536818 0.0018158312 8.794213 1.731913e-05 27.434843
## May 2016 3.454788 0.0010694488 8.479348 1.226505e-05 26.681559
## Jun 2016 3.374660 0.0006744193 8.081265 1.139870e-05 25.257479
## Jul 2016 3.296391 0.0004422453 7.712582 8.640441e-06 25.221105
## Aug 2016 3.219937 0.0003454530 7.288778 6.697549e-06 23.472759
## Sep 2016 3.145257 0.0002394670 6.750946 5.071337e-06 24.847122
## Oct 2016 3.072308 0.0001750666 6.313403 3.479275e-06 22.841283
## Nov 2016 3.001051 0.0001384299 6.131601 2.718743e-06 23.661790
## Dec 2016 2.931447 0.0001029620 5.990724 1.779186e-06 24.923954
checkresiduals(holt.exponential)
##
## Ljung-Box test
##
## data: Residuals from Holt's method with exponential trend
## Q* = 1316.7, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Note, we use initial=“optimal” since we get “optimization failure” when using simple method.
Next, lets fit Holt’s Damped trend model,
holt.damped <- holt(solar, damped=TRUE, initial="simple", h=24) # Fit with additive damped trend
summary(holt.damped)
##
## Forecast method: Damped Holt's method
##
## Model Information:
## Damped Holt's method
##
## Call:
## holt(y = solar, h = 24, damped = TRUE, initial = "simple")
##
## Smoothing parameters:
## alpha = 0.9278
## beta = 0.9278
## phi = 0.8
##
## Initial states:
## l = 13.7194
## b = 2.5785
##
## sigma: 3.4657
##
## AIC AICc BIC
## 5932.524 5932.652 5959.477
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.00831309 3.452592 2.638613 3.791232 19.25658 0.433476
## ACF1
## Training set 0.07287524
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 3.77468888 -0.6668422 8.21622 -3.018047 10.56742
## Feb 2015 2.73586859 -5.9098357 11.38157 -10.486595 15.95833
## Mar 2015 1.90481212 -11.3560708 15.16570 -18.375958 22.18558
## Apr 2015 1.23996674 -16.7554651 19.23540 -26.281671 28.76160
## May 2015 0.70809029 -22.0017301 23.41791 -34.023583 35.43976
## Jun 2015 0.28258901 -27.0463637 27.61154 -41.513437 42.07862
## Jul 2015 -0.05781212 -31.8700954 31.75447 -48.710501 48.59488
## Aug 2015 -0.33013310 -36.4696231 35.80936 -55.600714 54.94045
## Sep 2015 -0.54798994 -40.8505077 39.75453 -62.185372 61.08939
## Oct 2015 -0.72227547 -45.0230940 43.57854 -68.474531 67.02998
## Nov 2015 -0.86170394 -49.0000892 47.27668 -74.483011 72.75960
## Dec 2015 -0.97324674 -52.7951082 50.84861 -80.227945 78.28145
## Jan 2016 -1.06248101 -56.4218036 54.29684 -85.727259 83.60230
## Feb 2016 -1.13386844 -59.8933599 57.62562 -90.998756 88.73102
## Mar 2016 -1.19097841 -63.2222137 60.84026 -96.059566 93.67761
## Apr 2016 -1.23666639 -66.4199175 63.94658 -100.925846 98.45251
## May 2016 -1.27321679 -69.4970922 66.95066 -105.612630 103.06620
## Jun 2016 -1.30245712 -72.4634340 69.85852 -110.133780 107.52887
## Jul 2016 -1.32584939 -75.3277524 72.67605 -114.501994 111.85029
## Aug 2016 -1.34456321 -78.0980276 75.40890 -118.728857 116.03973
## Sep 2016 -1.35953427 -80.7814739 78.06241 -122.824909 120.10584
## Oct 2016 -1.37151112 -83.3846076 80.64159 -126.799718 124.05670
## Nov 2016 -1.38109261 -85.9133126 83.15113 -130.661967 127.89978
## Dec 2016 -1.38875780 -88.3729032 85.59539 -134.419528 131.64201
checkresiduals(holt.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt's method
## Q* = 1134.4, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Holt’s Damped Linear smoothing model, \(holt.damped\) with MASE measure of 0.433476
is best Holt’s Linearly smoothed model.
Moving onto the next method, lets add Seasonal components to the
model using Holt-Winters’ Trend and Seasonality Method,
Since our Solar Radiation response time series has a seasonality, Holt’s-Winter method provides a solution to the model fitting problem. Holt-Winters’ method has the smoothing equations for the level, for trend, and seasonality. There are two different Holt-Winters’ methods, depending on whether seasonality is modelled in an additive or multiplicative way. Here, we have three smoothing constants \(\alpha\) , \(\beta*\), and \(\gamma\). Lets fit the models and get forecasts,
We will look at 5 cases,
(Note, additive seasonality with exponential trend is a forbidden model combination, hence not considered)
First, lets look at the additive seasonality case,
a. Additive seasonality case without damping -
Add.hw <- hw(solar,seasonal="additive", h=2*frequency(solar))
summary(Add.hw)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "additive")
##
## Smoothing parameters:
## alpha = 0.9993
## beta = 0.0019
## gamma = 1e-04
##
## Initial states:
## l = 11.3306
## b = 0.209
## s = -10.7542 -8.4968 -3.227 3.0768 7.9264 10.9122
## 10.1422 7.3322 2.2353 -1.9528 -7.3626 -9.8316
##
## sigma: 2.3575
##
## AIC AICc BIC
## 5434.708 5435.662 5511.076
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.1179476 2.328736 1.504489 -2.230359 12.68101 0.24716 0.1703355
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 6.127399 3.1061581 9.148641 1.506810 10.74799
## Feb 2015 8.655602 4.3802222 12.930983 2.116973 15.19423
## Mar 2015 14.122221 8.8814747 19.362968 6.107191 22.13725
## Apr 2015 18.366625 12.3095941 24.423656 9.103196 27.63005
## May 2015 23.522264 16.7439512 30.300578 13.155729 33.88880
## Jun 2015 26.390991 18.9586817 33.823300 15.024255 37.75773
## Jul 2015 27.219228 19.1837595 35.254697 14.930039 39.50842
## Aug 2015 24.290519 15.6920179 32.889020 11.140246 37.44079
## Sep 2015 19.495874 10.3670365 28.624711 5.534522 33.45723
## Oct 2015 13.253765 3.6218809 22.885648 -1.476930 27.98446
## Nov 2015 8.042144 -2.0695730 18.153861 -7.422393 23.50668
## Dec 2015 5.840110 -4.7313921 16.411612 -10.327607 22.00783
## Jan 2016 6.819616 -4.1942247 17.833457 -10.024600 23.66383
## Feb 2016 9.347819 -2.0927719 20.788410 -8.149055 26.84469
## Mar 2016 14.814438 2.9609172 26.667959 -3.313958 32.94283
## Apr 2016 19.058842 6.8048110 31.312872 0.317919 37.79976
## May 2016 24.214481 11.5711780 36.857784 4.878218 43.55074
## Jun 2016 27.083208 14.0608586 40.105557 7.167243 46.99917
## Jul 2016 27.911445 14.5194061 41.303485 7.430089 48.39280
## Aug 2016 24.982736 11.2296053 38.735867 3.949138 46.01633
## Sep 2016 20.188091 6.0818048 34.294377 -1.385612 41.76179
## Oct 2016 13.945981 -0.5061083 28.398071 -8.156582 36.04855
## Nov 2016 8.734361 -6.0566988 23.525420 -13.886613 31.35533
## Dec 2016 6.532327 -8.5913309 21.655984 -16.597311 29.66196
checkresiduals(Add.hw)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 205.55, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Lets consider damped = TRUE,
b. Additive seasonality case with damping -
Add.hw.damped <- hw(solar,seasonal="additive",damped = TRUE, h=2*frequency(solar))
summary(Add.hw.damped)
##
## Forecast method: Damped Holt-Winters' additive method
##
## Model Information:
## Damped Holt-Winters' additive method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "additive",
##
## Call:
## damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.945198 2.940530 8.949866 1.3499548 10.54044
## Feb 2015 8.479778 4.230548 12.729009 1.9811413 14.97842
## Mar 2015 13.784309 8.579902 18.988716 5.8248548 21.74376
## Apr 2015 17.598568 11.588776 23.608360 8.4073847 26.78975
## May 2015 22.632258 15.912803 29.351713 12.3557383 32.90878
## Jun 2015 25.599150 18.238024 32.960277 14.3412786 36.85702
## Jul 2015 26.761775 18.810497 34.713053 14.6013444 38.92221
## Aug 2015 23.722251 15.221610 32.222892 10.7216429 36.72286
## Sep 2015 18.222624 9.205955 27.239293 4.4328191 32.01243
## Oct 2015 12.293259 2.788471 21.798047 -2.2430606 26.82958
## Nov 2015 7.504219 -2.464877 17.473315 -7.7421977 22.75064
## Dec 2015 5.150488 -5.262286 15.563263 -10.7744758 21.07545
## Jan 2016 5.947275 -4.891175 16.785726 -10.6287044 22.52326
## Feb 2016 8.481728 -2.766251 19.729707 -8.7205712 25.68403
## Mar 2016 13.786139 2.142988 25.429291 -4.0205242 31.59280
## Apr 2016 17.600287 5.574906 29.625668 -0.7909464 35.99152
## May 2016 22.633871 10.238009 35.029734 3.6760353 41.59171
## Jun 2016 25.600665 12.845046 38.356283 6.0926298 45.10870
## Jul 2016 26.763197 13.657667 39.868726 6.7200186 46.80637
## Aug 2016 23.723586 10.277223 37.169950 3.1591478 44.28802
## Sep 2016 18.223877 4.445085 32.002669 -2.8489663 39.29672
## Oct 2016 12.294435 -1.808972 26.397843 -9.2748652 33.86374
## Nov 2016 7.505324 -6.915414 21.926061 -14.5492911 29.55994
## Dec 2016 5.151525 -9.579726 19.882776 -17.3779790 27.68103
checkresiduals(Add.hw.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' additive method
## Q* = 210.76, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Next, lets fit Multiplicative seasonality case,
a. without damping -
Mul.hw <- hw(solar,seasonal="multiplicative", h=2*frequency(solar))
summary(Mul.hw)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.8251
## beta = 1e-04
## gamma = 0.0259
##
## Initial states:
## l = 10.5659
## b = -0.2371
## s = 0.5024 0.6292 0.9319 1.2229 1.4923 1.6309
## 1.5606 1.3672 1.0278 0.8128 0.5106 0.3114
##
## sigma: 0.3971
##
## AIC AICc BIC
## 6648.746 6649.699 6725.114
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04281136 2.12539 1.359297 -0.4896895 10.87401 0.2233077
## ACF1
## Training set 0.06551473
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.610335 2.75534413 8.465326 1.2440033 9.976666
## Feb 2015 6.512806 2.03809262 10.987520 -0.3306776 13.356290
## Mar 2015 8.786120 1.33667747 16.235562 -2.6068190 20.179058
## Apr 2015 10.192785 -0.02672876 20.412298 -5.4366123 25.822181
## May 2015 12.512597 -1.96157245 26.986766 -9.6237347 34.648928
## Jun 2015 14.061609 -4.41048190 32.533701 -14.1890164 42.312235
## Jul 2015 14.502088 -6.89909775 35.903274 -18.2282011 47.232377
## Aug 2015 12.945620 -8.35024040 34.241481 -19.6235881 45.514829
## Sep 2015 10.352210 -8.52365174 29.228072 -18.5159294 39.220350
## Oct 2015 7.418279 -7.51119146 22.347750 -15.4143760 30.250935
## Nov 2015 4.989730 -6.05900058 16.038460 -11.9078451 21.887305
## Dec 2015 3.871776 -5.53876625 13.282319 -10.5204066 18.263960
## Jan 2016 4.199400 -7.01753622 15.416336 -12.9554236 21.354224
## Feb 2016 4.838892 -9.30305408 18.980837 -16.7893479 26.467131
## Mar 2016 6.477174 -14.21845674 27.172805 -25.1740619 38.128410
## Apr 2016 7.452633 -18.56883703 33.474103 -32.3437711 47.249037
## May 2016 9.069749 -25.52978297 43.669280 -43.8456686 61.985166
## Jun 2016 10.099485 -31.99838423 52.197355 -54.2836502 74.482621
## Jul 2016 10.315199 -36.68017471 57.310572 -61.5580226 82.188420
## Aug 2016 9.113765 -36.29212001 54.519651 -60.3285438 78.556075
## Sep 2016 7.208700 -32.09306283 46.510463 -52.8981594 67.315559
## Oct 2016 5.105869 -25.38360449 35.595343 -41.5237568 51.735495
## Nov 2016 3.391945 -18.81647619 25.600367 -30.5729044 37.356795
## Dec 2016 2.597256 -16.07148862 21.266001 -25.9541251 31.148637
checkresiduals(Mul.hw)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 38.585, df = 24, p-value = 0.03017
##
## Model df: 0. Total lags used: 24
Model summary:
Lets consider exponential = TRUE,
b. with exponential trend -
Mul.hw.exponential <- hw(solar,seasonal="multiplicative",exponential = TRUE, h=2*frequency(solar))
summary(Mul.hw.exponential)
##
## Forecast method: Holt-Winters' multiplicative method with exponential trend
##
## Model Information:
## Holt-Winters' multiplicative method with exponential trend
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",
##
## Call:
## exponential = TRUE)
##
## Smoothing parameters:
## alpha = 0.6499
## beta = 1e-04
## gamma = 0.0597
##
## Initial states:
## l = 9.3759
## b = 0.9889
## s = 0.3552 0.4789 0.952 1.0553 1.4608 1.7926
## 1.6746 1.4529 1.1145 0.8672 0.5333 0.2627
##
## sigma: 0.3755
##
## AIC AICc BIC
## 6584.208 6585.161 6660.576
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.06388152 2.208727 1.412454 -1.303792 11.28449 0.2320404 0.153871
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.797870 3.0225785 8.655128 1.5953091 10.27729
## Feb 2015 7.513359 3.4446038 12.167944 1.7660425 15.04686
## Mar 2015 10.758212 4.5474018 18.437599 2.4083105 24.39620
## Apr 2015 12.694087 4.5829956 22.789803 2.3481929 31.00817
## May 2015 15.340074 5.0799785 28.343220 2.4687611 39.90732
## Jun 2015 17.239431 5.1940126 33.376917 2.5711281 48.85535
## Jul 2015 18.004700 4.9174320 35.567771 2.4922460 52.66011
## Aug 2015 16.206596 3.9968415 32.559625 1.8097764 52.43149
## Sep 2015 13.011256 2.9744149 26.094173 1.3521960 43.20503
## Oct 2015 9.215869 1.9597009 18.940531 0.8110330 32.17512
## Nov 2015 6.214762 1.2069105 13.414478 0.5349731 23.25752
## Dec 2015 4.565667 0.8170558 9.898061 0.3460422 17.44680
## Jan 2016 5.220661 0.8327072 11.462953 0.3524989 21.25954
## Feb 2016 6.765364 1.0227657 15.427153 0.4391631 28.16354
## Mar 2016 9.687175 1.3688948 22.011580 0.5718432 42.64750
## Apr 2016 11.430324 1.5555689 26.126315 0.6247032 51.62845
## May 2016 13.812889 1.7321490 32.114327 0.6906114 63.65349
## Jun 2016 15.523155 1.8404312 36.084656 0.6717126 71.52788
## Jul 2016 16.212237 1.8029743 37.928010 0.7106782 78.66008
## Aug 2016 14.593143 1.4567612 34.445917 0.4848201 68.99487
## Sep 2016 11.715916 1.1210135 27.274795 0.4041534 57.28896
## Oct 2016 8.298381 0.7333143 19.651162 0.2583716 43.07387
## Nov 2016 5.596049 0.4717835 13.110578 0.1518328 28.24525
## Dec 2016 4.111130 0.3132286 9.682702 0.1067219 20.79399
checkresiduals(Mul.hw.exponential)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method with exponential trend
## Q* = 21.246, df = 24, p-value = 0.6242
##
## Model df: 0. Total lags used: 24
Model summary:
Lets consider damped = TRUE,
c. with damping (BEST EXPONENTIAL SMOOTHING/OVERALL
REGRESSION MODEL) -
Mul.hw.damped <- hw(solar,seasonal="multiplicative", damped = TRUE, h=2*frequency(solar))
summary(Mul.hw.damped)
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = solar, h = 2 * frequency(solar), seasonal = "multiplicative",
##
## Call:
## damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.805
## beta = 1e-04
## gamma = 0.0124
## phi = 0.94
##
## Initial states:
## l = 9.9033
## b = 0.4703
## s = 0.4341 0.5649 0.8263 1.1716 1.4514 1.6316
## 1.5503 1.3834 1.1045 0.8837 0.5761 0.4221
##
## sigma: 0.3145
##
## AIC AICc BIC
## 6366.701 6367.768 6447.561
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0456208 2.045442 1.239102 -2.172476 10.00296 0.2035619
## ACF1
## Training set 0.04177493
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2015 5.611616 3.3502090 7.873023 2.1530925 9.070139
## Feb 2015 7.060438 3.3373603 10.783517 1.3664818 12.754395
## Mar 2015 10.515176 3.8550902 17.175263 0.3294535 20.700899
## Apr 2015 12.993446 3.5145037 22.472389 -1.5033454 27.490238
## May 2015 16.284381 2.9386340 29.630128 -4.1261776 36.694939
## Jun 2015 18.261020 1.7242212 34.797820 -7.0298315 43.551872
## Jul 2015 19.031597 0.2101276 37.853067 -9.7533565 47.816551
## Aug 2015 17.105654 -1.2074754 35.418784 -10.9018606 45.113169
## Sep 2015 13.764743 -2.0801991 29.609686 -10.4680049 37.997491
## Oct 2015 9.774733 -2.2585804 21.808047 -8.6286319 28.178099
## Nov 2015 6.581938 -2.0456267 15.209503 -6.6127835 19.776659
## Dec 2015 5.163049 -2.0168711 12.342970 -5.8176914 16.143790
## Jan 2016 5.616270 -2.6571166 13.889656 -7.0367828 18.269322
## Feb 2016 7.066360 -3.9144360 18.047156 -9.7273184 23.860039
## Mar 2016 10.524088 -6.6891829 27.737360 -15.8013383 36.849515
## Apr 2016 13.004566 -9.3405081 35.349641 -21.1692761 47.178409
## May 2016 16.298445 -13.0720043 45.668893 -28.6197807 61.216670
## Jun 2016 18.276925 -16.2140342 52.767885 -34.4724452 71.026296
## Jul 2016 19.048304 -18.5466197 56.643228 -38.4481704 76.544779
## Aug 2016 17.120781 -18.1782620 52.419825 -36.8644468 71.106009
## Sep 2016 13.777000 -15.8651593 43.419159 -31.5567705 59.110770
## Oct 2016 9.783493 -12.1627698 31.729756 -23.7804195 43.347405
## Nov 2016 6.587872 -8.8063951 21.982139 -16.9556278 30.131371
## Dec 2016 5.167730 -7.4021686 17.737629 -14.0562710 24.391731
checkresiduals(Mul.hw.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 24, p-value = 0.4864
##
## Model df: 0. Total lags used: 24
Model summary:
Holt-Winters Multiplicative Seasonal model with Damped
trend, \(Mul.hw.damped\) with
MASE measure of 0.2035619 is best Holt-Winters’ model.
Lets summarise all the Exponential Smoothing methods and their models considered with their corresponding forecast accuracy measure, MASE,
The best model among the 3 types of Exponential Smoothing models
considered were,
From the above results, clearly our Solar Radiation time series has a damped trend. The seasonal component is multiplicative than additive unlike what was inferred from the classical decomposition analysis. *The best fitting model so far is the Holt-Winters’ multiplicative damped smoothing model with lowest MASE measure of 0.2035619.
The best Exponential smoothing model which gives the most accurate forecasting based on the MASE measure is the Holt-Winters’ multiplicative model with damped trend (\(Mul.hw.damped\)) with MASE measure of 0.2035619.
State-Space models are basically Exponential smoothing models but
with Error components taken into account. State-Space models uses the
innovations formulation of the model where \(Y_t\) denotes the observation at time t,
and let \(X_t\) is the state vector
that describes the level, trend and seasonality of the series. A
linear (Additive homoscedastic errors ) innovations
state-space model is written as,
\(Y_t = \omega'X_t +
\epsilon_t\),
where \(\epsilon_t\) is a white
noise series
Nonlinear (Multiplicative heteroscedastic errors
terms) state-space models are also possible and have the form,
\(Y_t = \omega'(X_{t-1}) +
r(X_{t-1})\epsilon_t\),
Lets first look at the linear (Additive errors) State Space models,
The State-Space models are generally depicted as ETS(Z,Z,Z), where Z can be None, Additive or Multiplicative (and damped). For example, ETS(A,N,N) is a state-space model with Additive error terms and no trend or seasonal components,
Since we do not know if the errors are additive or multiplicative, we will test both of these cases.
We will look at the following cases and compare them with their respective Exponential Smoothed cases withour error component,
Since many other variations are
Since our response, Solar Radiation has seasonal component, we
This Additive error model has no trend or seasonal component.
Note - This is also the Holt’s Local Level Model:
ETS(A,N,N). In this model, at any point along the data path,
the values in the neighbourhood of the point are approximated by a short
flat line representing what is referred to as a local level. This model
corresponds to the simple exponential smoothing model with additive
error terms. We use the ets() function to fit the model.
Add.SS.ANN = ets(solar, model="ANN")
summary(Add.SS.ANN)
## ETS(A,N,N)
##
## Call:
## ets(y = solar, model = "ANN")
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 5.0575
##
## sigma: 4.576
##
## AIC AICc BIC
## 6296.371 6296.407 6309.847
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0001378357 4.569082 3.876391 -5.213129 27.30052 0.6368203
## ACF1
## Training set 0.6678374
checkresiduals(Add.SS.ANN)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 4227.6, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
This Additive error model has Additive trend and no seasonal
component. Note - This is also the Holt’s Local Trend Model:
ETS(A,A,N). In this model, at each point, the data path is
approximated by a tangential line in the deterministic case. We can fit
3 different models by considering damping and drift aspects,
x. General ETS(A,A,N) -
Add.SS.AAN = ets(solar, model="AAN")
summary(Add.SS.AAN)
## ETS(A,Ad,N)
##
## Call:
## ets(y = solar, model = "AAN")
##
## Smoothing parameters:
## alpha = 0.9279
## beta = 0.9279
## phi = 0.8
##
## Initial states:
## l = 13.7196
## b = 2.5786
##
## sigma: 3.4657
##
## AIC AICc BIC
## 5932.524 5932.653 5959.478
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.008312543 3.452593 2.638571 3.790894 19.25631 0.4334691
## ACF1
## Training set 0.07272876
checkresiduals(Add.SS.AAN)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 1134, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
y. ETS(A,A,N) with drift -
A local trend model allows the growth rate to change stochastically over time. If \(\beta\)=0, the growth rate is constant and equal to a value that will be denoted by b. This model is hence the local level model with drift and is also known as simple exponential smoothing with drift. We use ets() function with beta=0 to fit this model. We set the argument beta to a value very close to zero.
Add.SS.AAN.drift = ets(solar, model="AAN", beta = 1E-4)
summary(Add.SS.AAN.drift)
## ETS(A,A,N)
##
## Call:
## ets(y = solar, model = "AAN", beta = 1e-04)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
##
## Initial states:
## l = 11.356
## b = -0.0206
##
## sigma: 4.5898
##
## AIC AICc BIC
## 6300.338 6300.399 6318.307
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.009889528 4.575899 3.887095 -5.243348 27.46947 0.6385788
## ACF1
## Training set 0.6652999
checkresiduals(Add.SS.AAN.drift)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,N)
## Q* = 4200.2, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
z. ETS(A,A,N) with damping -
Add.SS.AAN.damped = ets(solar, model="AAN")
summary(Add.SS.AAN.damped)
## ETS(A,Ad,N)
##
## Call:
## ets(y = solar, model = "AAN")
##
## Smoothing parameters:
## alpha = 0.9279
## beta = 0.9279
## phi = 0.8
##
## Initial states:
## l = 13.7196
## b = 2.5786
##
## sigma: 3.4657
##
## AIC AICc BIC
## 5932.524 5932.653 5959.478
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.008312543 3.452593 2.638571 3.790894 19.25631 0.4334691
## ACF1
## Training set 0.07272876
checkresiduals(Add.SS.AAN.damped)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 1134, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
This Additive error model has additive seasonal and no trend component.
Add.SS.ANA = ets(solar, model="ANA")
summary(Add.SS.ANA)
## ETS(A,N,A)
##
## Call:
## ets(y = solar, model = "ANA")
##
## Smoothing parameters:
## alpha = 0.9997
## gamma = 3e-04
##
## Initial states:
## l = 22.6142
## s = -10.2887 -8.5269 -4.0534 1.7184 7.355 10.8208
## 10.4774 7.7101 2.4749 -1.7156 -6.7936 -9.1783
##
## sigma: 2.3884
##
## AIC AICc BIC
## 5449.974 5450.719 5517.358
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0108645 2.362974 1.55041 -2.640142 12.95336 0.254704 0.1825724
checkresiduals(Add.SS.ANA)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 207.44, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
This Additive error model has Additive trend and Additive seasonal
component. Note - This is also the Local Additive Seasonal
Model: ETS(A,A,A).
Add.SS.AAA = ets(solar, model="AAA")
summary(Add.SS.AAA)
## ETS(A,Ad,A)
##
## Call:
## ets(y = solar, model = "AAA")
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
checkresiduals(Add.SS.AAA)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Lets not consider drift and damping for each model to maintain simplicity of reporting (The best model will be chosen as per auto-fit anyways).
This Multiplicative error model has no trend or seasonal component. Note - This is also the Holt-Winters’ Local Level Model: ETS(M,N,N)
Mul.SS.MNN = ets(solar, model="MNN")
summary(Mul.SS.MNN)
## ETS(M,N,N)
##
## Call:
## ets(y = solar, model = "MNN")
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 4.4856
##
## sigma: 0.3868
##
## AIC AICc BIC
## 6619.776 6619.812 6633.253
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001004352 4.569136 3.877241 -5.195978 27.31733 0.6369599
## ACF1
## Training set 0.6678785
checkresiduals(Mul.SS.MNN)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,N)
## Q* = 1334.8, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Comment -
Note - MNN model is bad compared to ANN model (MASE - 0.6368203 vs MASE - 0.636959). This is because, our time series is homoscedastic (equal variance of error terms) as seen from its time series plot. And we know for a homoscedastic series, a model with Additive error terms fit the series better than multiplicative error terms.
Mul.SS.MAN = ets(solar, model="MAN")
summary(Mul.SS.MAN)
## ETS(M,A,N)
##
## Call:
## ets(y = solar, model = "MAN")
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.0018
##
## Initial states:
## l = 7.8559
## b = 2.4488
##
## sigma: 0.3004
##
## AIC AICc BIC
## 6433.364 6433.456 6455.825
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -1.466339 4.822041 3.964894 -17.35094 30.69019 0.6513597 0.6697971
checkresiduals(Mul.SS.MAN)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,A,N)
## Q* = 2007.1, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Mul.SS.MAA = ets(solar, model="MAA")
summary(Mul.SS.MAA)
## ETS(M,Ad,A)
##
## Call:
## ets(y = solar, model = "MAA")
##
## Smoothing parameters:
## alpha = 0.5273
## beta = 0.0035
## gamma = 0.0104
## phi = 0.8
##
## Initial states:
## l = 11.5982
## b = 2.6304
## s = -10.4131 -6.5688 -2.8947 0.153 7.6029 8.4081
## 8.1855 8.4954 2.4863 -2.2582 -6.2468 -6.9497
##
## sigma: 0.3329
##
## AIC AICc BIC
## 6469.079 6470.146 6549.940
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02086702 3.282879 2.311939 -5.186754 18.98277 0.3798095
## ACF1
## Training set 0.551665
checkresiduals(Mul.SS.MAA)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,Ad,A)
## Q* = 346.85, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Note - This is also the Non-linear seasonal model, namely, Multiplicative Seasonal and Error Model: ETS(M,A,M). When the pattern of seasonality changes through time, linear seasonal structure cant be used. In that case, we use nonlinear seasonal models. It is hard to tell if our solar radiation series’s seasonality pattern changes through time or not. So, lets consider the nonlinear multiplicative seasonal component (MAM) model,
Mul.SS.MAM = ets(solar, model="MAM")
summary(Mul.SS.MAM)
## ETS(M,Ad,M)
##
## Call:
## ets(y = solar, model = "MAM")
##
## Smoothing parameters:
## alpha = 0.7922
## beta = 6e-04
## gamma = 0.0707
## phi = 0.98
##
## Initial states:
## l = 9.3693
## b = 1.2404
## s = 0.6953 0.3066 0.5802 0.9879 1.3594 1.5469
## 1.4723 1.4055 1.227 0.9573 0.6959 0.7656
##
## sigma: 0.2274
##
## AIC AICc BIC
## 5953.502 5954.569 6034.363
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.1750605 3.02791 1.961614 -5.497209 17.21119 0.3222574 0.2565219
checkresiduals(Mul.SS.MAM)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,Ad,M)
## Q* = 102.28, df = 24, p-value = 1.222e-11
##
## Model df: 0. Total lags used: 24
Model summary:
We know our series is seasonal, so not having seasonal component as none (MMN) should be a bad fit. Lets check anyways,
Mul.SS.MMN = ets(solar, model="MMN")
summary(Mul.SS.MMN)
## ETS(M,M,N)
##
## Call:
## ets(y = solar, model = "MMN")
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.0063
##
## Initial states:
## l = 9.1784
## b = 1.1871
##
## sigma: 0.3511
##
## AIC AICc BIC
## 6609.163 6609.255 6631.624
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -1.644301 5.221518 4.235504 -14.9362 30.65972 0.695816 0.6833568
checkresiduals(Mul.SS.MMN)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,M,N)
## Q* = 1371.3, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Model summary:
Lets summarise all the State-Space models considered with their corresponding forecast accuracy measure, MASE,
Best State-Space model with additive error term is ETS(A,A,A) with
MASE measure of 0.2461797.
Best State-Space model with Multiplicative error term is ETS(M,A,M)
with MASE measure of 0.3222574.
It seems in general models with Additive error components have better forecasting accuracy measure than their multiplicative error counterpart. It is important to note that, there are many other possible models which we have missed out. To avoid the hassle, we can autofit the best State-Space model using the below code,
autofit.SS = ets(solar)
summary(autofit.SS)
## ETS(A,Ad,A)
##
## Call:
## ets(y = solar)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9388
##
## Initial states:
## l = 11.154
## b = 0.7632
## s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
## 9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
##
## sigma: 2.3446
##
## AIC AICc BIC
## 5428.422 5429.489 5509.282
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
## ACF1
## Training set 0.1700724
checkresiduals(autofit.SS) # Autofit finds the ETS(A,Ad,A).
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The auto fitted model is the ETS(A,Ad,A) model having the least MASE value of 0.2461797 of all possible State space models.
The best State-space model which gives the most accurate forecasting based on the MASE measure is ETS(A,Ad,A) having lowest MASE measure of 0.2461797 of all possible State space models.
Based on the 4 Time series regression methods considered, the best
model as per MASE measure for each method is summarized below,
Best Time Series regression model is - Holt-Winters’ multiplicative Damped model (\(Mul.hw.damped\)) with MASE measure of 0.2035619.
Residual analysis to test model assumptions.
Lets perform a detailed Residual Analysis to check if any model assumptions have been violated.
The estimator error (or residual) is defined by:
\(\hat{\epsilon_i}\) = \(Y_i\) - \(\hat{Y_i}\) (i.e. observed value less - trend value)
The following problems are to be checked,
Lets first apply diagnostic check using checkresiduals() function,
checkresiduals(Mul.hw.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 24, p-value = 0.4864
##
## Model df: 0. Total lags used: 24
From the Residuals plot, linearity is not violated as the residuals are randomly distributed across the mean. Thus, linearity in distribution of error terms is not violated
To test mean value of residuals is zero or not, lets calculate mean value of residuals as,
mean(Mul.hw.damped$model$residuals)
## [1] 0.0190173
As mean value of residuals is close to 0, zero mean residuals is not violated.
Which has,
\(H_0\) : series
of residuals exhibit no serial autocorrelation of any order up to p
\(H_a\) : series of residuals
exhibit serial autocorrelation of any order up to p
From the Ljung-Box test output, since p (0.4864) > 0.05, we do not reject the null hypothesis of no serial autocorrelation.
Thus, according to this test and ACF plot, we can conclude that the serial correlation left in residuals is insignificant.
\(H_0\) : Time series is Normally
distributed
\(H_a\) : Time
series is not normal
shapiro.test(Mul.hw.damped$model$residuals)
##
## Shapiro-Wilk normality test
##
## data: Mul.hw.damped$model$residuals
## W = 0.3802, p-value < 2.2e-16
From the Shapiro-Wilk test, since p<0.05 significance level, we reject the null hypothesis that states the data is normal. Thus, residuals of Mul.hw.damped are Not normally distributed.
Summarizing residual analysis on \(full\) model:
Assumption 1: The error terms are randomly distributed and thus show
linearity: Not violated
Assumption 2:
The mean value of E is zero (zero mean residuals): Not
violated
Assumption 4: The error terms are
independently distributed, i.e. they are not autocorrelated:
Not violated
Assumption 5: The errors
are normally distributed. Violated
Although normality of residuals assumption is violated, ‘There is no normality assumption in fitting an exponential smoothing model’ (Rob Hyndman 2013). Having no residual assumptions’ violations, the Holt-Winters’ multiplicative Damped model is good for accurate forecasting. Lets forecast for the next 2 years,
Using the model with best MASE measure, Holt-Winters’ multiplicative Damped model (Mul.hw.damped), lets estimate and plot 2 years (2015 Jan - 2016 Dec) ahead forecasts. The following code chunk displays the fitted model, observed series and forecasts for 2 year time points ahead.
Note, Since we are forecasting using Holt-Winters’ model which does not incorporate regressor variables, we do not need the regressor (Precipitation) variable’s measures (Unlike Distributed Lag models)
Mul.hw.damped <- hw(solar,seasonal="multiplicative", damped = TRUE, h=2*frequency(solar)) # Revisit the fitted model
plot(Add.hw.damped, ylab="Monthly solar radiation reaching ground", main = "Forecasts from Holt-Winter's damped Multiplicative method", plot.conf=FALSE, type="l", fcol="white", xlab="Year")
lines(fitted(Mul.hw.damped), col="blue", lty=1)
lines(Mul.hw.damped$mean, type="l", col="blue")
legend("topleft",lty=1, pch=1, col=c(1,4),
c("Raw Data", "Holt Winters' fitted data"))
In the plot above, Blue lines after 2014 shows the forecasted 2 year
Solar Radiation data. The gray region around it shows the 95% and 80%
prediction intervals of the forecasts.
The point forecast values are,
Mul.hw.damped$mean
## Jan Feb Mar Apr May Jun Jul
## 2015 5.611616 7.060438 10.515176 12.993446 16.284381 18.261020 19.031597
## 2016 5.616270 7.066360 10.524088 13.004566 16.298445 18.276925 19.048304
## Aug Sep Oct Nov Dec
## 2015 17.105654 13.764743 9.774733 6.581938 5.163049
## 2016 17.120781 13.777000 9.783493 6.587872 5.167730
The dataset hold 2 columns and 54 observations. They are, quarterly Residential Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria between September 2003 and December 2016.
Our aim for the given dataset is to analyse whether the correlation between the quarterly Property Price Index (PPI) in Melbourne and quarterly population change over the previous quarter in Victoria is spurious or not.
data <- read.csv("C:/Users/admin/Downloads/data2.csv")
data.ts = ts(data, start=c(2003,3),frequency = 4)
head(data.ts)
## Quarter price change
## 2003 Q3 41 60.7 14017
## 2003 Q4 1 62.1 12350
## 2004 Q1 28 60.8 17894
## 2004 Q2 15 60.9 9079
## 2004 Q3 42 60.9 16210
## 2004 Q4 2 62.4 13788
The response, \(Y\) = {\(Y_t\)} is the Price Index (PPI) and the co variate time series, \(X\) = {\(X_t\)} is Population Change that we hope will explain \(Y\).
Read Response and Covariate TS -
data_PPI = data.ts[,2]
head(data_PPI) #Response TS
## Qtr1 Qtr2 Qtr3 Qtr4
## 2003 60.7 62.1
## 2004 60.8 60.9 60.9 62.4
data_pchange = data.ts[,3]
head(data_pchange) #Covariate TS
## Qtr1 Qtr2 Qtr3 Qtr4
## 2003 14017 12350
## 2004 17894 9079 16210 13788
data.joint=ts.intersect(data_PPI,data_pchange) # Response and Covariate TS together
head(data.joint)
## data_PPI data_pchange
## 2003 Q3 60.7 14017
## 2003 Q4 62.1 12350
## 2004 Q1 60.8 17894
## 2004 Q2 60.9 9079
## 2004 Q3 60.9 16210
## 2004 Q4 62.4 13788
The sample cross-correlation function (sample CCF) defined by,
\(r_k(X,Y) = \frac{\sum (X_t - \overline{X})(Y_t - \overline{Y})}{\sqrt{\sum (X_t - \overline{X})^2}\sqrt{\sum (Y_t - \overline{Y})^2}}\)
Sample cross-correlations that are larger than 1.96/n in magnitude are then deemed significantly different from zero.
Cross-correlations or Spurious correlations may arise due to the non-stationarity of the interaction Time series. Lets check nonstationarity in the response and covariate series by analyzing the time series plot and testing statistically as well,
Analysing Time Series plots for non-stationarity -
plot(data.joint,yax.flip=T)
Time series plots of both PPI and Population Change suggest non-stationarity as there is a clear upward trend in both of these time series. Lets confirm non-stationarity in both the response and covariate TS statistically,
Test Stationarity of Response and Covariate time series
-
Lets use ADF (Augmented Dickey-Fuller) test. The test hypothesis are,
\(H_0\) : Time series is Difference
non-stationary
\(H_a\) : Time
series is Stationary
adf.test(data.joint[,1]) # For Property Price Index (PPI)
##
## Augmented Dickey-Fuller Test
##
## data: data.joint[, 1]
## Dickey-Fuller = -1.3264, Lag order = 3, p-value = 0.8458
## alternative hypothesis: stationary
adf.test(data.joint[,2]) # For Population Change
##
## Augmented Dickey-Fuller Test
##
## data: data.joint[, 2]
## Dickey-Fuller = -1.603, Lag order = 3, p-value = 0.7344
## alternative hypothesis: stationary
For both response and covariate TS, since p-value > 0.05, we fail to reject null hypothesis of non stationarity. We can conclude that the series is non stationary at 5% level of significance.
Thus if we see significant correlation between PPI and Population change series, it is likely due the spurious correlation due the non-stationarity of the interacting variables. We can check this by looking at the cross-correlations between PPI response and Population change covariate time series using CCF(),
Sample CCF plot -
ccf(as.vector(data.joint[,1]), as.vector(data.joint[,2]),ylab='CCF', main = "Sample CCF between PPI and population change")
Nearly all of the cross-correlations are significantly different from zero according to the 1.96/n rule. The nonstationarity in the PPI series and the Population change series is more likely the cause of the spurious correlations found between the two series.
Lets remove the spurious correlation between PPI and population change series by the process of prewhitening which is an approach to disentangle the linear association between PPI and Population change series from their autocorrelation.
An approximate Prewhitening can be achieved by differencing (hence, eliminating non-stationarity in these 2 series). Further prewhitening can be done using the prewhiten() which transforms one of the two series such that it is uncorrelated (white noise).
Since we have seasonal component in our response and covariate time series (not testing, rather just assuming as the data is quarterly), lets perform ordinary and seasonal differencing on both PPI and Population change series and see if the cross-correlations are removed,
# Code gist - Inner diff(TS,freq) does seasonal differencing. Outer diff(TS) does ordinary differencing
data.dif=ts.intersect(diff(diff(data_PPI,4)),diff(diff(data_pchange,4))) # Note quarterly frequency of 4 is used in seasonal differencing
ccf(as.vector(data.dif[,1]), as.vector(data.dif[,2]),ylab='CCF', main = "Sample CCF between PPI and population change after differencing")
From the sample CCF plot, we notice still some
cross-correlations are significant after differencing.
Next, lets further prewhiten the differenced series using prewhiten() and check the sample CCF plot,
prewhiten(as.vector(data.dif[,1]),as.vector(data.dif[,2]),ylab='CCF', main="Sample CFF after prewhitening") # library(TSA)
From the plot above, all the cross-correlations are insignificant according to the 1.96/n rule. Hence, spurious correlation has been eliminated from PPI and Population change series.
Thus, it seems that Property Price Index (PPI) and Population change are in fact largely uncorrelated, and the strong cross-correlations found between the raw data series is indeed spurious.
Rob Hyndman (2013) Does the Holt-Winters algorithm for exponential smoothing in time series modelling require the normality assumption in residuals?, Stack Exchange Website, accessed 26 September 2023. https://stats.stackexchange.com/questions/64911/does-the-holt-winters-algorithm-for-exponential-smoothing-in-time-series-modelli#:~:text=There%20is%20no%20normality%20assumption,under%20almost%20all%20residual%20distributions.