Analysis and Forecast of Historical Solar Radiation Data

Task 1
Introduction
Descriptive Statistics
Time Series Plots
ACF and PACF plot
Decomposition
Time Series Regression Modelling
Exponential Smoothing Models
State Space Models
Forecasting
Conclusion
References

Task 1

Introduction

In this study, We aim to analyse and forecast horizontal solar radiation reaching the Earth’s surface at specific global locations using monthly average solar radiation and precipitation data spanning from January 1960 to December 2014. The goal is to provide accurate two-year ahead forecasts based on the most suited model. We’ll explore various time series regression methods, including dLagM, exponential smoothing and the corresponding state-space models and measure performance based on MASE, AIC and residual check.

data <- read_csv("data1.csv")

## Rows: 660 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): solar, ppt
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(data)

## # A tibble: 6 × 2
##   solar   ppt
##   <dbl> <dbl>
## 1  5.05 1.33 
## 2  6.42 0.921
## 3 10.8  0.947
## 4 16.9  0.615
## 5 24.0  0.544
## 6 26.3  0.703

Descriptive Statistics

summary(data)

##      solar             ppt        
##  Min.   : 1.582   Min.   :0.0010  
##  1st Qu.: 9.739   1st Qu.:0.2192  
##  Median :16.460   Median :0.4153  
##  Mean   :17.788   Mean   :0.4686  
##  3rd Qu.:23.711   3rd Qu.:0.6251  
##  Max.   :54.056   Max.   :2.7213

solar.ts = ts(data$solar,start=c(1960,1), frequency=12)
head(solar.ts)

##            Jan       Feb       Mar       Apr       May       Jun
## 1960  5.051729  6.415832 10.847920 16.930264 24.030797 26.298202

ppt.ts= ts(data$ppt,start=c(1960,1), frequency=12)
head(ppt.ts)

##        Jan   Feb   Mar   Apr   May   Jun
## 1960 1.333 0.921 0.947 0.615 0.544 0.703

Time Series Plots

plot(solar.ts, main = " Solar Radiation Series", ylab = "Amount of solar radiation", xlab = "Time")
points(solar.ts,x=time(solar.ts), pch=as.vector(season(solar.ts)),col=rainbow(12))

The Solar radiation series plot shows no apparent trend for the given time period. There is obvious seasonality in the series, changing variance is not apparent and there is no intervention point between 1960 and 2014.

plot(ppt.ts, main = " Precipitation Series", ylab = "Amount of Precipitation", xlab = "Time")
points(ppt.ts,x=time(ppt.ts), pch=as.vector(season(ppt.ts)),col=rainbow(12))

The Precipitation series plot shows no apparent trend for the given time period. There is obvious seasonality in the series, changing variance is not apparent and there is no intervention point between 1960 and 2014.

data.ts = ts(data, start = 1960, frequency = 12) #converting data to time series object
data.scale=scale(data.ts) #scaling data

plot(data.scale, plot.type="s", col=c("gold","black"),
     main = " All Series", xlab="Year", ylab="Scaled Data")
legend ("topleft", lty =1, text.width=1.5, col=c("gold","black"),
        c('Solar Radiation', 'Precipitation'))

ACF and PACF plot

par(mfrow=c(1,2)) 
par(mar=c(4,4,4,4))
acf(solar.ts, main = " ACF for Solar Radiation series", )
pacf(solar.ts, main = "PACF for Solar Radiation series")

par(mfrow=c(1,2)) 
par(mar=c(4,4,4,4))
acf(ppt.ts, main = " ACF for Precipitation series", )
pacf(ppt.ts, main = " PACF for Precipitation series")

With respect to both ACF plots. We observe positive values which gradually decrease as the lags increase. This pattern suggests the presence of a trend in the series. Additionally, the wave like pattern strongly suggests the presence of seasonality.

k = ar(solar.ts)$order # length of lag for ADF
adf.test(solar.ts,k=k)

## Warning in adf.test(solar.ts, k = k): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  solar.ts
## Dickey-Fuller = -4.557, Lag order = 25, p-value = 0.01
## alternative hypothesis: stationary

k = ar(ppt.ts)$order # length of lag for ADF
adf.test(ppt.ts,k=k)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ppt.ts
## Dickey-Fuller = -3.2594, Lag order = 28, p-value = 0.07769
## alternative hypothesis: stationary

The ADF test for both series have a P-value < 0.05 therefore we reject the H0 (null hypothesis) and conclude all series are stationary at 5% level.

Decomposition

solar.x12 <- x12(solar.ts)
plot(solar.x12, sa= TRUE, trend= TRUE, main= "X12 Decomposition of Solar Radiation series")

ppt.x12 <- x12(ppt.ts)
plot(ppt.x12, sa= TRUE, trend= TRUE, main= "X12 Decomposition of Precipitation series")

The X12 decomposition for both series shows the seasonally adjusted and trend lines have different heights from the original data for most of the plot. This suggests both components have influence over the original series.

Time Series Regression Modelling

for (i in 1:10){
  model1 = dlm(x = as.vector(ppt.ts), y = as.vector(solar.ts), q = i)
  cat("q = ", i, "AIC = ", AIC(model1$model), "BIC = ", BIC(model1$model),
      "MASE =", MASE(model1)$MASE , "\n")
}

## q =  1 AIC =  4728.713 BIC =  4746.676 MASE = 1.688457 
## q =  2 AIC =  4712.649 BIC =  4735.095 MASE = 1.675967 
## q =  3 AIC =  4688.551 BIC =  4715.478 MASE = 1.662703 
## q =  4 AIC =  4663.6 BIC =  4695.003 MASE = 1.646357 
## q =  5 AIC =  4644.622 BIC =  4680.499 MASE = 1.613848 
## q =  6 AIC =  4637.489 BIC =  4677.837 MASE = 1.607532 
## q =  7 AIC =  4632.716 BIC =  4677.532 MASE = 1.607042 
## q =  8 AIC =  4625.986 BIC =  4675.267 MASE = 1.604806 
## q =  9 AIC =  4615.084 BIC =  4668.827 MASE = 1.593121 
## q =  10 AIC =  4602.658 BIC =  4660.858 MASE = 1.577996

The lag length q=10 denotes the lowest AIC ,BIC values and MASE values.

model1 = dlm(x = as.vector(ppt.ts), y = as.vector(solar.ts), q =10)
AIC_value <- AIC(model1)

## [1] 4602.658

BIC_value <- BIC(model1)

## [1] 4660.858

MASE_value <- MASE(model1)

model_accuracy <- data.frame(MASE = MASE_value,AIC = AIC_value, BIC = BIC_value)
model_accuracy

##            MASE      AIC      BIC
## model1 1.577996 4602.658 4660.858

model2 = polyDlm(x = as.vector(ppt.ts), y = as.vector(solar.ts), q = 10, k = 2, show.beta = TRUE)

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0    -5.040      0.435 -11.600 2.39e-28
## beta.1    -2.320      0.314  -7.400 4.33e-13
## beta.2    -0.188      0.251  -0.749 4.54e-01
## beta.3     1.370      0.237   5.770 1.23e-08
## beta.4     2.340      0.243   9.600 1.74e-20
## beta.5     2.720      0.248  11.000 7.32e-26
## beta.6     2.530      0.243  10.400 1.40e-23
## beta.7     1.750      0.235   7.430 3.39e-13
## beta.8     0.387      0.249   1.560 1.20e-01
## beta.9    -1.560      0.312  -4.990 7.68e-07
## beta.10   -4.090      0.433  -9.440 7.08e-20

AIC_value <- AIC(model2)

## [1] 4591.904

BIC_value <- BIC(model2)

## [1] 4614.289

MASE_value <- MASE(model2)

model_accuracy <- data.frame(MASE = MASE_value,AIC = AIC_value, BIC = BIC_value)
model_accuracy

##            MASE      AIC      BIC
## model2 1.592555 4591.904 4614.289

model3 = koyckDlm(x = as.vector(ppt.ts), y = as.vector(solar.ts))
AIC_value <- AIC(model3)

## [1] 3946.476

BIC_value <- BIC(model3)

## [1] 3964.439

MASE_value <- MASE(model3)

model_accuracy <- data.frame(MASE = MASE_value,AIC = AIC_value, BIC = BIC_value)
model_accuracy

##            MASE      AIC      BIC
## model3 1.032483 3946.476 3964.439

for(i in 1:5){
  for(j in 1:5){
    model4 = ardlDlm(x = as.vector(ppt.ts), y = as.vector(solar.ts), p = i, q = j)
    cat("p = ", i, "q = ", j, "AIC = ", AIC(model4$model), "BIC = ", BIC(model4$model),"MASE =", MASE(model4)$MASE, "\n")
  }
}

## p =  1 q =  1 AIC =  3712.311 BIC =  3734.765 MASE = 0.8392434 
## p =  1 q =  2 AIC =  3239.416 BIC =  3266.352 MASE = 0.4971918 
## p =  1 q =  3 AIC =  3143.522 BIC =  3174.936 MASE = 0.4740063 
## p =  1 q =  4 AIC =  3138.399 BIC =  3174.288 MASE = 0.4697571 
## p =  1 q =  5 AIC =  3100.283 BIC =  3140.644 MASE = 0.450425 
## p =  2 q =  1 AIC =  3639.223 BIC =  3666.159 MASE = 0.7834855 
## p =  2 q =  2 AIC =  3229.051 BIC =  3260.476 MASE = 0.4951319 
## p =  2 q =  3 AIC =  3137.634 BIC =  3173.535 MASE = 0.4738939 
## p =  2 q =  4 AIC =  3132.962 BIC =  3173.337 MASE = 0.4702773 
## p =  2 q =  5 AIC =  3097.288 BIC =  3142.134 MASE = 0.4503599 
## p =  3 q =  1 AIC =  3608.793 BIC =  3640.207 MASE = 0.7572489 
## p =  3 q =  2 AIC =  3226.623 BIC =  3262.524 MASE = 0.4955334 
## p =  3 q =  3 AIC =  3139.409 BIC =  3179.798 MASE = 0.4737144 
## p =  3 q =  4 AIC =  3134.777 BIC =  3179.638 MASE = 0.4701162 
## p =  3 q =  5 AIC =  3098.808 BIC =  3148.139 MASE = 0.4502885 
## p =  4 q =  1 AIC =  3602.664 BIC =  3638.553 MASE = 0.7580664 
## p =  4 q =  2 AIC =  3224.285 BIC =  3264.66 MASE = 0.4959949 
## p =  4 q =  3 AIC =  3131.289 BIC =  3176.15 MASE = 0.4695096 
## p =  4 q =  4 AIC =  3131.424 BIC =  3180.772 MASE = 0.4665123 
## p =  4 q =  5 AIC =  3096.024 BIC =  3149.839 MASE = 0.4479481 
## p =  5 q =  1 AIC =  3599.402 BIC =  3639.764 MASE = 0.7572617 
## p =  5 q =  2 AIC =  3221.853 BIC =  3266.699 MASE = 0.4954501 
## p =  5 q =  3 AIC =  3127.103 BIC =  3176.434 MASE = 0.4675479 
## p =  5 q =  4 AIC =  3127.868 BIC =  3181.684 MASE = 0.4651969 
## p =  5 q =  5 AIC =  3097.877 BIC =  3156.177 MASE = 0.4479311

MASE and AIC are the lowest at p=4 and q=5.

model4 = ardlDlm(x = as.vector(ppt.ts), y = as.vector(solar.ts), p = 4, q = 5)
AIC_value <- AIC(model4)

## [1] 3096.024

BIC_value <- BIC(model4)

## [1] 3149.839

MASE_value <- MASE(model4)

model_accuracy <- data.frame(MASE = MASE_value,AIC = AIC_value, BIC = BIC_value)
model_accuracy

##             MASE      AIC      BIC
## model4 0.4479481 3096.024 3149.839

models <- c("Finite DLM", "Poly DLM", "Koyck", "ARDL")

mase_values <- MASE(model1, model2, model3, model4)

#  empty vectors to store AIC, BIC
aic_values <- numeric(length(models))
bic_values <- numeric(length(models))

# list of  model 
model_list <- list(model1, model2, model3, model4)

# Loop to calculate AIC & BIC for each model
for (i in 1:length(models)) {
  aic_values[i] <- AIC(model_list[[i]])
  bic_values[i] <- BIC(model_list[[i]])
}

## [1] 4602.658
## [1] 4660.858
## [1] 4591.904
## [1] 4614.289
## [1] 3946.476
## [1] 3964.439
## [1] 3096.024
## [1] 3149.839

results <- data.frame(ModelName = models, MASE = mase_values$MASE, AIC = aic_values, BIC = bic_values)
head(results)

##    ModelName      MASE      AIC      BIC
## 1 Finite DLM 1.5779955 4602.658 4660.858
## 2   Poly DLM 1.5925546 4591.904 4614.289
## 3      Koyck 1.0324829 3946.476 3964.439
## 4       ARDL 0.4479481 3096.024 3149.839

The ARDLM has the lowest MASE,AIC and BIC values therefore we further check the residuals and summary of the the model.

summary(model4)

## 
## Time series regression with "ts" data:
## Start = 6, End = 660
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5901  -1.3826  -0.2867   1.0526  18.5709 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.46103    0.43729   5.628 2.72e-08 ***
## X.t         -0.61439    0.54768  -1.122 0.262364    
## X.1          0.78469    0.77617   1.011 0.312409    
## X.2          1.28470    0.79026   1.626 0.104509    
## X.3          0.80457    0.77946   1.032 0.302355    
## X.4         -1.20750    0.55570  -2.173 0.030148 *  
## Y.1          1.27195    0.03849  33.049  < 2e-16 ***
## Y.2         -0.01868    0.06249  -0.299 0.765112    
## Y.3         -0.40480    0.06019  -6.725 3.87e-11 ***
## Y.4         -0.23201    0.06221  -3.729 0.000209 ***
## Y.5          0.21746    0.03772   5.766 1.26e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.546 on 644 degrees of freedom
## Multiple R-squared:  0.9338, Adjusted R-squared:  0.9328 
## F-statistic: 908.5 on 10 and 644 DF,  p-value: < 2.2e-16

checkresiduals(model4$model)

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 103.8, df = 14, p-value = 8.811e-16

From the ARDLM Model summary and residual diagnostics we observe:

Adjust R-squared is high at 0.9328 which means it only explains 93.28% of the variability.
The p-value: 2.2e-16 is < 0.05 therefore the model is significant.
The mean of the residuals is close to zero and there is no significant correlation in the residuals series.
ACF plot indicates presence of seasonality.
Histogram is normally distributed.
Breusch-Godfrey test p-value < 0.05 therefore there is significant autocorrelation remaining in the residuals.

Exponential Smoothing Models

Since there is presence of seasonality in the Solar Radiation series we will use Holt Winters modelling.

#  Defined options for seasonal and damped
seasonal_options <- c("multiplicative", "additive")
damped_options <- c(FALSE, TRUE)


results <- list()
mase_values <- numeric()
model_names <- character()
aic_values <- numeric()
bic_values <- numeric()
# Loop  all combinations of seasonal and damped
for (seasonal in seasonal_options) {
  for (damped in damped_options) {
    args <- list(seasonal = seasonal, damped = damped)
    result <- hw(solar.ts, seasonal = args$seasonal, damped = args$damped, h=2*12)
    sm <- summary(result)
    mase <- accuracy(sm)
    mase <- mase[,6]
    aic <- sm$model$aic
    bic <- sm$model$bic
    model_name <- sm$method
      # Store the values in the vectors
    mase_values <- c(mase_values, mase)
    model_names <- c(model_names, model_name)
    aic_values <- c(aic_values, aic)
    bic_values <- c(bic_values, bic)
  }
}

# Data frame to display the results
results_table2 <- data.frame(
  ModelName = model_names,
  MASE = mase_values,
  AIC = aic_values,
  BIC = bic_values
)

# Display the all exponential smoothing model metrics in a table
print(results_table2)

##                                    ModelName      MASE      AIC      BIC
## 1        Holt-Winters' multiplicative method 0.2233077 6648.746 6725.114
## 2 Damped Holt-Winters' multiplicative method 0.2035619 6366.701 6447.561
## 3              Holt-Winters' additive method 0.2471600 5434.708 5511.076
## 4       Damped Holt-Winters' additive method 0.2461797 5428.422 5509.282

The Damped Holt-Winters’ multiplicative method has the lowest MASE therefore we further check the residuals and summary of the the model.

hw.mult.damped.model = hw(solar.ts, seasonal = "multiplicative", damped = TRUE, h=2*12)
summary(hw.mult.damped.model)

## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = solar.ts, h = 2 * 12, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.805 
##     beta  = 1e-04 
##     gamma = 0.0124 
##     phi   = 0.94 
## 
##   Initial states:
##     l = 9.9033 
##     b = 0.4703 
##     s = 0.4341 0.5649 0.8263 1.1716 1.4514 1.6316
##            1.5503 1.3834 1.1045 0.8837 0.5761 0.4221
## 
##   sigma:  0.3145
## 
##      AIC     AICc      BIC 
## 6366.701 6367.768 6447.561 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.0456208 2.045442 1.239102 -2.172476 10.00296 0.2035619
##                    ACF1
## Training set 0.04177493
## 
## Forecasts:
##          Point Forecast       Lo 80     Hi 80       Lo 95     Hi 95
## Jan 2015       5.611616   3.3502090  7.873023   2.1530925  9.070139
## Feb 2015       7.060438   3.3373603 10.783517   1.3664818 12.754395
## Mar 2015      10.515176   3.8550902 17.175263   0.3294535 20.700899
## Apr 2015      12.993446   3.5145037 22.472389  -1.5033454 27.490238
## May 2015      16.284381   2.9386340 29.630128  -4.1261776 36.694939
## Jun 2015      18.261020   1.7242212 34.797820  -7.0298315 43.551872
## Jul 2015      19.031597   0.2101276 37.853067  -9.7533565 47.816551
## Aug 2015      17.105654  -1.2074754 35.418784 -10.9018606 45.113169
## Sep 2015      13.764743  -2.0801991 29.609686 -10.4680049 37.997491
## Oct 2015       9.774733  -2.2585804 21.808047  -8.6286319 28.178099
## Nov 2015       6.581938  -2.0456267 15.209503  -6.6127835 19.776659
## Dec 2015       5.163049  -2.0168711 12.342970  -5.8176914 16.143790
## Jan 2016       5.616270  -2.6571166 13.889656  -7.0367828 18.269322
## Feb 2016       7.066360  -3.9144360 18.047156  -9.7273184 23.860039
## Mar 2016      10.524088  -6.6891829 27.737360 -15.8013383 36.849515
## Apr 2016      13.004566  -9.3405081 35.349641 -21.1692761 47.178409
## May 2016      16.298445 -13.0720043 45.668893 -28.6197807 61.216670
## Jun 2016      18.276925 -16.2140342 52.767885 -34.4724452 71.026296
## Jul 2016      19.048304 -18.5466197 56.643228 -38.4481704 76.544779
## Aug 2016      17.120781 -18.1782620 52.419825 -36.8644468 71.106009
## Sep 2016      13.777000 -15.8651593 43.419159 -31.5567705 59.110770
## Oct 2016       9.783493 -12.1627698 31.729756 -23.7804195 43.347405
## Nov 2016       6.587872  -8.8063951 21.982139 -16.9556278 30.131371
## Dec 2016       5.167730  -7.4021686 17.737629 -14.0562710 24.391731

checkresiduals(hw.mult.damped.model)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 23.57, df = 24, p-value = 0.4864
## 
## Model df: 0.   Total lags used: 24

From the Damped Holt-Winters’ multiplicative summary and residual diagnostics we observe:

The mean of the residuals is close to zero and there is no significant correlation in the residuals series.
ACF plot indicates presence of significant lags at lags 12 and 24 .
Histogram is not normally distributed.
Damped Holt-Winters’ multiplicative method has a mase value of 0.2035619 which is much smaller compared to ARDLM model therefore it can provide better forecast for the Solar Radiation series.

State Space Models

# Defined options for ETS
etsmodel_options <- c("AAA", "MAA", "MAM", "MMM")
damped_options <- c(FALSE, TRUE)

# Empty list to store the results
results <- list()
mase_values <- numeric()
model_names <- character()
aic_values <- numeric()
bic_values <- numeric()
# Loop through all combinations of ETS
for (etsmodel in etsmodel_options) {
  for (damped in damped_options) {
    args <- list(etsmodel = etsmodel, damped = damped)
    result <- ets(solar.ts, model = args$etsmodel, damped = args$damped)
    sm <- summary(result)
    mase <- accuracy(sm)
    mase <- mase[,6]
    aic <- sm$aic
    bic <- sm$bic
    model_name <- sm$method
      # Store values in the vectors
    mase_values <- c(mase_values, mase)
    model_names <- c(model_names, model_name)
    aic_values <- c(aic_values, aic)
    bic_values <- c(bic_values, bic)

  }
}

# Data frame to display the results
results_table3 <- data.frame(
  ModelName = model_names,
  MASE = mase_values,
  AIC = aic_values,
  BIC = bic_values
)
# ETS auto model
Ets.auto.model <- ets(solar.ts)
sm<-summary(Ets.auto.model)
#ETS auto function detects the following model:
sm$method

## [1] "ETS(A,Ad,A)"

# Display all ETS model metrics 
print(results_table3)

##     ModelName      MASE      AIC      BIC
## 1  ETS(A,A,A) 0.2471600 5434.708 5511.076
## 2 ETS(A,Ad,A) 0.2461797 5428.422 5509.282
## 3  ETS(M,A,A) 0.4748561 7602.755 7679.123
## 4 ETS(M,Ad,A) 0.3798095 6469.079 6549.940
## 5  ETS(M,A,M) 0.3721664 6105.959 6182.327
## 6 ETS(M,Ad,M) 0.3222574 5953.502 6034.363
## 7  ETS(M,M,M) 0.5292151 6670.168 6746.536
## 8 ETS(M,Md,M) 0.3201193 5995.550 6076.410

The ETS(A,Ad,A) and ETS(A,A,A) have the lowest MASE, AIC and BIC values respectively, therefore we further check the residuals and summary of the model.

  ets.AAA.model <- ets(solar.ts, model = "AAA")
  summary(ets.AAA.model)

## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = solar.ts, model = "AAA") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724

  checkresiduals( ets.AAA.model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

From the state space model AAA summary and residual diagnostics we observe:

The mean of the residuals is close to zero and there is no significant correlation in the residuals series.
ACF plot indicates presence of significant lags and the presence of seasonality .
Histogram is normally distributed.
The mase value is higher than the Damped Holt-Winters’ multiplicative Model.

  ets.AAdA.model <- ets(solar.ts, model = "AAA", damped = TRUE)
  summary(ets.AAdA.model)

## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = solar.ts, model = "AAA", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9388 
## 
##   Initial states:
##     l = 11.154 
##     b = 0.7632 
##     s = -10.4919 -8.137 -3.348 2.5794 8.08 11.1219
##            9.9586 6.9916 1.9573 -1.8565 -7.1607 -9.6946
## 
##   sigma:  2.3446
## 
##      AIC     AICc      BIC 
## 5428.422 5429.489 5509.282 
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.01091357 2.314163 1.498521 -1.468083 12.44796 0.2461797
##                   ACF1
## Training set 0.1700724

  checkresiduals(ets.AAdA.model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 210.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

From the state space model AAA Damped summary and residual diagnostics we observe:

The mean of the residuals is close to zero and there is no significant correlation in the residuals series.
ACF plot indicates presence of significant lags and the presence of seasonality .
Histogram is normally distributed.
The mase value is higher than the Damped Holt-Winters’ multiplicative Model.

All_models<- bind_rows(results, results_table2, results_table3)
All_models <- arrange(All_models,MASE)
All_models

##                                     ModelName      MASE      AIC      BIC
## 1  Damped Holt-Winters' multiplicative method 0.2035619 6366.701 6447.561
## 2         Holt-Winters' multiplicative method 0.2233077 6648.746 6725.114
## 3        Damped Holt-Winters' additive method 0.2461797 5428.422 5509.282
## 4                                 ETS(A,Ad,A) 0.2461797 5428.422 5509.282
## 5               Holt-Winters' additive method 0.2471600 5434.708 5511.076
## 6                                  ETS(A,A,A) 0.2471600 5434.708 5511.076
## 7                                 ETS(M,Md,M) 0.3201193 5995.550 6076.410
## 8                                 ETS(M,Ad,M) 0.3222574 5953.502 6034.363
## 9                                  ETS(M,A,M) 0.3721664 6105.959 6182.327
## 10                                ETS(M,Ad,A) 0.3798095 6469.079 6549.940
## 11                                 ETS(M,A,A) 0.4748561 7602.755 7679.123
## 12                                 ETS(M,M,M) 0.5292151 6670.168 6746.536

We observe the Damped Holt-Winters’ multiplicative method has the best residual check in comparison to other models and has the lowest MASE value which is the main performance metrics to determine the best model for forecasting.

Forecasting

We use the Damped Holt-Winters’ multiplicative to model the forecast.

upper.95.int = hw.mult.damped.model$upper[,2] #CI upper bound
lower.95.int = hw.mult.damped.model$lower[,2] #CI lower bound 
autoplot(solar.ts) +
  autolayer(hw.mult.damped.model, series="Damped Holt-Winters' multiplicative method", 
    PI=FALSE) +
  autolayer(upper.95.int, series="95% confidence interval",  
    PI=FALSE) +
   autolayer(lower.95.int,series="95% confidence interval",
    PI=FALSE) +
  scale_color_manual(values = c("Damped Holt-Winters' multiplicative method" = "red",
                                "95% confidence interval" = "blue")) +
  xlab("Year") +
  ylab("Solar Radiation") +
  ggtitle("Original series, forecasts and 95% forecast interval for the Solar Radiation Series") +
  guides(colour=guide_legend(title="Forecast"))

The red line in the plot represents the forecast for the next 2 years, while the blue lines indicate the 95% confidence intervals. The Damped Holt-Winters’ multiplicative method uses an alpha value of 0.805, which places more weight on the most recent data points from the original solar radiation series when generating the forecast. As a result, the forecast closely aligns with the most recent years in the data.

Conclusion

Conclusively, the Damped Holt-Winters’ multiplicative method is the most suited model for the 2-year forecast, supported by the lowest MASE value and observed residual checks.

References

[1]Forecasting: Principles and practice (2nd) 7.3 Holt-Winters’ seasonal method. Available at: https://otexts.com/fpp2/holt-winters.html (Accessed: 16 September 2023).

[2]Hudson, I 2023, lecture and lab notes, Forecasting, RMIT University, Melbourne