The objective of this forecast is to develop a prediction model of energy demand (MWh/day) in Victoria, Australia. The model is developed based on daily energy demand from January 1st 2015 until October 6th 2020.
The data is obtained from this link https://www.kaggle.com/aramacus/electricity-demand-in-victoria-australia
Add several libraries
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
## 'data.frame': 2106 obs. of 14 variables:
## $ date : chr "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" ...
## $ demand : num 99635 129606 142301 104331 118132 ...
## $ RRP : num 25.6 33.1 34.6 25 26.7 ...
## $ demand_pos_RRP : num 97319 121082 142301 104331 118132 ...
## $ RRP_positive : num 26.4 38.8 34.6 25 26.7 ...
## $ demand_neg_RRP : num 2316 8524 0 0 0 ...
## $ RRP_negative : num -7.24 -47.81 0 0 0 ...
## $ frac_at_neg_RRP: num 0.0208 0.0625 0 0 0 ...
## $ min_temperature: num 13.3 15.4 20 16.3 15 17.7 18.9 23.1 16.5 13.6 ...
## $ max_temperature: num 26.9 38.8 38.2 21.4 22 26 37.4 28.2 18 21.7 ...
## $ solar_exposure : num 23.6 26.8 26.5 25.2 30.7 31.6 20.7 13.5 3.1 5.6 ...
## $ rainfall : num 0 0 0 4.2 0 0 0 19.4 1.2 5.2 ...
## $ school_day : chr "N" "N" "N" "N" ...
## $ holiday : chr "Y" "N" "N" "N" ...
## date demand
## 1 2015-01-01 99635.03
## 2 2015-01-02 129606.01
## 3 2015-01-03 142300.54
## 4 2015-01-04 104330.71
## 5 2015-01-05 118132.20
## 6 2015-01-06 130672.48
## date demand
## 2101 2020-10-01 106641.79
## 2102 2020-10-02 99585.83
## 2103 2020-10-03 92277.02
## 2104 2020-10-04 94081.56
## 2105 2020-10-05 113610.03
## 2106 2020-10-06 122607.56
Weekly frequency
According to the trend plot, the pattern is still up and down, meaning there is multiseasonal time series object. Therefore, the time series object needs to be reanalyzed through multiseasonal time series
energy$demand %>%
msts(seasonal.periods = c(7,30)) %>% # multiseasonal ts (weekly, monthly)
mstl() %>%
autoplot() energy$demand %>%
msts(seasonal.periods = c(30,90)) %>% # multiseasonal ts (montly, quarterly)
mstl() %>%
autoplot() energy$demand %>%
msts(seasonal.periods = c(90,180)) %>% # multiseasonal ts (quarterly, biannually)
mstl() %>%
autoplot()energy$demand %>%
msts(seasonal.periods = c(180,365)) %>% # multiseasonal ts (biannually, annualy)
mstl() %>%
autoplot() The last time series object showed the best decomposition among the other trials and therefore will be used for the time series model building.
Insight
Stationary check
## Warning in adf.test(energy_msts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: energy_msts
## Dickey-Fuller = -5.289, Lag order = 12, p-value = 0.01
## alternative hypothesis: stationary
Yes the insight point one is proven by statistical test that the data is stationary
The test set is prepared for the last 365 days data
Two models are compared in order to get the lowest error, the models are Exponential Smoothing and ARIMA (Autoregresive Integrated Moving Average)
model_ets <- stlm(y = energy_train, method = "ets", lambda = 0)
model_arima <- stlm(y = energy_train, method = "arima", lambda = 0)## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -2425.216 13900.34 10672.47 -3.132033 9.496329 0.5103473 1.308613
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -2456.324 13897.22 10662.31 -3.158258 9.486504 0.5094716 1.309062
Insight
ARIMA model is best in forecasting the energy demand, with the MAPE value of 9.48% or below 10%, meaning the model is Excellent
##
## Shapiro-Wilk normality test
##
## data: arima_forecast$residuals
## W = 0.9977, p-value = 0.01324
p value < 0.05, meaning the errors are not normally distributed
The model will be improved by transforming the data using Box Cox method
model_ets_2 <- stlm(y = energy_train, method = "ets", lambda = "auto")
model_arima_2 <- stlm(y = energy_train, method = "arima", lambda = "auto")ets_forecast_2 <- forecast(model_ets_2, h = 365)
arima_forecast_2 <- forecast(model_arima_2, h = 365)## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -7647.752 15389.14 11990.95 -7.741418 10.96011 0.5212872 1.496441
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -2577.214 13696.59 10615.34 -3.283028 9.463238 0.5206156 1.300609
Insight
After Box Cox Transformation, the model has improved its MAPE slightly by 0.02%
##
## Shapiro-Wilk normality test
##
## data: arima_forecast_2$residuals
## W = 0.99191, p-value = 3.264e-08
##
## Box-Ljung test
##
## data: arima_forecast_2$residuals
## X-squared = 0.069716, df = 1, p-value = 0.7918
Shapiro test : p value < 0.05, meaning the errors are not normally distributed
Box Test : p value > 0.05, meaning the error has no-autocorrelation with other
Improved ARIMA
test_forecast(actual = energy_msts, forecast.obj = arima_forecast_2, train = energy_train, test = energy_test)## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Visualisation on the last 100 data train
energy_train %>%
tail(100)%>%
autoplot()+
autolayer(tail(ets_forecast_2$fitted,100), series = "ets") +
autolayer(tail(arima_forecast_2$fitted,100),series = "arima")