Objective

The objective of this forecast is to develop a prediction model of energy demand (MWh/day) in Victoria, Australia. The model is developed based on daily energy demand from January 1st 2015 until October 6th 2020.

Data Source

The data is obtained from this link https://www.kaggle.com/aramacus/electricity-demand-in-victoria-australia

Library

Add several libraries

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(TTR)
library(fpp)
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: tseries
library(tseries)
library(TSstudio)
library(padr)

Data Preprocessing

Data Input

energy <- read.csv("complete_dataset.csv")
str(energy)
## 'data.frame':    2106 obs. of  14 variables:
##  $ date           : chr  "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" ...
##  $ demand         : num  99635 129606 142301 104331 118132 ...
##  $ RRP            : num  25.6 33.1 34.6 25 26.7 ...
##  $ demand_pos_RRP : num  97319 121082 142301 104331 118132 ...
##  $ RRP_positive   : num  26.4 38.8 34.6 25 26.7 ...
##  $ demand_neg_RRP : num  2316 8524 0 0 0 ...
##  $ RRP_negative   : num  -7.24 -47.81 0 0 0 ...
##  $ frac_at_neg_RRP: num  0.0208 0.0625 0 0 0 ...
##  $ min_temperature: num  13.3 15.4 20 16.3 15 17.7 18.9 23.1 16.5 13.6 ...
##  $ max_temperature: num  26.9 38.8 38.2 21.4 22 26 37.4 28.2 18 21.7 ...
##  $ solar_exposure : num  23.6 26.8 26.5 25.2 30.7 31.6 20.7 13.5 3.1 5.6 ...
##  $ rainfall       : num  0 0 0 4.2 0 0 0 19.4 1.2 5.2 ...
##  $ school_day     : chr  "N" "N" "N" "N" ...
##  $ holiday        : chr  "Y" "N" "N" "N" ...

Data Check

Changing data type

 energy <- energy %>%
  select(date, demand)%>%
  mutate(date = as.Date(date))

head(energy)
##         date    demand
## 1 2015-01-01  99635.03
## 2 2015-01-02 129606.01
## 3 2015-01-03 142300.54
## 4 2015-01-04 104330.71
## 5 2015-01-05 118132.20
## 6 2015-01-06 130672.48
tail(energy)
##            date    demand
## 2101 2020-10-01 106641.79
## 2102 2020-10-02  99585.83
## 2103 2020-10-03  92277.02
## 2104 2020-10-04  94081.56
## 2105 2020-10-05 113610.03
## 2106 2020-10-06 122607.56

Check missing value

energy %>%
  pad(interval = "day")%>%
  is.na()%>%
  colSums()
##   date demand 
##      0      0

Create Time Series Object

Weekly frequency

energy_ts <- ts(data = energy$demand, start = ymd("2015-01-01"), frequency = 7)
energy_ts %>%
  decompose()%>%
  autoplot()

According to the trend plot, the pattern is still up and down, meaning there is multiseasonal time series object. Therefore, the time series object needs to be reanalyzed through multiseasonal time series

Multiseasonal weekly & monthly

energy$demand %>%
  msts(seasonal.periods = c(7,30)) %>% # multiseasonal ts (weekly, monthly)
  mstl() %>% 
  autoplot() 

Multiseasonal monthly & quarterly

energy$demand %>%
  msts(seasonal.periods = c(30,90)) %>% # multiseasonal ts (montly, quarterly)
  mstl() %>% 
  autoplot() 

Multiseasonal quarterly & biannually

energy$demand %>%
  msts(seasonal.periods = c(90,180)) %>% # multiseasonal ts (quarterly, biannually)
  mstl() %>% 
  autoplot()

Multiseasonal biannually & annually

energy$demand %>%
  msts(seasonal.periods = c(180,365)) %>% # multiseasonal ts (biannually, annualy)
  mstl() %>% 
  autoplot() 

The last time series object showed the best decomposition among the other trials and therefore will be used for the time series model building.

energy_msts <- energy$demand %>%
  msts(seasonal.periods = c(180,365))

EDA and Visualisation

ggplot(energy, aes(x = date, y = demand)) +
  geom_line()

ggplot(tail(energy, 365), aes(x = date, y = demand)) +
  geom_line()

Insight

  1. The total data seems to have stationary pattern with additive seasonal
  2. During the start of COVID-19 Pandemic surprisingly the energy demand is not declining, but still move ups and downs

Stationary check

adf.test(energy_msts)
## Warning in adf.test(energy_msts): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  energy_msts
## Dickey-Fuller = -5.289, Lag order = 12, p-value = 0.01
## alternative hypothesis: stationary

Yes the insight point one is proven by statistical test that the data is stationary

Time Series Model

Cross Validation

The test set is prepared for the last 365 days data

energy_test <- tail(energy_msts, 365)
energy_train <- head(energy_msts, length(energy_msts)-length(energy_test))

Modeling and Comparing the Error

Two models are compared in order to get the lowest error, the models are Exponential Smoothing and ARIMA (Autoregresive Integrated Moving Average)

model_ets <- stlm(y = energy_train, method = "ets", lambda = 0)
model_arima <- stlm(y = energy_train, method = "arima", lambda = 0)
ets_forecast <- forecast(model_ets, h = 365)
arima_forecast <- forecast(model_arima, h = 365)
accuracy(ets_forecast$mean, energy_test)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -2425.216 13900.34 10672.47 -3.132033 9.496329 0.5103473  1.308613
accuracy(arima_forecast$mean, energy_test)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -2456.324 13897.22 10662.31 -3.158258 9.486504 0.5094716  1.309062

Insight

ARIMA model is best in forecasting the energy demand, with the MAPE value of 9.48% or below 10%, meaning the model is Excellent

Assumption Check

Normality Test

shapiro.test(arima_forecast$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  arima_forecast$residuals
## W = 0.9977, p-value = 0.01324

p value < 0.05, meaning the errors are not normally distributed

No Auto-Correlation

Box.test(model_arima$residuals, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  model_arima$residuals
## X-squared = 4.6365, df = 1, p-value = 0.0313

p value < 0.05, meaning the error still has autocorrelation with other

Model Improvement

Box Cox Transformation

The model will be improved by transforming the data using Box Cox method

model_ets_2 <- stlm(y = energy_train, method = "ets", lambda = "auto")
model_arima_2 <- stlm(y = energy_train, method = "arima", lambda = "auto")
ets_forecast_2 <- forecast(model_ets_2, h = 365)
arima_forecast_2 <- forecast(model_arima_2, h = 365)
accuracy(ets_forecast_2$mean, energy_test)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -7647.752 15389.14 11990.95 -7.741418 10.96011 0.5212872  1.496441
accuracy(arima_forecast_2$mean, energy_test)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -2577.214 13696.59 10615.34 -3.283028 9.463238 0.5206156  1.300609

Insight

After Box Cox Transformation, the model has improved its MAPE slightly by 0.02%

shapiro.test(arima_forecast_2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  arima_forecast_2$residuals
## W = 0.99191, p-value = 3.264e-08
Box.test(arima_forecast_2$residuals, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  arima_forecast_2$residuals
## X-squared = 0.069716, df = 1, p-value = 0.7918

Shapiro test : p value < 0.05, meaning the errors are not normally distributed

Box Test : p value > 0.05, meaning the error has no-autocorrelation with other

Final Visualisation

Improved ARIMA

test_forecast(actual = energy_msts, forecast.obj = arima_forecast_2, train = energy_train, test = energy_test)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Visualisation on the last 100 data train

energy_train %>% 
  tail(100)%>%
  autoplot()+
  autolayer(tail(ets_forecast_2$fitted,100), series = "ets") +
  autolayer(tail(arima_forecast_2$fitted,100),series = "arima")