Two models will be fit and compared for forecasting: TBATS, an automatic procedure, and then a GAM. The data is hourly electric load in MW. The first three weeks in July ’18 will be used to train the model and then the last week will be used to assess model accuracy.
First, loading, preparing and splitting the data.
library(tidyverse); library(forecast); library(mgcv)
elec <- read_csv("DOM_hourly.csv")
elec <- elec %>% mutate(Datetime=as.POSIXct(Datetime, format="%Y-%m-%d %H:%M:%S")) %>% arrange(Datetime)
# Get day of the week, stored in "week" column for weekly seasonality.
# This is for the GAM.
elec$week <- weekdays(elec$Datetime)
elec$week_num <- as.integer(car::recode(elec$week,
"'Monday'='1';'Tuesday'='2';'Wednesday'='3';'Thursday'='4';
'Friday'='5';'Saturday'='6';'Sunday'='7'"))
# Build models with first 3 weeks of July, and the test set will be the last week.
train <- elec %>% filter(Datetime >= as.Date("2018-07-01") & Datetime < as.Date("2018-07-22"))
test <- elec %>% filter(Datetime >= as.Date("2018-07-22") & Datetime < as.Date("2018-07-29"))
ggplot(train, aes(Datetime, DOM_MW)) +
geom_line() +
theme(panel.border = element_blank(),
panel.background = element_blank(),
panel.grid.minor = element_line(colour = "grey90"),
panel.grid.major = element_line(colour = "grey90"),
panel.grid.major.x = element_line(colour = "grey90"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 12, face = "bold")) +
labs(x = "Date", y = "Load (MW)")
Fitting TBATS. The graphic below is the fit of the model and the forecasts. Comparison to the actual electric load is presented at the bottom of this document.
train_msts <- msts(train$DOM_MW, c(24,168))
fit_tbats <- tbats(train_msts)
autoplot(forecast(fit_tbats, h=168)) +
theme_bw()
Next is some data prep for fitting the GAM. The GAM will be explicitly told that there is daily seasonality and weekly seasonality.
n_date <- unique(as.Date(train$Datetime))
n_weekdays <- unique(train$week)
period <- 24
N <- nrow(train) # number of observations in the train set
window <- N / period # number of days in the train set
matrix_gam <- data.frame(Load = train$DOM_MW,
Daily = rep(1:period, window),
Weekly = train$week_num)
# Daily and Weekly are seasonality components.
# Daily indicates the 24 hours in a day, so obs with Daily==i will be correlated;
# Specifically Daily==1 is midnight.
# Weekly indicates the 7 parts of a week. Weekly==1 is Monday;
# Thus all Mondays are correlated.
GAM is fit to the data. How it models the two seasonalities is shown in the graphics below. Then the model is compared to the actual electric load.
gam_1 <- gam(Load ~ s(Daily, bs = "cr", k = period) +
s(Weekly, bs = "cr", k = 7),
data = matrix_gam,
family = gaussian)
# Peak demand at 3PM Tuesday.
layout(matrix(1:2, nrow = 1))
plot(gam_1, shade = TRUE)
newdata <- data.frame(Daily=rep(1:24,7), Weekly=test$week_num)
gam_preds <- predict.gam(gam_1, newdata=newdata)
gam_preds <- data.frame(DOM_MW=gam_preds, Datetime=test$Datetime)
real <- rbind(train[, c("DOM_MW", "Datetime")],test[, c("DOM_MW", "Datetime")])
gam_fitted <- data.frame(DOM_MW=fitted(gam_1), Datetime=train$Datetime)
ggplot() +
geom_line(data = real, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.5) +
geom_line(data = gam_fitted, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.7, color="dark green") +
geom_line(data = gam_preds, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.7, color="blue") +
theme_bw() +
labs(x = "Time", y = "Load (MW)",
title = "Fit (green) and Forecasts (blue) from GAM")
Comparison of TBATS to actual data:
tbats_fitted <- data.frame(DOM_MW=fitted(fit_tbats), Datetime=train$Datetime)
tbats_preds <- data.frame(DOM_MW=forecast(fit_tbats,h=168)$mean, Datetime=test$Datetime)
ggplot() +
geom_line(data = real, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.5) +
geom_line(data = tbats_fitted, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.7, color="dark green") +
geom_line(data = tbats_preds, aes(x=Datetime,y=DOM_MW),
size = 0.8, alpha=.8, color="purple") +
theme_bw() +
labs(x = "Time", y = "Load (MW)",
title = "Fit (green) and Forecasts (blue) from TBATS")
Measuring forecast errors with MAPE (mean absolute percentage error). Forecasts from GAM are on average 8.4% off. TBATS forecasts are on average 7.3% off.
# GAM
mean(abs(test$DOM_MW - gam_preds$DOM_MW)/gam_preds$DOM_MW) * 100
## [1] 8.407873
# TBATS
mean(abs(test$DOM_MW - tbats_preds$DOM_MW)/test$DOM_MW) * 100
## [1] 7.299677