DATA 624 Project 1 Part B - Power
Introduction
I am to forecast how much power will be consumed by residential customers in kilowatt hours for the next 12 months. Data was provided for this project spaning January 1998 until December 2013.
Data Exploration
I will begin by reading in the data and ploting the series.
if (!file.exists("ResidentialCustomerForecastLoad-624.xlsx")) {
download.file("https://github.com/mikeasilva/CUNY-SPS/raw/master/DATA624/ResidentialCustomerForecastLoad-624.xlsx", "ResidentialCustomerForecastLoad-624.xlsx")
}
raw_data <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
kwh <- raw_data %>%
select(KWH) %>%
ts(start = decimal_date(date("1998-01-01")), frequency = 12)
autoplot(kwh, main = "Residential Power Usage (KWH)") +
scale_y_continuous(labels = comma) +
theme(axis.title = element_blank())
KWH | |
---|---|
Min. : 770523 | |
1st Qu.: 5429912 | |
Median : 6283324 | |
Mean : 6502475 | |
3rd Qu.: 7620524 | |
Max. :10655730 | |
NA’s :1 |
We have outliers and missing data. We will clean them up with tsclean
from the forcasts package, and replot the data with the ACF and PACF:
kwh <- tsclean(kwh)
ggtsplot <- function(ts, title) {
# A ggplot2 version of tsdisplay(df)
# Args:
# ts (Time-Series): The time series we want to plot
# title (str): The title of the graph
grid.arrange(
autoplot(ts) +
scale_y_continuous(labels = comma) +
ggtitle(title) +
theme(axis.title = element_blank()),
grid.arrange(
ggAcf(ts) + ggtitle(element_blank()),
ggPacf(ts) + ggtitle(element_blank()), ncol = 2)
, nrow = 2)
}
ggtsplot(kwh, "Cleaned Residential Power Usage (KWH)")
This looks much better. There is a seasonal component that looks to be additive. I will need to use a model that captures seasonality.
Candidate Models
Since the data is highly seasonal, I we will need to use an ARIMA or Holt-Winters model. Let’s fit an models using auto.arima()
and hw()
and visualize their projections. I will use a couple of different parameters with the Holt-Winters model.
h <- 12
lambda <- BoxCox.lambda(kwh)
autoplot(kwh) +
autolayer(hw(kwh, h = h), series = "Holt-Winters") +
autolayer(hw(kwh, h = h, lambda = lambda), series = "Holt-Winters (Box-Cox Adjusted)") +
autolayer(hw(kwh, h = h, lambda = lambda, damped = TRUE), series = "Holt-Winters (Damped & Box-Cox Adjusted)") +
autolayer(forecast(auto.arima(kwh), h = h), series = "ARIMA") +
facet_wrap(. ~ series, ncol = 2) +
scale_color_brewer(palette = "Set1") +
scale_y_continuous(labels = comma) +
theme(legend.position = "none", axis.title = element_blank())
Residual Analysis
All models do a good job picking up the seasonality. Let’s look at the residuals to see if any of the models should not be used.
Holt-Winters Model
Ljung-Box test
data: Residuals from Holt-Winters' additive method
Q* = 61.741, df = 8, p-value = 0.0000000002121
Model df: 16. Total lags used: 24
There are some spikes in the ACF that indicate this is not the best model.
Adjusted Holt-Winters Model
Ljung-Box test
data: Residuals from Holt-Winters' additive method
Q* = 48.973, df = 8, p-value = 0.00000006433
Model df: 16. Total lags used: 24
This is a bit better than the previous model but there are still ACF spikes outside of the bands.
Damped & Adjusted Holt-Winters Model
Ljung-Box test
data: Residuals from Damped Holt-Winters' additive method
Q* = 47.391, df = 7, p-value = 0.00000004682
Model df: 17. Total lags used: 24
The preformance of this model is about the same as above.
ARIMA Model
Ljung-Box test
data: Residuals from ARIMA(3,0,1)(2,1,0)[12] with drift
Q* = 11.549, df = 17, p-value = 0.8266
Model df: 7. Total lags used: 24
This model seems to be the best. There is only one ACF spike outside of the threshold and it only by a small amount. My working theory is this is the model I should use, but I will check this using cross validation.
Cross Validation
In order to understand how well a model is likely to preform at predicting out of sample data, I will use the tsCV
function and evaluate the models. My goal is to minimize the RMSE. First I will get the errors from the cross validation process, then I will compute the RMSE.
get_rmse <- function(e) {
sqrt(mean(e^2, na.rm = TRUE))
}
arima_forecast <- function(x, h) {
forecast(Arima(x, order = c(3, 0, 1), seasonal = c(2, 1, 0), include.drift = TRUE), h = h)
}
hw_error <- tsCV(kwh, hw, h = h)
adj_hw_error <- tsCV(kwh, hw, h = h, lambda = lambda)
damped_adj_hw_error <- tsCV(kwh, hw, h = h, lambda = lambda, damped = TRUE)
arima_errors <- tsCV(kwh, arima_forecast, h = h)
data.frame(Model = c("ARIMA", "Holt-Winters", "Adjusted Holt-Winters", "Damped & Adjusted Holt-Winters"),
RMSE = c(get_rmse(arima_errors[, h]), get_rmse(hw_error[, h]), get_rmse(adj_hw_error[, h]), get_rmse(damped_adj_hw_error[, h]))) %>%
arrange(RMSE) %>%
kable() %>%
kable_styling()
Model | RMSE |
---|---|
ARIMA | 659529.8 |
Holt-Winters | 1010721.9 |
Adjusted Holt-Winters | 1373617.3 |
Damped & Adjusted Holt-Winters | 1547751.3 |
The ARIMA model had the best cross validation results.
Model Selection
Given that the ARIMA model minimized the RMSE in the cross validation, I will use it to forecast residential power consumption.
Summary
I set out to forecast residential power consumption. Four modeling techniques were tested for accuracy using cross-validation. In the end an ARIMA model was selected bercause it minimized the RMSE. Here’s my forcast for the next 12 months of power usage:
my_forecast <- forecast(arima_fit, h = h)
autoplot(my_forecast) +
ggtitle("Forecasted Residential Power Consumption (KWH)") +
theme(axis.title = element_blank())
Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
---|---|---|---|---|---|
Jan 2014 | 9433336 | 8688058 | 10178615 | 8293531 | 10573141 |
Feb 2014 | 8570219 | 7785677 | 9354761 | 7370366 | 9770072 |
Mar 2014 | 6615312 | 5830157 | 7400467 | 5414521 | 7816102 |
Apr 2014 | 6005538 | 5196071 | 6815004 | 4767565 | 7243510 |
May 2014 | 5917569 | 5108101 | 6727036 | 4679595 | 7155543 |
Jun 2014 | 8187387 | 7377114 | 8997660 | 6948182 | 9426592 |
Jul 2014 | 9528779 | 8717809 | 10339750 | 8288507 | 10769052 |
Aug 2014 | 9991953 | 9180963 | 10802943 | 8751650 | 11232255 |
Sep 2014 | 8546843 | 7735715 | 9357971 | 7306330 | 9787356 |
Oct 2014 | 5856327 | 5045196 | 6667458 | 4615809 | 7096845 |
Nov 2014 | 6153245 | 5342114 | 6964376 | 4912727 | 7393763 |
Dec 2014 | 7732110 | 6920971 | 8543250 | 6491580 | 8972641 |