Power Forecast

Data

DATA & Data Review


We load the data and then utilize the summary and skim functions to assess the quality of the data. Here are some observations:

  • 192 Rows of data
  • One NA value
  • Mean value of approximately 6.502 Million

We will use the tsclean function to clean the data and address the NA value. we show the same summary and skim function output for the cleaned series

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   770523  5429912  6283324  6502475  7620524 10655730        1
Data summary
Name power
Number of rows 192
Number of columns 1
_______________________
Column type frequency:
ts 1
________________________
Group variables None

Variable type: ts

skim_variable n_missing complete_rate start end frequency deltat mean sd min max median
x 1 0.99 1998 2013 12 0.08 6502475 1447571 770523 10655730 6283324
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  4313019  5443502  6351262  6529701  7608792 10655730
Data summary
Name power
Number of rows 192
Number of columns 1
_______________________
Column type frequency:
ts 1
________________________
Group variables None

Variable type: ts

skim_variable n_missing complete_rate start end frequency deltat mean sd min max median
x 0 1 1998 2013 12 0.08 6529701 1369032 4313019 10655730 6351262

EDA

Exploratory Data Analysis/h3>

Next we perform some basic EDA to better understand the series and to inform our data modeling. Here are some observations:

  • The ggtsdisplay plots show seansonality
  • Most of the spikes of the ACF fall outside the limits - this is not white
  • The PACF indicates some lags and most of the data is inside the blue lines.
  • The series is non stationary
  • The seasonplot support the premise that the data is seasonal
  • The BoxCox function indicates the optimal lambda for the series is -0.1442. Plot of the series before and after the Box Cox Transform are set forth below

The transformed power series is set forth below iwth optimal lambda of -0.1442665.

Modeling

Power Series Modeling/h3>

Similar to the ATM analysis, we will utilize several different techniques (STL, ETS and Arima) to model the the power series.

  • STL Decomp - the STL decomposition was consistent with our EDA findings,
  • The selected exponential smoothing model was ETS(A, Ad, A) with a box cox transform value of -.144
  • Auto Arima was utilized and yield a ARIMA(0,0,1)(2,1,0)[12] with drift and Box Cox lamda of -0.144

Modeling output is set forth below:

Exponential Smoothing

## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = ., lambda = lambda, biasadj = TRUE) 
## 
##   Box-Cox transformation: lambda= -0.1443 
## 
##   Smoothing parameters:
##     alpha = 0.118 
##     beta  = 0.0001 
##     gamma = 0.0001 
##     phi   = 0.979 
## 
##   Initial states:
##     l = 6.1998 
##     b = 0.0001 
##     s = -0.006 -0.0285 -0.0132 0.019 0.0263 0.0212
##            0.0014 -0.0255 -0.0192 -0.0077 0.0081 0.024
## 
##   sigma:  0.0094
## 
##       AIC      AICc       BIC 
## -765.9795 -762.0258 -707.3446

ARIMA

## Series: . 
## ARIMA(0,0,1)(2,1,0)[12] with drift 
## Box Cox transformation: lambda= -0.1442665 
## 
## Coefficients:
##          ma1     sar1     sar2   drift
##       0.2563  -0.7036  -0.3817  0.0001
## s.e.  0.0809   0.0734   0.0748  0.0001
## 
## sigma^2 estimated as 0.00008869:  log likelihood=585.32
## AIC=-1160.65   AICc=-1160.3   BIC=-1144.68

Model Selection

Model Comparison and Selection


We have plotted the forecast from our three modeling techniques below. Additionally we have calculated RMSEs for each modeling approach to determine the best model:

  • STL RMSE: 914,243
  • ETS RMSE: 967,678
  • ARIMA RMSE: 837,117

Once again the Arima model has produced the best fit. You can see the table output below. The Arima point forecast was also written to a csv file.

x
STL 914242.9
ETS 967678.0
ARIMA 837116.8

Power forecast Output