Data 624 Midterm Project 1b

Power Forecast

Data

DATA & Data Review

We load the data and then utilize the summary and skim functions to assess the quality of the data. Here are some observations:

192 Rows of data
One NA value
Mean value of approximately 6.502 Million

We will use the tsclean function to clean the data and address the NA value. we show the same summary and skim function output for the cleaned series

power <- readxl::read_excel('ResidentialCustomerForecastLoad-624.xlsx')
power <- power$KWH %>% ts(start=c(1998, 1), frequency = 12)

summary(power)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   770523  5429912  6283324  6502475  7620524 10655730        1

skim_without_charts(power)

Data summary

Name	power
Number of rows	192
Number of columns	1
_______________________
Column type frequency:
ts	1
________________________
Group variables	None

Variable type: ts

skim_variable	n_missing	complete_rate	start	end	frequency	deltat	mean	sd	min	max	median
x	1	0.99	1998	2013	12	0.08	6502475	1447571	770523	10655730	6283324

power = tsclean(power)

summary(power)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  4313019  5443502  6351262  6529701  7608792 10655730

skim_without_charts(power)

Data summary

Name	power
Number of rows	192
Number of columns	1
_______________________
Column type frequency:
ts	1
________________________
Group variables	None

Variable type: ts

skim_variable	n_missing	complete_rate	start	end	frequency	deltat	mean	sd	min	max	median
x	0	1	1998	2013	12	0.08	6529701	1369032	4313019	10655730	6351262

EDA

Exploratory Data Analysis/h3>

Next we perform some basic EDA to better understand the series and to inform our data modeling. Here are some observations:

The ggtsdisplay plots show seansonality

Most of the spikes of the ACF fall outside the limits - this is not white

The PACF indicates some lags and most of the data is inside the blue lines.

The series is non stationary

The seasonplot support the premise that the data is seasonal

The BoxCox function indicates the optimal lambda for the series is -0.1442. Plot of the series before and after the Box Cox Transform are set forth below

autoplot(power) +  labs(title = "Power Usage", subtitle = 'In KWH') + theme_fivethirtyeight()

options(scipen=999)
ggtsdisplay(power) + labs(title = "Power Usage", subtitle = 'In KWH') + theme_fivethirtyeight()

options(scipen=999)
ggsubseriesplot(power) + theme_fivethirtyeight() + 
  labs(title = "Power Useage", subtitle = "Seasonal Plot", y = "Useage (kWh)", x = "Mth")

lambda = BoxCox.lambda(power)

p1 <- autoplot(power) + 
   theme_fivethirtyeight() + labs(title = "Power Time Series")



p2 <- autoplot(BoxCox(power, lambda)) +
    theme_fivethirtyeight() + labs(title = "Box Cox- Power Time Series")
p1

The transformed power series is set forth below iwth optimal lambda of -0.1442665.

p2

Modeling

Power Series Modeling/h3>

Similar to the ATM analysis, we will utilize several different techniques (STL, ETS and Arima) to model the the power series.

STL Decomp - the STL decomposition was consistent with our EDA findings,

The selected exponential smoothing model was ETS(A, Ad, A) with a box cox transform value of -.144

Auto Arima was utilized and yield a ARIMA(0,0,1)(2,1,0)[12] with drift and Box Cox lamda of -0.144

Modeling output is set forth below:

STL

power %>% 
  stl(s.window = 'periodic', robust = TRUE) %>%
  autoplot()  +
  labs(title = "STL Decomposition", x = "Yr")

Exponential Smoothing

power %>% 
  ets(lambda = lambda, biasadj = TRUE)

## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = ., lambda = lambda, biasadj = TRUE) 
## 
##   Box-Cox transformation: lambda= -0.1443 
## 
##   Smoothing parameters:
##     alpha = 0.118 
##     beta  = 0.0001 
##     gamma = 0.0001 
##     phi   = 0.979 
## 
##   Initial states:
##     l = 6.1998 
##     b = 0.0001 
##     s = -0.006 -0.0285 -0.0132 0.019 0.0263 0.0212
##            0.0014 -0.0255 -0.0192 -0.0077 0.0081 0.024
## 
##   sigma:  0.0094
## 
##       AIC      AICc       BIC 
## -765.9795 -762.0258 -707.3446

ARIMA

power %>%
  auto.arima(lambda = lambda, biasadj = TRUE)

## Series: . 
## ARIMA(0,0,1)(2,1,0)[12] with drift 
## Box Cox transformation: lambda= -0.1442665 
## 
## Coefficients:
##          ma1     sar1     sar2   drift
##       0.2563  -0.7036  -0.3817  0.0001
## s.e.  0.0809   0.0734   0.0748  0.0001
## 
## sigma^2 estimated as 0.00008869:  log likelihood=585.32
## AIC=-1160.65   AICc=-1160.3   BIC=-1144.68

Model Selection

Model Comparison and Selection

We have plotted the forecast from our three modeling techniques below. Additionally we have calculated RMSEs for each modeling approach to determine the best model:

STL RMSE: 914,243
ETS RMSE: 967,678
ARIMA RMSE: 837,117

Once again the Arima model has produced the best fit. You can see the table output below. The Arima point forecast was also written to a csv file.

	x
STL	914242.9
ETS	967678.0
ARIMA	837116.8

Power forecast Output

DATE <-  seq(as.Date('2014-01-01'), as.Date('2014-12-01'), by = 'month') %>% format('%Y-%b')
df <- data.frame(DATE, arima_fcst) %>% 
  select(DATE, Point.Forecast)
write_excel_csv(df, path="C:/Users/mutue/OneDrive/Documents/Data 624/data624_PWRFCST_JMUNDY.csv")