Part B consists of a simple data set of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
Load data
df.power = fread("ResidentialCustomerForecastLoad-624.txt", header=TRUE, sep = '\t', stringsAsFactors = FALSE)
View dataframe preview
head(df.power)
## CaseSequence YYYY-MMM KWH
## 1: 733 1998-Jan 6862583
## 2: 734 1998-Feb 5838198
## 3: 735 1998-Mar 5420658
## 4: 736 1998-Apr 5010364
## 5: 737 1998-May 4665377
## 6: 738 1998-Jun 6467147
summary(df.power)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
Let’s remove Year and Month string column and convert the time series into monthly frequency
power.ts = ts(df.power$KWH, start = c(1998,1), end=c(2013,12), frequency = 12)
Let’s plot the time series
autoplot(power.ts) +
xlab('Month') +
ylab('KWH') +
ggtitle('Monthly power consumption in KWH')

From the above time series we can see an outlier in year 2010. The Time series exhibits seasonal pattern indicating high power consumption in winter and summar time and low power consumption in rest of the period. There is no upward ot downward trend exhibited by the time series indicating the demand for power is relatively stable from 2000 to 2013. We can see the surge in peaks has increased after 2010 indicating high usage of power during winter and summer season post 2010
Let’s use STL decomposition to confirm our findings
res = stl(power.ts, s.window = 12)
plot(res)

We can see that STL plot above shows strong seasonal pattern. There is no underlying trend however there is a surge in power consumption post 2010. The unusually low power consumption in the year 2010 is appearent in the error plot above
Above is also a non-stationary time series. Let’s see what order differencing is required to make it stationary
print(paste('Order of seasonal difference required = ', nsdiffs(power.ts)))
## [1] "Order of seasonal difference required = 1"
print(paste('Order of difference required = ', ndiffs(power.ts)))
## [1] "Order of difference required = 1"
We can see that we need first order seasonal difference followed by first order regular difference to make this time series stationary
Let’s try following forecating models on this time series and assess which one is better in terms of RMSE. We will use time series cross validation function to calculate RMSE for eash model
- Seasonal Naive Model
- Random walk model
- ETS model
- ARIMA model
calcRMSE = function(modelName, modelFunc)
{
res = tsCV(power.ts,modelFunc)
print(paste('RMSE for model', modelName, '=', sqrt(mean(res^2, na.rm=TRUE))))
}
Fit seasonal naive model and calc RMSE
calcRMSE('Seasonal Naive', snaive)
## [1] "RMSE for model Seasonal Naive = 1210609.88804598"
Fit Random Walk model and calc RMSE
calcRMSE('Random Walk', rwf)
## [1] "RMSE for model Random Walk = 1539633.85951456"
Fit Seasonal ARIMA model and calc RMSE
func <- function(x, h){forecast(auto.arima(x), h=h, seasonal=TRUE)}
calcRMSE('ARIMA', func)
## [1] "RMSE for model ARIMA = 1115003.87958045"
Fit ETS model and calc RMSE
func <- function(x, h){forecast(ets(x, model="ZZZ"), h=h)}
calcRMSE('ETS', func)
## [1] "RMSE for model ETS = 1037684.72933356"
We can see that ETS model gives us best RMSE. Let’s analyze model details
fit = ets(power.ts, model = 'ZZZ')
summary(fit)
## ETS(M,N,M)
##
## Call:
## ets(y = power.ts, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.1167
## gamma = 1e-04
##
## Initial states:
## l = 6133758.2586
## s = 0.9568 0.7524 0.8702 1.1849 1.2617 1.1857
## 1.0001 0.7743 0.8154 0.9116 1.0651 1.222
##
## sigma: 0.1201
##
## AIC AICc BIC
## 6224.787 6227.514 6273.649
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 61969.44 833321.3 505622.3 -4.356712 12.01361 0.722433
## ACF1
## Training set 0.1757877
We can see that auto selection method of ETS has chosen ETS (M,N,M) model indicating multiplicative error terms and No trend and multiplicative seasonality. This make sense based on time series and STL dcomposition results
Let’s analyze residuals
checkresiduals(fit)

##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,M)
## Q* = 30.705, df = 10, p-value = 0.0006562
##
## Model df: 14. Total lags used: 24
Residual plots are not satisfactory. small value of Ljung-Box test shows that residuals are correlated. ACF plot also confirms the same. Histogram suggests that residuals are not normally distributed. It shows huge left skew in the residual distribution. The Residual plot indicates a huge outlier presense in year 2010, which is in line with our finding in the time series
We can see that RMSE is greatly improved using BoxCox transformation. We can also see that BoxCox transformation changes ETS model to ETS(A,N,A) where error and seasonality are additive and there is no trend
Let’s visualize residuals
checkresiduals(fit)

##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 13.696, df = 10, p-value = 0.1873
##
## Model df: 14. Total lags used: 24
Residual plots are satisfactory. Large value of Ljung-Box test shows that residuals are not correlated. ACF plot also confirms the same. Histogram suggests that residuals are not normally distributed however it appears to be acceptable
Let’s use ETS(A,N,A) model to generate forecast for 2014
fcst = forecast(fit, h=12, lambda = BoxCox.lambda(power.ts))
## Warning in forecast.ets(fit, h = 12, lambda = BoxCox.lambda(power.ts)):
## biasadj information not found, defaulting to FALSE.
print(fcst)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 9072736 7183518 11395443 6333493 12829171
## Feb 2014 7731631 6095283 9751051 5361280 11000973
## Mar 2014 6662622 5230951 8435807 4590628 9536172
## Apr 2014 6005531 4700833 7625719 4118547 8633056
## May 2014 5671328 4431064 7213978 3878258 8174218
## Jun 2014 7358020 5785457 9303198 5081382 10509173
## Jul 2014 7751644 6100889 9791808 5361297 11055897
## Aug 2014 9309851 7358433 11712684 6481503 13197502
## Sep 2014 8558822 6748361 10792837 5936177 12175460
## Oct 2014 6416036 5020103 8150139 4397280 9228578
## Nov 2014 5589925 4356484 7127459 3807699 8086030
## Dec 2014 7040548 5518105 8929009 4838018 10102189
fwrite(data.frame(fcst), "PowerForecast.csv", sep=",", col.names = TRUE, row.names = FALSE)
Let’s plot the forecast
autoplot(fcst, pi=TRUE) +
ggtitle("Forecast - Power consumption in KWH")

Autoplot above shows point forecast along with 95% confidence interval for 2014 energy consumption in KWH using ETS(A,N,A) model with BoxCox transformation