DATA 661 - Independent Study: Electric Demand Analysis

Introduction

In 2007, New York City launched PlaNYC, a plan to create a “greener, greater New York” through the implementation of initiatives in key areas of interest, including energy and climate change. PlaNYC established a goal of reducing citywide greenhouse gas (GHG) emissions by 30 percent below 2005 levels by 2030 (30x30). This called for fundamental change in the way existing buildings are managed and operated. In 2014, NYC Mayor Bill de Blasio released an updated environmental plan for NYC: “One City: Built to Last,” which included an increase in the GHG emissions target to an 80% reduction from 2005 GHG levels by 2050.

NYC GHG Emissions

In 2014, NYC buildings were responsible for 68% of citywide GHG emissions through the use of electricity, natural gas, heating oil, steam and biofuel.

NYC GHG Emissions by Sector (2014)

Commercial and institutional buildings comprised about 31% of total emissions in 2014, while residential buildings comprised about 28%.

NYC GHG Emissions from Large Buildings by End Use (2014)

In commercial buildings, electricity represents a large portion of energy consumed, for end uses such as space cooling, ventilation, lighting and plug loads.

Electricity Demand

What is electricity demand?

Electricity demand refers to the maximum amount of electric energy that is being consumed at one time. Essentially, demand (measured in kilowatts, or kW) measures the speed at which a customer uses energy (like the speedometer in a car), while consumption (measured in kilowatt hours, or kWh) measures how much electricity is used (like the odometer in a car).

Why forecast electricity demand?

Demand charges account for a significant portion of total electricity costs in commercial buildings. In fact, demand charges are increasing in the U.S., even though energy prices are decreasing. One reason for this is the aging power grid infrastructure, which increasingly requires maintenance and updates – costs that are passed on to customers.

The ability to accurately forecast electricity demand can help customers and utilities make sustainable electricity planning decisions to help ensure sufficent electricity supply. Utilities enroll customers in demand response programs to incentivize them to curtail their consumption so that demand on the electric grid can be lowered when demand is higher than supply.

Electricity consumption has been growing at an increasing rate due to the effects of both environmental and human behavior. As a result, electricity demand patterns have become more complex and harder to recognize. Researchers have tried numerous forecasting methods, yet no single one has been found to be generalized enough to account for all cases.

Current Work

This study has pursued a number of time series analysis techniques with the goal of finding one that works well for demand forecasting in typical NYC office buildings. The most promising method was implemented via Facebook’s Prophet package (R version), which performs what they refer to as “forecasting at scale.”

According to Facebook, Prophet is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. The model has four main components: 1) a piecewise linear or logistic growth curve trend. (Prophet automatically detects changes in trends by selecting change-points from the data); 2) A yearly seasonal component modeled using Fourier series; 3) a weekly seasonal component using dummy variables; and 4) a user-provided list of important holidays.

Following is a synopsis the analysis undertaken for this study.

Methodology

Data

The analysis was performed on two sets of electricity demand data from “typical” NYC office buildings. The data was generated at a daily interval, and covered the six-year period from 2011-2016. The data was anonymized to allay privacy concerns.

Time Series Analysis I: Prophet

The first procedure involved time series analysis and forecasting with Facebook’s Prophet package (R version). First, the data was read in and converted to the proper formats. Five years of data (2011-2015) was used as the training set and the actual 2016 data was used to validate the data that was forecasted for 2016.

## # A tibble: 6 x 2
##         ds      y
##      <chr>  <dbl>
## 1 1/1/2011 496.35
## 2 1/2/2011 484.13
## 3 1/3/2011 575.84
## 4 1/4/2011 581.45
## 5 1/5/2011 580.01
## 6 1/6/2011 581.54

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   59.43  444.10  499.80  547.70  603.20 1141.00      33

The 2011-2015 data was visualized using Highcharter.

The Prophet package was used to create a model for 2011-2015, and then to forecast out one year (2016). The Top chart shows the five actual years and one forecasted year; the second chart shows the electric demand trend without seasonality, with the forecasted year and confidence band at the end. The third chart shows the weekly seasonality, where you can see no demand on Sunday, a ramp-up on Monday with pretty consistent demand mid-week, and then a ramp-down to minimal demand on Saturday. The fourth chart shows yearly seasonality, with peak demand over the summer cooling months.

## Initial log joint probability = -22.4069
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##              ds     yhat yhat_lower yhat_upper
## 2188 2016-12-27 467.2487   346.9658   583.2708
## 2189 2016-12-28 466.4273   348.7210   584.6858
## 2190 2016-12-29 464.0187   343.8716   578.1373
## 2191 2016-12-30 452.0771   343.1059   564.8202
## 2192 2016-12-31 379.9421   263.2719   498.6533
## 2193 2017-01-01 358.0418   235.2251   477.0080

Holiday dates were compiled for each of the five years Add holidays and then the forecast was rerun. Note the third chart down, which is now for the holidays trend, which shows the negative (from baseline) demand for the defined days. There are some small shifts in the trend and yearly charts, though nothing significant. The Facebook team uses the holidays option for forced change-points in the data that would seemingly have a significant effect on the trends – like the superbowl, if you were to analyze a time series of a quarterback’s passes throughout a season.

## Initial log joint probability = -22.4069
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##              ds
## 2188 2016-12-27
## 2189 2016-12-28
## 2190 2016-12-29
## 2191 2016-12-30
## 2192 2016-12-31
## 2193 2017-01-01

## Initial log joint probability = -22.4069
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##            ds     july4   tgiving      xmas  newyears
## 11 2013-11-28   0.00000 -79.26345   0.00000   0.00000
## 12 2013-12-25   0.00000   0.00000 -45.99918   0.00000
## 13 2014-01-01   0.00000   0.00000   0.00000 -32.97763
## 14 2014-07-04 -86.58054   0.00000   0.00000   0.00000
## 15 2014-11-27   0.00000 -79.26345   0.00000   0.00000
## 16 2014-12-25   0.00000   0.00000 -45.99918   0.00000
## 17 2015-01-01   0.00000   0.00000   0.00000 -32.97763
## 18 2015-07-04 -86.58054   0.00000   0.00000   0.00000
## 19 2015-11-26   0.00000 -79.26345   0.00000   0.00000
## 20 2015-12-25   0.00000   0.00000 -45.99918   0.00000

The forecasted data for 2016 was extracted and stored for comparative analysis.

##              ds     yhat
## 1827 2016-01-01 460.0157
## 1828 2016-01-02 387.7859
## 1829 2016-01-03 365.6486
## 1830 2016-01-04 449.4169
## 1831 2016-01-05 471.9148
## 1832 2016-01-06 471.4844

Time Series Analysis II: stR and Forecast

More detailed time series analysis and forecasting was performed on the same dataset, using the stR and Forecast packages from statistican Rob Hyndman, who is well known for his work in forecasting.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   59.43  444.10  499.80  547.70  603.20 1141.00      33

The time series was analyzed for missing data and data imputation was performed.

## [1] "Length of time series:"
## [1] 1826
## [1] "-------------------------"
## [1] "Number of Missing Values:"
## [1] 33
## [1] "-------------------------"
## [1] "Percentage of Missing Values:"
## [1] "1.81%"
## [1] "-------------------------"
## [1] "Stats for Bins"
## [1] "  Bin 1 (457 values from 1 to 457) :      0 NAs (0%)"
## [1] "  Bin 2 (457 values from 458 to 914) :      1 NAs (0.219%)"
## [1] "  Bin 3 (457 values from 915 to 1371) :      32 NAs (7%)"
## [1] "  Bin 4 (455 values from 1372 to 1826) :      0 NAs (0%)"
## [1] "-------------------------"
## [1] "Longest NA gap (series of consecutive NAs)"
## [1] "32 in a row"
## [1] "-------------------------"
## [1] "Most frequent gap size (series of consecutive NA series)"
## [1] "32 NA in a row (occuring 1 times)"
## [1] "-------------------------"
## [1] "Gap size accounting for most NAs"
## [1] "32 NA in a row (occuring 1 times, making up for overall 32 NAs)"
## [1] "-------------------------"
## [1] "Overview NA series"
## [1] "  1 NA in a row: 1 times"
## [1] "  32 NA in a row: 1 times"

The data was convert to multi-seasonal time series format, with weekly and yearly seasonal periods defined.

Automatic Robust Seasonal-Trend decomposition (STR) was used to model the time series (Robust STR handle outliers better); data was plotted with 95% confidence level bands.

Model components were extracted and the beta coefficients of the forecast were plotted. If beta coefficients are too “wiggly,” it can mean that the forecast is suboptimal. The basic trend in the first chart is clearly defined and shows a decrease; the bands in the weekly and yearly seasonality charts are basically smooth.

## Time Series:
## Start = 2011.01923076923 
## End = 2011.03292939937 
## Frequency = 365 
##            Data    Trend Seasonality 7 Seasonality 365.4   Random
## 2011.019 496.35 590.4620    -44.614062         -94.20901 44.71106
## 2011.022 484.13 590.4080    -64.774984         -89.39227 47.88930
## 2011.025 575.84 590.3539      7.451112         -85.12858 63.16357
## 2011.027 581.45 590.2998     26.987860         -81.88221 46.04451
## 2011.030 580.01 590.2458     27.323691         -78.63584 41.07637
## 2011.033 581.54 590.1917     27.938018         -78.28285 41.69311

Forecasting

Multiple forecasting methods were implemented in an effort to find the one most appropriate for the dataset under analysis; results can be seen below for forecasting using: 1) Mean; 2) Naive; 3) Random Walk; and 4) Random Walk with Drift. As you can see, the forecasts from these methods are nearly flat, so it can be seen that they do not properly capture the seasonality and trend of the data.

Further attempts at forecasting proved to be more successful. First, a Seasonal Naive method was used, which employs a seasonal ARIMA(0,0,0)(0,1,0)m model (where m is the seasonal period).

Next was the Theta method, which was introduced by Assimakopoulos and Nikolopoulos (2000).

The last method tested was based on STL (Seasonal and Trend decomposition using Loess) decomposition, which takes a time series object, applies an STL decomposition, models the seasonally adjusted data, then re-seasonalizes using the last year of the seasonal component. The forecast is then based on the seasonally-adjusted object.

Results

Forecasted data for 2016 was extracted from each of the successful models, and were combined with actual 2016 data.

##   Actual  Prophet Seasonal Naive    Theta      STL
## 1 403.56 460.0157         436.81 405.8030 407.3958
## 2 384.08 387.7859         476.85 449.2164 461.9320
## 3 368.93 365.6486         435.23 430.6490 448.4301
## 4 447.52 449.4169         381.43 426.4648 444.6142
## 5 475.36 471.9148         481.01 425.6598 445.1264
## 6 466.36 471.4844         427.95 418.5450 434.9454

A correlation matrix was generated to examine correlations between actual and forecasted datasets.

Actual and forecasted datasets were plotted.

Statistics were generated to compare the accuracy of the forecasted datasets: 1) ME: Mean Error; 2) RMSE: Root Mean Squared Error; 3) MAE: Mean Absolute Error; 4) MPE: Mean Percentage Error; and 5) MAPE: Mean Absolute Percentage Error.

##                ME      RMSE      MAE        MPE     MAPE
## Prophet -27.17934  98.97287 71.57806  -8.996124 14.15014
## SNaive  -29.04411 125.87001 89.84803  -9.993189 18.37573
## Theta   -14.80093 115.75756 79.49581  -7.876948 16.19813
## STL     -29.85497 111.98505 79.17248 -10.840212 16.69730

Although it does not rank best for every metric generated, the data forecasted using the prophet package does rank best for most – including the RMSE and MAE. The MAE is often used for comparison, as it is easy to comprehend and to calculate.

Replication

Prophet was used to analyze and forecast for a second dataset, using the exact procedure described earlier.

Read in electric demand data (2011-15).

## # A tibble: 6 x 2
##         ds      y
##      <chr>  <dbl>
## 1 1/1/2011 146.48
## 2 1/2/2011 147.06
## 3 1/3/2011 246.70
## 4 1/4/2011 236.64
## 5 1/5/2011 235.14
## 6 1/6/2011 246.36

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   94.92  159.60  222.40  222.70  266.00  438.50

Chart time series data.

Use Prophet package to create forecast (with holidays).

## Initial log joint probability = -34.7391
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##              ds
## 2188 2016-12-27
## 2189 2016-12-28
## 2190 2016-12-29
## 2191 2016-12-30
## 2192 2016-12-31
## 2193 2017-01-01

## Initial log joint probability = -34.7391
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance

##            ds     july4   tgiving     xmas  newyears
## 11 2013-11-28   0.00000 -48.42203   0.0000   0.00000
## 12 2013-12-25   0.00000   0.00000 -31.6313   0.00000
## 13 2014-01-01   0.00000   0.00000   0.0000 -17.00493
## 14 2014-07-04 -35.93053   0.00000   0.0000   0.00000
## 15 2014-11-27   0.00000 -48.42203   0.0000   0.00000
## 16 2014-12-25   0.00000   0.00000 -31.6313   0.00000
## 17 2015-01-01   0.00000   0.00000   0.0000 -17.00493
## 18 2015-07-04 -35.93053   0.00000   0.0000   0.00000
## 19 2015-11-26   0.00000 -48.42203   0.0000   0.00000
## 20 2015-12-25   0.00000   0.00000 -31.6313   0.00000

Extract 2016 forecasted data from forecast for comparative purposes.

##              ds     yhat
## 1827 2016-01-01 226.1496
## 1828 2016-01-02 118.7880
## 1829 2016-01-03 119.5173
## 1830 2016-01-04 239.7535
## 1831 2016-01-05 239.0367
## 1832 2016-01-06 238.0068

##   actual2 forecast2
## 1  188.84  226.1496
## 2  175.30  118.7880
## 3  174.71  119.5173
## 4  258.38  239.7535
## 5  256.17  239.0367
## 6  228.59  238.0068

Combine forecasted and actual 2016 data.

##   actual162 prophet162
## 1    188.84   226.1496
## 2    175.30   118.7880
## 3    174.71   119.5173
## 4    258.38   239.7535
## 5    256.17   239.0367
## 6    228.59   238.0068

Check correlations between actual and forecasted datasets.

Plot actual and forecasted data.

Generate statistics to measure accuracy of forecasted dataset.

##                 ME     RMSE     MAE       MPE     MAPE
## Test set -13.10004 40.28194 31.6217 -9.471424 17.99446

References

City of New York, “One City Built to Last”. 2014. Retrieved from http://www.nyc.gov/html/builttolast/assets/downloads/pdf/OneCity.pdf.

Dokumentov, A and Hyndman, RJ (2017). stR: STR Decomposition. R package version 0.3. https://CRAN.R-project.org/package=stR.

“Evaluating Forecast Accuracy.” OTexts. N.p., n.d. Web. 16 May 2017.

Hyndman, RJ (2017). forecast: Forecasting functions for time series and linear models. R package version 8.0, http://github.com/robjhyndman/forecast.

Hyndman, RJ and Khandakar Y (2008). “Automatic time series forecasting: the forecast package for R.” Journal of Statistical Software, 26(3), pp. 1-22. http://www.jstatsoft.org/article/view/v027i03.

Kunst, J (2017). highcharter: A Wrapper for the ‘Highcharts’ Library. R package version 0.5.0. https://CRAN.R-project.org/package=highcharter.

Taylor, S and Letham, B (2017). prophet: Automatic Forecasting Procedure. R package version 0.1.1. https://CRAN.R-project.org/package=prophet.