This exploratory data analysis using the statistical R programming language, looks at the climate, population and economic factors driving the electricity demand load in the Houston area.
The study also focuses on whether the longest economic downturn since World War II - The Great Recession of 2007 to 2009 - had any statistically significant effect on the electricity usage in the Houston area.
Houston is the Energy Capital of the World, employing nearly a third of the nation’s jobs in oil and gas extraction. Houston is the headquarters for virtually every segment of the energy industry including exploration and production.
The Texas power grid system is designed to handle the electrical demand, especially the Houston area demand.
The Electric Reliability Council of Texas (ERCOT) manages the flow of electricity to more than 26 million Texans – representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and over 680 power generation units.
In general, heating and cooling the structures in which we live accounts for 48 percent of the energy that American households use every year1. Houston residents are also no stranger to the summer heat and humidity. Many rely on home air conditioning systems to keep them cool. Homes in the Houston area have the highest electricity usage2 in Texas. Based on ERCOT data, the average home in the Houston area uses around 15,600 kWh a year, or about 1,300 kWh a month.
A key responsibility of ERCOT is scheduling and managing how electricity will flow through the network. If the capacity of any system component is exceeded, a fault can occur, potentially leading to a massive blackout.
The results of this analysis can provide some preliminary insight and guidance to energy demand forecasters in power generation capacity allocation.
It is generally understood that climate will be the biggest driver of electricity usage in any region, especially the daily maximum and minimum temperatures driving cooling and heating requirements for the residents. So, it is expected that this study will confirm the high climate related influence on electrical consumption. And it is also intuitive that rising population in a region will correspondingly increase that consumption. What is not clear how much (if any) does the economic health of a region impacts that consumption of electricity in that region.
The ERCOT region includes the major Texas urban load centers of Houston, Dallas, Fort Worth, San Antonio, and Austin.
ERCOT electricity distribution in Texas broken down into sub-regions. Here’s a map3 depicting the ERCOT regions.
ERCOT Regions
As the map shows, Houston area is within the Coast region. The Coast region consists of Harris, Fort Bend, Brazoria, Chambers and Galveston Counties. For the remainder of this report, the Coast area will be referred to as the Houston area.
The study focused on the period from 2002 to 2018. Houston area data was collected for electric load, climate, population, and economic influences for that time period.
Hourly electric load for the Coast region (Houston area) was obtained from the ERCOT load archive website:
http://www.ercot.com/gridinfo/load/load_hist
Houston Intercontinental Airport hourly weather data was extracted from the National Oceanic and Atmospheric Administration (NOAA) data portal:
https://www.ncdc.noaa.gov/cdo-web/search
Annual Population data and economic variables (new housing structures (Housing), new private industries (quarterly), Real Gross Domestic Product (Annual) and monthly unemployment rate) were extracted from the Federal Reserve Bank’s economic research (FRED) website:
https://fred.stlouisfed.org/
Data pre-processing and cleaning operations were performed in a separate attached R program file: ‘DataPreprocessing.R’.
The Hourly Electrical loads for the 2002-2018 period were aggregated to daily average loads, and the hourly temperature data was aggregated to the daily maximum and minimum temperatures.
Temperatures were converted to cooling/heating degree days. Degree days4 measures how cold or warm a region is. A degree day compares the mean (the average of the daily maximum and minimum) outdoor temperatures for a location to a baseline temperature, usually 65° Fahrenheit (F) in the United States. The more extreme high or low the outside temperature, the higher the number of degree days. A high degree day generally results in higher amounts of energy use for heating or cooling.
An inner join was used to combine the electrical and climate data into a single data frame.
Since the population and economic data were a lower frequency time series, these were dis-aggregated to daily time series to match the load and climate data. The tempdisagg R program library was used to convert data from low frequency (annual/quarterly/monthly data) to high frequency (daily data).
The dis-aggregated population and economic data frames were combined to the electrical and climate data frame to form the dataframe - “df_all” used in this analysis.
The key variables in this daily time series data frame are:
* avg_DailyLoad: Houston area Daily Average Load (Mwh)
* degDay: Cooling/Heating Degree Days (F) - Positive numbers indicate Cooling Degree Day
* popHOUDaily: Daily Population numbers in the thousands. Data from 2010 to 2018 are estimated
* gdpHOUDaily: Real Gross Domestic Product (Thousands of 2012 Chained Dollars) - All Industries
* GDPperPOP: All Industries Real GDP per Capita (2012 Chained Dollars)
* EstHOUDaily: Number of Private Establishments for All Industries
* housHOUDaily: New Private Housing Structures Authorized by Building Permits
* ueHOUDaily: Houston Metro Unemployment Rate
Here is a look at the first six observations in the df_all data frame.
| date | avg_DailyLoad | degDay | year | month | dayN | dayL | dayW | week | weekend | season | popHOUDaily | gdpHOUDaily | GDPperPOP | EstHOUDaily | housHOUDaily | ueHOUDaily |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002-01-01 | 8525.602 | -24.0 | 2002 | Jan | 01 | Tue | 2 | 00 | FALSE | Winter | 5520.884 | 291328843 | 52768.51 | 107132.0 | 46592.00 | 5.900000 |
| 2002-01-02 | 9424.616 | -27.5 | 2002 | Jan | 02 | Wed | 3 | 00 | FALSE | Winter | 5521.204 | 291330305 | 52765.72 | 107135.9 | 46617.91 | 5.890323 |
| 2002-01-03 | 9970.283 | -30.0 | 2002 | Jan | 03 | Thu | 4 | 00 | FALSE | Winter | 5521.523 | 291331766 | 52762.93 | 107139.8 | 46643.82 | 5.880645 |
| 2002-01-04 | 9524.447 | -27.0 | 2002 | Jan | 04 | Fri | 5 | 00 | FALSE | Winter | 5521.843 | 291333228 | 52760.14 | 107143.7 | 46669.73 | 5.870968 |
| 2002-01-05 | 7810.253 | -14.5 | 2002 | Jan | 05 | Sat | 6 | 00 | TRUE | Winter | 5522.163 | 291334689 | 52757.35 | 107147.6 | 46695.64 | 5.861290 |
| 2002-01-06 | 7580.678 | -15.0 | 2002 | Jan | 06 | Sun | 7 | 00 | TRUE | Winter | 5522.483 | 291336151 | 52754.56 | 107151.5 | 46721.55 | 5.851613 |
Exploratory data analysis was utilized with multiple plots generated to detect the key variables dependencies on electrical consumption. The emerging patterns were examined to draw insights from the emerging patterns if any.
A descriptive model time series was also constructed to examine the 2007 to 2009 recession effects on the Houston area electricity loads.
All of the key study variables of the df_all dataset were aggregated to monthly averages. This was done to ease the computational load on the automatic ARIMA time series algorithm used in the R forecast6 library - auto.arima.
The auto.arima function does not converge on a large data set with a high frequency, such as 365 days as available for this study. Thus, the aggregated monthly data (frequency=12) was utilized in the time series model.
An additional boolean variable was added to the monthly data series indicating the months of recession in the time duration of December 2007 to June 2009.
The daily average electrical load plot shows both the seasonal changes and an upward trend over time in the Houston area electricity consumption.
It can be clearly seen below that the regions’ temperature (Degree Day) drives the seasonal variations.
There appears to be a quadratic relationship between Load and Degree Day in the plot below. The seasonal variations are also clearly seen here, with Summers and Springs generating the cooling loads and Winter/Fall the heating loads.
The seasonal variation of electricity consumption plot below through the full 52 weeks of a year synchronizes with the Houston areas rise and fall of temperatures over the course of a year.
Another factor that impacts electricity consumption is whether the day falls on a weekend or a workday. This first plot showing the weekly cycle, gives an indication perhaps there is a slight drop in weekend electricity usage for this region.
This second plot below showing electricity usage over the course of the seven days of the week. Numbers six and seven indicates Saturday and Sunday. Again, there appears to be a slight drop in weekend consumption of electricity, although nothing conclusive.
Looking at population of the Houston area, from 2002 to 2018 there is an upward growth in population. The time plot below animates the population rise over time with the corresponding change in the overall average daily electricity loads for the Houston area.
Houston area has become one of the world’s great global cities. It is the nation’s fourth largest city, and it has a variety of growing industries, from health care and digital technology to manufacturing and trade.
It would make sense to think that as Houston area grows in the number of established industries, so would the demand for electricity for factories, office spaces, manufacturing, and production plants. The plot generated below showing this increase in electricity loads with new industries established here over time.
HoustonIndustriesElectricalLoad
Correlation plot was generated to see the effects of the key features impacting the electrical load of the region.
Obviously, climate truly is the main driver of electrical consumption. However, population, GDP and the new industries all have a moderate correlation to load.
Surprisingly, new housing structures emerging in the region has very little impact on electricity usage. Perhaps, the newer homes are electrically more efficient compared to the dwelling from which some is relocating. The areas’ workforce employment did not have any affect at all in the load variability.
The Great Recession5 which began December 2007 and ended in June 2009 makes it the longest recession since World War II. Great Recession was the deepest economic decline since the Great Depression. The financial effects of the Great Recession were devastating. US Real gross domestic product (GDP) fell a whopping 4.3 percent during this period. The unemployment rate went from 5% to 9.5% and peaked at about 10% in October 2009. The Houston area mirrored some of this economic trouble, but to a lower degree compared to national impacts.
In terms of electricity usage, there was a slight dip in average electricity usage for the recession time period 2007 to 2009 for this region compared to the previous two years 2005-2007 and post-recession years 2009-2011. Interestingly there was slight rise in the heating/cooling degrees during the recession phase. So, weather was not driving that dip in usage.
df20052007 <- df_all %>% filter(date > '2005-10-30'& date <='2007-11-30')
df20072009 <- df_all %>% filter(date > '2007-10-30'& date <='2009-11-30')
df20092011 <- df_all %>% filter(date > '2009-10-30'& date <='2011-11-30')
mean(df20052007$degDay)
## [1] 5.112352
mean(df20072009$degDay)
## [1] 5.429134
mean(df20092011$degDay)
## [1] 5.225361
mean(df20052007$avg_DailyLoad)
## [1] 10121.61
mean(df20072009$avg_DailyLoad)
## [1] 10054.79
mean(df20092011$avg_DailyLoad)
## [1] 10497.23
The interactive plots below show a slightly higher left tail for the recession area plot, although this may be explained by the loss electric power to multiple customers in the region during 2008’s hurricane Ike for about 10 days.
In this interactive plot below, the changes over time for the load, climate, population, and GDP features are depicted. Clearly, the seasonality components of the load and climate variables are in sync. The rising population partly explains the upward trend in the load data.
The electricity load time series is decomposed below in the interactive plot to display seasonality, trend, and noise in the load data.
Climate variable, degree day is also decomposed to view the seasonal, trend and noise components. It is again evident how seasonality of the region’s climate drives the load’s seasonality.
In descriptive modeling, a time series is modeled to determine its components in terms of seasonal patterns, trends, and relationship to external variables like population, economic factors etc. The results from a descriptive model could be employed in decision making and forming a regions policy.
Since the use for the model is for descriptive analysis and not predictive, the entire data is used to build the ARIMA time series model.
The external variables chosen to regress with ARIMA components are the population, GDP per capita, and the Boolean variable - Recession Yes(1)/No(0) variable describing the months of recession.
As explained before in the methods section, the daily data was converted to monthly data for the R library auto.arima function to converge.
# Matrix of regressors
xreg <- cbind(degreeDay = df_all_monthly_ts[, "degreeDay"],
degreeDaySq = df_all_monthly_ts[, "degreeDay"]^2,
population = df_all_monthly_ts[, "popHOUDaily"],
gdpPerCapita = df_all_monthly_ts[, "gdpPerCapita"],
Recession = df_all_monthly_ts[, "Recession"])
# Parallelizing your code.
# Set up Parallel package for multi-core processing
library(parallel) # For Windows Machines only use this package
library(doParallel) # For Windows Machines only use this package
# Calculate the number of cores to use for multi-core training
no_cores <- detectCores() - 1 # convention to leave 1 core for OS
# Initiate cluster
cluster <- makeCluster(no_cores)
registerDoParallel(cluster)
# Arima Model with Regressors
fit <- auto.arima(df_all_monthly_ts[, "load"], xreg = xreg,
seasonal = TRUE, D=1,
stepwise=FALSE,
parallel=TRUE,num.cores= no_cores,
approximation=FALSE,
allowdrift = TRUE,
#lambda = 'auto',
allowmean = TRUE)
## De-register parallel processing and Shutdown cluster
stopCluster(cluster)
registerDoSEQ()
summary(fit)
## Series: df_all_monthly_ts[, "load"]
## Regression with ARIMA(1,0,1)(1,1,2)[12] errors
##
## Coefficients:
## ar1 ma1 sar1 sma1 sma2 degreeDay degreeDaySq
## 0.9729 -0.8316 -0.6315 -0.0245 -0.6706 70.0535 4.9866
## s.e. 0.0290 0.0626 0.1785 0.1724 0.1207 8.7853 0.3295
## population gdpPerCapita Recession
## 1.1679 0.0314 -150.0015
## s.e. 0.2508 0.0365 126.6125
##
## sigma^2 estimated as 110569: log likelihood=-1309.1
## AIC=2640.2 AICc=2641.76 BIC=2675.38
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 2.84984 312.9939 194.9067 -0.1359274 1.94 0.4400712 -0.01025121
Utilizing the auto.arima package in R generated a Regression with a seasonal(frequency 12) autoregressive, moving average differenced once in the seasonal component ARIMA(1,0,1)(1,1,2)[12] errors model.
The model fit the training data almost perfectly. The red line in the plot below shows the fit data plotted against the actual data.
Using the astsa package, diagnostic plots were generated below showing the residuals as white noise and the normality of residuals, both of which indicates the model extracted the signal from the noise. The ACF residuals plot shows almost no correlation in the lags, which means the model has extracted all the signal possible here in this model.
# Using astsa library
library(astsa)
fit_ast <- sarima(df_all_monthly_ts[,'load'],xreg = xreg, p = 1,
d = 0, q = 1, P = 1,
D = 1, Q = 2, S = 12)
## initial value 6.023288
## iter 2 value 5.929587
## iter 3 value 5.901023
## iter 4 value 5.890770
## iter 5 value 5.878300
## iter 6 value 5.871674
## iter 7 value 5.870296
## iter 8 value 5.870088
## iter 9 value 5.864154
## iter 10 value 5.861090
## iter 11 value 5.856618
## iter 12 value 5.855638
## iter 13 value 5.847399
## iter 14 value 5.838814
## iter 15 value 5.831982
## iter 16 value 5.829444
## iter 17 value 5.823443
## iter 18 value 5.819156
## iter 19 value 5.814303
## iter 20 value 5.813019
## iter 21 value 5.810645
## iter 22 value 5.809829
## iter 23 value 5.809639
## iter 24 value 5.809306
## iter 25 value 5.808794
## iter 26 value 5.808682
## iter 27 value 5.808254
## iter 28 value 5.807988
## iter 29 value 5.807876
## iter 30 value 5.807797
## iter 31 value 5.807721
## iter 32 value 5.807701
## iter 33 value 5.807698
## iter 34 value 5.807697
## iter 35 value 5.807697
## iter 35 value 5.807697
## iter 35 value 5.807697
## final value 5.807697
## converged
## initial value 5.822047
## iter 2 value 5.820463
## iter 3 value 5.817397
## iter 4 value 5.816884
## iter 5 value 5.816142
## iter 6 value 5.815845
## iter 7 value 5.815260
## iter 8 value 5.814556
## iter 9 value 5.814180
## iter 10 value 5.813980
## iter 11 value 5.813841
## iter 12 value 5.813807
## iter 13 value 5.813790
## iter 14 value 5.813769
## iter 15 value 5.813732
## iter 16 value 5.813688
## iter 17 value 5.813658
## iter 18 value 5.813651
## iter 19 value 5.813650
## iter 19 value 5.813650
## iter 19 value 5.813650
## final value 5.813650
## converged
xxxxx
## Estimate SE t.value p.value
## ar1 0.9729 0.0290 33.6021 0.0000
## ma1 -0.8316 0.0626 -13.2844 0.0000
## sar1 -0.6315 0.1785 -3.5374 0.0005
## sma1 -0.0245 0.1724 -0.1419 0.8873
## sma2 -0.6706 0.1207 -5.5577 0.0000
## degreeDay 70.0535 8.7853 7.9740 0.0000
## degreeDaySq 4.9866 0.3295 15.1345 0.0000
## population 1.1679 0.2508 4.6566 0.0000
## gdpPerCapita 0.0314 0.0365 0.8601 0.3910
## Recession -150.0015 126.6125 -1.1847 0.2378
The descriptive indicates that the regression components of climate and population are are related to the electricity load, with climate being the main driver.
The economic effects of GDP per Capita and Recession does not appear to play any statistical significance in determining Houston’s electricity demand.
Climate has an extremely powerful impact on the magnitude of electricity used in the Houston area, especially due to the fact that the region experiences long, hot and humid summers.
Future studies can analyze the effects of global warming on the climate of Houston and the resulting implications of the region’s electricity demand. Electricity markets need to have the reserve generation capacity to potentially meet the higher demands of a warming region due to climate change.
In this study economic factors had a negligible impact on the region’s electricity usage during the Great Recession period from years 2007 to 2009. This could be because the Houston economy was diversified enough at that time to be economically insulated from the bigger effects of the recession that most of the nation experienced.
It would be also worthwhile to study relationship between the economic effects of the Coronavirus (COVID-19) pandemic and the usage of electricity in the Houston area. The findings of that study might be extremely valuable for electricity demand forecasters, since it is expected that future pandemics may be the new norm globally.
https://www.epa.gov/sites/production/files/2016-08/documents/print_heating-cooling-2016.pdf
https://electricityplans.com/average-electricity-bill-in-texas/
https://www.eia.gov/energyexplained/units-and-calculators/degree-days.php
https://www.federalreservehistory.org/essays/great-recession-of-200709
Forecasting: Principles and Practice, Rob J Hyndman and George Athanasopoulos,https://otexts.com/fpp2/
https://quickelectricity.com/what-is-ercot/
https://www.houston.org/why-houston/industries/energy
https://www.houston.org/why-houston/industries/all-industries