Other Projects
Electricity demand forecasting with R: https://rpubs.com/alrickcampbell/energy_demand_forecasting_R
Electricity demand forecasting with Python: https://rpubs.com/alrickcampbell/energy_forecasting_Python
The objective of this exercise is to analyse the temporal aspect of household electricity demand in Ontario, with a particular focus on the data from 2016. To gain a better understanding of the patterns, I will narrow down my analysis to this specific timeframe. I will be examining the seasonality of electricity demand across various time dimensions, including hourly, daily, weekly, monthly, and seasonal.
The raw data used for this exercise can be found at https://ssc.ca/en/case-study/predicting-hourly-electricity-demand-ontario. The data set provided consists of hourly electricity demand (MW) from January 1, 2003 December 31, 2016.
To begin, I first install Reticulate in R which will then allow me to run Python in the R environment through a virtual environment that was created.
# ```{r setup}
library(reticulate)
py_list_packages(
envname = NULL,
type = c("auto", "virtualenv", "conda"),
python = NULL
)
# py_install("cython")
# py_install("seaborn")
# py_install("sktime")
# py_install("pandas")
# py_install("xlwt")
# py_install("statsmodels")
# py_install("matplotlib")
# py_install("pillow")
# py_install("scikit-learn")
# py_install("numpy")
# py_install("openpyxl")
# py_install("skforecast")
# py_install("pytz")Click on code to see the list of Python modules used for this exercise.
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from datetime import datetime, timedelta
from pytz import timezone
import openpyxl
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries
from skforecast.model_selection_multiseries import grid_search_forecaster_multiseries
from sklearn import model_selection
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
import statsmodels.api as sm
from sklearn import metrics
from scipy.stats import pearsonr
from scipy import stats
from sktime.forecasting.naive import NaiveForecasterInitially, I read in the energy demand data and inspect its dimensions and variable types, before displaying the first five observations. The ‘Date’ variable is observed to be in the incorrect date format, so I combine the ‘Date’ and ‘Hour’ variables and shift the hour back by one to fix the formatting. The resulting ‘datetime’ variable is then verified to be in the correct format.
Using the ‘datetime’ variable, I generate new variables to represent time of day, day of the week, weekday/weekend status, month, and season. Following this, I verify the absence of any missing data or outliers and present an overview of the revised dataset.
df = pd.read_excel(r'SSC2020_hourly_demand.xlsx', sheet_name='Hourly Demand')
# Rename the columns date column for for merge
df = df.rename(columns = {'Total Energy Use from Electricity (MW)': 'MW'}, inplace = False)
len(df.index)
## 122736
print(df)
## Date Hour MW Year Month
## 0 2003-01-01 1 14745 2003 1
## 1 2003-01-01 2 14280 2003 1
## 2 2003-01-01 3 13821 2003 1
## 3 2003-01-01 4 13239 2003 1
## 4 2003-01-01 5 13236 2003 1
## ... ... ... ... ... ...
## 122731 2016-12-31 20 16260 2016 12
## 122732 2016-12-31 21 15658 2016 12
## 122733 2016-12-31 22 15195 2016 12
## 122734 2016-12-31 23 14758 2016 12
## 122735 2016-12-31 24 14153 2016 12
##
## [122736 rows x 5 columns]
n = df.shape[0]
df.dtypes
## Date datetime64[ns]
## Hour int64
## MW int64
## Year int64
## Month int64
## dtype: object
df['ymd'] = df.Date
#df['ymd'] = df['Date'].dt.strftime('%Y-%m-%d')
df['Hour'] = df['Hour'] - 1
df['Hour'] = pd.to_datetime(df.Hour, unit='h').dt.strftime('%H:%M:%S')
df.dtypes
## Date datetime64[ns]
## Hour object
## MW int64
## Year int64
## Month int64
## ymd datetime64[ns]
## dtype: object
df['datetime'] = df.apply(lambda row: pd.to_datetime(row['ymd']) + pd.to_timedelta(row['Hour']), axis = 1)
df.dtypes
# return the day of week
## Date datetime64[ns]
## Hour object
## MW int64
## Year int64
## Month int64
## ymd datetime64[ns]
## datetime datetime64[ns]
## dtype: object
df['day_num'] = df['ymd'].dt.weekday
df['day_of_week'] = df['ymd'].dt.strftime('%A')
print (df['day_of_week'].unique())
## ['Wednesday' 'Thursday' 'Friday' 'Saturday' 'Sunday' 'Monday' 'Tuesday']
df.dtypes
## Date datetime64[ns]
## Hour object
## MW int64
## Year int64
## Month int64
## ymd datetime64[ns]
## datetime datetime64[ns]
## day_num int64
## day_of_week object
## dtype: object
df['day'] = pd.cut(df['day_num'],
bins=[-np.inf, 4, np.inf],
labels=['Weekday', 'Weekend'],include_lowest =True)
print (df['day'].unique())
## ['Weekday', 'Weekend']
## Categories (2, object): ['Weekday' < 'Weekend']
df['day_in_month'] = df['ymd'].dt.day
df['month'] = df['ymd'].dt.month
def seasons(ymd):
m = ymd.month
d = ymd.day
season=None
if (m==3 and d>=21) or m==4 or m==5 or (m==6 and d<=20):
season = 'spring'
elif (m==6 and d>=21 ) or m==7 or m==8 or (m==9 and d<=20):
season = 'summer'
elif (m==9 and d>=21 ) or m==10 or m==11 or (m==12 and d<=20):
season = 'autumn'
elif (m==12 and d>=21 ) or m==1 or m==2 or (m==3 and d<=20):
season = 'winter'
return season
df['season'] = df.apply(lambda x: seasons(x['ymd']), axis=1)
print (df['season'].unique())
## ['winter' 'spring' 'summer' 'autumn']
print(df['season'].value_counts()) # to print count of every category
# summary statistics
## spring 30912
## summer 30912
## autumn 30576
## winter 30336
## Name: season, dtype: int64
print (df.describe())
## MW Year ... day_in_month month
## count 122736.000000 122736.000000 ... 122736.000000 122736.000000
## mean 16561.766271 2009.500391 ... 15.730935 6.522487
## std 2610.524068 4.031533 ... 8.800676 3.448839
## min 2270.000000 2003.000000 ... 1.000000 1.000000
## 25% 14610.000000 2006.000000 ... 8.000000 4.000000
## 50% 16488.000000 2009.500000 ... 16.000000 7.000000
## 75% 18358.000000 2013.000000 ... 23.000000 10.000000
## max 27005.000000 2016.000000 ... 31.000000 12.000000
##
## [8 rows x 6 columns]
print(df)
## Date Hour MW Year ... day day_in_month month season
## 0 2003-01-01 00:00:00 14745 2003 ... Weekday 1 1 winter
## 1 2003-01-01 01:00:00 14280 2003 ... Weekday 1 1 winter
## 2 2003-01-01 02:00:00 13821 2003 ... Weekday 1 1 winter
## 3 2003-01-01 03:00:00 13239 2003 ... Weekday 1 1 winter
## 4 2003-01-01 04:00:00 13236 2003 ... Weekday 1 1 winter
## ... ... ... ... ... ... ... ... ... ...
## 122731 2016-12-31 19:00:00 16260 2016 ... Weekend 31 12 winter
## 122732 2016-12-31 20:00:00 15658 2016 ... Weekend 31 12 winter
## 122733 2016-12-31 21:00:00 15195 2016 ... Weekend 31 12 winter
## 122734 2016-12-31 22:00:00 14758 2016 ... Weekend 31 12 winter
## 122735 2016-12-31 23:00:00 14153 2016 ... Weekend 31 12 winter
##
## [122736 rows x 13 columns]
df.dtypes
## Date datetime64[ns]
## Hour object
## MW int64
## Year int64
## Month int64
## ymd datetime64[ns]
## datetime datetime64[ns]
## day_num int64
## day_of_week object
## day category
## day_in_month int64
## month int64
## season object
## dtype: object
data = df
start_date = '2004-01-01 00:00:00'
split_date = '2015-12-31 23:00:00'
# Time series for 2016 only
df_2016 = data.loc[data['datetime'] > split_date]Figure 3.1 displays the hourly variation in household electricity demand in Ontario for each day of 2016 to highlight the seasonal pattern in electricity demand. The graph shows that electricity demand is generally higher in the winter months (December to February) and the summer months (June to August) and lower in the spring (March to May) and fall (September to November). The high electricity demand during winter and summer may be due to the increased use of heating and cooling systems to cope with extreme temperatures. The peak demand occurs in the summer months (June to August), likely due to air conditioning systems’ usage, while the lowest demand occurs in the spring (March to May) when temperatures are mild and the use of heating and cooling systems is minimal.
# Plot time series for 2016
fig, ax = plt.subplots()
plt.plot(df_2016['datetime'], df_2016['MW'], 'tab:red')
plt.xlabel("date"), plt.ylabel("MW")
## (Text(0.5, 0, 'date'), Text(0, 0.5, 'MW'))
plt.tight_layout()
plt.show()Figure 3.1: Household electricity demand (MW) in 2016.
Upon analysing the data, it can be seen in Figure 3.2 that the demand for electricity is highest during the daytime, with a peak between 3 pm to 7 pm. Additionally, there is a distinct decrease in demand during the early morning hours.
# Create a boxplot by hour of day
fig = plt.figure()
ax = df_2016.boxplot(by='Hour', column='MW', figsize=(10, 6))
plt.suptitle('')
ax.set_title('')
plt.xlabel("Hour of day")
plt.ylabel("MW")
ax.tick_params(axis='x', rotation=90)
# Show the plot
plt.show()Figure 3.2: Household electricity demand (MW) by hour of day.
According to Figure 3.3, the highest electricity demand usually occurs on Thursdays, and it is generally higher on weekdays than on weekends.
# Create a boxplot by day of week
fig = plt.figure()
# Sort the data by day of the week
order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df_2016['day_of_week'] = pd.Categorical(df_2016['day_of_week'], categories=order, ordered=True)
## <string>:1: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_2016 = df_2016.sort_values(by='day_of_week')
ax1 = df_2016.boxplot(by='day_of_week', column='MW', figsize=(10, 6))
plt.suptitle('')
ax1.set_title('')
plt.xlabel("Day of week")
plt.ylabel("MW")
# Show the plot
plt.show()Figure 3.3: Household electricity demand (MW) by day of the week.
The side-by-side boxplot in Figure 3.4 shows a comparison of the distribution of hourly electricity demand in megawatts (MW) between weekdays and weekends in Ontario during the year 2016. The left boxplot represents the distribution of hourly electricity demand on weekdays, while the right boxplot represents the distribution on weekends.
The boxplots show that the median demand on weekdays is higher than on weekends. The interquartile range (IQR) for both weekdays and weekends is similar, with the demand being more spread out on weekdays than on weekends. The upper whisker on weekdays is longer than on weekends, indicating that the demand on weekdays can be much higher than on weekends.
The weekdays boxplot also has a larger number of outliers, with the demand on some hours being significantly higher than on other hours. This is especially evident during the morning and evening peak hours, where the demand is much higher on weekdays than on weekends. On the other hand, the weekends boxplot has fewer outliers, indicating that the demand is relatively more consistent throughout the day.
# Create a boxplot by weekday vs weekend
fig = plt.figure()
ax2 = df_2016.boxplot(by='day', column='MW', figsize=(10, 6))
plt.suptitle('')
ax2.set_title('')
plt.xlabel("Part of week")
plt.ylabel("MW")
# Show the plot
plt.show()Figure 3.4: Weekday vs weekend electricity demand (MW) in 2016.
Figure 3.5 compares the electricity demand on weekdays (Monday to Friday) and weekends (Saturday and Sunday) throughout the year in 2016. The x-axis shows the days of the year, while the y-axis shows the electricity demand in MW.
The graph shows a clear difference between the electricity demand on weekdays and weekends. The demand is higher on weekdays than on weekends for most of the year.
# Create a time plot of weekday vs weekend
fig, ax = plt.subplots(figsize=(8,6))
df_2016.set_index('datetime', inplace=True)
df_2016.groupby('day')['MW'].plot(ax=ax)
## day
## Weekday AxesSubplot(0.125,0.2;0.775x0.68)
## Weekend AxesSubplot(0.125,0.2;0.775x0.68)
## Name: MW, dtype: object
ax.legend(frameon=False)
plt.xlabel("Date")
plt.ylabel("MW")
plt.show()Figure 3.5: Boxplot comparison of weekday and weekend electricity demand (MW) in 2016.
After analysing the data, it is observed in Figure 3.6 that electricity demand varies by month, with higher demand during the summer and winter months compared to the spring and fall months. The analysis reveals that the peak demand periods occur during the summer months, especially in the months of July and August, and during the winter months, especially in the months of December and January. This is consistent with the pattern observed in the previous examples, indicating that peak demand periods are influenced by both time of day and month of the year.
# Create a boxplot by month
fig = plt.figure()
ax1 = df_2016.boxplot(by='Month', column='MW', figsize=(10, 6))
plt.suptitle('')
ax1.set_title('')
plt.xlabel("Month")
plt.ylabel("MW")
# Show the plot
plt.show()Figure 3.6: Household electricity demand (MW) by month.
The boxplot in Figure 3.7 shows how electricity demand varies by time of day and season. The median demand for electricity is generally higher in the morning and evening hours, with a dip in demand during the afternoon hours. The demand for electricity is generally higher during the winter and summer seasons compared to spring and fall. There is also more variability in the demand for electricity during the winter and summer seasons.
It is essential to pay attention to the time of day and season when using electricity to prevent higher electricity costs, particularly with electricity providers that have implemented time-of-use pricing schemes. Hydro Ottawa’s pricing structure, for example, varies electricity rates based on the time of day and season. During the winter, off-peak hours are from 7 p.m. to 7 a.m., and during summer, off-peak hours are from 7 p.m. to 7 a.m. and all day on weekends and holidays.
# Create a boxplot for each season
fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True, sharex=True)
df_2016.groupby('season').boxplot(by='Hour', column='MW', ax=axes)
## autumn AxesSubplot(0.1,0.559091;0.363636x0.340909)
## spring AxesSubplot(0.536364,0.559091;0.363636x0.340909)
## summer AxesSubplot(0.1,0.15;0.363636x0.340909)
## winter AxesSubplot(0.536364,0.15;0.363636x0.340909)
## dtype: object
##
## <string>:1: UserWarning: When passing multiple axes, sharex and sharey are ignored. These settings must be specified when creating axes.
axes[1][0].tick_params(axis='x', rotation=90)
axes[1][1].tick_params(axis='x', rotation=90)
# Iterate through each subplot and set the xlabel to an empty string
for row in axes:
for ax in row:
ax.set_xlabel("")
fig.suptitle("")
fig.supxlabel('Hour')
fig.supylabel('MW')
plt.tight_layout()
plt.show()Figure 3.7: Seasonal load demand profiles of electricity by time-of-day, 2016
In conclusion, the analysis suggests that the grid operator needs to effectively plan and manage the supply of electricity during peak demand periods, particularly in the summer and winter months. The grid operator may consider increasing the grid capacity, implementing time-of-use pricing or demand-response programs to encourage customers to reduce their energy usage during peak demand hours, and factoring in temperature forecasts when planning electricity supply during the summer months. Additionally, implementing public education campaigns may help to raise awareness of the environmental impact of energy usage and encourage energy conservation.
A project by Alrick Campbell
alrickcampbell83@gmail.com