BC Hydro electricity data

Several years ago BC Hydro (the public utility provider in the province) introduced automatic metered electricity readings,¹ allowing the collection of hourly electricity consumption data since 2013. Thanks to kind participation of my mother, I have been able to use her account to see what sort of information consumers can glean from their Hydro data. Here is a snapshot of the data:

head(hourly.raw)

##   Account.Holder Account.Number Interval.Start.Date.Time
## 1       Person A    0XXXXXXXXXX         2013-06-18 00:00
## 2       Person A    0XXXXXXXXXX         2013-06-18 01:00
## 3       Person A    0XXXXXXXXXX         2013-06-18 02:00
## 4       Person A    0XXXXXXXXXX         2013-06-18 03:00
## 5       Person A    0XXXXXXXXXX         2013-06-18 04:00
## 6       Person A    0XXXXXXXXXX         2013-06-18 05:00
##   Net.Consumption..kWh. Demand..kW. Power.Factor.... Estimated.Usage
## 1                  0.42         N/A              N/A                
## 2                  0.33         N/A              N/A                
## 3                  0.26         N/A              N/A                
## 4                  0.31         N/A              N/A                
## 5                  0.33         N/A              N/A                
## 6                  0.31         N/A              N/A

Next let’s clean it up:

hourly.df <- hourly.raw %>% 
            transmute(Date=as.POSIXct(Interval.Start.Date.Time),Consumption=as.numeric(Net.Consumption..kWh.)) %>%
            filter(Date<'2016-06-18')
# Add the date formats we are interest in
hourly.df <- hourly.df %>%
              mutate(Year=Date %>% format('%Y'),
                     Month=Date %>% format('%b'),
                     Day=Date %>% format('%d'),
                     DayWeek=Date %>% format('%a'),
                     Hour=Date %>% format('%I%p'),
                     DayMonth=Date %>% format('%b%d%y'),
                     MonthYear=Date %>% format('%b %Y'))
head(hourly.df)

##                  Date Consumption Year Month Day DayWeek Hour DayMonth
## 1 2013-06-18 00:00:00          35 2013   Jun  18     Tue 12am  Jun1813
## 2 2013-06-18 01:00:00          26 2013   Jun  18     Tue 01am  Jun1813
## 3 2013-06-18 02:00:00          19 2013   Jun  18     Tue 02am  Jun1813
## 4 2013-06-18 03:00:00          24 2013   Jun  18     Tue 03am  Jun1813
## 5 2013-06-18 04:00:00          26 2013   Jun  18     Tue 04am  Jun1813
## 6 2013-06-18 05:00:00          24 2013   Jun  18     Tue 05am  Jun1813
##   MonthYear
## 1  Jun 2013
## 2  Jun 2013
## 3  Jun 2013
## 4  Jun 2013
## 5  Jun 2013
## 6  Jun 2013

As we can see, we have the integer values for consumption (would have been nice if the data wasn’t rounded!) as well as the hourly, day, and month for a given data. An initial visualization of the data will show that there are clear seasonal trends (yearly share appears to peak in January) as well as a downward trend. This can largely be accounted for by the increased energy effeciency in household appliances alongside (perhaps) our increasing awareness of the need to reduce electricity consumption in respect to global warming.

# Get the mean of the monthly data (and scrub outliers)
monthly.df <- hourly.df %>% select(Consumption,DayMonth) %>% 
  group_by(DayMonth) %>% filter(Consumption<quantile(Consumption,0.99)) %>% 
  summarise(Consumption=sum(Consumption)) %>% 
              mutate(Date=as.Date(DayMonth,format='%b%d%y')) %>% 
              arrange(Date)
# Plot
ggplot(monthly.df,aes(x=Date,y=Consumption)) + geom_line() + xlab('') + ylab('kWH Consumption') +
  scale_x_date(date_labels = format('%b-%y'),date_breaks='4 month') + ggtitle('Chart 1') +
  theme(axis.text.x=element_text(size=10))

There are two empirical questions we could ask from this data: (i) what are usual monthly/daily consumption patterns (i.e. how seasonal is this data), and (ii) what is pace of the downward trend in this data? We’ll begin with a simply OLS regression by checking calculating the share of daily consumption (for a given year) and weather simple calendar date regressors can preditct this.

# Subset to only include the full year data
ty.df <- hourly.df %>% group_by(Year) %>% 
          filter(Date>='2014-01-01' & Date<'2016-01-01') %>%
          mutate(Share=Consumption/sum(Consumption),Month=factor(Month,unique(Month)),Time=1:length(Consumption)) %>% as.data.frame
mean.df <-ty.df %>% group_by(Month) %>% summarise(Share=mean(Share))
# The data is noisy, but there appear to variations in the mean
ggplot(ty.df,aes(x=Month,y=Share,color=Year)) + 
  geom_jitter() + theme(legend.position='right') + scale_color_discrete(name='') +
  geom_point(data=mean.df,aes(x=Month,y=Share),size=3,color='black') + xlab('') + ylab('Yearly share') + 
  ggtitle('Chart 2') + theme(axis.text.x=element_text(size=10))

As Chart 2 shows, the dispersion of data is quite noisy. For an ANOVA perspective, most of the variation is idiosyncratic, rather than around the mean.

all.df <- hourly.df %>% mutate(Time=1:length(Consumption))
# Make some dummies
reg.df <- make.regs(all.df,'2014-01-01','2016-01-01')
# Run the OLS regression
con.ols <- lm(I(Consumption/10)~.-1,data=reg.df)

We can see from Chart 3 belows that most of the coefficients on our dummies have statistically significant results (95% confidence internal in bars). Throughout the day, electricity consumption bottoms out from 1am-5am, and then rises from 6am to 5pm, where it peaks. The distribution of consumption within the weeks (relative to Sunday) shows that the weekdays consume less electricity, but that Saturday consumers the largest amount. For the calendar months, electricity consumption declines from May-Septermber, and increaes from October-April, consistent with the colder weather.

Finally, we want to see how the seasonally adjusted weather data looks compared to what we actually observed. To do this, we can use the predict function to generate fitted values throughout the calendar year (I fix a constant time trend to ensure that the seasonal factors are not skewed by the downward trend). Once we have this, we can generate a scaling factor as the ratio of the predicted value calendar share to the mean share. We then divide the actual data by this seasonal factor to adjust the data. Chrat 4 shows that the the cold months tend to have the highest distribution of scaling factors (naturally). However even in these months there are certain hours which consume less than the mean yearly share and hence have a seasonal factor less than one.

# Create the Date sequence
full.hours <- seq(as.POSIXct("2013-01-01 00:00:00"),as.POSIXct("2016-12-31 23:00:00",by="hour"),by='hour')
full.df <- data.frame(Date=full.hours) %>%
            mutate(Month=Date %>% format('%b'),
                     DayWeek=Date %>% format('%a'),
                     Hour=Date %>% format('%I%p'),
                    Time=rep(1,length(date)),
                    Consumption=rep(NA,length(Date)))
# Re-create the regression matrix
full.reg.df <- make.regs(full.df,'2013-01-01','2017-01-01') %>% select(-Consumption)
# Generate the fitted values
pred.val <- predict(con.ols,full.reg.df) %>% setNames(NULL)
# Combine with the date data
scale.df <- full.df %>% mutate(Year=format(Date,'%Y'),Predict=pred.val) %>% 
  group_by(Year) %>% mutate(Scaling=Predict/sum(Predict)*(length(Predict))) %>%
  select(Date,Scaling)
# Merge the actual data with our scaling data
merge.df <- merge(filter(hourly.df,Date<'2016-06-01'),scale.df,by='Date') %>% 
  mutate(ConsumptionSA=Consumption/Scaling,Month=factor(Month,levels=mons))
# Plot scaling factor data
ggplot(merge.df,aes(x=Month,y=Scaling,fill=Month)) +
  geom_violin() + geom_jitter(size=0.1,alpha=0.5) +
  theme(legend.position='right') + scale_fill_discrete(name='') + 
  xlab('') + ylab('Distribution of hourly scaling factors') + 
  ggtitle('Chart 4') + theme(axis.text.x=element_text(size=10))

As Chart 5 below shows, adjusting the data for seasonality removes the spikes we see around the cold months, and continues to show the general downward trend in the data.

Before this, BC Hydro would send an engineer to literally read the meter.↩