It was a very warm week for the UK last week, which got me thinking about the concept of heating and cooling demand for electricity. There must be a temperature at which they offset each other. Using a regression and simulation, I aim to pinpoint this offset.

There is already vast discussions and literature on the notion of Heating and Cooling degree days, and the “base” temperature used. This is discussed below, including some shortfalls in openly available data.

About the Data Used

I could not find any open-source data for UK temperatures at a small enough level to analyse the highs and lows during the day, so I turned to Elexon (or the Balancing Mechanism Reporting Service).

Temperatures: https://www.bmreports.com/bmrs/?q=generation/tempraturedata

Demand: https://www.bmreports.com/bmrs/?q=demand/rollingsystemdemand

We have two datasets to play with here, both courtesy of The Balancing Mechanism Reporting Service. The demand dataset is given in 5-minute intervals, and ergo has been averaged out across days instead. Temperatures are already given in days, and are measured from a handful of weather stations across the UK at 12:00PM. The power grid operator apparently uses this dataset for forecasting.

As the data only relates to 12PM, the major caveat of this analysis is that while we can eliminate seasonal effects that span any multiple of days, we cannot account for intra-day effects.

There are data available via the met office for England, but not the UK as a whole. This is explored in the same fashion as the Elexon dataset.

For the demand dataset, we will want to eliminate all observations that are not 12PM:

library(knitr);library(tidyverse);library(magrittr); library(logspline);library(scales);library(lubridate)
demand <- read_csv("instantDemand.csv",  
  col_types = cols(timestamp = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

instantDemand <- demand %>%
filter(hour(timestamp) == 12 & minute(timestamp) == 0) %>%
  set_colnames(c("timestamp", "daily_demand")) %>%
mutate(dateval = as.Date(timestamp))


instantDemand %>% head() %>% kable()
timestamp daily_demand dateval
2011-05-28 12:00:00 34370 2011-05-28
2011-05-29 12:00:00 33764 2011-05-29
2011-05-30 12:00:00 35954 2011-05-30
2011-05-31 12:00:00 40055 2011-05-31
2011-06-01 12:00:00 39713 2011-06-01
2011-06-02 12:00:00 40309 2011-06-02

The periodicity in the data could be explained away with two main factors. The first would be lighting, and the second would be temperatures. I say “main” for a good reason: other factors may include how well the economy is performing, seasonal work and production of goods, holidays, and so on.

Secondly, temperatures. These figures are an average at 12PM every day from a handful of stations across the UK. The graph below shows that periodicity exists for the temperatures data as well. We will add some variables: a time trend and some cyclical variables, with the aid of the pi function.

We will turn to functional programming here. The reason will become apparent soon.

tau <- 2*pi



c_sin <- function(v = temperatures$time_trend, len){sin((tau*v)/len) %>% return()}
c_cos <- function(v = temperatures$time_trend, len){cos((tau*v)/len) %>% return()}
add_c <- function(df){
  l <- 1:nrow(df)
df %<>% 
mutate(s1 = c_sin(l, 365), s2 = c_sin(l, 7), c1 = c_cos(l, 365), c2 = c_cos(l,7)) %>%
return()}

temperatures <-  read_csv("temperatures.csv",  col_types = cols(X1 = col_skip(), dateval = col_date(format = "%Y-%m-%d"))) %>%
  mutate(time_trend = 1:nrow(.))
temperatures %<>%
  add_c() %>% 
  left_join(instantDemand) 
## Joining, by = "dateval"
temperatures %>%
  head() %>% 
  kable()
dateval outturn normal high low time_trend s1 s2 c1 c2 timestamp daily_demand
2012-11-10 8.7 9.4 7.0 11.6 1 0.0172134 0.7818315 0.9998518 0.6234898 2012-11-10 12:00:00 38728
2012-11-11 7.7 9.4 7.0 11.6 2 0.0344216 0.9749279 0.9994074 -0.2225209 2012-11-11 12:00:00 37786
2012-11-12 8.0 7.7 5.1 11.1 3 0.0516197 0.4338837 0.9986668 -0.9009689 2012-11-12 12:00:00 45554
2012-11-13 10.5 7.7 5.1 11.1 4 0.0688024 -0.4338837 0.9976303 -0.9009689 2012-11-13 12:00:00 42742
2012-11-14 10.7 7.7 5.1 11.1 5 0.0859648 -0.9749279 0.9962982 -0.2225209 2012-11-14 12:00:00 43159
2012-11-15 8.8 7.7 5.1 11.1 6 0.1031017 -0.7818315 0.9946708 0.6234898 NA NA
td_lm <- function(df){df %>% 
  lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + s1 + s2 + c1 + c2)}

An Initial, Non-stationary Glimpse

Before delving right into it, here’s what the relationship between temperatures and demand looks like.

The trendline is a polynomial of order 2.

Of course, this relationship makes a bad assumption that both variables are time-independent. The model below makes an attempt to account for the data’s time-dependence.

Modeling the relationship

With the above in mind, we can use a regression model to show the relationship between electricity demand and temperatures.

We will stick a squared term for outturn temperature as well.

\[ Let \ D_t \ represent \ our \ demand \ at \ time \ t , \ and \ T_t \ represent \ the \ outturn \ temperature. \ \] \[ Using \ our \ tau \ defined \ above \ as \ \tau = 2\pi \, the \ model \ used \ is \ as \ follows:\]

\[\hat{D_t} = \beta_0 + \beta_1 (T_t) + \beta_2(T_t)^2 + \beta_3(t) + \beta_4sin(\tau t/365)+ \beta_5sin(\tau t/7)+\beta_6cos(\tau t/365) + \beta_7cos(\tau t/7) + \epsilon_t\]

td_model  <- temperatures %>% td_lm() 

summary(td_model)
## 
## Call:
## lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + 
##     s1 + s2 + c1 + c2, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13573.5  -1730.4     77.7   1809.9   8360.6 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 4.960e+04  3.796e+02  130.67   <2e-16 ***
## poly(outturn, 2, raw = T)1 -1.423e+03  5.433e+01  -26.19   <2e-16 ***
## poly(outturn, 2, raw = T)2  4.287e+01  2.154e+00   19.91   <2e-16 ***
## time_trend                 -3.398e+00  8.309e-02  -40.90   <2e-16 ***
## s1                          1.968e+03  1.852e+02   10.63   <2e-16 ***
## s2                         -1.538e+03  8.155e+01  -18.86   <2e-16 ***
## c1                          2.044e+03  9.218e+01   22.18   <2e-16 ***
## c2                         -2.297e+03  8.161e+01  -28.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2804 on 2356 degrees of freedom
##   (87 observations deleted due to missingness)
## Multiple R-squared:  0.7811, Adjusted R-squared:  0.7804 
## F-statistic:  1201 on 7 and 2356 DF,  p-value: < 2.2e-16
errors <- td_model$residuals

error_dist <- logspline(td_model$residuals) 

Simulating the Demand-Temperature Relationship

First, lets keep it in range with the data we already have. Here’s some stuff that will do into the simulation:

  1. Get the minimum and maximum from the actual outturn temperatures
  2. Create a continuous range between these values
  3. Fix our time-specific variables
  4. From the distribtion of model residuals, simulate some statistical noise
td_sim <- function(td_range, td_dist, td_lm, td_x){
  y <- predict(td_lm, td_x) + rlogspline(length(td_range), td_dist)
 data.frame(td_range,  y) %>% 
    set_colnames(c("temperatures", "demand_simulations")) %>%
   return()}

td_x <- function(t_range, x){
  data.frame(outturn = t_range, 
time_trend = x, 
s1 = c_sin(x, len = 365), 
s2 = c_sin(x, len = 7), 
c1 = c_cos(x, len = 365), 
c2 = c_cos(x, len = 7)) %>% return()}

max_elexon_temp <- max(temperatures$outturn, na.rm=T) %T>% 
cat("Max Temperature in the dataset: ", ., "\n")
## Max Temperature in the dataset:  24.9
min_elexon_temp <- min(temperatures$outturn, na.rm=T) %T>% 
cat("Min Temperature in the dataset: ", ., "\n")
## Min Temperature in the dataset:  -2
td_range <- seq(min_elexon_temp, max_elexon_temp, 0.01)

ft <- 1 + errors %>% length()

td_simulation_1 <- td_x(td_range, ft) %>%
 
  td_sim(td_range, error_dist, td_model, .)

With our model, we can get a sense for what might ocurr at higher temperatures that aren’t included in the original data as well.

All we need to do is define a bigger range of temperatures, run the model and simulate the likely error from it.

td_range_2 <- seq(-5,34, 0.01)

td_simulation_2 <- td_x(td_range_2, ft) %>%
  td_sim(td_range_2, error_dist, td_model, .)



td_plot <- td_simulation_2 %>% 
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() + 
  geom_jitter(alpha = 0.25) + 
  geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
  ggtitle("Expected Demand versus Temperatures (BMRS)") + 
  labs(x = x_title, y = y_title) + theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5)); td_plot

We can also include when we would expect cooling demand to start reversing overall demand.

We’re using a partial derivative and fixing in our time variables at a given point (2391)

\[ \frac{\partial \hat{D_t}} {\partial T_t } = 0,\ t = 2361 \]

td_optimise <- function(td_lm, td_range, ft, guess){
b <- td_lm$coefficients %>% as.numeric
x <- td_range
fixed <-  b[4]*ft + b[5]*c_sin(v = ft, 365) + b[6]*c_sin(v = ft, 7) + b[7]*c_cos(v = ft, 365) + b[8]*c_cos(v = ft, 7)
nlm(f = function(x) b[1] + b[2]*x + b[3]*x^2 + fixed, p = guess) %>% return()
}
td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)

td_minima_y <- td_optima$minimum

td_minima_x <- td_optima$estimate 
td_plot + 
  geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) + 
  geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
  geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2) 

\[ y = 27750, \ \ x = 16.6 \]

Therefore, it is likely that power demand stops declining when temperatures reach roughly 16.6 degrees . . . According to these particular data, anyway.

As an aside, an argument could be put for the residuals exhibiting autocorrelation. Here’s an ACF plot of the residuals from td_model:

Met Office Data, Degree Days and More

As mentioned in the beginning, there is another option when it comes to open temperature data.

Heating and Cooling Degree Days (HDD and CDD respectively) are used to determine how much energy one will require, given a mid-point temperature and a “base”. Values less than zero are excluded.

Per day, then:

\[ \overline{T} = \frac{min(T) + max(T)}{2} \]

\[ CDD = \frac{\vert\overline{T} - Base \vert + (\overline{T} - Base)}{2} \]

\[ HDD = \frac{\vert Base - \overline{T} \vert + (Base - \overline{T})}{2} \]

Despite the met office data only pertains to a sub-region of England, we could use it. It requires some serious cleaning up as well, so readers may wish to explore the code used.

Once tidied, it’s a case of wrinse-and-repeat.

library(data.table); library(reshape2)

daymin <- read_table2("https://www.metoffice.gov.uk/hadobs/hadcet/cetmindly1878on_urbadj4.dat", 
col_names = c("Year", "Day", "1", "2","3","4","5","6","7","8","9","10","11","12","x")) %>% 
melt(c("Year","Day")) %>% 
filter(variable != "x", value != -999, Year >= 2011) %>%
mutate(dateval = ymd(paste0( Year, "-", variable, "-", Day)), min_outturn = value / 10) %>% 
arrange(dateval) %>% 
select(dateval, min_outturn)

daymax <- read_table2("https://www.metoffice.gov.uk/hadobs/hadcet/cetmaxdly1878on_urbadj4.dat", 
col_names = c("Year", "Day", "1", "2","3","4","5","6","7","8","9","10","11","12","x")) %>% 
melt(c("Year","Day")) %>% 
filter(variable != "x", value != -999, Year >= 2011) %>%
mutate(dateval = ymd(paste0( Year, "-", variable, "-", Day)), max_outturn = value / 10) %>% 
arrange(dateval) %>% 
select(dateval, max_outturn)

demand_minmax <- demand %>% mutate(dateval =  as.Date(timestamp)) %>% 
group_by(dateval) %>% 
summarise(daily_demand = (min(demand) + max(demand))/2)

temperatures <- left_join(demand_minmax, daymin) %>%
left_join(daymax) %>% 
mutate(outturn = (min_outturn + max_outturn)/2, 
CDD = (abs(outturn - 15.5) + (outturn - 15.5))/2,
HDD = (abs(15.5 - outturn) + (15.5 - outturn))/2) %>% na.omit()

For visualisation’s sake, I’ve made HDDs negative here.

The demand data looks similar to what it was previously, as does the scatterplot:

temperatures %<>% na.omit() %>%
  mutate(time_trend = seq_len(nrow(.)))  
temperatures %<>% add_c()
  
  td_model  <- temperatures %>%  td_lm()
summary(td_model)
## 
## Call:
## lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + 
##     s1 + s2 + c1 + c2, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9633.4  -879.6    44.9  1039.2  6032.4 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 4.190e+04  1.466e+02  285.72   <2e-16 ***
## poly(outturn, 2, raw = T)1 -8.759e+02  2.315e+01  -37.84   <2e-16 ***
## poly(outturn, 2, raw = T)2  2.799e+01  1.027e+00   27.24   <2e-16 ***
## time_trend                 -2.158e+00  3.463e-02  -62.31   <2e-16 ***
## s1                         -2.667e+03  7.473e+01  -35.69   <2e-16 ***
## s2                         -1.436e+03  4.143e+01  -34.67   <2e-16 ***
## c1                         -2.833e+03  5.144e+01  -55.08   <2e-16 ***
## c2                          1.229e+03  4.145e+01   29.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1593 on 2948 degrees of freedom
## Multiple R-squared:  0.8961, Adjusted R-squared:  0.8959 
## F-statistic:  3632 on 7 and 2948 DF,  p-value: < 2.2e-16
errors <- td_model$residuals 

error_dist <- logspline(td_model$residuals) 

  max_temp <- max(temperatures$outturn, na.rm=T) %T>% 
cat("Max Temperature in the dataset: ", ., "\n")
## Max Temperature in the dataset:  24.55
min_temp <- min(temperatures$outturn, na.rm=T) %T>% 
cat("Min Temperature in the dataset: ", ., "\n")
## Min Temperature in the dataset:  -3.8
ft <- 1 + errors %>% length()

td_range <- seq(min_temp, max_temp, 0.01)

td_simulation_1 <- td_x(td_range, ft) %>%
 
  td_sim(td_range, error_dist, td_model, .)

td_range_2 <- seq(-5,34, 0.01)

td_simulation_2 <- td_x(td_range_2, ft) %>%
 
  td_sim(td_range_2, error_dist, td_model, .)

td_plot <- td_simulation_2 %>% 
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() + 
  geom_jitter(alpha = 0.25) + 
  geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
  ggtitle("Expected Demand versus Temperatures (Met Office Data)") + 
  labs(x = x_title, y = y_title) + theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5)); td_plot

td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)

td_minima_y <- td_optima$minimum %T>% cat("Demand minima: ", ., "\n")
## Demand minima:  23072.8
td_minima_x <- td_optima$estimate %T>% cat("Temperature minima: ", ., "\n")
## Temperature minima:  15.64787
td_plot + 
  geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) + 
  geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
  geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2)

\[ y = 23073, x= 15.65 \]

Using this dataset for temperatures instead, our model’s minima is pretty close to the base used in degree days calculations!

Other variants of the same answer

p1 <- function(){
  demand %>% 
  mutate(dateval = as.Date(timestamp)) %>% 
  group_by(dateval) %>% return()
}
p2 <- function(df){
 df %>% left_join(daymax,.) %>% 
  left_join(daymin) %>%
  na.omit() %>%
  mutate(time_trend = seq_len(nrow(.)), outturn = (min_outturn + max_outturn)/2) %>%  add_c() %>% return()
}



p3 <- function(df, td_model = NULL){
  
if(is.null(td_model)){
td_model  <- df %>%  td_lm()
}
errors <- td_model$residuals 

error_dist <- logspline(td_model$residuals)

ft <- 1 + errors %>% length()

td_range <- seq(min_temp, max_temp, 0.01)

td_range_2 <- seq(-5,34, 0.01)

td_simulation_2 <- td_x(td_range_2, ft) %>%
td_sim(td_range_2, error_dist, td_model, .)

td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)

td_minima_y <- td_optima$minimum %T>% cat("Demand minima: ", ., "\n") %>% print()

td_minima_x <- td_optima$estimate %T>% cat("Temperature minima: ", ., "\n") %>% print()

td_simulation_2 %>% 
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() + 
  geom_jitter(alpha = 0.25) + 
  geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
  ggtitle("Expected Demand versus Temperatures (Met Office Data)") + 
  labs(x = x_title, y = "Demand") + theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) + 
  geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
  geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2) %>% return()

}

Using daily total megawatt-hours:

p1() %>% 
summarise(daily_demand = sum(demand)/12) %>% 
p2()  %>% 
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima:  560819.6 
## [1] 560819.6
## Temperature minima:  15.84501 
## [1] 15.84501

Median demand:

p1() %>% 
summarise(daily_demand = median(demand)) %>% 
p2() %>% 
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima:  23663.79 
## [1] 23663.79
## Temperature minima:  15.84723 
## [1] 15.84723

Mean Demand:

p1() %>%
  summarise(daily_demand = mean(demand)) %>% 
p2() %>%
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima:  23283.37 
## [1] 23283.37
## Temperature minima:  15.54955 
## [1] 15.54955

Some sort of LAD estimation:

library(quantreg)
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve
p1() %>%
  summarise(daily_demand = mean(demand)) %>% 
p2() %>%
p3(
td_model = temperatures %>%  rq(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + s1 + s2 + c1 + c2, tau = 0.5)
)
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima:  22840.2 
## [1] 22840.2
## Temperature minima:  15.43276 
## [1] 15.43276