It was a very warm week for the UK last week, which got me thinking about the concept of heating and cooling demand for electricity. There must be a temperature at which they offset each other. Using a regression and simulation, I aim to pinpoint this offset.
There is already vast discussions and literature on the notion of Heating and Cooling degree days, and the “base” temperature used. This is discussed below, including some shortfalls in openly available data.
I could not find any open-source data for UK temperatures at a small enough level to analyse the highs and lows during the day, so I turned to Elexon (or the Balancing Mechanism Reporting Service).
Temperatures: https://www.bmreports.com/bmrs/?q=generation/tempraturedata
Demand: https://www.bmreports.com/bmrs/?q=demand/rollingsystemdemand
We have two datasets to play with here, both courtesy of The Balancing Mechanism Reporting Service. The demand dataset is given in 5-minute intervals, and ergo has been averaged out across days instead. Temperatures are already given in days, and are measured from a handful of weather stations across the UK at 12:00PM. The power grid operator apparently uses this dataset for forecasting.
As the data only relates to 12PM, the major caveat of this analysis is that while we can eliminate seasonal effects that span any multiple of days, we cannot account for intra-day effects.
There are data available via the met office for England, but not the UK as a whole. This is explored in the same fashion as the Elexon dataset.
For the demand dataset, we will want to eliminate all observations that are not 12PM:
library(knitr);library(tidyverse);library(magrittr); library(logspline);library(scales);library(lubridate)
demand <- read_csv("instantDemand.csv",
col_types = cols(timestamp = col_datetime(format = "%Y-%m-%d %H:%M:%S")))
instantDemand <- demand %>%
filter(hour(timestamp) == 12 & minute(timestamp) == 0) %>%
set_colnames(c("timestamp", "daily_demand")) %>%
mutate(dateval = as.Date(timestamp))
instantDemand %>% head() %>% kable()
timestamp | daily_demand | dateval |
---|---|---|
2011-05-28 12:00:00 | 34370 | 2011-05-28 |
2011-05-29 12:00:00 | 33764 | 2011-05-29 |
2011-05-30 12:00:00 | 35954 | 2011-05-30 |
2011-05-31 12:00:00 | 40055 | 2011-05-31 |
2011-06-01 12:00:00 | 39713 | 2011-06-01 |
2011-06-02 12:00:00 | 40309 | 2011-06-02 |
The periodicity in the data could be explained away with two main factors. The first would be lighting, and the second would be temperatures. I say “main” for a good reason: other factors may include how well the economy is performing, seasonal work and production of goods, holidays, and so on.
Secondly, temperatures. These figures are an average at 12PM every day from a handful of stations across the UK. The graph below shows that periodicity exists for the temperatures data as well. We will add some variables: a time trend and some cyclical variables, with the aid of the pi
function.
We will turn to functional programming here. The reason will become apparent soon.
tau <- 2*pi
c_sin <- function(v = temperatures$time_trend, len){sin((tau*v)/len) %>% return()}
c_cos <- function(v = temperatures$time_trend, len){cos((tau*v)/len) %>% return()}
add_c <- function(df){
l <- 1:nrow(df)
df %<>%
mutate(s1 = c_sin(l, 365), s2 = c_sin(l, 7), c1 = c_cos(l, 365), c2 = c_cos(l,7)) %>%
return()}
temperatures <- read_csv("temperatures.csv", col_types = cols(X1 = col_skip(), dateval = col_date(format = "%Y-%m-%d"))) %>%
mutate(time_trend = 1:nrow(.))
temperatures %<>%
add_c() %>%
left_join(instantDemand)
## Joining, by = "dateval"
temperatures %>%
head() %>%
kable()
dateval | outturn | normal | high | low | time_trend | s1 | s2 | c1 | c2 | timestamp | daily_demand |
---|---|---|---|---|---|---|---|---|---|---|---|
2012-11-10 | 8.7 | 9.4 | 7.0 | 11.6 | 1 | 0.0172134 | 0.7818315 | 0.9998518 | 0.6234898 | 2012-11-10 12:00:00 | 38728 |
2012-11-11 | 7.7 | 9.4 | 7.0 | 11.6 | 2 | 0.0344216 | 0.9749279 | 0.9994074 | -0.2225209 | 2012-11-11 12:00:00 | 37786 |
2012-11-12 | 8.0 | 7.7 | 5.1 | 11.1 | 3 | 0.0516197 | 0.4338837 | 0.9986668 | -0.9009689 | 2012-11-12 12:00:00 | 45554 |
2012-11-13 | 10.5 | 7.7 | 5.1 | 11.1 | 4 | 0.0688024 | -0.4338837 | 0.9976303 | -0.9009689 | 2012-11-13 12:00:00 | 42742 |
2012-11-14 | 10.7 | 7.7 | 5.1 | 11.1 | 5 | 0.0859648 | -0.9749279 | 0.9962982 | -0.2225209 | 2012-11-14 12:00:00 | 43159 |
2012-11-15 | 8.8 | 7.7 | 5.1 | 11.1 | 6 | 0.1031017 | -0.7818315 | 0.9946708 | 0.6234898 | NA | NA |
td_lm <- function(df){df %>%
lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + s1 + s2 + c1 + c2)}
Before delving right into it, here’s what the relationship between temperatures and demand looks like.
The trendline is a polynomial of order 2.
Of course, this relationship makes a bad assumption that both variables are time-independent. The model below makes an attempt to account for the data’s time-dependence.
With the above in mind, we can use a regression model to show the relationship between electricity demand and temperatures.
We will stick a squared term for outturn temperature as well.
\[ Let \ D_t \ represent \ our \ demand \ at \ time \ t , \ and \ T_t \ represent \ the \ outturn \ temperature. \ \] \[ Using \ our \ tau \ defined \ above \ as \ \tau = 2\pi \, the \ model \ used \ is \ as \ follows:\]
\[\hat{D_t} = \beta_0 + \beta_1 (T_t) + \beta_2(T_t)^2 + \beta_3(t) + \beta_4sin(\tau t/365)+ \beta_5sin(\tau t/7)+\beta_6cos(\tau t/365) + \beta_7cos(\tau t/7) + \epsilon_t\]
td_model <- temperatures %>% td_lm()
summary(td_model)
##
## Call:
## lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend +
## s1 + s2 + c1 + c2, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13573.5 -1730.4 77.7 1809.9 8360.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.960e+04 3.796e+02 130.67 <2e-16 ***
## poly(outturn, 2, raw = T)1 -1.423e+03 5.433e+01 -26.19 <2e-16 ***
## poly(outturn, 2, raw = T)2 4.287e+01 2.154e+00 19.91 <2e-16 ***
## time_trend -3.398e+00 8.309e-02 -40.90 <2e-16 ***
## s1 1.968e+03 1.852e+02 10.63 <2e-16 ***
## s2 -1.538e+03 8.155e+01 -18.86 <2e-16 ***
## c1 2.044e+03 9.218e+01 22.18 <2e-16 ***
## c2 -2.297e+03 8.161e+01 -28.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2804 on 2356 degrees of freedom
## (87 observations deleted due to missingness)
## Multiple R-squared: 0.7811, Adjusted R-squared: 0.7804
## F-statistic: 1201 on 7 and 2356 DF, p-value: < 2.2e-16
errors <- td_model$residuals
error_dist <- logspline(td_model$residuals)
First, lets keep it in range with the data we already have. Here’s some stuff that will do into the simulation:
td_sim <- function(td_range, td_dist, td_lm, td_x){
y <- predict(td_lm, td_x) + rlogspline(length(td_range), td_dist)
data.frame(td_range, y) %>%
set_colnames(c("temperatures", "demand_simulations")) %>%
return()}
td_x <- function(t_range, x){
data.frame(outturn = t_range,
time_trend = x,
s1 = c_sin(x, len = 365),
s2 = c_sin(x, len = 7),
c1 = c_cos(x, len = 365),
c2 = c_cos(x, len = 7)) %>% return()}
max_elexon_temp <- max(temperatures$outturn, na.rm=T) %T>%
cat("Max Temperature in the dataset: ", ., "\n")
## Max Temperature in the dataset: 24.9
min_elexon_temp <- min(temperatures$outturn, na.rm=T) %T>%
cat("Min Temperature in the dataset: ", ., "\n")
## Min Temperature in the dataset: -2
td_range <- seq(min_elexon_temp, max_elexon_temp, 0.01)
ft <- 1 + errors %>% length()
td_simulation_1 <- td_x(td_range, ft) %>%
td_sim(td_range, error_dist, td_model, .)
With our model, we can get a sense for what might ocurr at higher temperatures that aren’t included in the original data as well.
All we need to do is define a bigger range of temperatures, run the model and simulate the likely error from it.
td_range_2 <- seq(-5,34, 0.01)
td_simulation_2 <- td_x(td_range_2, ft) %>%
td_sim(td_range_2, error_dist, td_model, .)
td_plot <- td_simulation_2 %>%
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() +
geom_jitter(alpha = 0.25) +
geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
ggtitle("Expected Demand versus Temperatures (BMRS)") +
labs(x = x_title, y = y_title) + theme_bw() +
theme(plot.title = element_text(hjust = 0.5)); td_plot
We can also include when we would expect cooling demand to start reversing overall demand.
We’re using a partial derivative and fixing in our time variables at a given point (2391)
\[ \frac{\partial \hat{D_t}} {\partial T_t } = 0,\ t = 2361 \]
td_optimise <- function(td_lm, td_range, ft, guess){
b <- td_lm$coefficients %>% as.numeric
x <- td_range
fixed <- b[4]*ft + b[5]*c_sin(v = ft, 365) + b[6]*c_sin(v = ft, 7) + b[7]*c_cos(v = ft, 365) + b[8]*c_cos(v = ft, 7)
nlm(f = function(x) b[1] + b[2]*x + b[3]*x^2 + fixed, p = guess) %>% return()
}
td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)
td_minima_y <- td_optima$minimum
td_minima_x <- td_optima$estimate
td_plot +
geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) +
geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2)
\[ y = 27750, \ \ x = 16.6 \]
Therefore, it is likely that power demand stops declining when temperatures reach roughly 16.6 degrees . . . According to these particular data, anyway.
As an aside, an argument could be put for the residuals exhibiting autocorrelation. Here’s an ACF plot of the residuals from td_model
:
As mentioned in the beginning, there is another option when it comes to open temperature data.
Heating and Cooling Degree Days (HDD and CDD respectively) are used to determine how much energy one will require, given a mid-point temperature and a “base”. Values less than zero are excluded.
Per day, then:
\[ \overline{T} = \frac{min(T) + max(T)}{2} \]
\[ CDD = \frac{\vert\overline{T} - Base \vert + (\overline{T} - Base)}{2} \]
\[ HDD = \frac{\vert Base - \overline{T} \vert + (Base - \overline{T})}{2} \]
Despite the met office data only pertains to a sub-region of England, we could use it. It requires some serious cleaning up as well, so readers may wish to explore the code used.
Once tidied, it’s a case of wrinse-and-repeat.
library(data.table); library(reshape2)
daymin <- read_table2("https://www.metoffice.gov.uk/hadobs/hadcet/cetmindly1878on_urbadj4.dat",
col_names = c("Year", "Day", "1", "2","3","4","5","6","7","8","9","10","11","12","x")) %>%
melt(c("Year","Day")) %>%
filter(variable != "x", value != -999, Year >= 2011) %>%
mutate(dateval = ymd(paste0( Year, "-", variable, "-", Day)), min_outturn = value / 10) %>%
arrange(dateval) %>%
select(dateval, min_outturn)
daymax <- read_table2("https://www.metoffice.gov.uk/hadobs/hadcet/cetmaxdly1878on_urbadj4.dat",
col_names = c("Year", "Day", "1", "2","3","4","5","6","7","8","9","10","11","12","x")) %>%
melt(c("Year","Day")) %>%
filter(variable != "x", value != -999, Year >= 2011) %>%
mutate(dateval = ymd(paste0( Year, "-", variable, "-", Day)), max_outturn = value / 10) %>%
arrange(dateval) %>%
select(dateval, max_outturn)
demand_minmax <- demand %>% mutate(dateval = as.Date(timestamp)) %>%
group_by(dateval) %>%
summarise(daily_demand = (min(demand) + max(demand))/2)
temperatures <- left_join(demand_minmax, daymin) %>%
left_join(daymax) %>%
mutate(outturn = (min_outturn + max_outturn)/2,
CDD = (abs(outturn - 15.5) + (outturn - 15.5))/2,
HDD = (abs(15.5 - outturn) + (15.5 - outturn))/2) %>% na.omit()
For visualisation’s sake, I’ve made HDDs negative here.
The demand data looks similar to what it was previously, as does the scatterplot:
temperatures %<>% na.omit() %>%
mutate(time_trend = seq_len(nrow(.)))
temperatures %<>% add_c()
td_model <- temperatures %>% td_lm()
summary(td_model)
##
## Call:
## lm(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend +
## s1 + s2 + c1 + c2, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9633.4 -879.6 44.9 1039.2 6032.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.190e+04 1.466e+02 285.72 <2e-16 ***
## poly(outturn, 2, raw = T)1 -8.759e+02 2.315e+01 -37.84 <2e-16 ***
## poly(outturn, 2, raw = T)2 2.799e+01 1.027e+00 27.24 <2e-16 ***
## time_trend -2.158e+00 3.463e-02 -62.31 <2e-16 ***
## s1 -2.667e+03 7.473e+01 -35.69 <2e-16 ***
## s2 -1.436e+03 4.143e+01 -34.67 <2e-16 ***
## c1 -2.833e+03 5.144e+01 -55.08 <2e-16 ***
## c2 1.229e+03 4.145e+01 29.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1593 on 2948 degrees of freedom
## Multiple R-squared: 0.8961, Adjusted R-squared: 0.8959
## F-statistic: 3632 on 7 and 2948 DF, p-value: < 2.2e-16
errors <- td_model$residuals
error_dist <- logspline(td_model$residuals)
max_temp <- max(temperatures$outturn, na.rm=T) %T>%
cat("Max Temperature in the dataset: ", ., "\n")
## Max Temperature in the dataset: 24.55
min_temp <- min(temperatures$outturn, na.rm=T) %T>%
cat("Min Temperature in the dataset: ", ., "\n")
## Min Temperature in the dataset: -3.8
ft <- 1 + errors %>% length()
td_range <- seq(min_temp, max_temp, 0.01)
td_simulation_1 <- td_x(td_range, ft) %>%
td_sim(td_range, error_dist, td_model, .)
td_range_2 <- seq(-5,34, 0.01)
td_simulation_2 <- td_x(td_range_2, ft) %>%
td_sim(td_range_2, error_dist, td_model, .)
td_plot <- td_simulation_2 %>%
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() +
geom_jitter(alpha = 0.25) +
geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
ggtitle("Expected Demand versus Temperatures (Met Office Data)") +
labs(x = x_title, y = y_title) + theme_bw() +
theme(plot.title = element_text(hjust = 0.5)); td_plot
td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)
td_minima_y <- td_optima$minimum %T>% cat("Demand minima: ", ., "\n")
## Demand minima: 23072.8
td_minima_x <- td_optima$estimate %T>% cat("Temperature minima: ", ., "\n")
## Temperature minima: 15.64787
td_plot +
geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) +
geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2)
\[ y = 23073, x= 15.65 \]
Using this dataset for temperatures instead, our model’s minima is pretty close to the base used in degree days calculations!
p1 <- function(){
demand %>%
mutate(dateval = as.Date(timestamp)) %>%
group_by(dateval) %>% return()
}
p2 <- function(df){
df %>% left_join(daymax,.) %>%
left_join(daymin) %>%
na.omit() %>%
mutate(time_trend = seq_len(nrow(.)), outturn = (min_outturn + max_outturn)/2) %>% add_c() %>% return()
}
p3 <- function(df, td_model = NULL){
if(is.null(td_model)){
td_model <- df %>% td_lm()
}
errors <- td_model$residuals
error_dist <- logspline(td_model$residuals)
ft <- 1 + errors %>% length()
td_range <- seq(min_temp, max_temp, 0.01)
td_range_2 <- seq(-5,34, 0.01)
td_simulation_2 <- td_x(td_range_2, ft) %>%
td_sim(td_range_2, error_dist, td_model, .)
td_optima <- td_optimise(td_model, td_range_2, ft, guess = 20)
td_minima_y <- td_optima$minimum %T>% cat("Demand minima: ", ., "\n") %>% print()
td_minima_x <- td_optima$estimate %T>% cat("Temperature minima: ", ., "\n") %>% print()
td_simulation_2 %>%
ggplot(mapping = aes(x = temperatures, y = demand_simulations)) + theme_bw() +
geom_jitter(alpha = 0.25) +
geom_smooth(method = "lm", formula = y ~ poly(x, 2, raw = T), size = 1.5) +
ggtitle("Expected Demand versus Temperatures (Met Office Data)") +
labs(x = x_title, y = "Demand") + theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = td_minima_y, colour = "red", alpha = 0.75) +
geom_vline(xintercept = td_minima_x, colour = "red", alpha = 0.75) +
geom_point(mapping = aes(x = td_minima_x, y = td_minima_y), colour = "red", size = 2) %>% return()
}
Using daily total megawatt-hours:
p1() %>%
summarise(daily_demand = sum(demand)/12) %>%
p2() %>%
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima: 560819.6
## [1] 560819.6
## Temperature minima: 15.84501
## [1] 15.84501
Median demand:
p1() %>%
summarise(daily_demand = median(demand)) %>%
p2() %>%
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima: 23663.79
## [1] 23663.79
## Temperature minima: 15.84723
## [1] 15.84723
Mean Demand:
p1() %>%
summarise(daily_demand = mean(demand)) %>%
p2() %>%
p3()
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima: 23283.37
## [1] 23283.37
## Temperature minima: 15.54955
## [1] 15.54955
Some sort of LAD estimation:
library(quantreg)
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
p1() %>%
summarise(daily_demand = mean(demand)) %>%
p2() %>%
p3(
td_model = temperatures %>% rq(formula = daily_demand ~ poly(outturn, 2, raw = T) + time_trend + s1 + s2 + c1 + c2, tau = 0.5)
)
## Joining, by = "dateval"
## Joining, by = "dateval"
## Demand minima: 22840.2
## [1] 22840.2
## Temperature minima: 15.43276
## [1] 15.43276