Impact of Pricing Signal on Electricity Demand

  1. Introduction
  2. Data
  3. Data Exploration
  4. Visualizing Seasonality
    • Annual Seasonality
    • Heatmap Weekly Vs. Hourly
    • Heatmap Montly Vs. Hourly
  5. Electricity Pricing Information
  6. Comparing Standard and Time of Use Pricing
    • Graphical Visualization
    • Logistic Regression
    • Diagnostic Plots
  7. Conclusions
  8. Useful Links

1. Introduction

Managing electricity demand is important in an evolving generatio/demand landscape where more electricity is generated by non-traditional plants such solar and wind farms. Shifting demand in time to when electricity is generated from renewable sources will be of increasing need in the future.

A study took place between 2011 to 2014 in the UK to study how residential consumers would react to knowing in advance the price of electricity. More than 1100 homes were recruited in a project to receive day-ahead pricing notificiation, registered in a half-hourly pricing scheme and given monthly feedback on their consumption patterns. These consumer were joined by 4500 households that did not receive any pricing or feedback information and had fixed daily electricity pricing.

Pricing Signal Price per kW.hh
Low 3.99p
Normal 11.76p
High 67.2p

This study will analyze how effective was the day-ahead Pricing Signal notification in modifying the consumption of electricity of those users receiving notofication versus those without it.

The different households in the study were classified in 13 different demographic segments that fitted a combination of unique socio-economic indicators. For this study we chose only of these 13 groups.

ACORN-L “Post Industrial Families”

Twenty years ago, these would have been traditional blue-collar areas. Now with the decline of heavy industry, people are quite likely to work in office or clerical jobs and in shops. Most households are traditional families with school-age children. They generally live in three-bedroom terraced houses, which tend to be at the cheaper end of the housing market. Most families are owner-occupiers, but a number rent their houses from the council.

Incomes are more likely to be about the national average. Spending on credit cards is low and people are careful with their money. Mortgages are often covered by a mortgage protection policy and levels of remortgaging are high. Many will also switch utility provider in order to get the best deal. A higher proportion of these home owners took out new mortgages just prior to the recession, and with higher loan-to-value ratios, the result of falls in house prices is that some are likely to be in negative equity. A few may have been in arrears on their mortgage in the past and some may have additional debt from loans or credit cards. Ten per cent of these families were finding their financial situation difficult, even before the recession. They have cut back on the amount they spend, for example by buying fewer new clothes. Unemployment is both above average and increasing faster than average, particularly among skilled trades and industrial workers.

Most families can afford to run a car and to take a holiday every year, often a package holiday to the Mediterranean. Cable and satellite TV is popular, as are computer games and sports such as football and rugby. There are more smokers than average, and they are less likely to give up smoking. These are cautious consumers who are successfully adapting to the changing nature of employment in the UK.

2. Data

The data gathered half-hourly by the 5500 households from November 2011 and February 2014 created a very large dataset. When initially downloaded the zipped file had a size of 765.12 MB that increased to 11.3GB once unzipped. The dataset offers plenty of opportunities to compared between different variables but as a whole is too unwieldy to handle. This is the reason why I decided to concentrate on one fo the 13 households groups.

I created a subset of the initial data by first filtering by year (only data from 2013). Then I created a new subset from the previous by selecting by type of household (ACORN-L or “Post Industrial Families”). Finally, I subsetted from the previous one by created to two new dataframes depending on wether the families used the “Time of Use” or “Standard” pricing. The final two datasets used in this study “London.Energy_2013_L_Std” and “London.Energy_2013_L_ToU” have a size of 374 MB and 106 MB respectively. A large reduction in computer memory needs form the initial 13 GB dataset.

The following is the workflow used to filter the initial dataset to the specific subset of interest:

Loading Dataset:

London.Energy.2013 <- filter(London.Energy, str_detect(London.Energy$DateTime, 
    "2013"))
write.csv(London.Energy.2013, "/Users/josemawyin/607_Final_Project/London.Energy.2013.csv")
rm(London.Energy)

Subsetting by type of household:

London.Energy.2013.L <- filter(London.Energy.2013, str_detect(London.Energy.2013$Acorn, 
    "L"))
write.csv(London.Energy.2013.L, "/Users/josemawyin/607_Final_Project/London.Energy_2013_L.csv")
rm(London.Energy.2013)

Subsetting by type of meter:

London.Energy.2013.L.Std <- filter(London.Energy.2013.L, str_detect(London.Energy.2013.L$stdorToU, 
    "Std"))
write.csv(London.Energy.2013.L.Std, "/Users/josemawyin/607_Final_Project/London.Energy_2013_L_Std.csv")
London.Energy.2013.L.ToU <- filter(London.Energy.2013.L, str_detect(London.Energy.2013.L$stdorToU, 
    "ToU"))
write.csv(London.Energy.2013.L.ToU, "/Users/josemawyin/607_Final_Project/London.Energy_2013_L_ToU.csv")
rm(London.Energy.2013.L)
London.Energy.2013.Q.Std <- filter(London.Energy.2013.Q, str_detect(London.Energy.2013.Q$stdorToU, 
    "Std"))
write.csv(London.Energy.2013.Q.Std, "/Users/josemawyin/607_Final_Project/London.Energy_2013_Q_Std.csv")
London.Energy.2013.Q.ToU <- filter(London.Energy.2013.Q, str_detect(London.Energy.2013.Q$stdorToU, 
    "ToU"))
write.csv(London.Energy.2013.Q.ToU, "/Users/josemawyin/607_Final_Project/London.Energy_2013_Q_ToU.csv")
rm(London.Energy.2013.Q)

3. Data Exploration

Let’s take a look at the data in in our “London.Energy_2013_L_ToU” dataframe:

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   X1_1 = col_double(),
##   LCLid = col_character(),
##   stdorToU = col_character(),
##   DateTime = col_datetime(format = ""),
##   KWH.hh..per.half.hour. = col_double(),
##   Acorn = col_character(),
##   Acorn_grouped = col_character()
## )
## Observations: 1,280,561
## Variables: 8
## $ X1                     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
## $ X1_1                   <dbl> 4485629, 4485630, 4485631, 4485632, 44856…
## $ LCLid                  <chr> "MAC000014", "MAC000014", "MAC000014", "M…
## $ stdorToU               <chr> "ToU", "ToU", "ToU", "ToU", "ToU", "ToU",…
## $ DateTime               <dttm> 2013-01-01 00:00:00, 2013-01-01 00:30:00…
## $ KWH.hh..per.half.hour. <dbl> 0.019, 0.034, 0.033, 0.016, 0.034, 0.026,…
## $ Acorn                  <chr> "ACORN-L", "ACORN-L", "ACORN-L", "ACORN-L…
## $ Acorn_grouped          <chr> "Adversity", "Adversity", "Adversity", "A…

As seen from the structure information above, we have 1,280,561 observations spread along 8 variables. This data frame is a mixed of factors, characters, date/time information and numerical data. Later on, we will have to change the type of some of these variables to facilitate our analysis.

## Observations: 420,768
## Variables: 3
## $ DateTime               <fct> 2013-01-01 00:00:00, 2013-01-01 00:30:00,…
## $ LCLid                  <chr> "MAC000106", "MAC000106", "MAC000106", "M…
## $ KWH.hh..per.half.hour. <int> 359, 351, 116, 64, 75, 102, 108, 68, 82, …

These 1,280,561 observations correspond to the half-hourly measurements over 365 days for 24 households that were in this group of (ACORN-L or “Post Industrial Families”) part of the ToU (Time of Use) group. Initialy, this data is the “Long-Format” that then was trasformed in a “Wide-Format” by having the household ID’s as column names, Date/Time of use as column information and the electricity comsumption inforamtion as the variable.

Finally, after we have the data in a “Wide-Format” we calculate the row-wise average of the values to get the mean consumption of electricity for a given time over all the 24 households of this group.

4. Visualizing Seasonality

This section will visualize how the half-hourly electricity consumption changes depending on the hour of the day, day of the week and month of the year. We will need to separate the hour, day and month information from the date/time stamp attached to the consumption reading. Below we can see the structure of the dataframe after we separated these components.

4.1 Annual Seasonality in Electricity Consumption

In this plot we can observe the longer time-scale trend of anual seasonality affects electricity demand. It is higher during the colder winter months and decreases during the summer. In the UK, heating uses electricity while households do not have the prevalence of air conditioning present in other countries such as the United States.

4.2 Heatmats of Weekday Vs Hour

In this heatmap we can see how the hour of the week affects electricity demand. We can see how the demand increases in the afternoon (4pm to 7pm) as people get back to work and decreases in the early hours of the morning (midnight to 5am) as people are sleeping and electricity consumption is at a minimum.

Of interest is how the consumption between Saturday and Sunday differ in the heatmap. Appears that consumtion is more spread out around Sunday late afternoon as compared to Saturday.

4.3 Heatmat of Month Vs Hour of the Day

In this heatmap we can observe the seasonality in the electricity demand across months of the year as well as hours of the day. We see how electricity demans is again the highest in the late afternoon but we also see how it increases around winter months and decreases around the summer months. This can be explained by the fact that in the City of London where this study takes place, there is high usage of electric heating which explains the increased demand during the winter months.

5. Electricity Pricing Information

This section will study the timing signal provided to the subset of households with the ToU (Time of Use) electricity pricing scheme. As we can see in the histogram above, most the pricing signals indicated normal pricing. The incentive for the consumer to control their electricity demand is that there is a significant variation in the cost of electricity depending on the pricing signal as we can see the table below.

Pricing Signal Price per kW.hh
Low 3.99p
Normal 11.76p
High 67.2p
## 'data.frame':    17520 obs. of  9 variables:
##  $ TariffDateTime: POSIXct, format: "2013-01-01 00:00:00" "2013-01-01 00:30:00" ...
##  $ Tariff        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ DateTime      : POSIXct, format: "2013-01-01 00:00:00" "2013-01-01 00:30:00" ...
##  $ ymd           : POSIXct, format: "2013-01-01 00:00:00" "2013-01-01 00:30:00" ...
##  $ month         : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year          : num  2013 2013 2013 2013 2013 ...
##  $ wday          : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ hour          : int  0 0 1 1 2 2 3 3 4 4 ...
##  $ Tariff_factor : Factor w/ 3 levels "High","Low","Normal": 3 3 3 3 3 3 3 3 3 3 ...

In the pricing signal heatmap above darker cells signify Lower Price while lighter cells signify Higher Price of electricity. We can see that the goal of the pricing signal is to incentivize the consumer to use less electricity in the times of typical high usage (later afternoon) as the higher electricity price is sent for these times of the day.

6.Comparing Standard and Time of Use Pricing

## 'data.frame':    17520 obs. of  11 variables:
##  $ DateTime     : POSIXct, format: "2013-01-01 00:00:00" "2013-01-01 00:30:00" ...
##  $ Tariff_Factor: Factor w/ 3 levels "High","Normal",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Tariff_Number: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ L.ToU.Average: num  224 214 165 130 156 ...
##  $ L.Std.Average: num  212 207 199 182 177 ...
##  $ Difference   : num  12.1 7.39 -33.26 -51.67 -21.14 ...
##  $ ymd          : POSIXct, format: "2013-01-01 00:00:00" "2013-01-01 00:30:00" ...
##  $ month        : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year         : num  2013 2013 2013 2013 2013 ...
##  $ wday         : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ hour         : int  0 0 1 1 2 2 3 3 4 4 ...

In this section we will study the effect of the pricing signal on the electricity demand of households with ToU (Time of Use) signal as compared with those households with the Std (Standard) electricity pricing scheme. The comparisson will use the difference between the average electricity demand of ToU users minus Std users. A negative difference value means that the ToU users are consuming less electricity than those without pricing signals (Std users) while a positive difference means that they are using more electricity.

6.1 Graphical Visualization

The box and scatter plot above shows that indeed ToU consumers are affected by pricing signals. First, the mean demand changes depending on the price signal (High, Normal and Low pricing). Second, the mean demand is positive with a Low pricing signal indicating higher consumption and negative with a High pricing signal indicating lower consumtion. This makes sense as you would prefer to use electricity when is cheaper rather than more expansive.

6.2 Logistic Regression

In this section, we will used linear regression to study the link between a response variable and a series of predictor variables. The response variable will be the difference between the electricity demand of households using ToU and Std electricity pricing schemes. The predictor variables will be:

  1. Pricing Signal - Factor with 3 levels (High, Normal, Low)

  2. Hour of the Day - Factor with 7 levels (7 days of the week)

  3. Month of the Year - Factor with 12 levels (12 months of the year)

## 
## Call:
## lm(formula = Difference ~ as.factor(Tariff_Factor) + as.factor(hour) + 
##     as.factor(wday) + as.factor(month) - 1, data = L.ToU.and.ToU)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -216.80  -27.90   -2.14   24.78  422.34 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## as.factor(Tariff_Factor)High   -16.4331     2.6176  -6.278 3.51e-10 ***
## as.factor(Tariff_Factor)Normal   3.8613     1.8802   2.054 0.040024 *  
## as.factor(Tariff_Factor)Low     28.7077     2.2226  12.916  < 2e-16 ***
## as.factor(hour)1               -11.8327     2.6500  -4.465 8.05e-06 ***
## as.factor(hour)2                -9.9800     2.6500  -3.766 0.000166 ***
## as.factor(hour)3               -12.1155     2.6500  -4.572 4.87e-06 ***
## as.factor(hour)4               -20.4140     2.6500  -7.703 1.40e-14 ***
## as.factor(hour)5               -27.0097     2.6500 -10.192  < 2e-16 ***
## as.factor(hour)6               -34.3258     2.6500 -12.953  < 2e-16 ***
## as.factor(hour)7               -37.1353     2.6500 -14.013  < 2e-16 ***
## as.factor(hour)8               -48.1521     2.6500 -18.170  < 2e-16 ***
## as.factor(hour)9               -40.7219     2.6500 -15.367  < 2e-16 ***
## as.factor(hour)10              -27.5683     2.6500 -10.403  < 2e-16 ***
## as.factor(hour)11              -23.2104     2.6501  -8.758  < 2e-16 ***
## as.factor(hour)12              -26.7986     2.6501 -10.112  < 2e-16 ***
## as.factor(hour)13              -30.3664     2.6501 -11.459  < 2e-16 ***
## as.factor(hour)14              -33.4864     2.6501 -12.636  < 2e-16 ***
## as.factor(hour)15              -24.6660     2.6501  -9.308  < 2e-16 ***
## as.factor(hour)16              -15.4140     2.6501  -5.816 6.12e-09 ***
## as.factor(hour)17              -27.2055     2.6538 -10.252  < 2e-16 ***
## as.factor(hour)18              -10.7554     2.6538  -4.053 5.08e-05 ***
## as.factor(hour)19              -11.9430     2.6538  -4.500 6.83e-06 ***
## as.factor(hour)20              -13.5506     2.6538  -5.106 3.32e-07 ***
## as.factor(hour)21               -9.4431     2.6538  -3.558 0.000374 ***
## as.factor(hour)22               -3.5569     2.6538  -1.340 0.180165    
## as.factor(hour)23                3.9294     2.6500   1.483 0.138153    
## as.factor(wday).L               -1.1290     1.0152  -1.112 0.266122    
## as.factor(wday).Q               -4.3162     1.0153  -4.251 2.14e-05 ***
## as.factor(wday).C               -4.0288     1.0154  -3.968 7.29e-05 ***
## as.factor(wday)^4               -1.7662     1.0148  -1.741 0.081786 .  
## as.factor(wday)^5               -1.7929     1.0108  -1.774 0.076127 .  
## as.factor(wday)^6               -1.1011     1.0121  -1.088 0.276642    
## as.factor(month).L             -24.9014     1.3276 -18.756  < 2e-16 ***
## as.factor(month).Q             -39.4148     1.3329 -29.572  < 2e-16 ***
## as.factor(month).C             -14.7679     1.3268 -11.131  < 2e-16 ***
## as.factor(month)^4               8.3026     1.3242   6.270 3.70e-10 ***
## as.factor(month)^5              14.4000     1.3348  10.788  < 2e-16 ***
## as.factor(month)^6              -0.5436     1.3419  -0.405 0.685386    
## as.factor(month)^7               4.7068     1.3298   3.540 0.000402 ***
## as.factor(month)^8              -1.0924     1.3246  -0.825 0.409540    
## as.factor(month)^9               0.9471     1.3291   0.713 0.476123    
## as.factor(month)^10             -4.4858     1.3240  -3.388 0.000705 ***
## as.factor(month)^11             -2.2332     1.3221  -1.689 0.091221 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.63 on 17477 degrees of freedom
## Multiple R-squared:  0.2111, Adjusted R-squared:  0.2092 
## F-statistic: 108.8 on 43 and 17477 DF,  p-value: < 2.2e-16

The summary of the linear regression model above shows a series of diagnostic coefficiencts of importance for the fitness of our model. We notice the following:

  1. R-squared: Our model only explains around 21% of the variability of our data. Why is this value low? We only have 24 different measuments per time-period and our predictor variables are limited to 3 factors. These are not enough to explain the randomness of when somebody turns on a microwave or other high demand but of short usage.

  2. P-values: The statistical significance of our predictor variables is high and non-uniform within the levels of a factor. For example, in the Pricing Signal factor, the Normal level has lower statistical significance in explaining Difference in consumption because the user would not react to the pricing signal and adjust their consumption. We have also variability in the time domain as shown in the case of the levels of the Hour of the Day and Month of the Year factors. Also, indicating different consumptions rates within the two different time domains.

6.3 Diagnostic Plots

From the plot above, we can see the normality of our response variable Difference that couples together ToU and Std users.

The residual plot above tell us that the residuals of our model show homoscedastic. Homoscedasticity means that the residuals, the difference between the observed value and the predicted value, are equal across all values of the predictor variables.

Finally, the Q-Q plots above shows that the data meet the assumption of normality.

## Response variable: numerical, Explanatory variable: categorical
## Warning: Ignoring null value since it's undefined for ANOVA.
## Summary statistics:
## n_High = 788, mean_High = -37.0389, sd_High = 57.4804
## n_Normal = 15072, mean_Normal = -16.1524, sd_Normal = 52.7084
## n_Low = 1660, mean_Low = 1.7017, sd_Low = 66.3165
## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: y
##              Df   Sum Sq Mean Sq F value    Pr(>F)
## x             2   863004  431502  146.01 < 2.2e-16
## Residuals 17517 51766195    2955                  
## 
## Pairwise tests: t tests with pooled SD 
##        High Normal
## Normal    0     NA
## Low       0      0

The Null Hypotheses is that the pricing signal has no effect on the Differences of demand between ToU and Std users. As we can see in the plot above, this is not the case. Clearly, there is a difference in demand depending on the pricing signal. Therefore, the Null Hypotheses fails and the alternative hypothese is that the Pricing Signal does have an effect on electricity consumption.

7. Conclusions

This study has shown that day-ahead Pricing Signal had an effect on the electricity demand of users part of the Time of Use (ToU) pricing scheme versus those on the Standard scheme (Std). ToU consumers shifted their electricity demand from typical high usage times to when electricity was cheaper.

Our linear regression model has shown the statistical signficance of the not only the Pricing Signal but segments of the dayly and weekly time domains that matched our visualizations of seasonality.

Personally, this study has been an useful opportunity to learn how to manage large datasets, different techniques to visualize relationships between variables, deal with time-series and working with models containing variables of numeric and factor type. A significant percentage of the time was spent in wrangling the smaller subset data from the whole as whale as analyzing the regression model that best fitted this case study.

This study has only focused on part of the collected data as it only covered 1 year of the three year study and 1 of the 13 consumer groups. Further work could analyze how the observations from this study are affected by not only different consumer groups but also parameters such as weather that may have an effect on electricity demand.