Keywords: functional data analysis, derivatives, statistical learning, clustering
A common analysis in utilities is the use of meter read data to define peak hours of consumption. This paper proposes a novel approach using functional data analysis (FDA) to define the peak hours of consumption for individual customers and groups of customers. Through functional data analysis, smooth, differentiable curves are fitted to an individual customer’s meter read data. Then, the individual on-peak times are aggregated to define on-peak hours for a group of customers, such as for all residential customers.
The individual customer on-peak hours are proposed as the times of day where the velocity or the rate of change is greatest or least, respectively. The group on-peak hours are determined by two different methods; the first by assessing the distributions of the start and end aggregated individual on-peak windows and the second via an unsupervised statistical learning procedure.
Methods
This paper relies heavily on functional data analysis to fit smooth, differentiable curves to each customer’s meter read data. Additionally, a kernel density estimation (KDE) method is used to assess the individual start and end on-peak distributions to construct the group on-peak hours. Statistical learning methods such as multidimensional scaling and k-means clustering are also explored as an alternative method to define on-peak hours but for several groups of customers.
Functional data analysis is a relatively young branch of statistics. For instance, the term “Functional Data Analysis” was coined by James O. Ramsay2 in 1982 (Wang, Chiou, and Müller 2016). The literature for functional data analysis is rich and the field is actively expanding.
Functional data analysis is about the analysis of information on curves or functions3. In the FDA framework, the data are random draws from an underlying process. We may consider the underlying process as a curve that lives in an infinite dimensional vector space.4 This curve is the object of the analyses. The goal of the analysis is to project the curve onto the subspace where the data live.
From “Functional Data Analysis” (Ramsay and Silverman 2005), the goals of functional data analysis include:
The goals as listed above are similar to other statistical analyses. Additionally, functional data analysis is broadly applicable. For example, in “Functional Data Analysis” (Ramsay and Silverman 2005) the authors highlight various case studies with different types of data; Berkeley Growth Study (Nancy Bayley 1941), economic data such as the non-durable goods manufacturing index, Canadian weather data, motion data from The Motion Analysis Laboratory at Children Hospital, San Diego (Olshen 1989).
These examples highlight a commonality of the type of data that functional data analysis applies to. These data are called functional data. Functional data may consist of discrete observations. However, the observations are random samples of a continuous function. For example, meter read data is often periodic but the process of energy consumption is continuous. Thus, periodic meter read data would be an excellent use case to apply functional data analysis techniques.
A unique feature of functional data analysis is that the fitted curves can be differentiated. This feature opens new avenues of research and analysis. The method described in this paper makes use of this feature. Specifically, the first derivative of the fitted curves, \(\frac{dx}{dt}\).
A search on Google Scholar for “"functional data analysis" +utilities +energy” resulted in 174 entries. Part of the motivation of this paper is to advance the use of functional data analysis in energy and across utilities. The primary motivation for this research is to propose a novel way to define the peak hours of consumption for individual customers. Now more than ever utilities need to be data-driven and customer-centric, and to have a deep understanding of their customers’ behavior. This paper investigates an application of functional data analysis that could help utilities to construct hyper-personalized time-of-use plans, to facilitate marketing and sales initiatives, or even to enhance existing analyses.
This analysis is based on a simple random sample of 54,797 Consumers Energy residential customers. The data consists of weekday, averaged hourly meter reads. Noise was added to the summarized data to adhere to Consumers Energy compliance regulations. Only customers with at least six months of consumption history were considered. For the sampled customers, the last 24 months of meter reads were obtained. However, a customer need not necessarily have 24 months of history. A sampled customer might only have 13 months of history for example.
At Consumers Energy, the on-peak hours for business customers are defined as weekdays from 11 am - 7 pm. While for residential customers on-peak is defined as weekdays 2 pm - 7 pm. These on-peak windows were defined from the point of view of the grid and describe the hours of the day during which electricity demand is the highest.
These on-peak windows, however, say nothing about an individual customer’s energy consumption behavior. A better understanding of the peak hours of consumption per customer would help identify customers to target for the time-of-use plan. The individual on-peak hours could also be used as features in a predictive analysis as additional metrics to describe customer behavior. An analysis of the on-peak hours constructed from the aggregated individual on-peak hours might reveal there are clusters of customers that behave similarly.
Consumers Energy is a public utility founded in 1886 and is located in Michigan. Consumers Energy provides electricity and natural gas to 6.8M customers. It has about 5.9 Gigawatts of generation capacity.5
This section outlines how the peak hours of consumption per customer are defined.
Sections:
The choice was made to summarize the data before applying functional data analysis. It is possible to apply FDA to ungrouped data and then summarize the fitted curves. However, given the size of the data, it’s simpler to work with summarized data. For example, there are millions of records for the 54,797 sampled customers. Additionally, the fitted curves are the same regardless of whether FDA was applied to the summarized or unsummarized data. For this analysis, the meter read data was summarized via the mean. The following examples highlight the nature of data summary steps.
Example: SQL:
SELECT
customer_id
, hour
, mean(consumption) as mean_consumption
FROM internal_db.internal_table
GROUP BY
customer_id,
hour
Example: R:
library(tidyverse)
raw_data %>%
dplyr::groupby(customer_id, hour) %>%
dplyr::summarize(mean_conumption = mean(consumption))
After the data is summarized it must be re-organized from a long format to a wide format where the observations/ records are summarized hourly meter reads and the variables/ features are customers. An example of the structure is provided below.
| Example Data Structure | |||
|---|---|---|---|
| customer_01 | customer_02 | customer_03 | customer_04 |
Also, for illustrative purposes, below is a plot of two customers’ summarized consumption behavior. The summarized hourly meter reads are plotted as points and connected via the default line plotting functionality.
FDA can be thought of as a nonparametric, flexible regression technique of the form \(f(t) = \Sigma^K_{k=1} \beta_k \phi_k(t)\) - this formulation is known as a basis function expansion (J. O. Ramsay and Silverman 2009). The k nonlinear functions, \(\phi(t)\), are called basis functions (J. O. Ramsay and Silverman 2009).
For this analysis, the basis functions are defined by the Fourier Series. That is \(\phi_k(t) = 1 + sin(wt) + cos(wt) + ...\) (J. O. Ramsay and Silverman 2009). The Fourier basis function was chosen because it is well suited for periodic data and it is also easily differentiated. 24 such of these functions were used - that is K = 24. This results in a fully parameterized model with coefficients for each of the 24 hours.
The parameterization is not problematic because a penalized method is used to fit a smooth curve to the data. The penalization has the form \(\lambda \int [Lf(t)]^2 dt\) (Ramsay and Silverman 2005). Where \(L\) is a differential operator. A specific formulation for \(L\) is used in conjunction with the Fourier basis function called the harmonic acceleration operator; \(Lf = D^3f + w^2D f\) (J. O. Ramsay and Silverman 2009). Where \(D^m\) represents the \(m^{th}\) derivative of the function \(f\) (J. O. Ramsay and Silverman 2009) and \(w\) = \(\frac{2 \pi}{T}\) and \(T\) is the range of \(t\) values that span the data (J. O. Ramsay and Silverman 2009).
The optimal value for the tuning parameter \(\lambda\) is selected via a grid search technique using a generalized cross-validation procedure (GCV) of the form \(GCV(\lambda) = \frac{n}{n-df(\lambda)} \frac{SSE}{n-df(\lambda)}\) (J. O. Ramsay and Silverman 2009). Where SSE is the sum of squared errors. The argmin is selected as the minimum \(log_{10}(\lambda)\).
Below is an image of the 24 Fourier basis functions.
The grid to be searched over consists of 2,001 \(\lambda\) values, from 0.00001 to 1,001.
As can be seen above, the argmin, \(log_{10}(\lambda)\) = -1.0808696 = \(\lambda\) = 0.08301. As \(\lambda \rightarrow \infty\) the curvature is increasingly penalized, while as \(\lambda \rightarrow 0\) the curve is free to fit the data as closely as possible using the 24 basis functions (J. O. Ramsay and Silverman 2009).
The optimal value of lambda is applied to produce a smooth, differentiable curve for each customer. The fitted smooth curves can be used to obtain an estimate of consumption for any period in the range of [1, 24] such as 01:05 or 13:52.
Below are the smoothed curves fitted to the same previously shown customers.
A visual assessment of the smoothed curves is presented below to highlight how the curves fit the data. In addition to the smoothed curves, a plot of the residuals is also provided. The residuals should appear as white noise ~ \(N(0, \sigma)\).
The graphs below represent the first 5 customers. These customers will be used as examples in the next few illustrations.
Since the proposed method of this analysis relies on derivatives, below are plots of the first and second derivatives (velocity and acceleration) for the same previously shown 5 customers. The plots below illustrate the underlying mechanism by which the maximum and minimum velocity will be obtained. The acceleration curves are illustrated just to showcase that further analysis using derivatives is possible.
The first derivative is used to obtain the maximum and minimum velocity. Similar to the undifferentiated curves, the differentiated curves can also be used to estimate the mean consumption for any period in the range of [1, 24].
Hourly meter reads are too sparse to provide an accurate view of consumption. To obtain a more granular assessment a finer grid of times is necessary. For this reason, the grid of time used to obtain the maximum and minimum velocity consists of the 1,381 minutes between the hours of 01 to 24.
| Individual On-Peak Times | ||
|---|---|---|
| First 5 customers | ||
| Customer | Start | End |
| 1 | 17.050000 | 24.00000 |
| 2 | 7.733333 | 22.75000 |
| 3 | 14.300000 | 17.08333 |
| 4 | 19.833333 | 23.53333 |
| 5 | 18.533333 | 10.26667 |
In the table above the whole number represents the hour and the decimal portion represents the minutes but in fractional hour terms. To convert the fractional hour to minutes multiply the decimal by 60. For example, for Customer 1, start is defined as the 17 hour and 3 minute.
| Individual On-Peak Times | ||
|---|---|---|
| First 5 customers | ||
| Customer | Start | End |
| 1 | 17:3 | 24:0 |
| 2 | 7:44 | 22:45 |
| 3 | 14:18 | 17:5 |
| 4 | 19:50 | 23:32 |
| 5 | 18:32 | 10:16 |
In conclusion, for the 54,797 sampled residential customers individual on-peak hours of consumption based on the customer’s own averaged data were defined.
This information can be used as-is for marketing or sales efforts, in product design, or even as part of another data science project. The individual on-peak hours can for instance be used as features in a predictive model. Using derivatives it is possible to create other potentially useful features such as the number of extrema per customer.
In this section, two methods to aggregate the individual on-peak windows are described. The outcome of this section is to define the group on-peak windows.
The simplest method to define a group on-peak window is to use kernel density estimation (KDE) to fit a density plot to the individual on-peak start and end data.
From the density plots above, it appears that the on-peak hours could be defined as 18 to 23. However, it can also be seen from the density plots that the distributions are multimodal, which suggests that a single on-peak window of 18 to 23 might be insufficient to adequately describe the average on-peak consumption behavior.
Since the KDE density plot shows that the data is multimodal an unsupervised learning method is used to define the group on-peak windows. Unlike the previous method where the start and end on-peak periods were assessed individually, this approach makes use of all the available data.
The modality of the start and end individual distributions suggests that there may be more than one group on-peak window. In this section unsupervised statistical learning methods are used to explore the individual on-peak data to determine how many on-peak windows there exist. Once defined, the sampled customers are clustered and a start and end window can be defined for each cluster.
Initially, a dimension-reduction technique called multidimensional scaling is used to visually inspect the data. The following are details about multidimensional scaling from the R function cmdscale:
Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities.
Euclidean distance was used to create the dissimilarity matrix.
To enhance the visual, the size of each point was increased and the transparency was modified to 7%. The darker areas highlight regions of high density.
An elbow plot and a silhouette plot were used to assess how many clusters K there may be. K-means clustering is then used as the clustering procedure. It is common and recommended to scale the features before clustering. However, since the data are on the same scale already there the data was not scaled.
Based on the above plots, the number of clusters K will be set to 7.
Finally, the on-peak hours for each cluster are defined via a KDE method as was used before except this time the start and end distributions are mostly unimodal. The modality of the distributions signals that each cluster has a unique set of start and end on-peak hours.
A natural next step would be to build a profile for each of the clusters. One such profile could be energy usage per cluster. In the below chart below, for each cluster, the fitted curves for the individual customers in that cluster were summarized via the mean to create the average cluster consumption profile.
R (R Core Team 2022), fda (Ramsay, Graves, and Hooker 2021), tidyverse (Wickham 2021), data.table (Dowle and Srinivasan 2021), future (Bengtsson 2022), future.apply (Bengtsson 2021a), gt (Iannone, Cheng, and Schloerke 2022), dtplyr (Wickham et al. 2022), parallelly (Bengtsson 2021b), parallelDist (Eckert 2022), bigmds (Pachón García and Delicado 2021), factoextra (Kassambara and Mundt 2020), ggplot (Wickham et al. 2021), arrow (Richardson et al. 2022), renv (Ushey 2022), parallelly (Bengtsson 2021b), knitr (Xie 2021)