Orcas Power and Light (Opalco) is a small rural electric co-op in the San Juan archipelago in Washington State. Opalco manages a transmission and distribution grid for roughly 16,000 members in the islands. Many of those members have chosen to add photovoltaic solar panels to their properties to generate renewable electricity. These resources are often interconnected with the grid so that power can feed back into the system when the member cannot use all of the electricity generated. This generation resource is a potential benefit to Opalco, but because it is variable in time and space based on weather, season, time of day, and location, it needs to be managed differently than traditional generation resources.
The number of interconnected members as well as the total generation has increased over time and is likely to continue increasing. Right now, a small proportion of members have generation capacity, but as more members add solar panels, the total generation will become a consequential proportion of the system. Likewise, Opalco operates a larger solar installation built as a community solar project and has another much larger community solar project planned for later this year.
This analysis will show that solar resources are growing at an increasing rate. As these resources continue to grow, it will be important to quantify where, when, and how much generation capacity is available.
In the customer service database that is used for billing co-op members, there is a table of historical usage that shows aggregated monthly usage for each power meter. There are several different rate classes that determine how much each unit of power costs. This data was filtered by rate classes used for solar interconnections using an SQL query and saved as a csv file that contains usage data for each meter for each month.
After loading the relevant packages from R, the first step in the analysis is to read the csv data into R and then create a tidy data format to make the analysis easier.
##### Lesson 3 lab: plotting historical solar usage to show growth over time #####
# load libraries
library(tidyverse)
library(here)
library(RColorBrewer)
library(scales)
# read in historical data
hist_usage <- read_csv(here('data', 'hist_usage.csv'))
Before analysis, the raw data needs to be cleaned up. The first step is to convert the billing date field to a date time data type and format it in a consistent way. This is done using lubridate’s parse_date_time function.
# first, use lubridate to parse the date field for each data frame
hist_usage$BI_BILL_DT_TM = parse_date_time(hist_usage$BI_BILL_DT_TM, 'dmy')
Once the dates are formatted properly, the data can run through a series of tidying functions piped together. The basic steps of the data tidying process include:
# tidying pipeline for members historical usage
clean_hist <- hist_usage %>%
select(BI_SRV_LOC_NBR, BI_BILL_DT_TM, BI_USAGE,
BI_REGISTER_SET_NBR,
BI_RATE_SCHED) %>%
filter(BI_REGISTER_SET_NBR == 2) %>%
rename(location_number = BI_SRV_LOC_NBR,
usage_kwh = BI_USAGE,
reg_set = BI_REGISTER_SET_NBR,
rate_schedule = BI_RATE_SCHED,
bill_date = BI_BILL_DT_TM) %>%
select(location_number, bill_date, usage_kwh, rate_schedule) %>%
mutate(location_number = as.character(location_number),
year = year(bill_date),
month = month(bill_date))
# Output the tidied data as a legible table
knitr::kable(head(clean_hist),
caption = "Table 1: First few rows of the table ready for analysis.")
location_number | bill_date | usage_kwh | rate_schedule | year | month |
---|---|---|---|---|---|
14206 | 2009-11-25 | 1863 | 0952 | 2009 | 11 |
14967 | 2009-11-25 | 136 | 0952 | 2009 | 11 |
7696 | 2009-11-25 | 19 | 0952 | 2009 | 11 |
7696 | 2009-11-25 | 0 | 0952 | 2009 | 11 |
15705 | 2009-11-25 | 131 | 0952 | 2009 | 11 |
14494 | 2009-11-25 | 0 | 0952 | 2009 | 11 |
One of the first steps in this analysis is to get a sense of the range of values in this data set. I know there is an extreme outlier somewhere in 2022 that is creating havoc in this data set. Figure 1 shows a histogram of the usage data from member installations. This histogram shows that there is one extreme outlier skewing the whole data set and making the rest of the data cluster around zero kwh.
# Using a histogram to demonstrate an extreme outlier
clean_hist %>%
ggplot(aes(x = usage_kwh)) +
geom_histogram() +
theme_classic() +
labs(x = "KWh",
y = "Frequency",
title = "Histogram Demonstrating Extreme Outlier")
Figure 1: There is an extreme outlier skewing this data that must indicate a false reading or a data entry error.
This outlier is an impossibly high number. It would indicate a solar installation that put out somewhere near 6000 Megawatt-hours of power in one month. As we will see in later analysis, this is more than the entire solar capacity on the system to date (see Figure 3).
Now that we can see how badly skewed this data is, we can filter out values that must be a false reading or a data entry error. This is done using the filter function in the dplyr package to take only values below a certain level. I’ve chosen 5,000,000 as the upper limit because it does get rid of that extreme outlier but is also high enought to continue to include all reasonable readings.
Starting with the first interconnections in 2009, more and more members have installed solar panels each year. Figure 2 shows the cumulative number of interconnected members each year since 2009. From this we can see that the number of solar interconnects has grown quite a bit and that the rate of growth is higher now that it was in previous years.
# Pipe tidied member data through some dplyr functions before piping into a plot
clean_hist %>%
filter(year < 2024) %>%
group_by(year) %>%
distinct(location_number) %>%
count(year) %>%
ggplot(aes(x = year, y = n)) +
geom_line() +
theme_classic() +
labs(x = "Year",
y = "Number of Interconnected Members",
title = "Growth of Interconnected Members Over Time")
Figure 2: Total number of interconnected members by year.
Another way to show how much solar capacity has grown is to examine the total amount of power produced by member generation. Figure 3 shows the number of Megawatt-hours produced by members each year. From this plot we can see that the amount of power generated has grown year over year as more people add solar panels to their property.
# Pipe tidied member data through a filter function to remove the current year and
# to remove outlier values skewing data before piping into a plot
clean_hist %>%
filter(year < 2024,
usage_kwh <5000000) %>%
group_by(year) %>%
mutate(total_usage = sum(usage_kwh/1000)) %>%
ggplot(aes(x = year, y = total_usage, color = total_usage)) +
geom_col() +
labs(x = "Year",
y = 'Solar Generation in MWh',
title = "Opalco Member's Solar Generation") +
theme_classic() +
scale_color_distiller(palette = "YlOrRd",
direction = 1,
name = "MWh")
Figure 3: Total member generation by year. This is the amount of power that member owned solar installations supply to the grid in total per year going back to 2009.
# save plot as a png with ggsave
ggsave("member_generation.png")
Future analysis could examine these figures compared to the total demand on the system to determine how much of the demand can be met by solar generation. Likewise, finer grained analysis of when the generation is occurring would be helpful to compare to when demand is higher or lower. Finally, spatial analysis of where the generation is happening and when can determine if parts of the network can meet a greater or lesser proportion of demand and what parts of the grid are potentially vulnerable to overloading from solar generation.
In addition to the member generation shown in Figure 3, the existing community solar project puts out around 500 Megawatt-hours per year, and the next community solar project will be nearly six times larger. Solar generation will make up a significant amount of demand in the future. Quantifying this generation and understanding when and where it is available will allow for better management of the system.