1. Introduction

Orcas Power and Light (Opalco) is a small rural electric co-op in the San Juan archipelago in Washington State. Opalco manages a transmission and distribution grid for roughly 16,000 members in the islands. Many of those members have chosen to add photovoltaic solar panels to their properties to generate renewable electricity. These resources are often interconnected with the grid so that power can feed back into the system when the member cannot use all of the electricity generated. This generation resource is a potential benefit to Opalco, but because it is variable in time and space based on weather, season, time of day, and location, it needs to be managed differently than traditional generation resources.

The number of interconnected members as well as the total generation has increased over time and is likely to continue increasing. Right now, a small proportion of members have generation capacity, but as more members add solar panels, the total generation will become a consequential proportion of the system. Likewise, Opalco operates a larger solar installation built as a community solar project and has another much larger community solar project planned for later this year.

This analysis will show that solar resources are growing at an increasing rate. As these resources continue to grow, it will be important to quantify where, when, and how much generation capacity is available.

2. Data

2.1 Data acquisition

In the customer service database that is used for billing co-op members, there is a table of historical usage that shows aggregated monthly usage for each power meter. There are several different rate classes that determine how much each unit of power costs. This data was filtered by rate classes used for solar interconnections using an SQL query and saved as a csv file that contains usage data for each meter for each month.

After loading the relevant packages from R, the first step in the analysis is to read the csv data into R and then create a tidy data format to make the analysis easier.

##### Lesson 3 lab: plotting historical solar usage to show growth over time #####

# load libraries
library(tidyverse)
library(here)
library(RColorBrewer)
library(scales)

# read in historical data 
hist_usage <- read_csv(here('data', 'hist_usage.csv'))

2.2 Data tidying

Before analysis, the raw data needs to be cleaned up. The first step is to convert the billing date field to a date time data type and format it in a consistent way. This is done using lubridate’s parse_date_time function.

# first, use lubridate to parse the date field for each data frame
hist_usage$BI_BILL_DT_TM = parse_date_time(hist_usage$BI_BILL_DT_TM, 'dmy')

Once the dates are formatted properly, the data can run through a series of tidying functions piped together. The basic steps of the data tidying process include:

  1. Taking a subset of the columns based on what is relevant for the analysis.
  2. Filtering results to include only the electricity flowing back to the grid
  3. Re-naming fields to be shorter or easier to understand
  4. Converting an identification number to a character data type for ease of analysis
  5. Creating new fields for month and year variables
# tidying pipeline for members historical usage
clean_hist <- hist_usage %>%
  select(BI_SRV_LOC_NBR, BI_BILL_DT_TM, BI_USAGE,
         BI_REGISTER_SET_NBR,
         BI_RATE_SCHED) %>%
  filter(BI_REGISTER_SET_NBR == 2) %>%
  rename(location_number = BI_SRV_LOC_NBR,
         usage_kwh = BI_USAGE,
         reg_set = BI_REGISTER_SET_NBR,
         rate_schedule = BI_RATE_SCHED,
         bill_date = BI_BILL_DT_TM) %>% 
  select(location_number, bill_date, usage_kwh, rate_schedule) %>% 
  mutate(location_number = as.character(location_number),
         year = year(bill_date),
         month = month(bill_date))


# Output the tidied data as a legible table
knitr::kable(head(clean_hist),
            caption = "Table 1: First few rows of the table ready for analysis.")
Table 1: First few rows of the table ready for analysis.
location_number bill_date usage_kwh rate_schedule year month
14206 2009-11-25 1863 0952 2009 11
14967 2009-11-25 136 0952 2009 11
7696 2009-11-25 19 0952 2009 11
7696 2009-11-25 0 0952 2009 11
15705 2009-11-25 131 0952 2009 11
14494 2009-11-25 0 0952 2009 11

3. Analysis

3.1 Data exploration

One of the first steps in this analysis is to get a sense of the range of values in this data set. I know there is an extreme outlier somewhere in 2022 that is creating havoc in this data set. Figure 1 shows a histogram of the usage data from member installations. This histogram shows that there is one extreme outlier skewing the whole data set and making the rest of the data cluster around zero kwh.

# Using a histogram to demonstrate an extreme outlier

clean_hist %>% 
  ggplot(aes(x = usage_kwh)) +
  geom_histogram() +
  theme_classic() +
  labs(x = "KWh", 
       y = "Frequency",
       title = "Histogram Demonstrating Extreme Outlier")
Figure 1: There is an extreme outlier skewing this data that must indicate a false reading or a data entry error.

Figure 1: There is an extreme outlier skewing this data that must indicate a false reading or a data entry error.

This outlier is an impossibly high number. It would indicate a solar installation that put out somewhere near 6000 Megawatt-hours of power in one month. As we will see in later analysis, this is more than the entire solar capacity on the system to date (see Figure 3).

Now that we can see how badly skewed this data is, we can filter out values that must be a false reading or a data entry error. This is done using the filter function in the dplyr package to take only values below a certain level. I’ve chosen 5,000,000 as the upper limit because it does get rid of that extreme outlier but is also high enought to continue to include all reasonable readings.

3.2 Total Member Generation

Starting with the first interconnections in 2009, more and more members have installed solar panels each year. Figure 2 shows the cumulative number of interconnected members each year since 2009. From this we can see that the number of solar interconnects has grown quite a bit and that the rate of growth is higher now that it was in previous years.

# Pipe tidied member data through some dplyr functions before piping into a plot

clean_hist %>%
  filter(year < 2024) %>% 
  group_by(year) %>% 
  distinct(location_number) %>% 
  count(year) %>% 
  ggplot(aes(x = year, y = n)) +
  geom_line() +
  theme_classic() +
  labs(x = "Year",
       y = "Number of Interconnected Members",
       title = "Growth of Interconnected Members Over Time")
Figure 2: Total number of interconnected members by year.

Figure 2: Total number of interconnected members by year.

Another way to show how much solar capacity has grown is to examine the total amount of power produced by member generation. Figure 3 shows the number of Megawatt-hours produced by members each year. From this plot we can see that the amount of power generated has grown year over year as more people add solar panels to their property.

# Pipe tidied member data through a filter function to remove the current year and 
#    to remove outlier values skewing data before piping into a plot 

clean_hist %>% 
  filter(year < 2024,
         usage_kwh <5000000) %>% 
  group_by(year) %>%
  mutate(total_usage = sum(usage_kwh/1000)) %>% 
  ggplot(aes(x = year, y = total_usage, color = total_usage)) +
  geom_col() +
  labs(x = "Year", 
       y = 'Solar Generation in MWh',
       title = "Opalco Member's Solar Generation") +
  theme_classic() +
  scale_color_distiller(palette = "YlOrRd", 
                        direction = 1,
                        name = "MWh")
Figure 3: Total member generation by year. This is the amount of power that member owned solar installations supply to the grid in total per year going back to 2009.

Figure 3: Total member generation by year. This is the amount of power that member owned solar installations supply to the grid in total per year going back to 2009.

# save plot as a png with ggsave
ggsave("member_generation.png")

4. Conclusion and Further Considerations

Future analysis could examine these figures compared to the total demand on the system to determine how much of the demand can be met by solar generation. Likewise, finer grained analysis of when the generation is occurring would be helpful to compare to when demand is higher or lower. Finally, spatial analysis of where the generation is happening and when can determine if parts of the network can meet a greater or lesser proportion of demand and what parts of the grid are potentially vulnerable to overloading from solar generation.

In addition to the member generation shown in Figure 3, the existing community solar project puts out around 500 Megawatt-hours per year, and the next community solar project will be nearly six times larger. Solar generation will make up a significant amount of demand in the future. Quantifying this generation and understanding when and where it is available will allow for better management of the system.