Motive

Since I worked at the company Johnson Controls as a senior UX designer, I designed energy management systems as enterprise solutions. I had the opportunity to observe various types of data and visualizations. However, the dashboards of data cater to different personas with diverse goals and literacy levels. The UX design porfolio is in my website: https://www.miyoung.design

As a designer, I focus on data visualization for aesthetics. However, I am deeply interested in understanding the data and finding the best fit for it to enhance people’s data literacy.

Here are my attempts to display the same data in different ways to provide varied insights. Additionally, I aim to pose questions for identifying next steps based on simple, flat data visualizations.

I hope that my trials will spark curiosity about the data and foster appreciation for the insights provided by data visualization.

Book and Learning Materials

Data Source

Libraries

library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2   3.5.1      ✔ fma       2.5   
## ✔ forecast  8.22.0     ✔ expsmooth 2.3
## 
library(ggplot2)
library(readr)
library(RColorBrewer)
library(knitr)

Data

Import data

Once imported data, usually check the data format, class with head() and summary(), or class().

AllEnergy_month <- read_csv("mydata/Net_generation_for_electric_utility_monthly_v2.csv", col_names=TRUE)
## Rows: 277 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Month
## dbl (11): all fuels (utility-scale) thousand megawatthours, coal thousand me...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(AllEnergy_month)
## # A tibble: 6 × 12
##   Month  all fuels (utility-scal…¹ coal thousand megawa…² United States : petr…³
##   <chr>                      <dbl>                  <dbl>                  <dbl>
## 1 Jan-01                   236467.                143856.                 11120.
## 2 Feb-01                   199802.                121453.                  5767.
## 3 Mar-01                   211942.                127005.                  6521.
## 4 Apr-01                   197499.                115801.                  6661.
## 5 May-01                   215508.                125839.                  6729.
## 6 Jun-01                   233622.                134020.                  7415.
## # ℹ abbreviated names: ¹​`all fuels (utility-scale) thousand megawatthours`,
## #   ²​`coal thousand megawatthours`,
## #   ³​`United States : petroleum liquids thousand megawatthours`
## # ℹ 8 more variables:
## #   `United States : petroleum coke thousand megawatthours` <dbl>,
## #   `natural gas thousand megawatthours` <dbl>,
## #   `United States : other gases thousand megawatthours` <dbl>, …
summary(AllEnergy_month)
##     Month           all fuels (utility-scale) thousand megawatthours
##  Length:277         Min.   :147107                                  
##  Class :character   1st Qu.:181263                                  
##  Mode  :character   Median :196209                                  
##                     Mean   :198531                                  
##                     3rd Qu.:214858                                  
##                     Max.   :258901                                  
##                     NA's   :1                                       
##  coal thousand megawatthours
##  Min.   : 29039             
##  1st Qu.: 68076             
##  Median :100478             
##  Mean   : 95326             
##  3rd Qu.:120752             
##  Max.   :149494             
##  NA's   :1                  
##  United States : petroleum liquids thousand megawatthours
##  Min.   :  436.6                                         
##  1st Qu.:  698.8                                         
##  Median :  930.1                                         
##  Mean   : 2011.3                                         
##  3rd Qu.: 2779.9                                         
##  Max.   :11119.7                                         
##  NA's   :1                                               
##  United States : petroleum coke thousand megawatthours
##  Min.   :  99.81                                      
##  1st Qu.: 465.61                                      
##  Median : 599.25                                      
##  Mean   : 611.19                                      
##  3rd Qu.: 766.37                                      
##  Max.   :1150.66                                      
##  NA's   :1                                            
##  natural gas thousand megawatthours
##  Min.   : 11711                    
##  1st Qu.: 23470                    
##  Median : 37974                    
##  Mean   : 41397                    
##  3rd Qu.: 56370                    
##  Max.   :101984                    
##  NA's   :1                         
##  United States : other gases thousand megawatthours
##  Min.   :-0.6553                                   
##  1st Qu.: 0.1293                                   
##  Median : 5.4644                                   
##  Mean   :10.8683                                   
##  3rd Qu.:16.2696                                   
##  Max.   :81.4182                                   
##  NA's   :1                                         
##  nuclear thousand megawatthours
##  Min.   :27884                 
##  1st Qu.:33787                 
##  Median :35966                 
##  Mean   :36296                 
##  3rd Qu.:38729                 
##  Max.   :48876                 
##  NA's   :1                     
##  conventional hydroelectric thousand megawatthours
##  Min.   :12963                                    
##  1st Qu.:17580                                    
##  Median :19960                                    
##  Mean   :20413                                    
##  3rd Qu.:22904                                    
##  Max.   :29918                                    
##  NA's   :1                                        
##  United States : other renewables thousand megawatthours
##  Min.   :  115.3                                        
##  1st Qu.:  616.4                                        
##  Median : 2206.6                                        
##  Mean   : 2852.9                                        
##  3rd Qu.: 4105.7                                        
##  Max.   :10460.1                                        
##  NA's   :1                                              
##  United States : hydro-electric pumped storage thousand megawatthours
##  Min.   :-888.37                                                     
##  1st Qu.:-530.20                                                     
##  Median :-419.64                                                     
##  Mean   :-432.99                                                     
##  3rd Qu.:-321.91                                                     
##  Max.   : -25.69                                                     
##  NA's   :1                                                           
##  United States : other thousand megawatthours
##  Min.   :15.41                               
##  1st Qu.:39.02                               
##  Median :44.76                               
##  Mean   :44.92                               
##  3rd Qu.:51.58                               
##  Max.   :69.65                               
##  NA's   :1

Transform to a time series data

I transformed the data into a time series format and utilized autoplot() to experiment with forecasting models. The ts() function is used to convert the data into a time series format.

ts_AllEnergy_month <- ts(AllEnergy_month[,-1], start=c(2001,1), end=c(2023,12), frequency=12)
autoplot(ts_AllEnergy_month)+
  theme(legend.text=element_text(size=6))

Check the column names

I want to break down the data by utility types. Therefore, I check the columns to browse the utility types simply by using colnames().

colnames(ts_AllEnergy_month)
##  [1] "all fuels (utility-scale) thousand megawatthours"                    
##  [2] "coal thousand megawatthours"                                         
##  [3] "United States : petroleum liquids thousand megawatthours"            
##  [4] "United States : petroleum coke thousand megawatthours"               
##  [5] "natural gas thousand megawatthours"                                  
##  [6] "United States : other gases thousand megawatthours"                  
##  [7] "nuclear thousand megawatthours"                                      
##  [8] "conventional hydroelectric thousand megawatthours"                   
##  [9] "United States : other renewables thousand megawatthours"             
## [10] "United States : hydro-electric pumped storage thousand megawatthours"
## [11] "United States : other thousand megawatthours"

Then I selected only 3 type of sources to focus: Total fules, Natural Gas, Other renewable. autoplot()’s facet argument is similar function to facet_grid() of ggplot.

sub_3types<-ts_AllEnergy_month[,c(1,5,9)]
autoplot(sub_3types, facet=FALSE)

autoplot(sub_3types, facet=TRUE)

Subset of data based on the type of energy source

1. All fuels

I tried to draw about the total fuels data with autoplot(), seasonplot(), ggsubseriesplot().

sub_total<-ts_AllEnergy_month[,1]
autoplot(sub_total)

Seasonal subseries plots

Seasonplot of all fuels

The ggseasonplot() function makes it easy to visualize year-by-year data as either a line or a polar graph. Although both the line graph and the polar graph use the same dataset, they offer different perspectives.

ggseasonplot(sub_total)+
  ylab("thousand megawatthours") +
  ggtitle("Seasonal plot: all fuels")

Specifically, the polar graph provides a clearer understanding of the seasonal pattern for each year. Additionally, it highlights that fuel generation during the summer months in the US (July and August) has definitely increased over the past 10 years. Book link: https://otexts.com/fpp2/seasonal-plots.html

ggseasonplot(sub_total, polar=TRUE)+
  ylab("thousand megawatthours") +
  ggtitle("Polar Plot: all fuels in US")

Subseriesplot of all fules

ggsubseriesplot() function is alternative way to draw time series data. “The horizontal lines diciate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.

In this example of 22 year dataset, I can notice there is seasonal pattern. Also, this graph can help me to see the within-group pattern. As example, July and August exhibit similar patterns. Also, I could see the outliers.

Book link: https://otexts.com/fpp2/seasonal-subseries-plots.html

ggsubseriesplot(sub_total)+
  ylab("thousand megawatthours") +
  ggtitle("Subseries plot: all fuels in US")

2. Natural gas

I selected natural gas data as subset because it’s increasing over time.

sub_naturalgas<-ts_AllEnergy_month[,5]
autoplot(sub_naturalgas)+
  ylab("thousand megawatthours") +
  ggtitle("Natural Gas in US")

High natural gas generation during summer time is surprising

Interestingly, natural gas generation is also high in the summertime over time. It makes me wonder what generator uses gas to transform electricity.

ggseasonplot(sub_naturalgas)

ggseasonplot(sub_naturalgas, polar=TRUE)

ggsubseriesplot(sub_naturalgas)

Fossil fuels are the largest sources of energy for electricity generation

Natural gas was the largest source—about 40%—of U.S. electricity generation in 2022. Natural gas is used in steam turbines and gas turbines to generate electricity. Cite: https://www.eia.gov/energyexplained/electricity/electricity-in-the-us.php

include_graphics("images/stackbar-naturalgas.jpg")

3. Other renewables

Renewable energy resources include biomass, hydro, geothermal, solar, wind, ocean thermal, wave action, and tidal action. It’s worth noting that this list does not include conventional hydropower, which is provided in a separate dataset.

sub_renewables<-ts_AllEnergy_month[,9]
autoplot(sub_renewables)


It’s interesting to note that renewable energy generation is not as high as in other months, given that most generation from other sources occurs during the summer. I wonder why the summer months don’t have more generation. With longer daylight hours, one might expect solar energy to contribute more during these months compared to others.

ggseasonplot(sub_renewables)

Questions

  • Why summer has lower generation by renewable sources?
ggseasonplot(sub_renewables, polar=TRUE)

ggsubseriesplot(sub_renewables)

Net generation from renewable sources excluding hydroelectric by state by state

When I look at the map, it becomes much clearer which state has more electric generation from renewable sources. Texas seems to obviously have the highest generation. On the eia.gov site, you can drill down into the map to see the types of plants.

include_graphics("images/NetGeneration_Renewable.png")

The state map is impressive for showing the types of plants. Texas, in particular, has numerous solar and wind plants. However, I’m still unsure why there isn’t as much production during summer months compared to winter months.

include_graphics("images/renewable_plant_TX.png")

So, it triggered me to explore the seasonal pattern.

Renewable_month <- read_csv("mydata/Net_generation_other_renewables_United_States_monthly-CSV.csv", col_names=TRUE, show_col_types = FALSE)
ts_Renewable <- ts(Renewable_month[,-1], start=c(2001,1), end=c(2024,2), frequency=12)
autoplot(ts_Renewable)

renew_subset_indi<-ts_Renewable[ ,3]

ggseasonplot(renew_subset_indi)+
  ylab("thousand megawatthours") +
  ggtitle("Renewable source: independent power producers")

ggseasonplot(renew_subset_indi, polar=TRUE)+
  ylab("thousand megawatthours") +
  ggtitle("Renewable source: independent power producers")

renew_subset_elec<-ts_Renewable[ ,2]

ggseasonplot(renew_subset_elec)+
  ylab("thousand megawatthours") +
  ggtitle("Renewable source: electric utility")

ggseasonplot(renew_subset_elec, polar=TRUE)+
  ylab("thousand megawatthours") +
  ggtitle("Renewable source: electric utility")

US carbon emission reduction

It would be a significant leap to assume this. However, the chart from ‘Our World in Data’ prompts me to think: does the presence of more renewable sources in the US contribute to reducing CO2 emissions, or is the US losing industries that create greenhouse gases (GHGs)? Cite: https://ourworldindata.org/worlds-energy-problem

include_graphics("images/consumption-co2-per-capita-vs-gdppc(1).png")