Since I worked at the company Johnson Controls as a senior UX designer, I designed energy management systems as enterprise solutions. I had the opportunity to observe various types of data and visualizations. However, the dashboards of data cater to different personas with diverse goals and literacy levels. The UX design porfolio is in my website: https://www.miyoung.design
As a designer, I focus on data visualization for aesthetics. However, I am deeply interested in understanding the data and finding the best fit for it to enhance people’s data literacy.
Here are my attempts to display the same data in different ways to provide varied insights. Additionally, I aim to pose questions for identifying next steps based on simple, flat data visualizations.
I hope that my trials will spark curiosity about the data and foster appreciation for the insights provided by data visualization.
I have explored the data related with energy. I am learning myself of data visualization utilizing R programming. In this note, I used “fpp2” R package, following the book tutorial ‘Forecasting: Principles and Practice (2nd)’ by Rob J Hundman and George Athanasopoulos. Link for book: https://otexts.com/fpp2/
Also, DataCamp has a course for Forecasting in R. Link for DataCamp: https://app.datacamp.com/learn/courses/forecasting-in-r
Our World in Data site is inspiring. Here is about ‘The world’s energy problem’. https://ourworldindata.org/worlds-energy-problem
Net generation by energy sources: electric utility
Monthly data from 2021 Jan - 2024 Jan
Data source: U.S. Energy Information Administration
Downloaded on April 25, 2024
I selected a specific dataset: a Net generation by energy source: electric utilities. It is total amount of electricity produced (or generated) by various energy sources within the electric utility sector. This includes energy generated from sources such as coal, natural gas, nuclear, hydroelectric, wind, solar, and others.
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2 3.5.1 ✔ fma 2.5
## ✔ forecast 8.22.0 ✔ expsmooth 2.3
##
library(ggplot2)
library(readr)
library(RColorBrewer)
library(knitr)
Once imported data, usually check the data format, class with head() and summary(), or class().
AllEnergy_month <- read_csv("mydata/Net_generation_for_electric_utility_monthly_v2.csv", col_names=TRUE)
## Rows: 277 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Month
## dbl (11): all fuels (utility-scale) thousand megawatthours, coal thousand me...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(AllEnergy_month)
## # A tibble: 6 × 12
## Month all fuels (utility-scal…¹ coal thousand megawa…² United States : petr…³
## <chr> <dbl> <dbl> <dbl>
## 1 Jan-01 236467. 143856. 11120.
## 2 Feb-01 199802. 121453. 5767.
## 3 Mar-01 211942. 127005. 6521.
## 4 Apr-01 197499. 115801. 6661.
## 5 May-01 215508. 125839. 6729.
## 6 Jun-01 233622. 134020. 7415.
## # ℹ abbreviated names: ¹`all fuels (utility-scale) thousand megawatthours`,
## # ²`coal thousand megawatthours`,
## # ³`United States : petroleum liquids thousand megawatthours`
## # ℹ 8 more variables:
## # `United States : petroleum coke thousand megawatthours` <dbl>,
## # `natural gas thousand megawatthours` <dbl>,
## # `United States : other gases thousand megawatthours` <dbl>, …
summary(AllEnergy_month)
## Month all fuels (utility-scale) thousand megawatthours
## Length:277 Min. :147107
## Class :character 1st Qu.:181263
## Mode :character Median :196209
## Mean :198531
## 3rd Qu.:214858
## Max. :258901
## NA's :1
## coal thousand megawatthours
## Min. : 29039
## 1st Qu.: 68076
## Median :100478
## Mean : 95326
## 3rd Qu.:120752
## Max. :149494
## NA's :1
## United States : petroleum liquids thousand megawatthours
## Min. : 436.6
## 1st Qu.: 698.8
## Median : 930.1
## Mean : 2011.3
## 3rd Qu.: 2779.9
## Max. :11119.7
## NA's :1
## United States : petroleum coke thousand megawatthours
## Min. : 99.81
## 1st Qu.: 465.61
## Median : 599.25
## Mean : 611.19
## 3rd Qu.: 766.37
## Max. :1150.66
## NA's :1
## natural gas thousand megawatthours
## Min. : 11711
## 1st Qu.: 23470
## Median : 37974
## Mean : 41397
## 3rd Qu.: 56370
## Max. :101984
## NA's :1
## United States : other gases thousand megawatthours
## Min. :-0.6553
## 1st Qu.: 0.1293
## Median : 5.4644
## Mean :10.8683
## 3rd Qu.:16.2696
## Max. :81.4182
## NA's :1
## nuclear thousand megawatthours
## Min. :27884
## 1st Qu.:33787
## Median :35966
## Mean :36296
## 3rd Qu.:38729
## Max. :48876
## NA's :1
## conventional hydroelectric thousand megawatthours
## Min. :12963
## 1st Qu.:17580
## Median :19960
## Mean :20413
## 3rd Qu.:22904
## Max. :29918
## NA's :1
## United States : other renewables thousand megawatthours
## Min. : 115.3
## 1st Qu.: 616.4
## Median : 2206.6
## Mean : 2852.9
## 3rd Qu.: 4105.7
## Max. :10460.1
## NA's :1
## United States : hydro-electric pumped storage thousand megawatthours
## Min. :-888.37
## 1st Qu.:-530.20
## Median :-419.64
## Mean :-432.99
## 3rd Qu.:-321.91
## Max. : -25.69
## NA's :1
## United States : other thousand megawatthours
## Min. :15.41
## 1st Qu.:39.02
## Median :44.76
## Mean :44.92
## 3rd Qu.:51.58
## Max. :69.65
## NA's :1
I transformed the data into a time series format and utilized autoplot() to experiment with forecasting models. The ts() function is used to convert the data into a time series format.
ts_AllEnergy_month <- ts(AllEnergy_month[,-1], start=c(2001,1), end=c(2023,12), frequency=12)
autoplot(ts_AllEnergy_month)+
theme(legend.text=element_text(size=6))
I want to break down the data by utility types. Therefore, I check the columns to browse the utility types simply by using colnames().
colnames(ts_AllEnergy_month)
## [1] "all fuels (utility-scale) thousand megawatthours"
## [2] "coal thousand megawatthours"
## [3] "United States : petroleum liquids thousand megawatthours"
## [4] "United States : petroleum coke thousand megawatthours"
## [5] "natural gas thousand megawatthours"
## [6] "United States : other gases thousand megawatthours"
## [7] "nuclear thousand megawatthours"
## [8] "conventional hydroelectric thousand megawatthours"
## [9] "United States : other renewables thousand megawatthours"
## [10] "United States : hydro-electric pumped storage thousand megawatthours"
## [11] "United States : other thousand megawatthours"
Then I selected only 3 type of sources to focus: Total fules, Natural Gas, Other renewable. autoplot()’s facet argument is similar function to facet_grid() of ggplot.
sub_3types<-ts_AllEnergy_month[,c(1,5,9)]
autoplot(sub_3types, facet=FALSE)
autoplot(sub_3types, facet=TRUE)
I tried to draw about the total fuels data with autoplot(), seasonplot(), ggsubseriesplot().
sub_total<-ts_AllEnergy_month[,1]
autoplot(sub_total)
The ggseasonplot() function makes it easy to visualize year-by-year data as either a line or a polar graph. Although both the line graph and the polar graph use the same dataset, they offer different perspectives.
ggseasonplot(sub_total)+
ylab("thousand megawatthours") +
ggtitle("Seasonal plot: all fuels")
Specifically, the polar graph provides a clearer understanding of the seasonal pattern for each year. Additionally, it highlights that fuel generation during the summer months in the US (July and August) has definitely increased over the past 10 years. Book link: https://otexts.com/fpp2/seasonal-plots.html
ggseasonplot(sub_total, polar=TRUE)+
ylab("thousand megawatthours") +
ggtitle("Polar Plot: all fuels in US")
ggsubseriesplot() function is alternative way to draw time series data. “The horizontal lines diciate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.
In this example of 22 year dataset, I can notice there is seasonal pattern. Also, this graph can help me to see the within-group pattern. As example, July and August exhibit similar patterns. Also, I could see the outliers.
Book link: https://otexts.com/fpp2/seasonal-subseries-plots.html
ggsubseriesplot(sub_total)+
ylab("thousand megawatthours") +
ggtitle("Subseries plot: all fuels in US")
I selected natural gas data as subset because it’s increasing over time.
sub_naturalgas<-ts_AllEnergy_month[,5]
autoplot(sub_naturalgas)+
ylab("thousand megawatthours") +
ggtitle("Natural Gas in US")
Interestingly, natural gas generation is also high in the summertime over time. It makes me wonder what generator uses gas to transform electricity.
ggseasonplot(sub_naturalgas)
ggseasonplot(sub_naturalgas, polar=TRUE)
ggsubseriesplot(sub_naturalgas)
Natural gas was the largest source—about 40%—of U.S. electricity generation in 2022. Natural gas is used in steam turbines and gas turbines to generate electricity. Cite: https://www.eia.gov/energyexplained/electricity/electricity-in-the-us.php
include_graphics("images/stackbar-naturalgas.jpg")
Renewable energy resources include biomass, hydro, geothermal, solar, wind, ocean thermal, wave action, and tidal action. It’s worth noting that this list does not include conventional hydropower, which is provided in a separate dataset.
sub_renewables<-ts_AllEnergy_month[,9]
autoplot(sub_renewables)
It’s interesting to note that renewable energy generation is not as
high as in other months, given that most generation from other sources
occurs during the summer. I wonder why the summer months don’t have more
generation. With longer daylight hours, one might expect solar energy to
contribute more during these months compared to others.
ggseasonplot(sub_renewables)
ggseasonplot(sub_renewables, polar=TRUE)
ggsubseriesplot(sub_renewables)
When I look at the map, it becomes much clearer which state has more electric generation from renewable sources. Texas seems to obviously have the highest generation. On the eia.gov site, you can drill down into the map to see the types of plants.
include_graphics("images/NetGeneration_Renewable.png")
The state
map is impressive for showing the types of plants. Texas, in particular,
has numerous solar and wind plants. However, I’m still unsure why there
isn’t as much production during summer months compared to winter
months.
include_graphics("images/renewable_plant_TX.png")
So, it triggered me to explore the seasonal pattern.
Renewable_month <- read_csv("mydata/Net_generation_other_renewables_United_States_monthly-CSV.csv", col_names=TRUE, show_col_types = FALSE)
ts_Renewable <- ts(Renewable_month[,-1], start=c(2001,1), end=c(2024,2), frequency=12)
autoplot(ts_Renewable)
renew_subset_indi<-ts_Renewable[ ,3]
ggseasonplot(renew_subset_indi)+
ylab("thousand megawatthours") +
ggtitle("Renewable source: independent power producers")
ggseasonplot(renew_subset_indi, polar=TRUE)+
ylab("thousand megawatthours") +
ggtitle("Renewable source: independent power producers")
renew_subset_elec<-ts_Renewable[ ,2]
ggseasonplot(renew_subset_elec)+
ylab("thousand megawatthours") +
ggtitle("Renewable source: electric utility")
ggseasonplot(renew_subset_elec, polar=TRUE)+
ylab("thousand megawatthours") +
ggtitle("Renewable source: electric utility")
It would be a significant leap to assume this. However, the chart from ‘Our World in Data’ prompts me to think: does the presence of more renewable sources in the US contribute to reducing CO2 emissions, or is the US losing industries that create greenhouse gases (GHGs)? Cite: https://ourworldindata.org/worlds-energy-problem
include_graphics("images/consumption-co2-per-capita-vs-gdppc(1).png")