I. Introduction

Since the discovery of modern energy sources, humans’ lives have changed dramatically. The world entered an industrial revolution that impacted our lives positively and negatively. A couple of decades ago, we started understanding the negative impacts more clearly, specifically climate change and its effects. This led scientists to begin investigating other types of energy sources. Currently, the U.S is using five different sources of energy:

1- Coal

2- Natural Gas

3- Petroleum

4- Renewal Energy: Solar

5- Nuclear Energy

Overview

For this project, I want to investigate energy demand and use throughout the United States. I will look at the past 20 years and try to figure out the following questions:

1- Which State consumes the highest level of energy?

2- Which State has the highest energy expenditure?

3- Is there a correlation between the State’s level of consumption and expenditure?

4- Analyzing the State with the highest consumption, what energy sources are being used?

5- What about Maryland? How are we doing?

6- Finally, I want to compare the best energy source (renewable) to the most commonly used (petroleum), and see if there is a particular usage trend overall.

Dataset and Resources

My dataset is a combination of several excel worksheets that were compiled into one comma-separated values file.

Load the needed libraries and the dataset

# load the libraries 
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.7     v dplyr   1.0.9
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.1.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
library(dplyr)
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.3
library(knitr)
## Warning: package 'knitr' was built under R version 4.1.3
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.1.3
#Set Working station and load the datase.
setwd("C:/Users/mayss/Desktop/Data Science/Datasets")
energy <- read.csv('state_energy.csv')

Let us look at the dataset structure

str(energy)
## 'data.frame':    1071 obs. of  12 variables:
##  $ state                                  : chr  "AK" "AK" "AK" "AK" ...
##  $ year                                   : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
##  $ total_energy_consumption               : int  742061 735745 732909 733471 775019 797723 746457 723227 650500 639557 ...
##  $ total_energy_.expenditures             : num  2842 2785 2525 2817 4148 ...
##  $ energy_expenditures_percent_of_gdp     : num  10.6 9.77 8.49 8.79 11.75 ...
##  $ total_energy_expenditures_.pc          : num  4525 4395 3931 4344 6292 ...
##  $ gdp                                    : int  39407 40959 42979 42355 44055 45657 49190 51721 51252 56215 ...
##  $ coal_consumption                       : int  16455 15911 16429 12552 14075 13994 14981 13662 14705 14505 ...
##  $ natural_gas_consumption                : int  437972 413049 420808 415891 407944 434704 375719 372186 343938 343971 ...
##  $ petroleum_consumption                  : int  275255 289334 277400 285417 334121 333091 342272 323089 278725 265166 ...
##  $ nuclear_energy_consumed_for_electricity: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ renewable_energy_consumption           : int  12375 17446 18269 19607 18876 15930 13481 14286 13126 15912 ...

What are the variables?

Variable <-c("State", "Year", "Total Consumption", "Total Energy Expenditure", "Expenditures as Percent of Current-Dollar GDP ", "Total Expenditure per Capita", "GDP", "Coal Consumption", "Natural Gas Consumption", "Petroleum Consumption", "Renewable Energy Consumption", "Nuclear Consumption" )
Description <-c("US State", "Years:2000-2021", "Total energy consumption (Billion Btu)", "Total energy expenditures (million dollars)", "Energy expenditures as percent of current-dollar Gross Domestic Product (GDP)", "Total energy expenditures per capita in dollars", "Real gross domestic product (GDP) in million Chained (2012) Dollars", "Coal total consumption in billion Btu", "Natural gas total consumption (excluding supplemental gaseous fuels) in billion Btu", "All petroleum products total consumption, excluding biofuels in billion Btu", "Renewable energy sources total consumption in billion Btu", "Nuclear energy consumed for electricity generation in billion Btu")

Variable_Table <-data.frame(Variable, Description)

knitr::kable(Variable_Table, "pipe", align = "l", "c")
Variable Description
State US State
Year Years:2000-2021
Total Consumption Total energy consumption (Billion Btu)
Total Energy Expenditure Total energy expenditures (million dollars)
Expenditures as Percent of Current-Dollar GDP Energy expenditures as percent of current-dollar Gross Domestic Product (GDP)
Total Expenditure per Capita Total energy expenditures per capita in dollars
GDP Real gross domestic product (GDP) in million Chained (2012) Dollars
Coal Consumption Coal total consumption in billion Btu
Natural Gas Consumption Natural gas total consumption (excluding supplemental gaseous fuels) in billion Btu
Petroleum Consumption All petroleum products total consumption, excluding biofuels in billion Btu
Renewable Energy Consumption Renewable energy sources total consumption in billion Btu
Nuclear Consumption Nuclear energy consumed for electricity generation in billion Btu

II. Investigate Some Answers

1- Which State had the highest energy consumption and in which year?

I will start with a visualization first to see if a particular state pops out. Then I will take a closer look to gather more information. For the visualization, I am using the time series from the highchart because it makes it easier to see several components at once

highchart() %>%
hc_add_series(data = energy,
          type = "line", hcaes(x = year,
          y = total_energy_consumption,
          group = state)) %>%
  hc_xAxis(title = list(text="Year")) %>%
  hc_yAxis(title = list(text="Total Energy Consumptuin in millions")) %>%
 hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_legend(align = "right", 
            verticalAlign = "bottom")

As you can see, Texas takes the lead; California and Florida come next, trailing by five billion BTUs or more. Most other states cluster in a similar range consuming between 1.5 and 2.5 billion BTUs.

If we take a closer look, We can pinpoint the year of the highest consumption.

# finding the maximum value of total energy consumption
energy[which.max(energy$total_energy_consumption),] 
##     state year total_energy_consumption total_energy_.expenditures
## 923    TX 2019                 14234096                   137807.1
##     energy_expenditures_percent_of_gdp total_energy_expenditures_.pc     gdp
## 923                               7.39                        4754.1 1785318
##     coal_consumption natural_gas_consumption petroleum_consumption
## 923           992721                 4795213               6846892
##     nuclear_energy_consumed_for_electricity renewable_energy_consumption
## 923                                  431234                      1071707

Texas’s highest energy consumption was in 2019, with a total consumption of 11,829,075 billion BTUs!

2- Which state had the highest expenditure in millions of dollars?

#Find the row with the maximum expenditure
energy[which.max(energy$total_energy_.expenditures),] 
##     state year total_energy_consumption total_energy_.expenditures
## 915    TX 2011                 11829075                   162237.9
##     energy_expenditures_percent_of_gdp total_energy_expenditures_.pc     gdp
## 915                               12.1                        6326.2 1353600
##     coal_consumption natural_gas_consumption petroleum_consumption
## 915          1695239                 3800578               5394068
##     nuclear_energy_consumed_for_electricity renewable_energy_consumption
## 915                                  414881                       524941

Once again, Texas takes the lead. However, it is surprising to see that the highest total expenditure was in 2011, while Texas’s highest consumption was in 2019! So here is a question, Is there a relationship between the State’s consumption and expenditure?

I will construct a linear regression and calculate the correlation to answer this question.

But first, let us narrow it down to the State of Texas.

# creating a subset using the filter function
texas <- filter(energy, state == 'TX')

3- Is there a relationship between the Texas’s energy consumption and its energy expenditures?

# Create a scattor plot with a non-default theme
p1 <- ggplot(texas, aes(x = total_energy_consumption , y = total_energy_.expenditures)) +
 theme_light(base_size = 12) +
labs(title = "Consumption Vs. Expenditure For the State of Texas",
caption = "Source: U.S. Energy Information Administraion EIA\n") +
    xlab("Total Energy Consumption in billion Btu") +
  ylab("Total Energy Expenditures in million dollors") 

# convert the numbers on the x-axis to a standard form
options(scipen = 999)
p2<- p1 + geom_point() + geom_smooth(method=lm,se=FALSE,fullrange=TRUE,color="purple")+
         theme_classic()+
          stat_regline_equation(label.y = 180000, aes(label = ..eq.label..)) +
          stat_regline_equation(label.y = 175000, aes(label = ..rr.label..))
p2
## `geom_smooth()` using formula 'y ~ x'

As the visualization and the linear regression equation show, there is almost no association between the State’s energy consumption and expenditures. In fact, if we calculate the r value, we can see how weak the correlation is

Calculating the correlation

cor(texas$total_energy_consumption, texas$total_energy_.expenditures)
## [1] 0.07184423

r is about 0.07. This is a very week correlation!

4- What type of energy sources does Texas State use?

Since I am focusing on the energy sources, I will eliminate the other variables.

texas_source <- texas[,!names(texas) %in% c("total_energy_.expenditures", "total_energy_expenditures_.pc", "gdp", "energy_expenditures_percent_of_gdp", "total_energy_consumption", "state")] 

convert the wide dataset to a long one

texas_source_long <- texas_source %>%
  pivot_longer(   #converting columns -25, into one column called 'year'.
    cols = 2:6, 
    names_to = c("energy_type"),
    values_to = "billion_btu" #converting the amount consumed for each type of energy source to a single column, called 'billion_btu'.
  )
texas_source_long
## # A tibble: 105 x 3
##     year energy_type                             billion_btu
##    <int> <chr>                                         <int>
##  1  2000 coal_consumption                            1548184
##  2  2000 natural_gas_consumption                     4550106
##  3  2000 petroleum_consumption                       5767121
##  4  2000 nuclear_energy_consumed_for_electricity      391670
##  5  2000 renewable_energy_consumption                 101491
##  6  2001 coal_consumption                            1492977
##  7  2001 natural_gas_consumption                     4382935
##  8  2001 petroleum_consumption                       5773604
##  9  2001 nuclear_energy_consumed_for_electricity      398535
## 10  2001 renewable_energy_consumption                 102166
## # ... with 95 more rows

Let us plot

library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.1.3
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.1.3
texas_source_long %>%
    ggplot(aes(x = year, y = billion_btu)) +
geom_point(aes(color=energy_type), size = 2) +
scale_x_log10("Year") +
scale_y_log10("Total Consumption in Billion Btu") +
ggtitle("Texas Energy Sources 2000-2020\n") +
scale_color_discrete(name="Energy Source")+
  theme_clean()+
  scale_color_manual(labels = c("Coal", "Natural Gas", "Nuclear", "Petroleum", "Renewable"),
                     values = c("red", "blue", "orange", "purple", "green"))+
  labs(col="Energy Sources")
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

Even though petroleum usage is the highest, it is refreshing to see that renewable energy is making its way up the chart.

5- What about Maryland?

I will create a subset of the original dataset to see how Maryland is doing and only include the energy source columns. This will take several steps.

# create a subset for MD
maryland <- filter(energy, state == 'MD')

# eliminate unneeded columns
maryland_source <- maryland[,!names(maryland) %in% c("total_energy_.expenditures", "total_energy_expenditures_.pc", "gdp", "energy_expenditures_percent_of_gdp", "total_energy_consumption", "state")] 

# convert to long dataset
maryland_source_long <- maryland_source %>%
  pivot_longer(   #converting columns 2 to 5 into one column called 'energy_type'.
    cols = 2:6, 
    names_to = c("energy_type"),
    values_to = "billion_btu" #converting the amount consumed for each type of energy source to a single column, called 'billion_btu'.
  )

# check it out!
maryland_source_long
## # A tibble: 105 x 3
##     year energy_type                             billion_btu
##    <int> <chr>                                         <int>
##  1  2000 coal_consumption                             312162
##  2  2000 natural_gas_consumption                      219031
##  3  2000 petroleum_consumption                        546998
##  4  2000 nuclear_energy_consumed_for_electricity      144204
##  5  2000 renewable_energy_consumption                  54124
##  6  2001 coal_consumption                             318851
##  7  2001 natural_gas_consumption                      184783
##  8  2001 petroleum_consumption                        566433
##  9  2001 nuclear_energy_consumed_for_electricity      142612
## 10  2001 renewable_energy_consumption                  33277
## # ... with 95 more rows

I will use the time series in highchart again!

highchart() %>%
hc_add_series(data = maryland_source_long,
          type = "line", hcaes(x = year,
          y = billion_btu,
          group = energy_type)) %>%
  hc_xAxis(title = list(text="Year")) %>%
  hc_yAxis(title = list(text="Total Energy Consumptuin in millions")) %>%
  hc_title(text = "Maryland Energy Consumption 2000-2020") %>%
 hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_legend(align = "right", 
            verticalAlign = "bottom")

6- States’ Overview

III. Reflection

With the rise of gas prices and the rapid climate change, we need to seriously start using healthier and more economical energy sources. This study showed that while more states are using renewable and nuclear energy, we still need a heavier push in that direction. While my comparison was based on the State’s GDP, it would have been more accurate if I also had the population count for each State throughout the 20 years.

This was an eye-opener study, as I thought Maryland was doing better than what the chart showed. Hopefully, with more awareness, we become one of the pioneer States in using more efficient energy sources.

Resources

Dataset Resource The U.S Energy Information Adminstration https://www.eia.gov/state/seds/seds-data-complete.php?sid=MI#StatisticsIndicators

Background Resource Ohur World in Data: The world’s energy problem https://ourworldindata.org/worlds-energy-problem

Thank you!