DATA 608 Story 7

Investigating the Relationship between Energy Production and Dependency in the United States

For this assignment, we were given the task to investigate the distribution of energy production and dependency in the United States. Using 2022 data from the U.S. Energy Information Administration (EIA). The analysis focuses on understanding which states are net producers of energy versus those that rely heavily on external sources to meet their consumption needs. We extracted state-level data on energy production in BTU, consumption by source and calculated a dependency ratio using population data to reflect each state’s level of reliance on imported energy. Using this data, we developed an interactive choropleth map to visually highlight regional strengths, vulnerabilities and energy source composition. This analysis aims to uncover geographic patterns in energy self-sufficiency and inform potential areas for policy intervention and investment in energy infrastructure.

# STEP 1 & STEP 2
# Load libraries
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
library(leaflet)
library(sf)

## Warning: package 'sf' was built under R version 4.3.3

## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

library(geojsonio)

## Registered S3 method overwritten by 'geojsonsf':
##   method        from   
##   print.geojson geojson
## 
## Attaching package: 'geojsonio'
## 
## The following object is masked from 'package:base':
## 
##     pretty

# Load the datasets
consumption_file <- "/Users/zigcah/Downloads/Primary energy consumption by source.xlsx"
production_file <- "/Users/zigcah/Downloads/Primary energy production in Btu – renewable energy and total energy.xlsx"
population_file <- "/Users/zigcah/Downloads/Population, GDP, and degree days.xlsx"

# Load energy consumption sheets
coal_df <- read_excel(consumption_file, sheet = "Coal", skip = 2)
gas_df <- read_excel(consumption_file, sheet = "Natural gas", skip = 2)
petro_df <- read_excel(consumption_file, sheet = "Petroleum", skip = 2)
nuclear_df <- read_excel(consumption_file, sheet = "Nuclear", skip = 2)
renew_df <- read_excel(consumption_file, sheet = "Total renewable energy", skip = 2)

# Extract 2022 data
extract_2022 <- function(df, colname) {
  df %>%
    select(State, `2022`) %>%
    rename(!!colname := `2022`) %>%
    filter(!is.na(State))
}

coal_2022 <- extract_2022(coal_df, "Coal")
gas_2022 <- extract_2022(gas_df, "Natural_Gas")
petro_2022 <- extract_2022(petro_df, "Petroleum")
nuclear_2022 <- extract_2022(nuclear_df, "Nuclear")
renew_2022 <- extract_2022(renew_df, "Renewable")

# Merge consumption data
consumption_df <- reduce(list(coal_2022, gas_2022, petro_2022, nuclear_2022, renew_2022),
                         left_join, by = "State") %>%
  mutate(Total_Consumption = rowSums(across(Coal:Renewable), na.rm = TRUE))

# Map state abbreviations to full names
state_abbrev_to_name <- c(
  AK = "Alaska", AL = "Alabama", AR = "Arkansas", AZ = "Arizona", CA = "California",
  CO = "Colorado", CT = "Connecticut", DC = "District of Columbia", DE = "Delaware",
  FL = "Florida", GA = "Georgia", HI = "Hawaii", IA = "Iowa", ID = "Idaho",
  IL = "Illinois", IN = "Indiana", KS = "Kansas", KY = "Kentucky", LA = "Louisiana",
  MA = "Massachusetts", MD = "Maryland", ME = "Maine", MI = "Michigan", MN = "Minnesota",
  MO = "Missouri", MS = "Mississippi", MT = "Montana", NC = "North Carolina", ND = "North Dakota",
  NE = "Nebraska", NH = "New Hampshire", NJ = "New Jersey", NM = "New Mexico", NV = "Nevada",
  NY = "New York", OH = "Ohio", OK = "Oklahoma", OR = "Oregon", PA = "Pennsylvania",
  RI = "Rhode Island", SC = "South Carolina", SD = "South Dakota", TN = "Tennessee", TX = "Texas",
  UT = "Utah", VA = "Virginia", VT = "Vermont", WA = "Washington", WI = "Wisconsin", WV = "West Virginia",
  WY = "Wyoming"
)

consumption_df <- consumption_df %>%
  mutate(State = state_abbrev_to_name[State])

# Load production data
prod_df <- read_excel(production_file, sheet = "P5B", col_names = FALSE)

## New names:
## • `` -> `...1`
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`

prod_clean <- prod_df %>%
  select(State = ...14, Total_Production = ...15) %>%
  filter(!is.na(State), !is.na(Total_Production)) %>%
  mutate(State = str_replace_all(State, "\\n", " ")) %>%
  filter(State != "United States e")

# Load population and GDP
pop_df <- read_excel(population_file, sheet = "Total population", skip = 2) %>%
  select(State, Population = `2022`) %>%
  filter(!is.na(State))

gdp_df <- read_excel(population_file, sheet = "Current-dollar GDP", skip = 2) %>%
  select(State, GDP = `2022`) %>%
  filter(!is.na(State))

# Merge datasets
energy_df <- consumption_df %>%
  left_join(prod_clean, by = "State") %>%
  left_join(pop_df, by = "State") %>%
  left_join(gdp_df, by = "State") %>%
  mutate(
    Dependency_Ratio = pmax((Total_Consumption - as.numeric(Total_Production)) / Total_Consumption, 0),
    across(c(Coal, Natural_Gas, Petroleum, Nuclear, Renewable), ~ . / Total_Consumption * 100, .names = "{.col}_Pct")
  )

# Load GeoJSON
states_geo <- geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json", what = "sp")
states_geo@data <- states_geo@data %>%
  rename(State = NAME) %>%
  left_join(energy_df, by = "State") %>%
  mutate(across(c(Coal_Pct, Natural_Gas_Pct, Petroleum_Pct, Nuclear_Pct, Renewable_Pct,
                  Total_Production, Total_Consumption, Dependency_Ratio), ~ as.numeric(.)))

# Create popups 
popup_info <- paste0(
  "<strong>", states_geo@data$State, "</strong><br/>",
  "Total Production: ", ifelse(is.na(states_geo@data$Total_Production), "N/A", scales::comma(states_geo@data$Total_Production)), " Btu<br/>",
  "Total Consumption: ", ifelse(is.na(states_geo@data$Total_Consumption), "N/A", scales::comma(states_geo@data$Total_Consumption)), " Btu<br/>",
  "Dependency Ratio: ", ifelse(is.na(states_geo@data$Dependency_Ratio), "N/A", scales::percent(states_geo@data$Dependency_Ratio, accuracy = 0.1)), "<br/>",
  "Energy Mix (%):<br/>",
  " - Coal: ", ifelse(is.na(states_geo@data$Coal_Pct), "N/A", round(states_geo@data$Coal_Pct, 1)), "%<br/>",
  " - Natural Gas: ", ifelse(is.na(states_geo@data$Natural_Gas_Pct), "N/A", round(states_geo@data$Natural_Gas_Pct, 1)), "%<br/>",
  " - Petroleum: ", ifelse(is.na(states_geo@data$Petroleum_Pct), "N/A", round(states_geo@data$Petroleum_Pct, 1)), "%<br/>",
  " - Nuclear: ", ifelse(is.na(states_geo@data$Nuclear_Pct), "N/A", round(states_geo@data$Nuclear_Pct, 1)), "%<br/>",
  " - Renewable: ", ifelse(is.na(states_geo@data$Renewable_Pct), "N/A", round(states_geo@data$Renewable_Pct, 1)), "%"
)

# Color palette
pal <- colorNumeric(palette = "YlOrRd", domain = states_geo@data$Dependency_Ratio)

Interpretation

My analysis of the interactive choropleth map shows a clear regional pattern in energy production and dependency across the United States. States rich in natural resources such as Texas, Wyoming, and North Dakota, produce more energy than they consume, resulting in low or even negative dependency ratios. The map also displayed most states with low dependency rates are located in the midwest region, precisely those north of Texas, and I believe this is due to the low population density in these states in comparison with other regions of the United States. On the other hand, highly populated states like California, New York and several states in the Northeast region of the U.S. display high dependency ratios, which signals that they rely significantly on imported energy. This pattern aligns with the fact that many of these dependent states have limited fossil fuel production and dense urban infrastructure, which increases consumption. While the map supports the idea that geography and natural resource availability are primary drivers of a state’s energy independence, there are also some exceptions. An example of this is the fact that some high energy producing states still show moderate dependency levels, possibly due to the scale of their industrial or residential energy demand.

Overall, I believe this visualization highlights potential vulnerabilities in national energy security, especially in states and regions that consume far more energy than they produce. These states are more exposed to disruptions in their supply chain, price fluctuations and policy changes related to imported energy. It also underscores the opportunity for renewable energy expansion in these regions. Upon reviewing the data, I also observed that some of the most dependent states have relatively low renewable energy percentages, suggesting a gap that could be addressed with policy intervention and investment.

DATA 608 Story 7

Biyag Dukuray

2025-05-18

Investigating the Relationship between Energy Production and Dependency in the United States

Visualization

Interactive Choropleth Map

Interpretation