Introduction

This report analyzes state-level energy production and dependency patterns in the United States. The goal is to understand regional strengths and vulnerabilities and discuss their implications for U.S. energy security. The analysis uses data from the U.S. Energy Information Administration (EIA) State Energy Data System (SEDS) for the year 2022.

Data Loading and Preparation

The SEDS data, processed into a CSV format, contains information on total energy production, production by source (coal, natural gas, crude oil, renewables, nuclear), and total energy consumption for each U.S. state.

# Load the processed SEDS data for 2022
energy_data_raw <- read_csv("https://raw.githubusercontent.com/samato0624/DATA608/refs/heads/main/seds_processed_data_2022.csv")

# Define the mapping from state abbreviations to full state names
STATE_ABBR_TO_NAME <- c(
  AL = "Alabama", AK = "Alaska", AZ = "Arizona", AR = "Arkansas", CA = "California",
  CO = "Colorado", CT = "Connecticut", DE = "Delaware", FL = "Florida", GA = "Georgia",
  HI = "Hawaii", ID = "Idaho", IL = "Illinois", IN = "Indiana", IA = "Iowa",
  KS = "Kansas", KY = "Kentucky", LA = "Louisiana", ME = "Maine", MD = "Maryland",
  MA = "Massachusetts", MI = "Michigan", MN = "Minnesota", MS = "Mississippi", MO = "Missouri",
  MT = "Montana", NE = "Nebraska", NV = "Nevada", NH = "New Hampshire", NJ = "New Jersey",
  NM = "New Mexico", NY = "New York", NC = "North Carolina", ND = "North Dakota", OH = "Ohio",
  OK = "Oklahoma", OR = "Oregon", PA = "Pennsylvania", RI = "Rhode Island", SC = "South Carolina",
  SD = "South Dakota", TN = "Tennessee", TX = "Texas", UT = "Utah", VT = "Vermont",
  VA = "Virginia", WA = "Washington", WV = "West Virginia", WI = "Wisconsin", WY = "Wyoming",
  DC = "District of Columbia"
)

# Create or overwrite the 'State_Name' column using the mapping.
energy_data_raw <- energy_data_raw %>%
  dplyr::mutate(
    State_Name = dplyr::recode(State_Abbr, !!!STATE_ABBR_TO_NAME, .default = State_Abbr, .missing = State_Abbr)
  )

# Ensure State_Abbr is consistent (e.g., uppercase)
energy_data_raw$State_Abbr <- toupper(energy_data_raw$State_Abbr)

# Filter for relevant MSNs and ensure Value is numeric
energy_data <- energy_data_raw %>%
  filter(MSN %in% c("TETCB", "NUETB", "CLPRB", "NGMPB", "PAPRB", "REPRB", "TEPRB")) %>%
  mutate(Value = as.numeric(Value)) %>%
  filter(!is.na(Value)) # Remove rows where Value became NA after conversion

# Reshape data for easier use: one row per state, with columns for each energy metric
energy_data_wide <- energy_data %>%
  select(State_Name, State_Abbr, MSN, Value) %>% 
  tidyr::pivot_wider(names_from = MSN, values_from = Value, values_fill = 0) # Fill missing with 0, assuming no production/consumption

# Rename columns for clarity if needed, e.g., TETRB to TotalProduction
colnames(energy_data_wide) <- gsub("TEPRB", "TotalProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("CLPRB", "CoalProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("NGMPB", "NaturalGasProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("PAPRB", "CrudeOilProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("REPRB", "RenewableProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("NUETB", "NuclearProduction", colnames(energy_data_wide))
colnames(energy_data_wide) <- gsub("TETCB", "TotalConsumption", colnames(energy_data_wide))

head(energy_data_wide)
# Calculate Net Energy (Production - Consumption) as a proxy for dependency
# Positive means surplus, negative means deficit (dependent)
energy_data_wide <- energy_data_wide %>%
  mutate(NetEnergy = TotalProduction - TotalConsumption,
         DependencyRatio = ifelse(TotalConsumption > 0, (TotalProduction / TotalConsumption), NA)) # Avoid division by zero

# Calculate percentages for production sources
energy_data_wide <- energy_data_wide %>%
  mutate(
    PercentCoal = ifelse(TotalProduction > 0, (CoalProduction / TotalProduction) * 100, 0),
    PercentNaturalGas = ifelse(TotalProduction > 0, (NaturalGasProduction / TotalProduction) * 100, 0),
    PercentCrudeOil = ifelse(TotalProduction > 0, (CrudeOilProduction / TotalProduction) * 100, 0),
    PercentRenewable = ifelse(TotalProduction > 0, (RenewableProduction / TotalProduction) * 100, 0),
    PercentNuclear = ifelse(TotalProduction > 0, (NuclearProduction / TotalProduction) * 100, 0)
  )
head(energy_data_wide)

Interactive Choropleth Map

An interactive choropleth map is created using Leaflet to visualize state-level energy data. The map displays total energy production, and popups provide details on production by source and dependency status.

# Load US States GeoJSON data for mapping directly from URL
states_geojson_url <- "https://raw.githubusercontent.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json"

# Attempt to read the GeoJSON directly from the URL
tryCatch({
  us_states_sf <- sf::st_read(states_geojson_url, quiet = TRUE)
  message("US States GeoJSON loaded directly from URL.")
}, error = function(e) {
  stop(paste("Error loading GeoJSON from URL:", e$message, "Please ensure the URL is accessible and the GeoJSON is valid. You might need to install the necessary drivers for sf to read from https if not already configured (e.g., libcurl support for GDAL)."))
})

# Ensure the sf object was loaded successfully
if (!exists("us_states_sf") || is.null(us_states_sf) || nrow(us_states_sf) == 0) {
  stop("Failed to load valid GeoJSON data from the URL.")
}

# Merge energy data with spatial data
if (!("State_Name" %in% names(us_states_sf))) {
    if ("name" %in% names(us_states_sf)){
        us_states_sf <- us_states_sf %>% dplyr::rename(State_Name = name)
    } else if ("NAME" %in% names(us_states_sf)) {
        us_states_sf <- us_states_sf %>% dplyr::rename(State_Name = NAME)
    } else {
        stop("GeoJSON does not have a recognizable state name column (e.g., 'name' or 'NAME').")
    }
}

# Ensure consistent state naming for merging (e.g. trim whitespace)
us_states_sf$State_Name <- stringr::str_trim(us_states_sf$State_Name)

# Perform the merge
map_data <- us_states_sf %>% 
  dplyr::left_join(energy_data_wide, by = "State_Name") %>%
  dplyr::filter(!is.na(TotalProduction)) # Keep only states with data
# Ensure RColorBrewer is loaded (it's in your required_packages list)
# library(RColorBrewer)

if (nrow(map_data) > 0 && "NetEnergy" %in% names(map_data)) {
  # Create a reversed "YlOrRd" color palette for NetEnergy
  # Red will represent lower/negative values, Yellow will represent higher/positive values.
  n_palette_colors <- 7 
  rev_ylorrd_palette <- rev(RColorBrewer::brewer.pal(n_palette_colors, "YlOrRd"))

  pal_net_energy <- colorNumeric(
    palette = rev_ylorrd_palette,
    domain = map_data$NetEnergy,
    na.color = "#bdbdbd"
  )

  # Create popups 
  popup_content <- paste0(
    "<strong>State: </strong>", map_data$State_Name, "<br>",
    "<strong>Net Energy (Prod-Cons): </strong>", prettyNum(round(map_data$NetEnergy,0), big.mark=","), " Billion Btu<br>",
    "<strong>Total Production: </strong>", prettyNum(round(map_data$TotalProduction,0), big.mark=","), " Billion Btu<br>",
    "<strong>Total Consumption: </strong>", prettyNum(round(map_data$TotalConsumption,0), big.mark=","), " Billion Btu<br>",
    "<em>Production Sources (% of Total):</em><br>",
    "&nbsp;&nbsp;Coal: ", round(map_data$PercentCoal, 1), "%<br>",
    "&nbsp;&nbsp;Natural Gas: ", round(map_data$PercentNaturalGas, 1), "%<br>",
    "&nbsp;&nbsp;Crude Oil: ", round(map_data$PercentCrudeOil, 1), "%<br>",
    "&nbsp;&nbsp;Renewable: ", round(map_data$PercentRenewable, 1), "%<br>",
    "&nbsp;&nbsp;Nuclear: ", round(map_data$PercentNuclear, 1), "%<br>",
    "<em>Dependency Insights:</em><br>",
    ifelse(map_data$NetEnergy >= 0, 
           paste0("&nbsp;&nbsp;Status: Energy Surplus (Produces ", round(map_data$DependencyRatio * 100,0), "% of consumption)"),
           paste0("&nbsp;&nbsp;Status: Energy Deficit (Produces ", round(map_data$DependencyRatio * 100,0), "% of consumption, reliant on imports)")
    )
  )

  leaflet_map <- leaflet(map_data) %>%
    addProviderTiles(providers$CartoDB.Positron) %>%
    setView(lng = -98.5795, lat = 39.8283, zoom = 4) %>%
    addPolygons(
      fillColor = ~pal_net_energy(NetEnergy),
      fillOpacity = 0.8,
      color = "#000000",
      weight = 1,
      smoothFactor = 0.5,
      highlightOptions = highlightOptions(
        weight = 3,
        color = "#666",
        fillOpacity = 0.9,
        bringToFront = TRUE
      ),
      label = ~State_Name,
      popup = ~popup_content,
      labelOptions = labelOptions(
        style = list("font-weight" = "normal", padding = "3px 8px"),
        textsize = "15px",
        direction = "auto"
      )
    ) %>%
    addLegend(
      pal = pal_net_energy,
      values = ~NetEnergy,
      opacity = 0.7,
      title = "Net Energy (Billion Btu)<br>(Red: Deficit, Yellow: Surplus)",
      position = "bottomright"
    )
  
  

} else {
  message("Map data is empty or NetEnergy column is missing. Cannot generate map. Check data loading and preparation steps.")
}
# Display the map
leaflet_map 

Narrative Overview

Key Regional Patterns and Dependencies

The interactive map reveals distinct regional patterns in U.S. energy production and consumption. States like Texas, Wyoming, and Pennsylvania exhibit high total energy production, largely driven by fossil fuels (crude oil, natural gas, and coal, respectively). Conversely, states in regions like New England and the Pacific Coast (e.g., California, despite significant in-state production) often show higher consumption relative to their production, indicating reliance on energy imports from other states or countries. The Midwest displays a mixed profile, with some states being significant producers of coal and renewables (like wind in Iowa), while others are net consumers. Vulnerabilities are apparent in states with low overall production and high consumption, or those heavily reliant on a single energy source that might face market volatility or policy changes.

Implications for U.S. Energy Security and Policy

These regional disparities have significant implications for U.S. energy security. States with surplus energy, particularly from diverse sources, contribute to national resilience. However, regions dependent on external sources are vulnerable to supply disruptions and price shocks. Policy considerations should focus on promoting diverse energy portfolios within states, investing in inter-state energy infrastructure to improve distribution, and fostering renewable energy development nationwide to reduce overall import dependency and enhance environmental sustainability. Furthermore, understanding specific state-level vulnerabilities (e.g., reliance on a single aging power plant type or a dominant fuel source with price instability) can help tailor targeted policies to bolster local and national energy security, ensuring a more robust and adaptable energy system for the future.