U.S. Energy Plants Analysis

Introduction

Climate change is a point of contention in the United States. Anthropogenic emissions, the physical cause of climate change, remain high across the country despite the significant risks they pose to domestic and international security and well-being. The emissions that are driving climate change are released by the combustion of fossil fuels like coal, oil and natural gas. Thousands of fossil fuel power plants currently operate in the U.S., producing tens of millions of megawatt hours of energy and releasing millions of tons of emissions per year. 73.62 million MWh were produced by American fossil fuel power plants in 2023, for example (EIA 2024), releasing 4.770 million metric tons of carbon dioxide (EIA 2024). Renewable energy power plants present an alternative form of energy production that does not involve the emission of as many tons of carbon dioxide. These power plants aim to harvest the energy driving natural phenomena like wind, sunlight, and coursing bodies of water, and unlike fossil fuel power plants, these fuels will never become depleted. However, states are generally divided over the prospect of a transition - both in the types of energy they harvest and that their residents prefer - because of differences in party affiliation, domestic industries, and culture (Baz 2023). Given the time-sensitivity of the climate crisis and the need to channel public support for a transition, it is thus worth investigating if there is alignment between Americans’ opinions about the merits of renewables over fossil fuels and the actual energy portfolios of the states where they live and are politically engaged. The degree of alignment could help reveal how this problem should be approached from a civic perspective and what American climate organizers, policymakers, and all concerned citizens should consider as part of their efforts to facilitate the transition. This project seeks to quantify and visualize this alignment.

Data

Two sets of data were used in this project: a CSV file listing every power plant in the United States and a set of responses to questions about climate change and energy production. The former comes from the Energy Information Administration, a division of the United States Department of Energy. In addition to the title of each power plant, the file also contains their city, state, coordinates, capacities in MWh, and type (as in petroleum, coal, hydroelectric, etc). I downloaded this data set from the power plants section of the EIA’s Energy Atlas directly as a CSV file that did not need to be edited before I imported it into R. The second data set was published by the Yale Program on Climate Change Communication, and it consisted of the responses to a national opinion survey related to climate change and energy that Yale conducted. These data were also in CSV format, and it dedicated two columns to each question: one indicating the percentage of people who answered yes and the other for those who answered no. It also included a separate column that indicated the states where the respondents lived. Before importing these data into R I bolded the questions that were most relevant to my research question, which were those that touched on funding and expanding renewable energy and detaching ourselves from fossil fuels. This choice allowed me to establish as accurate a percentage preference for renewables over fossil fuels in each state as I could.

Professor Davies also provided me with a shapefile of the United States that was well-suited for the type of visualization I was hoping to achieve, as well as the steps the visualization involves. The raw shapefile was drawn from ArcGIS.

Methods

The first step I took was to load the necessary packages and the data.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(tigris)

## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.

#Loading the opinion data
opinion_data<-read_csv("Data/YCOM7_publicdata.csv")

## Rows: 5473 Columns: 66
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): geotype, geoname
## dbl (64): geoid, count, discuss, discussOppose, reducetax, reducetaxOppose, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Loading the power plant data
power_plants<-read_csv("Data/Power_Plants.csv")

## Rows: 12798 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): Plant_Name, Utility_Name, sector_name, Street_Address, City, Count...
## dbl (23): X, Y, OBJECTID, Plant_Code, Utility_ID, Zip, Install_MW, Total_MW,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Loading the states shapefile
#Using st_read to import the shape file
states<-st_read("Data/US_State_Boundaries/US_State_Boundaries.shp")

## Reading layer `US_State_Boundaries' from data source 
##   `/Users/sahmschiller/Library/CloudStorage/Box-Box/R/Final Project/Data/US_State_Boundaries/US_State_Boundaries.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 53 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: 17.6744 xmax: 179.7784 ymax: 71.38921
## Geodetic CRS:  NAD83

#Using the filter function to narrow data down to just the fifty states
unitedStates<-states %>% 
  filter(OBJECTID %in% c(1:25,28:60))

#Using shift_geometry to move Alaska and Hawaii beneath contiguous US for better visuals
unitedStatesFinal<-shift_geometry(unitedStates,position=c("below"))

Data Wrangling

I then selected for the questions in the opinions data that I identified prior to download, as I mentioned above. I assigned these questions to a new data frame. I also created a column that indicated the average of these responses, which I took to be the percentage of people in each state who preferred renewables.

#Selecting opinion data for relevant questions and states
selectedOpinion<-opinion_data%>%
  select(geoname,drillanwrOppose,
         drilloffshoreOppose,fundrenewables,generaterenewable,
         prioritycleanenergy,regulate)
energyOpinion<-selectedOpinion[2:52,]

#Take average of each column and insert these values into a new column, na.rm to avoid N/As
energyOpinion$average<-rowMeans(energyOpinion[, 2:7],na.rm=TRUE)

Next, I joined the opinion and power plant data with the states shapefile and ensured that these data could be mapped.

#Joining data with states via left_join by the columns indicating state
energyOpinionJoin<-left_join(energyOpinion,unitedStatesFinal,by=c("geoname"="NAME"))
plantsJoined<-left_join(unitedStatesFinal,power_plants,by=c("NAME"="State"))
#Ensuring that these data sets are able to be plotted as a map via st_as_sf
plantsFinal<-st_as_sf(plantsJoined)
energyOpinionMappable<-st_as_sf(energyOpinionJoin)

To prepare the power plants data, I pivoted the data frame to add up the number of each type of plant in each state. I then added columns counting the number of renewable and fossil fuel plants as well as the total number of plants per state.

#Power plant data wrangling
#Selecting for state and primary source of energy
powerPlantsTrimmed<-power_plants%>%
  select(State,PrimSource)
#Pivot_wider to create column that indicates count of each type of plant per state
powerPlantsPivoted<-powerPlantsTrimmed%>%
  count(State,PrimSource)%>%
  pivot_wider(names_from=PrimSource,values_from=n,values_fill=0)

#Create vectors that represent count of renewables and fossil fuels, respectively
renewable_count<-c("hydroelectric", "solar", "geothermal", "wind", "pumped storage", "biomass")
fossil_count<-c("coal","natural gas","petroleum")
#Create new columns that indicate sums of renewables and fossil fuels using above vectors 
powerPlantsPivoted$renewablesCount<-rowSums(powerPlantsPivoted[renewable_count],na.rm =TRUE)
powerPlantsPivoted$fossilCount<-rowSums(powerPlantsPivoted[fossil_count],na.rm=TRUE)

#Create column that combines counts of fossil fuel and renewable plants to establish total number per state
powerPlantsPivoted$totalCount<-powerPlantsPivoted$fossilCount+powerPlantsPivoted$renewablesCount

I then used these columns to create another for the proportion of renewable power plants per state.

#Create the plant proportion column by dividing renewables by total count of plants
powerPlantsPivoted$proportion<-powerPlantsPivoted$renewablesCount/powerPlantsPivoted$totalCount
#Multiplying by 100 for consistency with opinion data (converting to a percentage)
powerPlantsPivoted$proportion<-100*(powerPlantsPivoted$proportion)

To numerically capture the alignment between these percentages I merged these data.

#Combining powerPlantsPivoted with energyOpinionMappable
#Rename state column in energyOpinionMappable to allow for merge
names(energyOpinionMappable)[names(energyOpinionMappable) == "geoname"]<-"State"
#Merge data sets
mergedData<-merge(powerPlantsPivoted,energyOpinionMappable,by = "State")
#Select for state, average (opinion), and ratio (energy portfolio)
countData<-mergedData %>%
  select(State,average,proportion,geometry)
#Ensuring data is mappable
countMappableData<-st_as_sf(countData)

I then created a column that indicates the difference between the percentage of plants in each state that are renewable and the percentage of residents that prefer renewables.

#Subtracting the opinion percentage from the plant percentage to find difference
countMappableData$difference<-countMappableData$proportion-countMappableData$average
#Rordering columns for legibility
countMappableData <- countMappableData[, c(1,3,2,5,4)]

Count is an important metric for understanding the energy portfolio of each state, but capacity is too. So, I repeated the above process for this metric as well, such that I could integrate the percentage of a state’s energy capacity sourced from renewables into my analysis.

#Reperforming steps for capacity
#Creating a new data set that calculates renewable capacity
renewable_capacity<-power_plants %>%
  filter(PrimSource%in%c("hydroelectric","solar","geothermal","wind","pumped storage","biomass"))%>%
  group_by(State)%>%
  summarize(renewable_capacity=sum(Total_MW,na.rm=TRUE))
#Doing the same for fossil capacity
fossil_capacity<-power_plants%>%
  filter(PrimSource%in%c("coal","natural gas","petroleum"))%>%
  group_by(State)%>%
  summarize(fossil_capacity=sum(Total_MW,na.rm =TRUE))

#Merging these data sets
capacityRF<-merge(renewable_capacity,fossil_capacity,by="State")
#Making it mappable
capacityMerged<-merge(capacityRF,energyOpinionMappable,by = "State")
capacityMappable<-st_as_sf(capacityMerged)

#Selecting for relevant columns
capacityFinal<-capacityMappable%>%
  select(State,renewable_capacity,fossil_capacity,geometry,average)
#Creating column that indicates total capacity
capacityFinal$totalMW<-capacityFinal$renewable_capacity+capacityFinal$fossil_capacity

#Creating column that is proportion of renewable capacity compared to overall capacity
capacityFinal$proportion<-capacityFinal$renewable_capacity/capacityFinal$totalMW
#Multiplying proportion by 100 for consistency with opinion data
capacityFinal$proportion<-100*(capacityFinal$proportion)

#Cleaning up dataset to only include relevant columns
capacityFinal$renewable_capacity<-NULL
capacityFinal$fossil_capacity<-NULL
capacityFinal$totalMW<-NULL
#Creating column that calculates difference between proportion and opinion
capacityFinal$difference<-capacityFinal$proportion-capacityFinal$average
#Reordering columns
capacityFinal <- capacityFinal[, c(1,4,2,5,3)]

Deciding that capacity was the more important energy metric to my analysis, I then rescaled the difference between the capacity and opinion percentages from -1 to 1 and attached it as a new column. Anything greater than 0 indicated that the state’s percentage renewable capacity exceeds the percentage of its population that prefer renewables and the opposite for anything less than 0.

#Rescaling difference
capacityFinal$rescaledDifference<-rescale(capacityFinal$difference, to = c(-1, 1))

Visualizations

I created three visualizations for this project. The first was a map illustrating the difference in capacity and preference.

AlignmentPlot<-ggplot(capacityFinal)+
  geom_sf(aes(fill=rescaledDifference))+
  #Scale_fill_gradient2 for divergent color scale
  scale_fill_gradient2(low = "red", mid = "white", high = "blue", midpoint = 0,
                       name="Alignment")+       labs(title="Alignment between Energy Portfolio and Energy Opinions by State")+
  theme_void()
AlignmentPlot

#Saving and adjusting size for poster
ggsave("AlignmentMap.jpg",AlignmentPlot,width=23,height=15,units="in")

The next was a scatterplot of the data used for this map. Each point represents a state. I also tested the correlation between these variables.

AlignmentScatter<-ggplot(capacityFinal,aes(average,proportion))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE)+
  labs(title="Renewable Capacity Proportion vs. Preference",x="% of people who prefer renewables over fossil fuels",
       y="% of energy capacity drawn from renewables")
ggsave("AlignmentScatter.jpg",AlignmentScatter,width=5,height=5,units="in")

## `geom_smooth()` using formula = 'y ~ x'

AlignmentScatter

## `geom_smooth()` using formula = 'y ~ x'

#Correlation between capacity and preference
cor.test(capacityFinal$proportion,capacityFinal$average)

## 
##  Pearson's product-moment correlation
## 
## data:  capacityFinal$proportion and capacityFinal$average
## t = 1.4943, df = 49, p-value = 0.1415
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07089779  0.45799880
## sample estimates:
##       cor 
## 0.2087648

The final visualization I made was also a scatterplot but included the percentage of renewable plants per state instead of their capacity. I tested the correlation here too.

CountScatter<-ggplot(countMappableData,aes(average,proportion))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE)+
  labs(title="Renewable Plant Count vs. Preference",x="% of people who prefer renewables over fossil fuels",
       y="% of total plants that are renewable")
ggsave("CountScatter.jpg",CountScatter,width=5,height=5,units="in")

## `geom_smooth()` using formula = 'y ~ x'

CountScatter

## `geom_smooth()` using formula = 'y ~ x'

#Correlation between count and preference
cor.test(countMappableData$average,countMappableData$proportion)

## 
##  Pearson's product-moment correlation
## 
## data:  countMappableData$average and countMappableData$proportion
## t = 3.4654, df = 49, p-value = 0.00111
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1914935 0.6408884
## sample estimates:
##       cor 
## 0.4436639

Results and Conclusions

The findings demonstrate that the level of alignment between preference and actual energy portfolios varies widely across the United States. In many states, the percentage of energy able to be generated with renewables falls under the percentage of their residents that prefers renewables. These states are concentrated in the eastern half of the country. The western half is the opposite, with their percentage renewable capacity exceeding the percentage preference of their populations overall. There are a few states that run counter to this trend, such as New Hampshire, Arizona, and Utah, as well as Texas, Colorado, Nebraska, and Minnesota, which have relatively equal proportions of renewable energy generation and preference. However, this geographic trend could be the result of the west being more conducive to renewable energy production in light of their greater abundance of sunshine, wind, and land than in the east.

The correlation between capacity and preference was not statistically significant, indicating a disconnect between public opinion and renewable capacity across the States. The correlation between count and preference was statistically significant but still quite weak at a value of 0.44. This correlation might be due to the high number of renewable plants across the country. It is worth noting, however, that many of these plants are small-scale solar and wind farms that have much lower generating capacities than fossil fuel plants. Thus, while the count of renewable plants appears to more closely align with Americans’ preferences across the States, most of their energy might still be coming from fossil fuels as a result of difference in capacity.

Overall, these results show that energy portfolios do not align with the preferences of Americans. I was pleasantly surprised to see that many of the states that I often understood to be opposed to renewables in fact supported them more than what was being reflected in their states’ proportion of renewable capacities and plant counts. It was both worrying and relieving to see that so many states fell into this category, indicating the challenge that lay before the transition but also the popularity of renewable energy that renewable energy advocates could mobilize. I found the states in blue to be surprising as well, as it did not occur to me that states could have a more robust renewable energy portfolio than what their residents prefer. Upon reflection, this disparity could be attributable to the strength of the industry in the face of state-level opposition. These results thus display the simultaneous popularity and industrial power of the renewable industry and which states lack in either. Resources could be invested in uncovering the source of the discrepancy by state, which could take the form of research and advocacy. States with an alignment score less than 0 reflect the public opinion that can be harnessed, while states with scores above 0 highlight the influence of other factors that renewable energy advocates can leverage to tilt the scales in their favor. The American energy transition is possible, and its vanguards ought to factor this variation in alignment into their strategy going forward.

References

U.S. Energy Information Administration. “What Is U.S. Electricity Generation by Energy Source?” U.S. Energy Information Administration, February 29, 2024. https://www.eia.gov/tools/faqs/faq.php?id=427&t=3.
U.S. Energy Information Administration. “Short-Term Energy Outlook - U.S. Energy Information Administration (EIA),” 2024. https://www.eia.gov/outlooks/steo/report/total.php.
Baz, Lama El. “Republicans and Democrats Continue to Clash over Climate Change.” Globalaffairs.org, 2023. https://globalaffairs.org/research/public-opinion-survey/republicans-and-democrats-continue-clash-over-climate-change?utm_source=chatgpt.com.
“Power Plants,” n.d. https://atlas.eia.gov/datasets/eia::power-plants/explore.
Yale Program on Climate Change Communication. “Yale Climate Opinion Maps 2023 - Yale Program on Climate Change Communication,” November 22, 2024. https://climatecommunication.yale.edu/visualizations-data/ycom-us/.
“US State Boundaries,” n.d. https://hub.arcgis.com/datasets/TrainingServices::us-state-boundaries/explore?location=15.848048%2C-65.721188%2C3.01.