2025-06-14

Project Introduction

  • I chose a data set from Kaggle that represents the CO2 emissions from each of the 50 states since 1970. The data set also includes information about D.C. and the United States as a whole. The emissions data is further broken down into subcategories including the sector of origin and type of fuel. Sector of origin includes residential, commercial, transportation, electric power, industrial, and total carbon emissions. Fuel type includes coal, petroleum, natural gas, and all fuels. The value of carbon dioxide emissions is reported in millions of metric tons.

  • This is the link to the data set I chose to work with: https://www.kaggle.com/datasets/alistairking/u-s-co2-emissions/data

Problems

Through graphical analysis of the emissions data I want to identify the state, sector, and fuel type that has historically contributed the most CO2. I also want to examine the overall trends of CO2 emissions since 1970. Furthermore, I will look at the CO2 emissions trends by state, sector, and fuel type.

To begin my analysis, I want to get a look at the general trend of carbon dioxide emissions since 1970. To do so, I will group the emissions data by year. I did this by creating a new data frame:

yearly_emissions = co2_df %>% group_by(year) %>% summarise(total_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

General Emissions Trends

General Carbon Dioxide Trends

Emissions by Sector

Trends in Sectors of Carbon Dioxide Production

Emissions By State

Emissions By State

The first graph I rendered is difficult to read due the the data bunching towards the bottom of the graph. In order to fix this, I decided to further sort my data and remove “United States” from my list of states. This because the “United States” variable represents the total emissions for each year, so it was skewing my data set.

I did this by adding a filter in my data frame definition.

state_emissions1 = co2_df %>% filter(state.name != “United States”) %>% group_by(state.name, year) %>% summarise(total_state_emissions1 = sum(value, na.rm = TRUE), .groups = ‘drop’)

Revised Emissions By State

Emissions By State Continued

The graph on the previous page is a large improvement compared to the first but still left something to be desired. There are too many colors and they are hard to tell apart. In order to fix this, I wanted to make the graph interactive using plotly.

Interactive Emissions by State

Emissions By State (Final)

The interactive graph on the previous slide is a great representation of the data we are analyzing. It’s easy to identify Texas and California as the largest contributors to overall carbon dioxide emissions. However, their general trends do not necessarily match the overall trend we saw in emissions at the beginning of the presentation.

Lastly, I wanted to look at the trends in emissions based on the usage of certain fuel types. I did this by grouping emissions by fuel type and year into a new data frame.

fuel_emissions = co2_df %>% group_by(fuel.name, year) %>% summarise(total_fuel_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

Emissions By Fuel Type

Final Thoughts

As we can see in the graph on the previous slide, the sharp decrease we see in overall emissions beginning in 2007 directly coincides with the decline of coal as a fuel source. From all of the trends we have seen thus far, this seems to be the best explanation of why overall carbon dioxide emissions have fallen since 2007.

This data by itself provides a broad understanding of carbon dioxide emissions in the United States since 1970. We are able to draw sweeping generalities and set up the framework for further analysis. The information provided in this data set can be combined with other data sets to draw deeper connections and perhaps find some way to reduce emissions. Potential data sets to be explored in combination with this set could be population density, demographics, or even number one selling vehicle each year.

Thank you!