Project 1

2025-06-14

Project Introduction

I chose a data set from Kaggle that represents the CO₂ emissions from each of the 50 states since 1970. The data set also includes information about D.C. and the United States as a whole. The emissions data is further broken down into subcategories including the sector of origin and type of fuel. Sector of origin includes residential, commercial, transportation, electric power, industrial, and total carbon emissions. Fuel type includes coal, petroleum, natural gas, and all fuels. The value of carbon dioxide emissions is reported in millions of metric tons.
This is the link to the data set I chose to work with: https://www.kaggle.com/datasets/alistairking/u-s-co2-emissions/data

Problems

Through graphical analysis of the emissions data I want to identify the state, sector, and fuel type that has historically contributed the most CO₂. I also want to examine the overall trends of CO₂ emissions since 1970. Furthermore, I will look at the CO₂ emissions trends by state, sector, and fuel type.

To begin my analysis, I want to get a look at the general trend of carbon dioxide emissions since 1970. To do so, I will group the emissions data by year. I did this by creating a new data frame:

yearly_emissions = co2_df %>% group_by(year) %>% summarise(total_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

General Emissions Trends

General Carbon Dioxide Trends

As we can see from the interactive graph in the previous slide, there has been a general downward trend of carbon dioxide emissions since 2007. This is following what had been an upward trend since 1983.

Next, I wanted to create a graph that would show how the various sectors of carbon emission contribute to the overall total. I did this by creating a new data frame that grouped emissions by sector name and year.

sector_emissions = co2_df %>% group_by(sector.name, year) %>% summarise(total_sector_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

Emissions by Sector

Trends in Sectors of Carbon Dioxide Production

The graph on the previous slide shows electric power being the biggest carbon contributor until 2015. Now commercial carbon dioxide emissions are the largest contributor to overall emissions in the United States.

Next, I wanted to take a look at how each state’s carbon dioxide emissions compared to one another. I did this by grouping emissions by state and year.

state_emissions = co2_df %>% group_by(state.name, year) %>% summarise(total_state_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

Emissions By State

The first graph I rendered is difficult to read due the the data bunching towards the bottom of the graph. In order to fix this, I decided to further sort my data and remove “United States” from my list of states. This because the “United States” variable represents the total emissions for each year, so it was skewing my data set.

I did this by adding a filter in my data frame definition.

state_emissions1 = co2_df %>% filter(state.name != “United States”) %>% group_by(state.name, year) %>% summarise(total_state_emissions1 = sum(value, na.rm = TRUE), .groups = ‘drop’)

Revised Emissions By State

Emissions By State Continued

The graph on the previous page is a large improvement compared to the first but still left something to be desired. There are too many colors and they are hard to tell apart. In order to fix this, I wanted to make the graph interactive using plotly.

Interactive Emissions by State

Emissions By State (Final)

The interactive graph on the previous slide is a great representation of the data we are analyzing. It’s easy to identify Texas and California as the largest contributors to overall carbon dioxide emissions. However, their general trends do not necessarily match the overall trend we saw in emissions at the beginning of the presentation.

Lastly, I wanted to look at the trends in emissions based on the usage of certain fuel types. I did this by grouping emissions by fuel type and year into a new data frame.

fuel_emissions = co2_df %>% group_by(fuel.name, year) %>% summarise(total_fuel_emissions = sum(value, na.rm = TRUE), .groups = ‘drop’)

Emissions By Fuel Type

Final Thoughts

As we can see in the graph on the previous slide, the sharp decrease we see in overall emissions beginning in 2007 directly coincides with the decline of coal as a fuel source. From all of the trends we have seen thus far, this seems to be the best explanation of why overall carbon dioxide emissions have fallen since 2007.

This data by itself provides a broad understanding of carbon dioxide emissions in the United States since 1970. We are able to draw sweeping generalities and set up the framework for further analysis. The information provided in this data set can be combined with other data sets to draw deeper connections and perhaps find some way to reduce emissions. Potential data sets to be explored in combination with this set could be population density, demographics, or even number one selling vehicle each year.

Project Introduction

Problems

General Emissions Trends

General Carbon Dioxide Trends

Emissions by Sector

Trends in Sectors of Carbon Dioxide Production

Emissions By State

Emissions By State

Revised Emissions By State

Emissions By State Continued

Interactive Emissions by State

Emissions By State (Final)

Emissions By Fuel Type

Final Thoughts

Thank you!