Introduction

columns

Introduction

  • Carbon dioxide (CO2) is the major human-made greenhouse gas that contribute to more than 80% of global warming. Since the industrial revolution, the amount of CO2 in the atmosphere has increased due to increasing human emissions. Addressing the challenge posed by CO2 emission also require the understanding of the factors that influence it.
  • Till date, research studies to understand how the population and economic activities in the 50 US states affect CO2 emission are lacking.
  • This project aims to investigate the relationship between population and CO2 emissions across the US states.

columns

Research Question

  • Do population size and economic activity affect CO2 emissions across the 50 U.S. states?

Hypothesis

  • Most populous US states produce more CO2 than smaller states.

To test this hypothesis, three major variables will be used

  • CO2 emissions (dependent variable)
  • Population size (independent variable)
  • GDP size (independent variable)

Data

columns

Dataset

  • I used two main datasets:

  • Annual CO2 Emissions Dataset (1970–2021) from the the U.S. Energy Information Administration (EIA). This data comprises total CO2 emissions for all the 50 US states from 1970 to 2021, and measured in million metric tons (MMT). It has 51 rows and 51 columns, where rows contain states and the columns contain the years.

  • State Population Dataset (2020) from the U.S. Census Bureau. This is the 2020 population data for each of the 50 US states expressed millions. It has 51 rows containing the states and 2 columns containing the population data.

Data structure

  • The analysis uses state-level CO2 emissions data from 1970 to 2021, along with state population and GDP data. I reorganized the emissions data so that each row represents a state–year observation, and population and GDP data were merged by state to create combined datasets for correlation and regression analyses.

Methods

  • To answer my research question, I will use linear regression and correlation methods because population size, GDP, and CO2 emissions are continuous variables.

  • The aim is to assess the strength and direction of their relationships. Correlation provides a summary measure of association, while linear regression allows population size and economic activity to be evaluated as predictors of CO2 emissions. These methods are appropriate for examining broad state-level patterns rather than making causal claims.

Unit of analysis

  • The unit of analysis is the U.S. states. CO2 emissions are averaged over time at the state level, allowing for comparisons across states.

  • The unit of analysis for the time-series analysis is the state/year.

Mapping CO2 emissions in the 50 stattes from 1970 to 2021

  • Before I continue the analysis, I need to summarize the average emissions in each state from the year 1970 - 2021

columns

  • The chart ranks U.S. states by their average annual CO2 emissions from 1970 to 2021.
  • From the data, the following states are top 10 emitters of CO2: Texas, California, Pennsylvania, Ohio, Illinois, New York, Indiana, Florida, Louisiana and Michigan.
  • The least 10 emitter states include: Washington DC, Vermont, Rhode Island, South Dakota, Idaho, New Hampshire, Delaware, Maine, Hawaii, Montana and Alaska
  • Texas and California emerge as the largest contributors, each emitting more than 300 MMT annually on average, while smaller states such as Vermont and Rhode Island show far lower levels.
  • The error bars represent the variation in emissions across the period. Notably, states like Texas, Louisiana, New York and Pennsylvania exhibit not only high mean CO2 emissions but also considerable variability, hinting at policy or industry shifts over time.

Result 1

columns

Does population influence CO2 emission?

Having established how CO2 emissions vary on average across U.S. states, it becomes important to examine whether the high emission states are also more populated states. I will use linear regression to examine the relationship between average CO2 emission and population size


    Pearson's product-moment correlation

data:  avg_emissions$Population_Millions and avg_emissions$Avg_CO2
t = 11.359, df = 48, p-value = 3.319e-15
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7547422 0.9147152
sample estimates:
      cor 
0.8537363 

columns

The Pearson correlation coefficient is r = 0.854 implying that there is an association between state population size and average CO2 emission. This indicates that states with larger populations tend to produce more CO2. The size of the coefficient at 0.854 means that the relationship is strong, and the association is statistically significant (p-value < 0.001).

Result 2

columns

Per-Capita CO2 Emissions by State (2020)

  • Although population size is linked to total CO2 emissions, as shown by the regression results, per-capita CO2 emissions remove the effect of population size and show how much each person contributes on average. This helps reveal differences in energy use, industry, and transportation across states.
  • Again, measuring per-capita CO2 shows that emissions depend not only on how many people live in a state, but also on how energy is produced and used

columns

columns

The bar chart displays the top and bottom 10 U.S. states ranked by per-capita CO2 emissions in 2020, measured as the total emissions divided by the population in each state. This per-person metric allows for a more meaningful comparison across states of varying sizes, helping to uncover how much carbon the average resident is responsible for.

The results reveal stark contrasts. Wyoming, North Dakota, and Alaska lead with the highest per-capita CO2 emissions. These states have relatively small populations but economies that are heavily dependent on fossil fuel extraction, refining, and energy production industries that generate substantial emissions per resident.

In contrast, states like California, New York, and Massachusetts rank among the lowest in per-capita emissions. Despite their large total populations and economic output, these states benefit from more efficient public transportation, stricter environmental regulations, denser urban development, and a higher reliance on renewable energy sources.

Result 3

columns

Effects of GDP on CO2 emmission

Next, I want to see how gross domestic product (GDP) influence CO2 emission. This will help measure how economic activity contributes to emission as population alone may not be the only contributing factor.


    Pearson's product-moment correlation

data:  co2_gdp_2020$GDP_2020_Billion and co2_gdp_2020$CO2_Emissions
t = 8.1267, df = 48, p-value = 1.418e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6123523 0.8576678
sample estimates:
      cor 
0.7609914 

columns

chart

The scatterplot above shows the relationship between state GDP and CO2 emissions for 2020, with a linear regression line. The correlation analysis yielded a Pearson’s r of 0.761 with a p-value of < 0.001, indicating a statistically significant and strong positive relationship between GDP and CO2 emissions. In other words, states with higher economic output tend to emit more CO2.

Conclusion & Limitation

columns

From the analysis, the following conclusions can be made:

  • My hypothesis that US states with larger populations have higher total CO2 emissions is accepted

  • Per-capita emissions reveal that smaller, energy-producing states emit far more CO2 per person than larger states

  • GDP is strongly associated with CO2 emissions, indicating that economic activity contributes to emissions beyond population size

Limitation

  • Due to the difficulty in accessing multi-year population data, only the 2020 population data was used in this study. This limited the ability to understand if the effect of the population has been consistent over the years.

  • The study considered only linear regression. Since there are more than one independent variables (population and GDP), running a multiple regression analysis would have revealed how their combined effect relate to CO2 emission

References