To test this hypothesis, three major variables will be used
Dataset
I used two main datasets:
Annual CO2 Emissions Dataset (1970–2021) from the the U.S. Energy Information Administration (EIA). This data comprises total CO2 emissions for all the 50 US states from 1970 to 2021, and measured in million metric tons (MMT). It has 51 rows and 51 columns, where rows contain states and the columns contain the years.
State Population Dataset (2020) from the U.S. Census Bureau. This is the 2020 population data for each of the 50 US states expressed millions. It has 51 rows containing the states and 2 columns containing the population data.
Data structure
Methods
To answer my research question, I will use linear regression and correlation methods because population size, GDP, and CO2 emissions are continuous variables.
The aim is to assess the strength and direction of their relationships. Correlation provides a summary measure of association, while linear regression allows population size and economic activity to be evaluated as predictors of CO2 emissions. These methods are appropriate for examining broad state-level patterns rather than making causal claims.
Unit of analysis
The unit of analysis is the U.S. states. CO2 emissions are averaged over time at the state level, allowing for comparisons across states.
The unit of analysis for the time-series analysis is the state/year.
Having established how CO2 emissions vary on average across U.S. states, it becomes important to examine whether the high emission states are also more populated states. I will use linear regression to examine the relationship between average CO2 emission and population size
Pearson's product-moment correlation
data: avg_emissions$Population_Millions and avg_emissions$Avg_CO2
t = 11.359, df = 48, p-value = 3.319e-15
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7547422 0.9147152
sample estimates:
cor
0.8537363
The Pearson correlation coefficient is r = 0.854 implying that there is an association between state population size and average CO2 emission. This indicates that states with larger populations tend to produce more CO2. The size of the coefficient at 0.854 means that the relationship is strong, and the association is statistically significant (p-value < 0.001).
The bar chart displays the top and bottom 10 U.S. states ranked by per-capita CO2 emissions in 2020, measured as the total emissions divided by the population in each state. This per-person metric allows for a more meaningful comparison across states of varying sizes, helping to uncover how much carbon the average resident is responsible for.
The results reveal stark contrasts. Wyoming, North Dakota, and Alaska lead with the highest per-capita CO2 emissions. These states have relatively small populations but economies that are heavily dependent on fossil fuel extraction, refining, and energy production industries that generate substantial emissions per resident.
In contrast, states like California, New York, and Massachusetts rank among the lowest in per-capita emissions. Despite their large total populations and economic output, these states benefit from more efficient public transportation, stricter environmental regulations, denser urban development, and a higher reliance on renewable energy sources.
Effects of GDP on CO2 emmission
Next, I want to see how gross domestic product (GDP) influence CO2 emission. This will help measure how economic activity contributes to emission as population alone may not be the only contributing factor.
Pearson's product-moment correlation
data: co2_gdp_2020$GDP_2020_Billion and co2_gdp_2020$CO2_Emissions
t = 8.1267, df = 48, p-value = 1.418e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6123523 0.8576678
sample estimates:
cor
0.7609914
The scatterplot above shows the relationship between state GDP and CO2 emissions for 2020, with a linear regression line. The correlation analysis yielded a Pearson’s r of 0.761 with a p-value of < 0.001, indicating a statistically significant and strong positive relationship between GDP and CO2 emissions. In other words, states with higher economic output tend to emit more CO2.
My hypothesis that US states with larger populations have higher total CO2 emissions is accepted
Per-capita emissions reveal that smaller, energy-producing states emit far more CO2 per person than larger states
GDP is strongly associated with CO2 emissions, indicating that economic activity contributes to emissions beyond population size
Limitation
Due to the difficulty in accessing multi-year population data, only the 2020 population data was used in this study. This limited the ability to understand if the effect of the population has been consistent over the years.
The study considered only linear regression. Since there are more than one independent variables (population and GDP), running a multiple regression analysis would have revealed how their combined effect relate to CO2 emission
Burtraw, D., Kahn, D., & Palmer, K. (2006). CO2 allowance allocation in the regional greenhouse gas initiative and the effect on electricity investors. The Electricity Journal, 19(2), 79-90.
Revesz, R. L., & Livermore, M. A. (2008). Retaking rationality: How cost-benefit analysis can better protect the environment and our health. Oxford University Press.
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G (2023). R for Data Science. Second Edition. O’Reilly Media Inc, California. ```