| Emission Variables | Minimum | Mean | Maximum | |
|---|---|---|---|---|
| Total CO2 (thousand metric tons) | Total CO2 (thousand metric tons) | -1473.00 | 22687.119661 | 2806634.00 |
| Solid Fuel Consumption | Solid Fuel Consumption | -103.00 | 11202.723867 | 2045156.00 |
| Liquid Fuel Consumption | Liquid Fuel Consumption | -4663.00 | 7680.005109 | 680284.00 |
| Gas Fuel Consumption | Gas Fuel Consumption | -40.00 | 3227.981971 | 390719.00 |
| Cement Production | Cement Production | 0.00 | 638.453865 | 338912.00 |
| Gas Flaring | Gas Flaring | 0.00 | 276.163457 | 20520.00 |
| Per Capita CO2 (metric tons) | Per Capita CO2 (metric tons) | -0.68 | 1.268883 | 45.96 |
| Bunker Fuels | Bunker Fuels | 0.00 | 560.330606 | 45630.00 |
Global CO2 Emissions
Begin Instructions:
The text below has all the promots for the exam with a link to the raw data itself. For each component, there is a point value indicated that I will use as a rubric. In addition to those points, this will also be evaluated on the following:
- The data for this will be downloaded as a ZIP archive and will expand in a folder named
CSV-FILES. Put that folder in the same folder as this document and reference the raw data files in there for your analyses [5 pts]. - I want you to turn in the qmd file as a raw file. It MUST render to html output WITHOUT errors [5 pts].
- I will examine the markdown code if there are programming issues that I need to follow up on for understanding your results. The rendered output MUST NOT have any R code, R chunks, or any other indication that the analyses are done in the creation of the document [5 pts].
- Your writing should be appropriate for graduate-level discussion of data as if this were going into a scientific report [5 pts].
- You should remove all the text prompts from the output (e.g., the stuff in curly brackets I have inserted) as well as this block of instructions [5 pts].
End Instructions
Question 1. { A single paragraph here about why CO2 emissions are of interest to us. [5 pts]}
Carbon dioxide (CO2) emissions represent a paramount area of interest for scholars, policymakers, and the global community at large due to their pivotal role in climate change dynamics. As a greenhouse gas, CO2 contributes significantly to the enhancement of Earth’s natural greenhouse effect, trapping heat and leading to a rise in global temperatures. This phenomenon, commonly referred to as anthropogenic global warming, is principally driven by human activities such as the combustion of fossil fuels, deforestation, and industrial processes. The consequences of heightened CO2 concentrations are multifaceted, encompassing rising sea levels, more frequent and severe weather events, disruptions in ecosystems, and threats to human health and socio-economic stability. Consequently, understanding, monitoring, and mitigating CO2 emissions have become imperative endeavors to address the complexities of climate change and to formulate effective strategies for sustainable environmental management and global resilience.
The Data
Question 2. { A narrative (e.g., not just a list of figures or R output) about the data, where it comes from, and what it contains. This is a complete paragraph just like you find in a normal research paper [5 pts].
The data under examination originates from the Global, Regional, and National Fossil-Fuel CO2 Emissions dataset compiled by Boden, Marland, and Andres in 2013 (Boden, Marland, & Andres, 2013). Accessible in the form of a ZIP archive named CSV-FILES, this dataset serves as a valuable resource for understanding carbon dioxide (CO2) emissions worldwide. The dataset encompasses a wide range of information spanning multiple years and is instrumental in analyzing emissions from various countries. The sources of emissions are categorized into solid fuels, liquid fuels, gas fuels, cement production, gas flaring, bunker fuels, while also providing data on per capita CO2 emissions and total CO2 emissions for 259 nations. This comprehensive compilation allows for a nuanced examination of global and regional trends in CO2 emissions, facilitating a detailed exploration of the contributing factors to climate change. The dataset, being rich in detail and scope, provides researchers and policymakers with a robust foundation for formulating strategies to address the complexities of carbon emissions on a global scale.
Question 3. As part of this description, make a table describing the \(CO_2\) data including columns for the minimum, mean, and maximum value for each of the emission variables. [10 pts]}
Components of Emissions
Question 4. { The total emissions is composed of several types of data. For the data from the year 2010, analyze and present to the reader the relative contributions of CO2 by source for each of the G7 Countries. [5 pts]}
# A tibble: 6 × 3
Country Source Percentage
<chr> <chr> <dbl>
1 Canada Gas Fuels 33.7
2 Canada Liquid Fuels 47.8
3 Canada Solid Fuels 16.8
4 Canada Gas Flaring 0.551
5 Canada Cement Production 1.16
6 Canada Bunker Fuels 1.04
Question 5. { Divide the data into the G7 (e.g., Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States) and non-G7 countries. Compare emission output amongst all G7 countries to those from the rest of the world combined. All plots must be labelled properly and have an appropriate caption.[10 pts] }
The graph shows that G7 countries are the primary emitters of fossil fuels globally. Possible interpretations could be that G7 countries are more developed and have higher energy demands, or that G7 countries have played a larger role in the development of industries that rely heavily on fossil fuels, such as transportation and manufacturing compared to Non-G7 countries. These may be the primary reasons for the higher fossil fuel emissions from G7 countries. This highlights the importance of G7 countries taking action to reduce their emissions and help to mitigate climate change.
The dots above Non-G7 in the graph are outliers. These are data points that are significantly different from the rest of the data set. In this case, the outliers are countries with non-G7 emissions that are much higher than the average for non-G7 countries.
There are a few possible explanations for the outliers:
The outliers may be countries that are rapidly industrializing and have a high demand for energy, or countries that have a large amount of natural resources, such as coal or oil, and produce these resources for export. They could also be countries that have not implemented policies to reduce their emissions. The outliers here are still part of the non-G7 group of countries which means that the overall trend for non-G7 countries is still lower emissions than for G7 countries. However, the outliers show that there is a great deal of variation within the non-G7 group of countries.
Question 6. { Perform a statistical analysis of G7 vs. non-G7 countries output by source. Provide textual description of your analyses and present output as tables or in-text. [10 pts] }
Source p.value adjusted_p.value
1 Emissions.from.Gas.Fuels 1.752126e-14 1.051276e-13
2 Emissions.from.Liquid.Fuels 3.493838e-02 2.096303e-01
3 Emissions.from.Solid.Fuels 1.144837e-01 6.869025e-01
4 Emissions.from.Gas.Flaring 7.323487e-01 1.000000e+00
5 Emissions.from.Cement.Production 4.477030e-01 1.000000e+00
6 Emissions.from.Bunker.Fuels 5.535988e-02 3.321593e-01
Emissions from Gas Fuels: p-value: 1.75e-14 Adjusted p-value: 1.05e-13 Interpretation: The p-value is extremely low, indicating a significant difference in emissions from Gas Fuels between G7 and non-G7 countries. The adjusted p-value remains significant after correction. This means that, even after adjusting for multiple comparisons (using methods like Bonferroni correction, Holm’s method, or others), the result is still considered statistically significant. This suggests that the observed difference in emissions from Gas Fuels between G7 and non-G7 countries is unlikely to be due to random chance.
Emissions from Liquid Fuels: p-value: 0.035 Adjusted p-value: 0.21 Interpretation: The p-value is below the conventional threshold of 0.05, suggesting a significant difference. However, after adjusting for multiple comparisons, the result is no longer significant. This suggests that any observed differences in emissions from sources like Liquid Fuels between G7 and non-G7 countries could be due to random variability.
Emissions from Solid Fuels: p-value: 0.114 Adjusted p-value: 0.687 Interpretation: The p-value is not below 0.05, indicating no significant difference in emissions from Solid Fuels between the two groups. This result holds after adjusting for multiple comparisons. The differences observed in emissions from sources like Gas Fuels, Liquid Fuels, or others between G7 and non-G7 countries are unlikely to be due to random chance alone.
Emissions from Gas Flaring: p-value: 0.732 Adjusted p-value: 1.0 Interpretation: The p-value is high, suggesting no significant difference in emissions from Gas Flaring. This result holds even after adjustment.
Emissions from Cement Production: p-value: 0.448 Adjusted p-value: 1.0 Interpretation: The p-value is not below 0.05, indicating no significant difference in emissions from Cement Production between G7 and non-G7 countries. This result holds after adjusting for multiple comparisons.
Emissions from Bunker Fuels: p-value: 0.055 Adjusted p-value: 0.332 Interpretation: The p-value is slightly above 0.05, indicating a marginally significant difference. However, after adjusting for multiple comparisons, the result is no longer significant. This suggests that any observed differences in emissions from sources like Bunker Fuels between G7 and non-G7 countries could be due to random variability.
Temporal US Trends
Question 7. { For the G7 countries, deterime if total emissions between 1960-2010 are correlated? Which two countries have the highest and which two have the lowest correlation? Describe these in words and input appropriate graphics and in-text statistical parameters (e.g., Pearson correlation; df = X, t = Y, p = Z, r = W). [15 pts] }
Highest correlation: TAIWAN REPUBLIC OF KOREA and REPUBLIC OF KOREA TAIWAN with correlation = 0.9936041
Lowest correlation: UNITED KINGDOM ECUADOR and ECUADOR UNITED KINGDOM with correlation = -0.8602864
Pearson Correlation Coefficient (r):
Highest Correlation:
Countries: Taiwan and Republic of Korea Correlation Value: 0.9936041 This high positive correlation (close to 1) suggests that the total emissions between Taiwan and the Republic of Korea are strongly positively related. When one country experiences an increase or decrease in emissions, the other country tends to follow a similar pattern.
Lowest Correlation:
Countries: United Kingdom and Ecuador Correlation Value: -0.8602864 This high negative correlation (close to -1) indicates a strong negative relationship between the total emissions of the United Kingdom and Ecuador. When one country’s emissions increase, the other tends to decrease, and vice versa.
Question 8. { Take the total emission output from 1970-2010 for the US. Attempt to develop a linear regression model to fit these emission responses. Create appropriate plots of the raw data and model diagnostics. Is this a good model? Are the residuals of this model reasonably behaved? Explain why or why not this may be an appropriate approach textually and with graphical—and if appropriate statistical—evidence. [15 pts]}
Nation
0
Year
0
Total.CO2.emissions.from.fossil.fuels.and.cement.production..thousand.metric.tons.of.C.
0
Emissions.from.solid.fuel.consumption
0
Emissions.from.liquid.fuel.consumption
0
Emissions.from.gas.fuel.consumption
0
Emissions.from.cement.production
0
Emissions.from.gas.flaring
0
Per.capita.CO2.emissions..metric.tons.of.carbon.
0
Emissions.from.bunker.fuels..not.included.in.the.totals.
0
Call:
lm(formula = Total.CO2.emissions.from.fossil.fuels.and.cement.production..thousand.metric.tons.of.C. ~
Year, data = us_subset)
Residuals:
Min 1Q Median 3Q Max
-114007 -46184 8091 53133 99439
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.787e+07 1.652e+06 -10.81 2.67e-13 ***
Year 9.664e+03 8.302e+02 11.64 2.93e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 62900 on 39 degrees of freedom
Multiple R-squared: 0.7765, Adjusted R-squared: 0.7708
F-statistic: 135.5 on 1 and 39 DF, p-value: 2.927e-14
The R-squared value is 0.7765, suggesting that the model explains about 77.65% of the variance in the response variable (TotalCO2). There is a significant amount of unexplained variation. The adjusted R-squared is 0.7708, which adjusts the R-squared value for the number of predictors in the model.
The residuals in the Residuals Plot show a clear downward trend, which suggests that the model is overestimating the values for smaller fitted values and underestimating the values for larger fitted values. Overall, this is not a good model. The residuals are not reasonably behaved, and the model is not adequately explaining the variation in the data.
The Linear Regression graph depicts that total CO2 emissions appear to increase between the years 1970 and 2010 for the United States. The coefficients table shows that the intercept is significantly different from zero (p-value: 2.67e-13), and the coefficient for “Year” is also significant (p-value: 2.93e-14). This suggests that there is a statistically significant linear relationship between the year and total CO2 emissions for the United States.
Global Trends
Question 9. { Group all the countries by continent and take two time periods, 1984 and 2014. Make a graphic of total emissions by continent for each time period. [5 pts]}
Question 10. { Statistically test the hypothesis that total emissions by countries grouped within each continent (e.g. continent is the treatment but each country within it are the data) are not significantly different. Describe the models you develop textually (with in-text statistical parameters as above) and use appropriate graphical and tabular output to make your point. [15 pts]}
Number of missing values: 0
Welch's ANOVA result:
One-way analysis of means (not assuming equal variances)
data: Total.Fossil.Fuel.Emissions and Continent
F = 6.1397, num df = 5.00, denom df = 131.53, p-value = 3.81e-05
Kruskal-Wallis rank sum test
data: x and group
Kruskal-Wallis chi-squared = 122.3254, df = 5, p-value = 0
Comparison of x by group
(Bonferroni)
Col Mean-|
Row Mean | Africa Asia Australi Europe North Am
---------+-------------------------------------------------------
Asia | -6.726874
| 0.0000*
|
Australi | 3.065045 7.755739
| 0.0163* 0.0000*
|
Europe | -6.428999 -0.027477 -7.579942
| 0.0000* 1.0000 0.0000*
|
North Am | 0.895861 6.700475 -2.163018 6.476003
| 1.0000 0.0000* 0.2290 0.0000*
|
South Am | -2.839400 1.958979 -4.772491 1.931020 -3.285182
| 0.0339 0.3759 0.0000* 0.4011 0.0076*
alpha = 0.05
Reject Ho if p <= alpha/2
Dunn's test:
$chi2
[1] 122.3254
$Z
[1] -6.72687424 3.06504579 7.75573917 -6.42899974 -0.02747719 -7.57994237
[7] 0.89586192 6.70047583 -2.16301848 6.47600373 -2.83940052 1.95897961
[13] -4.77249141 1.93102007 -3.28518206
$P
[1] 8.667322e-12 1.088183e-03 4.391519e-15 6.422317e-11 4.890396e-01
[6] 1.728542e-14 1.851633e-01 1.038710e-11 1.526988e-02 4.709186e-11
[11] 2.259919e-03 2.505759e-02 9.098044e-07 2.674029e-02 5.095825e-04
$P.adjusted
[1] 1.300098e-10 1.632275e-02 6.587278e-14 9.633476e-10 1.000000e+00
[6] 2.592814e-13 1.000000e+00 1.558065e-10 2.290482e-01 7.063779e-10
[11] 3.389879e-02 3.758638e-01 1.364707e-05 4.011043e-01 7.643737e-03
$comparisons
[1] "Africa - Asia" "Africa - Australia"
[3] "Asia - Australia" "Africa - Europe"
[5] "Asia - Europe" "Australia - Europe"
[7] "Africa - North America" "Asia - North America"
[9] "Australia - North America" "Europe - North America"
[11] "Africa - South America" "Asia - South America"
[13] "Australia - South America" "Europe - South America"
[15] "North America - South America"
The Bartlett test suggests a violation of the homogeneity of variances assumption. This indicates that using a traditional ANOVA (or Welch’s ANOVA) might not be appropriate, and a non-parametric test like the Kruskal-Wallis test (followed by the Dunn test for post-hoc comparisons) is a reasonable choice shown below.
Here are some key findings based on the Dunn test results:
Asia has significantly higher total fossil fuel emissions compared to Africa, Europe, North America, and South America. Australia has significantly higher total fossil fuel emissions compared to Africa, Europe, and South America. Europe has significantly lower total fossil fuel emissions compared to Asia, Australia, and North America. North America has significantly higher total fossil fuel emissions compared to Africa, Europe, and South America. South America has significantly lower total fossil fuel emissions compared to Africa, Asia, Australia, and North America.
The Kruskal-Wallis test suggests overall differences among the continents, and the pairwise comparisons provide insights into which specific pairs are significantly different in terms of total fossil fuel emissions.
Question 11. { Compare the two models, are the differences between the continents the same in 2014 as they were in 1984? Describe any differences you see. [5 pts]}
[1] 0
[1] "Kruskal-Wallis test for 1984:"
Kruskal-Wallis rank sum test
data: Total.Fossil.Fuel.Emissions by Continent
Kruskal-Wallis chi-squared = 58.888, df = 5, p-value = 2.063e-11
[1] "Kruskal-Wallis test for 2014:"
Kruskal-Wallis rank sum test
data: Total.Fossil.Fuel.Emissions by Continent
Kruskal-Wallis chi-squared = 68.989, df = 5, p-value = 1.663e-13
Kruskal-Wallis rank sum test
data: x and group
Kruskal-Wallis chi-squared = 58.8877, df = 5, p-value = 0
Comparison of x by group
(Bonferroni)
Col Mean-|
Row Mean | Africa Asia Australi Europe North Am
---------+-------------------------------------------------------
Asia | -4.294620
| 0.0001*
|
Australi | 1.646401 4.566998
| 0.7476 0.0000*
|
Europe | -5.419476 -1.367851 -5.438268
| 0.0000* 1.0000 0.0000*
|
North Am | 0.325445 4.037115 -1.287536 5.083586
| 1.0000 0.0004* 1.0000 0.0000*
|
South Am | -2.221788 0.843426 -3.098198 1.860776 -2.289955
| 0.1972 1.0000 0.0146* 0.4708 0.1652
alpha = 0.05
Reject Ho if p <= alpha/2
Kruskal-Wallis rank sum test
data: x and group
Kruskal-Wallis chi-squared = 68.989, df = 5, p-value = 0
Comparison of x by group
(Bonferroni)
Col Mean-|
Row Mean | Africa Asia Australi Europe North Am
---------+-------------------------------------------------------
Asia | -5.173074
| 0.0000*
|
Australi | 2.746738 6.442654
| 0.0451 0.0000*
|
Europe | -3.870883 1.086618 -5.496379
| 0.0008* 1.0000 0.0000*
|
North Am | 0.939338 5.409059 -1.816454 4.282177
| 1.0000 0.0000* 0.5198 0.0001*
|
South Am | -1.771367 1.930169 -3.676292 1.086172 -2.341473
| 0.5737 0.4019 0.0018* 1.0000 0.1441
alpha = 0.05
Reject Ho if p <= alpha/2
[1] "Dunn test for 1984:"
$chi2
[1] 58.88774
$Z
[1] -4.2946203 1.6464013 4.5669982 -5.4194762 -1.3678512 -5.4382680
[7] 0.3254455 4.0371150 -1.2875370 5.0835866 -2.2217886 0.8434265
[13] -3.0981989 1.8607760 -2.2899550
$P
[1] 8.749627e-06 4.984059e-02 2.473791e-06 2.988696e-08 8.567933e-02
[6] 2.690050e-08 3.724220e-01 2.705627e-05 9.895360e-02 1.851868e-07
[11] 1.314880e-02 1.994950e-01 9.735034e-04 3.138791e-02 1.101196e-02
$P.adjusted
[1] 1.312444e-04 7.476088e-01 3.710687e-05 4.483044e-07 1.000000e+00
[6] 4.035075e-07 1.000000e+00 4.058441e-04 1.000000e+00 2.777802e-06
[11] 1.972320e-01 1.000000e+00 1.460255e-02 4.708186e-01 1.651794e-01
$comparisons
[1] "Africa - Asia" "Africa - Australia"
[3] "Asia - Australia" "Africa - Europe"
[5] "Asia - Europe" "Australia - Europe"
[7] "Africa - North America" "Asia - North America"
[9] "Australia - North America" "Europe - North America"
[11] "Africa - South America" "Asia - South America"
[13] "Australia - South America" "Europe - South America"
[15] "North America - South America"
[1] "Dunn test for 2014:"
$chi2
[1] 68.98901
$Z
[1] -5.1730750 2.7467383 6.4426546 -3.8708837 1.0866181 -5.4963794
[7] 0.9393385 5.4090592 -1.8164545 4.2821771 -1.7713676 1.9301695
[13] -3.6762924 1.0861721 -2.3414732
$P
[1] 1.151363e-07 3.009557e-03 5.870082e-11 5.422076e-05 1.386028e-01
[6] 1.938340e-08 1.737785e-01 3.167835e-08 3.465034e-02 9.253678e-06
[11] 3.824980e-02 2.679292e-02 1.183241e-04 1.387014e-01 9.603902e-03
$P.adjusted
[1] 1.727044e-06 4.514336e-02 8.805123e-10 8.133115e-04 1.000000e+00
[6] 2.907510e-07 1.000000e+00 4.751753e-07 5.197551e-01 1.388052e-04
[11] 5.737469e-01 4.018938e-01 1.774862e-03 1.000000e+00 1.440585e-01
$comparisons
[1] "Africa - Asia" "Africa - Australia"
[3] "Asia - Australia" "Africa - Europe"
[5] "Asia - Europe" "Australia - Europe"
[7] "Africa - North America" "Asia - North America"
[9] "Australia - North America" "Europe - North America"
[11] "Africa - South America" "Asia - South America"
[13] "Australia - South America" "Europe - South America"
[15] "North America - South America"
The Kruskal-Wallis tests indicate that there are significant differences in total fossil fuel emissions among continents for both 1984 and 2014. The Dunn tests for pairwise comparisons provide more details about which specific continent pairs are significantly different. Here are some key observations:
1984:
In 1984, the United States was the world’s leading producer of greenhouse gases. Overall the total fossil fuel emissions in 1984 were lower on average compared to 2014.
Asia has significantly different total fossil fuel emissions compared to Africa, Europe, and North America. Australia has significantly different total fossil fuel emissions compared to Asia and Europe. Europe has significantly different total fossil fuel emissions compared to Africa, Asia, and North America. North America has significantly different total fossil fuel emissions compared to Asia and Europe.
2014:
By 2014, the order of the continents by emissions had changed. Asia had become the world’s leading emitter. China specifically is the country that leads in 2014 emitting of fossil fuels which is the top dot, or outlier, on the graph for Asia in 2014.
Asia has significantly different total fossil fuel emissions compared to Africa, Europe, and North America. Australia has significantly different total fossil fuel emissions compared to Asia, Europe, and North America. Europe has significantly different total fossil fuel emissions compared to Africa, Asia, and North America. North America has significantly different total fossil fuel emissions compared to Asia, Europe, and South America. These results suggest that the differences in total fossil fuel emissions among continents persist and may have evolved over time.
Reference
Boden, T.A., G. Marland, and R.J. Andres. 2013. Global, Regional, and National Fossil-Fuel CO2 Emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A. doi.