Greenhouse gas emissions and Income 2018

Author

Ellis Oppong

Introduction

This is a project about the relationship between greenhouse gas emissions and gross domestic income of different countries across the globe. The goal of this project is to find out the relationship between a country’s income and greenhouse gas emissions per capita. This project uses several variables to establish the relationship between greenhouse gas emissions across countries with respect to their income levels. The variables include.

Country code - This is a short identification code for different countries around the world.

Country - This shows the names of the countries used in the dataset

Income group - It shows the income classification of different countries, whether they are high income, upper income or low income.

Gross national income per capita - This displays the measure of a country’s total income divided by its population.

Greenhouse gas per capita - This shows the average amount of greenhouse gases emitted by each person in a specific country.

Change - This shows the percentage change in GHG emissions.

Source of the dataset: World Bank, https://databank.worldbank.org/metadataglossary/world-development-indicators/series/EN.ATM.GHGT.KT.CE

Load libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the dataset

data <- read_csv("greenhousegas_gni2018.csv")
Rows: 179 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Country Code, Country Name, IncomeGroup
dbl (4): GNI Per Capita (USD), GHG Per Capita, change, population

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
# A tibble: 6 × 7
  `Country Code` `Country Name` IncomeGroup         `GNI Per Capita (USD)`
  <chr>          <chr>          <chr>                                <dbl>
1 AFG            Afghanistan    Low income                             550
2 ALB            Albania        Upper middle income                   4860
3 DZA            Algeria        Upper middle income                   4060
4 AGO            Angola         Lower middle income                   3370
5 ARG            Argentina      Upper middle income                  12370
6 ARM            Armenia        Upper middle income                   4230
# ℹ 3 more variables: `GHG Per Capita` <dbl>, change <dbl>, population <dbl>

Data cleaning

## Convert IncomeGroup to Factor:
data$IncomeGroup <- factor(data$IncomeGroup, 
                              levels = c("Low income", "Lower middle income", 
                                         "Upper middle income", "High income"))
## Log Transformation for GNI Per Capita
data$log_GNI <- log(data$`GNI Per Capita (USD)`)

Data Visualization

ggplot(data, aes(x = IncomeGroup, y = `GHG Per Capita`, fill = IncomeGroup)) +
  geom_bar(stat = "summary", fun = "mean") +
  labs(title = "Average Greenhouse Gas Emissions Per Capita by Income Group",
       x = "Income Group",
       y = "Average GHG Emissions Per Capita (Metric Tons)",
       caption = "Source: World Bank Dataset") +
  scale_fill_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = "bottom")

Essay

In conducting this project, the income group variable of the dataset was converted into factor form for proper ordering. A log transformation was applied to GNI Per Capita to handle its wide range of values, making the visualization more interpretable. The visualization represents countries grouped according to their income in correlation to their greenhouse gas emissions. There were some obvious and not-so-obvious insights into this project.

The visualization showed that high-income countries exhibited the highest average GHG emissions per capita, followed by upper-middle-income countries. Low-income countries have the lowest emissions, which was expected due to lower economic activity. What was surprising was that some upper-middle-income countries (e.g., China) have emissions comparable to high-income nations, suggesting that industrialization plays a significant role beyond just income levels. Also, keep in mind the limited information used in the visualization. The visualization does not account for population size, which could skew perceptions. A future improvement could include a weighted average based on population.

I ran into some trouble over the course of this project, which I had to overcome. Initially, the wide range of GNI Per Capita values made it difficult to visualize trends. To resolve it, a log transformation was created to resolve the issue. Including population size in the visualization would provide deeper insights, but was omitted due to complexity.

In conclusion, I think this project was partially successful. The project explored the relationship between income groups and GHG emissions per capita. The findings highlight the disparity in emissions across income groups, with high-income countries contributing significantly more per capita. Future work could explore the role of industrialization in greenhouse gas emissions around the world. The dataset was effectively cleaned and visualized, though the inclusion of certain variables could have enhanced the overall output.

Reference

Assisted by chatgpt for coding.