install tidyverse

library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

find correct directory

setwd("C:/Users/Don A/Documents/Don's files/MC")
pollution <- read_csv("warming.csv")
## Parsed with column specification:
## cols(
##   gdppc = col_number(),
##   coopc = col_double()
## )

install require scales to allow dollar signs in x axis

require(scales)
## Loading required package: scales
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

set up basic scatterplot

ggplot(data = pollution) +
  geom_point(mapping = aes(x = gdppc, y = coopc)) +
  ggtitle("GDP and co2 Emissions per capita for Countries with Population >= 5 million") +
  labs(caption = "from data.worldbank.org/indicators") +
  labs(x = "GDP per capita in USD (2018 data)", y = "co2 emissions in metric tons per capita (2014 data)") +
  scale_x_continuous(labels = dollar)

Don Allen

Data Science 110

Scatterplot of GDP and Carbon Dioxide Emissions Per Capita for Countries with Population >= 5 Million

I chose this topic to test for a linear correlation between GDP and carbon dioxide emissions per capita. As world population grows, and countries (and individuals) try to achieve growth to the middle class (or higher), would that necessarily doom the planet to increases in overall emissions? A simple graph like this would not be enough to answer the question with many other factors in play, but it would provide a snapshot of the current situation, and one could begin to form opinions about future results if other factors were to remain somewhat constant.

I found the data in the World Bank’s website (data.worldbank.org/indicator). I was able to download the following csv files, all containing continuous data, which were sorted by country as well as region and other characteristics: • GDP per capita in $USD (2018) • Carbon dioxide emissions in metric tons per capita (2014) • Population (2018)

They had data for total greenhouse gas emissions (co2 plus methane and others) which I preferred to use, but it only covered data through 2012. I chose carbon dioxide as the data was more current. Why the emissions estimates are five to seven years old – not to mention the calculation methodology – wasn’t readily apparent.

Initially I plotted every country (well over 200), but the result was a sea of dots, with some extreme outliers, which upon further review, were relatively tiny countries. Therefore, I reduced the total countries in the data set by almost 50 percent by eliminating countries with populations under 5 million (sorting and deleting in Excel from the csv file itself). It must be noted that some of the smaller countries I eliminated from the chart are the same small islands and archipelagos that face the most immediate and dire impacts of global warming.

The resulting plot did not show a linear correlation, in my opinion, though per capita emissions generally increased with increased wealth. But it did raise issues that can be examined in a future, more detailed chart.

Per the World Bank’s population data, 13 countries hold 62 percent of the world’s population. A bubble chart, with bubble size corresponding to population, would easily show which countries have the most impact on the planet. Also, a more detailed chart could display different colors to show if a country’s total emissions have increased or decreased over a given time period. The World Bank emissions data has a column for each year, so a change over a period (e.g., 2010 to 2014) would be easy to calculate.

In addition, in the existing plot, one obviously cannot identify individual countries, so an improved graph could include some or all country names (it might become too busy), or it could allow a viewer with interactive access to rollover each bubble to obtain more information.

Because this plot lacks those improvements, here is a sample of the data to help the reader/viewer identify selected countries:

Country: GDP per capita; co2 per capita; Population

China: $9,771 ; 7.54 metric tons ; 1.4 billion

India: $2,016 ; 1.73 metric tons ; 1.4 billion

USA: $62,641 ; 16.50 metric tons ; 327 million

Indonesia: $3,894 ; 1.82 metric tons ; 288 million

Switzerland: $82,839 ; 4.31 metric tons ; 9 million

Norway: $81,807 ; 9.27 metric tons ; 5 million

UAE: $43,005 ; 22.94 metric tons ; 10 million

Saudi Arabia: $23,219 ; 19.44 metric tons ; 34 million