Project Three DCA

library(readr)
library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.2.1     v dplyr   0.8.3
## v tibble  2.1.3     v stringr 1.4.0
## v tidyr   1.0.0     v forcats 0.4.0
## v purrr   0.3.3

## -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(dplyr)
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

I couldn’t get setwd to permanently work (it would change the directory, but then would automatically switch back to the original. So I ended up having to move my csv file. This used to work – not sure what I did to corrupt something.

getwd()

## [1] "C:/Users/Don A/Documents"

Hopefully my data from project two is not corrupt now

bubbledata <- read_csv("projecttwo.csv")

## Parsed with column specification:
## cols(
##   country = col_character(),
##   gdppc = col_number(),
##   ghgpc = col_double(),
##   population = col_number(),
##   change = col_double(),
##   ghg = col_double(),
##   tenyear = col_character()
## )

Installed require scales to allow dollar signs in x axis. Also, I removed it from this script, but I also added remove.packages() and install.packages() for five “things” in update that couldn’t be updated automatically (such as purrr). One of the versions was so old that it interfered with running charts that a week ago I could run without receiving an error message.

require(scales)

## Loading required package: scales

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

Set up initial bubble chart in ggplot

p2 <- ggplot(bubbledata, aes(x = gdppc, y = ghgpc)) +
  xlab("GDP per capita in USD (2018)") +
  ylab("Greenhouse Gas Emissions-Metric Tons per capita (2015)") +
  ggtitle("GDP and Greenhouse Gas Emissions per capita for Countries with Population >= 5 million") +
  labs(caption = "from data.worldbank.org/indicators and European Union EDGAR") +
  scale_x_continuous(labels = dollar)
p2 + geom_point()

Added Linear Regression without confidence interval, which showed what I expected, more or less.

p3 <- p2 + geom_point() +
  geom_smooth(method='lm',formula=y~x, se = FALSE, linetype= "dotdash", size = 0.3)
p3

Successfully added countries’ population size

p3 +
  geom_point(mapping = aes(gdppc, ghgpc, size = population), color = "red") +
  ggtitle ("GDP and Greenhouse Gas Emissions per capita", subtitle = "For Countries w/ Population >= 5 million")

I followed our tutorial from week 7, and added plotly at the end of the following chunk

p4 <- ggplot(bubbledata, aes(x = gdppc, y = ghgpc, size = population, text = paste("country:", country))) +
  theme_minimal(base_size = 12) +
  geom_point(alpha = 0.5, color = "red") +
  scale_x_continuous(labels = dollar) +
  ggtitle("GDP and Greenhouse Gas Emissions per capita", subtitle = "Countries < 5 million population") +
  labs(caption = "from data.worldbank.org/indicators and European Union EDGAR")
p4 <- ggplotly(p4)
p4

At this point, I tried unsuccessfully to change x & y axis titles, restore the second title and subtitle, restore the linear regression line, change the colors of the dots to show change in total greenhouse gas emissions from 2005 to 2015, and add the percent change to the text boxes. I found online other tutorials which added plot.ly earlier in the code, but I couldn’t get that to work either (beyond what you see here).

Essay: Project Two

I chose the same topic and similar data from my first project – looking at the relationship between GDP and greenhouse gas emissions per capita. In project one, I used World Bank data and focused on carbon dioxide emissions rather than total greenhouse gas emissions as the data was slightly more current (2014 vs. 2012). However, to set up project two, I explored where the World Bank found its data, which came from the European Union. The EU’s total greenhouse gas emissions was complete through 2015, so I pulled the data for the Y axis from there. Total greenhouse gas emissions include not only carbon dioxide, but also methane, nitrous oxide, and fluorinated gases – the most common being hydrofluorocarbons.

I’ve been interested in the topic of global warning for many years. Over the summer, I read a recently published book by David Wallace-Wells, called The Uninhabitable Earth – Life After Warming. His gives his summary in his first sentence, “It is worse, much worse, than you think.” One can get an idea of the author’s concept of the depth of the problem by looking at the chapter headings in part two – Elements of Chaos: Heat Death, Hunger, Drowning, Wildfire, Disasters No Longer Natural, Freshwater Drain, Dying Oceans, Unbreathable Air, Plagues of Warming, Economic Collapse, Climate Conflict, and Systems.

One can find plenty of reviews of his book from climate scientists (and others) on the internet. My general characterization of those reviews is that the author presents a worst-case scenario, and that he misread a few studies (the author was a history major, not a scientist). Still, I found plenty of support for parts, if not all, of what he has written in his heavily footnoted book (I counted 735).

Anyway, for project two, besides changing from carbon dioxide to total greenhouse emissions, I added in Excel, not using dplyr (I don’t think), emissions data from 2005 and 2015, hoping to show (by color) which countries have seen increases or decreases in emissions over that ten-year span. I added to the simple data plot from Project One by including a linear regression line – which showed that greenhouse gas emissions increase with increased GDP. I wasn’t really surprised by this, but it showed that the poorest countries, as a rule, fell below the regression line.

I then converted the plot to plotly, with not as much success as I had hoped. I was able to vary the size of the dots by population per country. Also, I was able to get the rollover text box feature to work. However, I was not successful in changing the color of the dots by the ten-year trends in overall emissions output by country. It’s probably not difficult, but I couldn’t find a similar tutorial to copy with any success. I also had troubles modifying the titles. (Maybe if I were to attempt it in Tableau, I would be more successful.)

Also, if I could have found the data (and then knew how to code it), I would have liked to include two horizontal lines – one for ideal greenhouse gas emissions per capita at current world population levels, that would keep the increase below two degrees Celsius by at 2100, and a second line showing what emission levels would need to fall to at estimated total world population in 2100.

Essay: Project Three

Link to Tableau, hopefully:

https://public.tableau.com/profile/don.a.#!/vizhome/DCAProjectThree/IncomeGroups

I attempted another global warming project, this time in Tableau Public, with slightly different data from the World Bank and European Union. In my Excel file, I included the following variables: -Country -Gross National Income per capita (as opposed to GDP, so I could use the next variable) -Income Group (per World Bank rankings – High, Upper Middle, Lower Middle, and Low Income) -Greenhouse Gas Emissions per capita -Population -Ten-year change in overall emissions (2005 to 2015)

In cleaning the data, I eliminated countries with population under 100,000, and additionally excluded Botswana, whose emissions data in the European Union database appeared to have been corrupted (I could find no evidence elsewhere that Botswana pollutes as much as the Persian Gulf oil states). I thought Curacao may have also been a mistaken outlier, but Wikipedia “research” showed that Shell built there what was once the world’s largest oil refinery.

I was not successful in including a regression line as I had done with a similar project in R.

Overall, I wasn’t too pleased with my final charts and ended up with more questions than answers, which can generally be divided into Tableau-oriented and data-oriented.

Tableau questions (or things to work on in the future): 1.How does one include statistical lines? 2.Could I have colored the countries by income group as I did by GNI per capita? 3.Could I have created a dashboard that didn’t shrink the maps into something unreadable? 4.Could I have created four separate scatterplots for each income group? 5.I didn’t really like the color options. Outliers seemed to make the color changes useless.

Data questions: 1.Has anyone researched how much of China’s emissions is generated to produce consumer products in the West? 2.How was the emissions data calculated? 3.How do large creators of data proofread huge volumes of it? 4.Why couldn’t I find more current emissions data? 5.I would love to be able to create two “baselines” of how far emission levels would have to drop to avoid catastrophe – at current population levels and estimated population levels in 2100.

Project Three DCA

Don Allen

12/18/2019