The major focus for this project is to ask 2 questions: How did the inception of industrialization affect the rate of CO2 emission and how does level of a nation’s income affect the level of CO2 emission?

#Loading required libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ readr   1.3.1
## ✔ tibble  2.1.3     ✔ purrr   0.3.2
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ ggplot2 3.2.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(gganimate)
#Read Data 
#co2_emissions <- read.csv("co2_global_emissions.csv")
#co2_emissions_regions <- read.csv("co2_global_emissions_regions.csv") 

# The forementioned datasets were merged using Open Refine
co2_emissions_merged <- read.csv("co2_global_emissions_merged_clean.csv")
#view(co2_emissions_merged)

#Read a seperate dataset to show the results of Carbon Dioxide emission in relation to the evolution of industrialization.
co2_cumulative_emissions <- read.csv("co2_cumulative_emissions.csv")
# Clean and explore the data variables
head(co2_emissions_merged)
##   Country Country_Code Income_Group                    Region Year
## 1   Aruba          ABW  High income Latin America & Caribbean 1986
## 2   Aruba          ABW  High income Latin America & Caribbean 1987
## 3   Aruba          ABW  High income Latin America & Caribbean 1988
## 4   Aruba          ABW  High income Latin America & Caribbean 1989
## 5   Aruba          ABW  High income Latin America & Caribbean 1990
## 6   Aruba          ABW  High income Latin America & Caribbean 1991
##   Emission_per_Capita
## 1            2.868319
## 2            7.235198
## 3           10.026179
## 4           10.634733
## 5           26.374503
## 6           26.046130
str(co2_emissions_merged)
## 'data.frame':    12261 obs. of  6 variables:
##  $ Country            : Factor w/ 264 levels "Afghanistan",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ Country_Code       : Factor w/ 264 levels "ABW","AFG","AGO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Income_Group       : Factor w/ 5 levels "","High income",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Region             : Factor w/ 8 levels "","East Asia & Pacific",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Year               : int  1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 ...
##  $ Emission_per_Capita: num  2.87 7.24 10.03 10.63 26.37 ...
summary(co2_emissions_merged)
##                 Country       Country_Code                Income_Group 
##  Afghanistan        :   55   AFG    :   55                      :2400  
##  Albania            :   55   AGO    :   55   High income        :3355  
##  Algeria            :   55   ALB    :   55   Low income         :1540  
##  Angola             :   55   ARB    :   55   Lower middle income:2280  
##  Antigua and Barbuda:   55   ARE    :   55   Upper middle income:2686  
##  Arab World         :   55   ARG    :   55                             
##  (Other)            :11931   (Other):11931                             
##                         Region          Year      Emission_per_Capita
##  Sub-Saharan Africa        :2460   Min.   :1960   Min.   : -0.0201   
##                            :2400   1st Qu.:1975   1st Qu.:  0.4419   
##  Europe & Central Asia     :2049   Median :1989   Median :  1.6688   
##  Latin America & Caribbean :1986   Mean   :1988   Mean   :  4.2151   
##  East Asia & Pacific       :1684   3rd Qu.:2002   3rd Qu.:  5.7911   
##  Middle East & North Africa:1110   Max.   :2014   Max.   :100.6977   
##  (Other)                   : 572   NA's   :12     NA's   :12
#Remove missing data
co2_co2_emissions_merged <- na.omit(co2_emissions_merged)
dim(co2_emissions_merged)
## [1] 12261     6
# Clean and explore the data variable for co2_cumulative_emissions
head(co2_cumulative_emissions)
##       Country Country_Code Year Cumulative_CO2_emissions_tonnes
## 1 Afghanistan          AFG 1751                               0
## 2 Afghanistan          AFG 1752                               0
## 3 Afghanistan          AFG 1753                               0
## 4 Afghanistan          AFG 1754                               0
## 5 Afghanistan          AFG 1755                               0
## 6 Afghanistan          AFG 1756                               0
str(co2_cumulative_emissions)
## 'data.frame':    61677 obs. of  4 variables:
##  $ Country                        : Factor w/ 231 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Country_Code                   : Factor w/ 223 levels "","ABW","AFG",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Year                           : int  1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 ...
##  $ Cumulative_CO2_emissions_tonnes: num  0 0 0 0 0 0 0 0 0 0 ...
summary(co2_cumulative_emissions)
##              Country       Country_Code        Year     
##  Afghanistan     :  267          : 2403   Min.   :1751  
##  Africa          :  267   ABW    :  267   1st Qu.:1817  
##  Albania         :  267   AFG    :  267   Median :1884  
##  Algeria         :  267   AGO    :  267   Mean   :1884  
##  Americas (other):  267   AIA    :  267   3rd Qu.:1951  
##  Andorra         :  267   ALB    :  267   Max.   :2017  
##  (Other)         :60075   (Other):57939                 
##  Cumulative_CO2_emissions_tonnes
##  Min.   :0.000e+00              
##  1st Qu.:0.000e+00              
##  Median :0.000e+00              
##  Mean   :2.386e+09              
##  3rd Qu.:4.305e+06              
##  Max.   :1.575e+12              
## 
#Remove missing data
co2_cumulative_emissions <- na.omit(co2_cumulative_emissions)
dim(co2_cumulative_emissions)
## [1] 61677     4
#Bar Plot for Cumulative CO2 Emission

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
co2_emission_bar <- ggplot(co2_cumulative_emissions, aes(x = Year, y = Cumulative_CO2_emissions_tonnes)) + 
  xlab("Year") +
  ylab("CO2 Emissions") +
  theme_minimal(base_size = 14)+
  geom_bar( position = "dodge", stat = "identity",color= "red") +
  xlim(1850, 2017) +
  ggtitle("CO2 Emission (1850 - 2016")

ggplotly(co2_emission_bar)
#Regression Diagnostics
fit <- lm(Emission_per_Capita ~ Year, data = co2_emissions_merged)
summary(fit)
## 
## Call:
## lm(formula = Emission_per_Capita ~ Year, data = co2_emissions_merged)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.059 -3.524 -2.394  1.406 97.154 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -64.722252   7.798821  -8.299   <2e-16 ***
## Year          0.034670   0.003922   8.840   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.892 on 12247 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.00634,    Adjusted R-squared:  0.006259 
## F-statistic: 78.14 on 1 and 12247 DF,  p-value: < 2.2e-16
fit
## 
## Call:
## lm(formula = Emission_per_Capita ~ Year, data = co2_emissions_merged)
## 
## Coefficients:
## (Intercept)         Year  
##   -64.72225      0.03467
#Draw linear regression for Emission over the years.
co2_regression <- ggplot(co2_emissions_merged, aes(x=Year, y= Emission_per_Capita)) +
  geom_point(aes( color = Income_Group), na.rm = TRUE) +
  geom_smooth(method = "lm", se = TRUE, color = "red")

co2_regression
## Warning: Removed 12 rows containing non-finite values (stat_smooth).

The emergence of first industrial revolution in the mid 1700s, pioneered the the United Kingdom’s as a carbon dioxide emitter. Colonization and slavery produced raw products such as cotton that created the cotton industry. And with Britian, resources such as coal and iron propelled emergence of factories. Britian, North America, Europe and Japan were major players when the second industrial revolution from mid 1800s was in full gear until the 20th Century, driven by World War I & II.

co2_cumulative_emissions <- na.omit(co2_cumulative_emissions)

co2_cumulative_emissions %>%
  
  filter(Country == "China" | Country == "Germany" | Country == "Japan" | 
           Country == "United Kingdom" | Country == "United States", ) %>%
  ggplot(aes(x = Year, y =  Cumulative_CO2_emissions_tonnes, 
                                       color = Country)) +
  geom_line(size = .75, na.rm = TRUE) + 
  transition_reveal(Year, keep_last = TRUE) +
  xlab("Year") +
  ylab ("Cumulative CO2 (in tonnes)") +
  xlim(1800, 2017) +
  ggtitle("Cumulative CO2 Emissions")
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

In lieu of a ggplot, I opted to utilize Tableau to display the transition of CO2 emission per capita from 1960 to 2014. Per capita CO2 emissions

The key drawback of measuring the total national emissions is that it takes no account of the nation’s population size. China is currently the world’s largest emitter, but since it also has the largest population, all being equal we would expect this to be the case. To make a fair comparison of contributions, we have to therefore compare emissions in terms of CO2 emitted per person. Source: Our World in Data

A packed bubble visualization was added to show the correlation between income groups and the rate of CO2. The visualization symolizes a black hole; the low income at high risk of the effects of CO2 emission contributed by the high income group.

linked phrase

#Bibliography - CO₂ and Greenhouse Gas Emissions by Hannah Ritchie and Max Roser https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions