Required packages

library(readr)
library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(ggplot2)
library(knitr)

Executive Summary

With the intention of studying correlation between global population growth and CO2 emissions, I found three datasets on the World Bank website. First, I examined all the variables, and found a common key to join the datasets together. Then, I looked at the data structure and attributes in detail, here I made the decision to convert some of the variables to the correct type (e.g. char to num). Next, I transformed the data from wide format to long format, this step took two attempts due to the different number of records for each country in two of the datasets as I discovered, which meant in the second go I had to split the data, apply transformation and rejoin the data. Furthermore, because I am interested in total emissions, I added up all the different types of emissions to create a new variable called “Total Emissions”. Also I scanned the data for any missing values, special values and outliers, and dealt with them by making missing values zero or deleting outliers upon closer inspection. Lastly, due to the massive difference in scale between Pop Growth and Total Emissions, I normalised Total Emissions by changing the scale for better understanding.

Data

1. Description

Weather reports on record-breaking temperatures and devastating natural disasters constantly remind us that global warming is real and climate change is unequivocal. Hence I decided to look at global population growth and CO2 emissions by country in the last half century or so with data from the World Bank. The population growth dataset also came with a sub-dataset, which categorises the countries in different income groups. Links: pop growth and emissions

Pop Growth Variables

Country Name: country name (qualitative variable)

Country Code: 3-letter code (qualitative variable)

Indicator Name: population growth (annual %) (constant qualitative variable)

Indicator Code: code for population growth (constant qualitative variable)

1960 to 2018 (58 columns): annual growth rate by country for respective year (quantitative variable)

Metadata Country Variables

Country Code: 3-letter code (qualitative variable)

Region: geographic region (qualitative variable)

IncomeGroup: High income; Upper middle income; Lower middle income; Low income (qualitative variable)

SpecialNote: sources of population estimates for each country (qualitative variable)

TableName: country name (qualitative variable)

Emissions Variables

Country Name: country name (qualitative variable)

Country Code: 3-letter code (qualitative variable)

Series Name: different types of emissions - all in kt of CO2 equivalent (qualitative variable)

Series Code: code for different types of emissions (qualitative variable)

1960 [YR1960] to 2018 [YR2018] (58 columns): annual emissions in thousand metric tonnes (quantitative variable)

2. Read/Import Data

pop <- read_csv("pop growth.csv",skip = 4)
Missing column names filled in: 'X64' [64]Parsed with column specification:
cols(
  .default = col_double(),
  `Country Name` = col_character(),
  `Country Code` = col_character(),
  `Indicator Name` = col_character(),
  `Indicator Code` = col_character(),
  `2018` = col_character(),
  X64 = col_character()
)
See spec(...) for full column specifications.
head(pop)
country <- read_csv("Metadata_Country.csv")
Missing column names filled in: 'X6' [6]Parsed with column specification:
cols(
  `Country Code` = col_character(),
  Region = col_character(),
  IncomeGroup = col_character(),
  SpecialNotes = col_character(),
  TableName = col_character(),
  X6 = col_character()
)
head(country)
emissions <- read_csv("emissions.csv")
Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
head(emissions)

3. Merge Data

report <- left_join(pop, country, by = "Country Code")
head(report)
report_new <- left_join(report, emissions, by = "Country Code")
head(report_new)

Understand

dim(report_new)
[1] 1848  131
names(report_new)
  [1] "Country Name.x" "Country Code"   "Indicator Name" "Indicator Code" "1960"          
  [6] "1961"           "1962"           "1963"           "1964"           "1965"          
 [11] "1966"           "1967"           "1968"           "1969"           "1970"          
 [16] "1971"           "1972"           "1973"           "1974"           "1975"          
 [21] "1976"           "1977"           "1978"           "1979"           "1980"          
 [26] "1981"           "1982"           "1983"           "1984"           "1985"          
 [31] "1986"           "1987"           "1988"           "1989"           "1990"          
 [36] "1991"           "1992"           "1993"           "1994"           "1995"          
 [41] "1996"           "1997"           "1998"           "1999"           "2000"          
 [46] "2001"           "2002"           "2003"           "2004"           "2005"          
 [51] "2006"           "2007"           "2008"           "2009"           "2010"          
 [56] "2011"           "2012"           "2013"           "2014"           "2015"          
 [61] "2016"           "2017"           "2018"           "X64"            "Region"        
 [66] "IncomeGroup"    "SpecialNotes"   "TableName"      "X6"             "Country Name.y"
 [71] "Series Name"    "Series Code"    "1960 [YR1960]"  "1961 [YR1961]"  "1962 [YR1962]" 
 [76] "1963 [YR1963]"  "1964 [YR1964]"  "1965 [YR1965]"  "1966 [YR1966]"  "1967 [YR1967]" 
 [81] "1968 [YR1968]"  "1969 [YR1969]"  "1970 [YR1970]"  "1971 [YR1971]"  "1972 [YR1972]" 
 [86] "1973 [YR1973]"  "1974 [YR1974]"  "1975 [YR1975]"  "1976 [YR1976]"  "1977 [YR1977]" 
 [91] "1978 [YR1978]"  "1979 [YR1979]"  "1980 [YR1980]"  "1981 [YR1981]"  "1982 [YR1982]" 
 [96] "1983 [YR1983]"  "1984 [YR1984]"  "1985 [YR1985]"  "1986 [YR1986]"  "1987 [YR1987]" 
[101] "1988 [YR1988]"  "1989 [YR1989]"  "1990 [YR1990]"  "1991 [YR1991]"  "1992 [YR1992]" 
[106] "1993 [YR1993]"  "1994 [YR1994]"  "1995 [YR1995]"  "1996 [YR1996]"  "1997 [YR1997]" 
[111] "1998 [YR1998]"  "1999 [YR1999]"  "2000 [YR2000]"  "2001 [YR2001]"  "2002 [YR2002]" 
[116] "2003 [YR2003]"  "2004 [YR2004]"  "2005 [YR2005]"  "2006 [YR2006]"  "2007 [YR2007]" 
[121] "2008 [YR2008]"  "2009 [YR2009]"  "2010 [YR2010]"  "2011 [YR2011]"  "2012 [YR2012]" 
[126] "2013 [YR2013]"  "2014 [YR2014]"  "2015 [YR2015]"  "2016 [YR2016]"  "2017 [YR2017]" 
[131] "2018 [YR2018]" 
str(report_new)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1848 obs. of  131 variables:
 $ Country Name.x: chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ Country Code  : chr  "ABW" "ABW" "ABW" "ABW" ...
 $ Indicator Name: chr  "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" ...
 $ Indicator Code: chr  "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" ...
 $ 1960          : num  3.15 3.15 3.15 3.15 3.15 ...
 $ 1961          : num  2.24 2.24 2.24 2.24 2.24 ...
 $ 1962          : num  1.41 1.41 1.41 1.41 1.41 ...
 $ 1963          : num  0.832 0.832 0.832 0.832 0.832 ...
 $ 1964          : num  0.593 0.593 0.593 0.593 0.593 ...
 $ 1965          : num  0.573 0.573 0.573 0.573 0.573 ...
 $ 1966          : num  0.617 0.617 0.617 0.617 0.617 ...
 $ 1967          : num  0.587 0.587 0.587 0.587 0.587 ...
 $ 1968          : num  0.569 0.569 0.569 0.569 0.569 ...
 $ 1969          : num  0.581 0.581 0.581 0.581 0.581 ...
 $ 1970          : num  0.572 0.572 0.572 0.572 0.572 ...
 $ 1971          : num  0.636 0.636 0.636 0.636 0.636 ...
 $ 1972          : num  0.671 0.671 0.671 0.671 0.671 ...
 $ 1973          : num  0.671 0.671 0.671 0.671 0.671 ...
 $ 1974          : num  0.472 0.472 0.472 0.472 0.472 ...
 $ 1975          : num  0.213 0.213 0.213 0.213 0.213 ...
 $ 1976          : num  -0.117 -0.117 -0.117 -0.117 -0.117 ...
 $ 1977          : num  -0.364 -0.364 -0.364 -0.364 -0.364 ...
 $ 1978          : num  -0.437 -0.437 -0.437 -0.437 -0.437 ...
 $ 1979          : num  -0.205 -0.205 -0.205 -0.205 -0.205 ...
 $ 1980          : num  0.193 0.193 0.193 0.193 0.193 ...
 $ 1981          : num  0.781 0.781 0.781 0.781 0.781 ...
 $ 1982          : num  1.28 1.28 1.28 1.28 1.28 ...
 $ 1983          : num  1.39 1.39 1.39 1.39 1.39 ...
 $ 1984          : num  1.02 1.02 1.02 1.02 1.02 ...
 $ 1985          : num  0.302 0.302 0.302 0.302 0.302 ...
 $ 1986          : num  -0.608 -0.608 -0.608 -0.608 -0.608 ...
 $ 1987          : num  -1.3 -1.3 -1.3 -1.3 -1.3 ...
 $ 1988          : num  -1.23 -1.23 -1.23 -1.23 -1.23 ...
 $ 1989          : num  -0.077 -0.077 -0.077 -0.077 -0.077 ...
 $ 1990          : num  1.81 1.81 1.81 1.81 1.81 ...
 $ 1991          : num  3.9 3.9 3.9 3.9 3.9 ...
 $ 1992          : num  5.44 5.44 5.44 5.44 5.44 ...
 $ 1993          : num  6.07 6.07 6.07 6.07 6.07 ...
 $ 1994          : num  5.63 5.63 5.63 5.63 5.63 ...
 $ 1995          : num  4.62 4.62 4.62 4.62 4.62 ...
 $ 1996          : num  3.52 3.52 3.52 3.52 3.52 ...
 $ 1997          : num  2.67 2.67 2.67 2.67 2.67 ...
 $ 1998          : num  2.11 2.11 2.11 2.11 2.11 ...
 $ 1999          : num  1.96 1.96 1.96 1.96 1.96 ...
 $ 2000          : num  2.06 2.06 2.06 2.06 2.06 ...
 $ 2001          : num  2.23 2.23 2.23 2.23 2.23 ...
 $ 2002          : num  2.23 2.23 2.23 2.23 2.23 ...
 $ 2003          : num  2.11 2.11 2.11 2.11 2.11 ...
 $ 2004          : num  1.76 1.76 1.76 1.76 1.76 ...
 $ 2005          : num  1.3 1.3 1.3 1.3 1.3 ...
 $ 2006          : num  0.798 0.798 0.798 0.798 0.798 ...
 $ 2007          : num  0.384 0.384 0.384 0.384 0.384 ...
 $ 2008          : num  0.131 0.131 0.131 0.131 0.131 ...
 $ 2009          : num  0.0986 0.0986 0.0986 0.0986 0.0986 ...
 $ 2010          : num  0.213 0.213 0.213 0.213 0.213 ...
 $ 2011          : num  0.377 0.377 0.377 0.377 0.377 ...
 $ 2012          : num  0.512 0.512 0.512 0.512 0.512 ...
 $ 2013          : num  0.593 0.593 0.593 0.593 0.593 ...
 $ 2014          : num  0.587 0.587 0.587 0.587 0.587 ...
 $ 2015          : num  0.525 0.525 0.525 0.525 0.525 ...
 $ 2016          : num  0.46 0.46 0.46 0.46 0.46 ...
 $ 2017          : num  0.421 0.421 0.421 0.421 0.421 ...
 $ 2018          : chr  NA NA NA NA ...
 $ X64           : chr  NA NA NA NA ...
 $ Region        : chr  "Latin America & Caribbean" "Latin America & Caribbean" "Latin America & Caribbean" "Latin America & Caribbean" ...
 $ IncomeGroup   : chr  "High income" "High income" "High income" "High income" ...
 $ SpecialNotes  : chr  "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ ...
 $ TableName     : chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ X6            : chr  NA NA NA NA ...
 $ Country Name.y: chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ Series Name   : chr  "HFC gas emissions (thousand metric tons of CO2 equivalent)" "Methane emissions (kt of CO2 equivalent)" "Nitrous oxide emissions (thousand metric tons of CO2 equivalent)" "Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)" ...
 $ Series Code   : chr  "EN.ATM.HFCG.KT.CE" "EN.ATM.METH.KT.CE" "EN.ATM.NOXE.KT.CE" "EN.ATM.GHGO.KT.CE" ...
 $ 1960 [YR1960] : chr  ".." ".." ".." ".." ...
 $ 1961 [YR1961] : chr  ".." ".." ".." ".." ...
 $ 1962 [YR1962] : chr  ".." ".." ".." ".." ...
 $ 1963 [YR1963] : chr  ".." ".." ".." ".." ...
 $ 1964 [YR1964] : chr  ".." ".." ".." ".." ...
 $ 1965 [YR1965] : chr  ".." ".." ".." ".." ...
 $ 1966 [YR1966] : chr  ".." ".." ".." ".." ...
 $ 1967 [YR1967] : chr  ".." ".." ".." ".." ...
 $ 1968 [YR1968] : chr  ".." ".." ".." ".." ...
 $ 1969 [YR1969] : chr  ".." ".." ".." ".." ...
 $ 1970 [YR1970] : chr  ".." "10.2469" "1.8261976" "4.44089209850063E-16" ...
 $ 1971 [YR1971] : chr  ".." "10.4531" "1.8269478" "-6.66133814775094E-16" ...
 $ 1972 [YR1972] : chr  ".." "10.657" "1.8448131" "-3.99680288865056E-15" ...
 $ 1973 [YR1973] : chr  ".." "10.8551" "1.8220901" "4.44089209850063E-16" ...
 $ 1974 [YR1974] : chr  ".." "11.0415" "1.821157" "3.10862446895044E-15" ...
 $ 1975 [YR1975] : chr  ".." "11.2194" "1.8296076" "-6.66133814775094E-16" ...
 $ 1976 [YR1976] : chr  ".." "11.5069" "2.6258457" "-1.33226762955019E-14" ...
 $ 1977 [YR1977] : chr  ".." "11.6718" "2.7775194" "7.99360577730113E-15" ...
 $ 1978 [YR1978] : chr  ".." "12.2338" "5.303201" "3.19744231092045E-14" ...
 $ 1979 [YR1979] : chr  ".." "12.4857" "6.067785" "2.39808173319034E-14" ...
 $ 1980 [YR1980] : chr  ".." "12.6755" "6.170271" "1.4210854715202E-14" ...
 $ 1981 [YR1981] : chr  ".." "12.937" "7.047013" "-5.15143483426073E-14" ...
 $ 1982 [YR1982] : chr  ".." "13.182" "7.624574" "6.83897383169096E-14" ...
 $ 1983 [YR1983] : chr  ".." "13.3622" "7.290487" "-4.17443857259059E-14" ...
 $ 1984 [YR1984] : chr  ".." "13.6488" "7.828492" "-7.90478793533111E-14" ...
 $ 1985 [YR1985] : chr  ".." "13.9676" "8.650364" "3.01980662698043E-14" ...
 $ 1986 [YR1986] : chr  ".." "13.8472" "6.47466" "1.4210854715202E-14" ...
  [list output truncated]
report_new$`2018` <- as.numeric(report_new$`2018`)
report_new$`1960 [YR1960]` <- as.numeric(report_new$`1960 [YR1960]`)
NAs introduced by coercion
report_new$`1961 [YR1961]` <- as.numeric(report_new$`1961 [YR1961]`)
NAs introduced by coercion
report_new$`1962 [YR1962]` <- as.numeric(report_new$`1962 [YR1962]`)
NAs introduced by coercion
report_new$`1963 [YR1963]` <- as.numeric(report_new$`1963 [YR1963]`)
NAs introduced by coercion
report_new$`1964 [YR1964]` <- as.numeric(report_new$`1964 [YR1964]`)
NAs introduced by coercion
report_new$`1965 [YR1965]` <- as.numeric(report_new$`1965 [YR1965]`)
NAs introduced by coercion
report_new$`1966 [YR1966]` <- as.numeric(report_new$`1966 [YR1966]`)
NAs introduced by coercion
report_new$`1967 [YR1967]` <- as.numeric(report_new$`1967 [YR1967]`)
NAs introduced by coercion
report_new$`1968 [YR1968]` <- as.numeric(report_new$`1968 [YR1968]`)
NAs introduced by coercion
report_new$`1969 [YR1969]` <- as.numeric(report_new$`1969 [YR1969]`)
NAs introduced by coercion
report_new$`1970 [YR1970]` <- as.numeric(report_new$`1970 [YR1970]`)
NAs introduced by coercion
report_new$`1971 [YR1971]` <- as.numeric(report_new$`1971 [YR1971]`)
NAs introduced by coercion
report_new$`1972 [YR1972]` <- as.numeric(report_new$`1972 [YR1972]`)
NAs introduced by coercion
report_new$`1973 [YR1973]` <- as.numeric(report_new$`1973 [YR1973]`)
NAs introduced by coercion
report_new$`1974 [YR1974]` <- as.numeric(report_new$`1974 [YR1974]`)
NAs introduced by coercion
report_new$`1975 [YR1975]` <- as.numeric(report_new$`1975 [YR1975]`)
NAs introduced by coercion
report_new$`1976 [YR1976]` <- as.numeric(report_new$`1976 [YR1976]`)
NAs introduced by coercion
report_new$`1977 [YR1977]` <- as.numeric(report_new$`1977 [YR1977]`)
NAs introduced by coercion
report_new$`1978 [YR1978]` <- as.numeric(report_new$`1978 [YR1978]`)
NAs introduced by coercion
report_new$`1979 [YR1979]` <- as.numeric(report_new$`1979 [YR1979]`)
NAs introduced by coercion
report_new$`1980 [YR1980]` <- as.numeric(report_new$`1980 [YR1980]`)
NAs introduced by coercion
report_new$`1981 [YR1981]` <- as.numeric(report_new$`1981 [YR1981]`)
NAs introduced by coercion
report_new$`1982 [YR1982]` <- as.numeric(report_new$`1982 [YR1982]`)
NAs introduced by coercion
report_new$`1983 [YR1983]` <- as.numeric(report_new$`1983 [YR1983]`)
NAs introduced by coercion
report_new$`1984 [YR1984]` <- as.numeric(report_new$`1984 [YR1984]`)
NAs introduced by coercion
report_new$`1985 [YR1985]` <- as.numeric(report_new$`1985 [YR1985]`)
NAs introduced by coercion
report_new$`1986 [YR1986]` <- as.numeric(report_new$`1986 [YR1986]`)
NAs introduced by coercion
report_new$`1987 [YR1987]` <- as.numeric(report_new$`1987 [YR1987]`)
NAs introduced by coercion
report_new$`1988 [YR1988]` <- as.numeric(report_new$`1988 [YR1988]`)
NAs introduced by coercion
report_new$`1989 [YR1989]` <- as.numeric(report_new$`1989 [YR1989]`)
NAs introduced by coercion
report_new$`1990 [YR1990]` <- as.numeric(report_new$`1990 [YR1990]`)
NAs introduced by coercion
report_new$`1991 [YR1991]` <- as.numeric(report_new$`1991 [YR1991]`)
NAs introduced by coercion
report_new$`1992 [YR1992]` <- as.numeric(report_new$`1992 [YR1992]`)
NAs introduced by coercion
report_new$`1993 [YR1993]` <- as.numeric(report_new$`1993 [YR1993]`)
NAs introduced by coercion
report_new$`1994 [YR1994]` <- as.numeric(report_new$`1994 [YR1994]`)
NAs introduced by coercion
report_new$`1995 [YR1995]` <- as.numeric(report_new$`1995 [YR1995]`)
NAs introduced by coercion
report_new$`1996 [YR1996]` <- as.numeric(report_new$`1996 [YR1996]`)
NAs introduced by coercion
report_new$`1997 [YR1997]` <- as.numeric(report_new$`1997 [YR1997]`)
NAs introduced by coercion
report_new$`1998 [YR1998]` <- as.numeric(report_new$`1998 [YR1998]`)
NAs introduced by coercion
report_new$`1999 [YR1999]` <- as.numeric(report_new$`1999 [YR1999]`)
NAs introduced by coercion
report_new$`2000 [YR2000]` <- as.numeric(report_new$`2000 [YR2000]`)
NAs introduced by coercion
report_new$`2001 [YR2001]` <- as.numeric(report_new$`2001 [YR2001]`)
NAs introduced by coercion
report_new$`2002 [YR2002]` <- as.numeric(report_new$`2002 [YR2002]`)
NAs introduced by coercion
report_new$`2003 [YR2003]` <- as.numeric(report_new$`2003 [YR2003]`)
NAs introduced by coercion
report_new$`2004 [YR2004]` <- as.numeric(report_new$`2004 [YR2004]`)
NAs introduced by coercion
report_new$`2005 [YR2005]` <- as.numeric(report_new$`2005 [YR2005]`)
NAs introduced by coercion
report_new$`2006 [YR2006]` <- as.numeric(report_new$`2006 [YR2006]`)
NAs introduced by coercion
report_new$`2007 [YR2007]` <- as.numeric(report_new$`2007 [YR2007]`)
NAs introduced by coercion
report_new$`2008 [YR2008]` <- as.numeric(report_new$`2008 [YR2008]`)
NAs introduced by coercion
report_new$`2009 [YR2009]` <- as.numeric(report_new$`2009 [YR2009]`)
NAs introduced by coercion
report_new$`2010 [YR2010]` <- as.numeric(report_new$`2010 [YR2010]`)
NAs introduced by coercion
report_new$`2011 [YR2011]` <- as.numeric(report_new$`2011 [YR2011]`)
NAs introduced by coercion
report_new$`2012 [YR2012]` <- as.numeric(report_new$`2012 [YR2012]`)
NAs introduced by coercion
report_new$`2013 [YR2013]` <- as.numeric(report_new$`2013 [YR2013]`)
NAs introduced by coercion
report_new$`2014 [YR2014]` <- as.numeric(report_new$`2014 [YR2014]`)
NAs introduced by coercion
report_new$`2015 [YR2015]` <- as.numeric(report_new$`2015 [YR2015]`)
NAs introduced by coercion
report_new$`2016 [YR2016]` <- as.numeric(report_new$`2016 [YR2016]`)
NAs introduced by coercion
report_new$`2017 [YR2017]` <- as.numeric(report_new$`2017 [YR2017]`)
NAs introduced by coercion
report_new$`2018 [YR2018]` <- as.numeric(report_new$`2018 [YR2018]`)
NAs introduced by coercion
report_new$Region <- factor(report_new$Region,
                            levels = c("East Asia & Pacific", "Europe & Central Asia", "Latin America & Caribbean", "Middle East & North Africa", "North America", "South Asia", "Sub-Saharan Africa"))
levels(report_new$Region)
[1] "East Asia & Pacific"        "Europe & Central Asia"      "Latin America & Caribbean" 
[4] "Middle East & North Africa" "North America"              "South Asia"                
[7] "Sub-Saharan Africa"        
report_new$IncomeGroup <- factor(report_new$IncomeGroup,
                                 levels = c("High income", "Upper middle income", "Lower middle income", "Low income"),
                                 ordered = TRUE)
levels(report_new$IncomeGroup)
[1] "High income"         "Upper middle income" "Lower middle income" "Low income"         
str(report_new)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1848 obs. of  131 variables:
 $ Country Name.x: chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ Country Code  : chr  "ABW" "ABW" "ABW" "ABW" ...
 $ Indicator Name: chr  "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" ...
 $ Indicator Code: chr  "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" ...
 $ 1960          : num  3.15 3.15 3.15 3.15 3.15 ...
 $ 1961          : num  2.24 2.24 2.24 2.24 2.24 ...
 $ 1962          : num  1.41 1.41 1.41 1.41 1.41 ...
 $ 1963          : num  0.832 0.832 0.832 0.832 0.832 ...
 $ 1964          : num  0.593 0.593 0.593 0.593 0.593 ...
 $ 1965          : num  0.573 0.573 0.573 0.573 0.573 ...
 $ 1966          : num  0.617 0.617 0.617 0.617 0.617 ...
 $ 1967          : num  0.587 0.587 0.587 0.587 0.587 ...
 $ 1968          : num  0.569 0.569 0.569 0.569 0.569 ...
 $ 1969          : num  0.581 0.581 0.581 0.581 0.581 ...
 $ 1970          : num  0.572 0.572 0.572 0.572 0.572 ...
 $ 1971          : num  0.636 0.636 0.636 0.636 0.636 ...
 $ 1972          : num  0.671 0.671 0.671 0.671 0.671 ...
 $ 1973          : num  0.671 0.671 0.671 0.671 0.671 ...
 $ 1974          : num  0.472 0.472 0.472 0.472 0.472 ...
 $ 1975          : num  0.213 0.213 0.213 0.213 0.213 ...
 $ 1976          : num  -0.117 -0.117 -0.117 -0.117 -0.117 ...
 $ 1977          : num  -0.364 -0.364 -0.364 -0.364 -0.364 ...
 $ 1978          : num  -0.437 -0.437 -0.437 -0.437 -0.437 ...
 $ 1979          : num  -0.205 -0.205 -0.205 -0.205 -0.205 ...
 $ 1980          : num  0.193 0.193 0.193 0.193 0.193 ...
 $ 1981          : num  0.781 0.781 0.781 0.781 0.781 ...
 $ 1982          : num  1.28 1.28 1.28 1.28 1.28 ...
 $ 1983          : num  1.39 1.39 1.39 1.39 1.39 ...
 $ 1984          : num  1.02 1.02 1.02 1.02 1.02 ...
 $ 1985          : num  0.302 0.302 0.302 0.302 0.302 ...
 $ 1986          : num  -0.608 -0.608 -0.608 -0.608 -0.608 ...
 $ 1987          : num  -1.3 -1.3 -1.3 -1.3 -1.3 ...
 $ 1988          : num  -1.23 -1.23 -1.23 -1.23 -1.23 ...
 $ 1989          : num  -0.077 -0.077 -0.077 -0.077 -0.077 ...
 $ 1990          : num  1.81 1.81 1.81 1.81 1.81 ...
 $ 1991          : num  3.9 3.9 3.9 3.9 3.9 ...
 $ 1992          : num  5.44 5.44 5.44 5.44 5.44 ...
 $ 1993          : num  6.07 6.07 6.07 6.07 6.07 ...
 $ 1994          : num  5.63 5.63 5.63 5.63 5.63 ...
 $ 1995          : num  4.62 4.62 4.62 4.62 4.62 ...
 $ 1996          : num  3.52 3.52 3.52 3.52 3.52 ...
 $ 1997          : num  2.67 2.67 2.67 2.67 2.67 ...
 $ 1998          : num  2.11 2.11 2.11 2.11 2.11 ...
 $ 1999          : num  1.96 1.96 1.96 1.96 1.96 ...
 $ 2000          : num  2.06 2.06 2.06 2.06 2.06 ...
 $ 2001          : num  2.23 2.23 2.23 2.23 2.23 ...
 $ 2002          : num  2.23 2.23 2.23 2.23 2.23 ...
 $ 2003          : num  2.11 2.11 2.11 2.11 2.11 ...
 $ 2004          : num  1.76 1.76 1.76 1.76 1.76 ...
 $ 2005          : num  1.3 1.3 1.3 1.3 1.3 ...
 $ 2006          : num  0.798 0.798 0.798 0.798 0.798 ...
 $ 2007          : num  0.384 0.384 0.384 0.384 0.384 ...
 $ 2008          : num  0.131 0.131 0.131 0.131 0.131 ...
 $ 2009          : num  0.0986 0.0986 0.0986 0.0986 0.0986 ...
 $ 2010          : num  0.213 0.213 0.213 0.213 0.213 ...
 $ 2011          : num  0.377 0.377 0.377 0.377 0.377 ...
 $ 2012          : num  0.512 0.512 0.512 0.512 0.512 ...
 $ 2013          : num  0.593 0.593 0.593 0.593 0.593 ...
 $ 2014          : num  0.587 0.587 0.587 0.587 0.587 ...
 $ 2015          : num  0.525 0.525 0.525 0.525 0.525 ...
 $ 2016          : num  0.46 0.46 0.46 0.46 0.46 ...
 $ 2017          : num  0.421 0.421 0.421 0.421 0.421 ...
 $ 2018          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X64           : chr  NA NA NA NA ...
 $ Region        : Factor w/ 7 levels "East Asia & Pacific",..: 3 3 3 3 3 3 3 6 6 6 ...
 $ IncomeGroup   : Ord.factor w/ 4 levels "High income"<..: 1 1 1 1 1 1 1 4 4 4 ...
 $ SpecialNotes  : chr  "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ ...
 $ TableName     : chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ X6            : chr  NA NA NA NA ...
 $ Country Name.y: chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
 $ Series Name   : chr  "HFC gas emissions (thousand metric tons of CO2 equivalent)" "Methane emissions (kt of CO2 equivalent)" "Nitrous oxide emissions (thousand metric tons of CO2 equivalent)" "Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)" ...
 $ Series Code   : chr  "EN.ATM.HFCG.KT.CE" "EN.ATM.METH.KT.CE" "EN.ATM.NOXE.KT.CE" "EN.ATM.GHGO.KT.CE" ...
 $ 1960 [YR1960] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1961 [YR1961] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1962 [YR1962] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1963 [YR1963] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1964 [YR1964] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1965 [YR1965] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1966 [YR1966] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1967 [YR1967] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1968 [YR1968] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1969 [YR1969] : num  NA NA NA NA NA NA NA NA NA NA ...
 $ 1970 [YR1970] : num  NA 1.02e+01 1.83 4.44e-16 NA ...
 $ 1971 [YR1971] : num  NA 1.05e+01 1.83 -6.66e-16 NA ...
 $ 1972 [YR1972] : num  NA 1.07e+01 1.84 -4.00e-15 NA ...
 $ 1973 [YR1973] : num  NA 1.09e+01 1.82 4.44e-16 NA ...
 $ 1974 [YR1974] : num  NA 1.10e+01 1.82 3.11e-15 NA ...
 $ 1975 [YR1975] : num  NA 1.12e+01 1.83 -6.66e-16 NA ...
 $ 1976 [YR1976] : num  NA 1.15e+01 2.63 -1.33e-14 NA ...
 $ 1977 [YR1977] : num  NA 1.17e+01 2.78 7.99e-15 NA ...
 $ 1978 [YR1978] : num  NA 1.22e+01 5.30 3.20e-14 NA ...
 $ 1979 [YR1979] : num  NA 1.25e+01 6.07 2.40e-14 NA ...
 $ 1980 [YR1980] : num  NA 1.27e+01 6.17 1.42e-14 NA ...
 $ 1981 [YR1981] : num  NA 1.29e+01 7.05 -5.15e-14 NA ...
 $ 1982 [YR1982] : num  NA 1.32e+01 7.62 6.84e-14 NA ...
 $ 1983 [YR1983] : num  NA 1.34e+01 7.29 -4.17e-14 NA ...
 $ 1984 [YR1984] : num  NA 1.36e+01 7.83 -7.90e-14 NA ...
 $ 1985 [YR1985] : num  NA 1.40e+01 8.65 3.02e-14 NA ...
 $ 1986 [YR1986] : num  NA 1.38e+01 6.47 1.42e-14 NA ...
  [list output truncated]

Tidy & Manipulate Data I

Attempt 1

report_tidy <- report_new %>% select(-(`Indicator Name`:`Indicator Code`), -X64, -(SpecialNotes:`Country Name.y`), -`Series Code`)
head(report_tidy)
report_tidy1 <- report_tidy %>% gather(`1960`:`2018`,key = Year, value = "Pop Growth") %>% gather(`1960 [YR1960]`:`2018 [YR2018]`, key = EYear, value = Emissions) %>% spread(key = `Series Name`, value = Emissions) %>% separate(EYear, into = c("EmYear","YR"), sep = " ")
head(report_tidy1)
identical(report_tidy1$Year,report_tidy1$EmYear)
[1] FALSE

Attempt 2

report_new_pop <- report %>% select(`Country Name`, `Country Code`, `1960`:`2018`, Region, IncomeGroup) %>% gather(`1960`:`2018`,key = Year, value = "Pop Growth") %>% unite(Key, `Country Name`, Year, sep = "-")
head(report_new_pop)
report_new_em <- report_new %>% select(`Country Name.y`, `Series Name`, `1960 [YR1960]`:`2018 [YR2018]`) %>% gather(`1960 [YR1960]`:`2018 [YR2018]`, key = EYear, value = Emissions) %>% spread(key = `Series Name`, value = Emissions) %>% separate(EYear, into = c("Year","YR"), sep = " ") %>% unite(Key, `Country Name.y`, Year, sep = "-")
head(report_new_em)
report_tidy2 <- report_new_pop %>% left_join(report_new_em, by = "Key")
head(report_tidy2)
report_tidy2$Region <- factor(report_tidy2$Region,
                            levels = c("East Asia & Pacific", "Europe & Central Asia", "Latin America & Caribbean", "Middle East & North Africa", "North America", "South Asia", "Sub-Saharan Africa"))
levels(report_tidy2$Region)
[1] "East Asia & Pacific"        "Europe & Central Asia"      "Latin America & Caribbean" 
[4] "Middle East & North Africa" "North America"              "South Asia"                
[7] "Sub-Saharan Africa"        
report_tidy2$IncomeGroup <- factor(report_tidy2$IncomeGroup,
                                 levels = c("High income", "Upper middle income", "Lower middle income", "Low income"),
                                 ordered = TRUE)
levels(report_tidy2$IncomeGroup)
[1] "High income"         "Upper middle income" "Lower middle income" "Low income"         
report_tidy2$`Pop Growth` <- as.numeric(report_tidy2$`Pop Growth`)
report_tidy3 <- report_tidy2 %>% separate(Key, into = c("Country Name", "Year"), sep = "-") %>% select(-YR,-`Country Code`)
Expected 2 pieces. Additional pieces discarded in 531 rows [61, 86, 141, 190, 197, 214, 216, 236, 240, 325, 350, 405, 454, 461, 478, 480, 500, 504, 589, 614, ...].
head(report_tidy3, 30)

Tidy & Manipulate Data II

report_tidy4 <- report_tidy3 %>% mutate(`Total Emissions` = `CO2 emissions (kt)` + `HFC gas emissions (thousand metric tons of CO2 equivalent)` + `Methane emissions (kt of CO2 equivalent)` + `Nitrous oxide emissions (thousand metric tons of CO2 equivalent)` + `Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)` + `PFC gas emissions (thousand metric tons of CO2 equivalent)` + `SF6 gas emissions (thousand metric tons of CO2 equivalent)`)
head(report_tidy4)

Scan I

colSums(is.na(report_tidy4))
                                                                             Country Name 
                                                                                        0 
                                                                                     Year 
                                                                                        0 
                                                                                   Region 
                                                                                     2773 
                                                                              IncomeGroup 
                                                                                     2773 
                                                                               Pop Growth 
                                                                                      480 
                                                                       CO2 emissions (kt) 
                                                                                     3321 
                               HFC gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                    14873 
                                                 Methane emissions (kt of CO2 equivalent) 
                                                                                     4862 
                         Nitrous oxide emissions (thousand metric tons of CO2 equivalent) 
                                                                                     4819 
Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent) 
                                                                                     5627 
                               PFC gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                    14872 
                               SF6 gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                    14867 
                                                                          Total Emissions 
                                                                                    14926 
report_tidy4$`Pop Growth`[is.na(report_tidy4$`Pop Growth`)] <- 0
report_tidy4$`CO2 emissions (kt)`[is.na(report_tidy4$`CO2 emissions (kt)`)] <- 0
report_tidy4$`HFC gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`HFC gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`Methane emissions (kt of CO2 equivalent)`[is.na(report_tidy4$`Methane emissions (kt of CO2 equivalent)`)] <- 0
report_tidy4$`Nitrous oxide emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`Nitrous oxide emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`PFC gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`PFC gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`SF6 gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`SF6 gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4 <- report_tidy4 %>% filter(!is.na(Region))
report_tidy5 <- report_tidy4 %>% mutate(`Total Emissions` = `CO2 emissions (kt)` + `HFC gas emissions (thousand metric tons of CO2 equivalent)` + `Methane emissions (kt of CO2 equivalent)` + `Nitrous oxide emissions (thousand metric tons of CO2 equivalent)` + `Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)` + `PFC gas emissions (thousand metric tons of CO2 equivalent)` + `SF6 gas emissions (thousand metric tons of CO2 equivalent)`)
colSums(is.na(report_tidy5))
                                                                             Country Name 
                                                                                        0 
                                                                                     Year 
                                                                                        0 
                                                                                   Region 
                                                                                        0 
                                                                              IncomeGroup 
                                                                                        0 
                                                                               Pop Growth 
                                                                                        0 
                                                                       CO2 emissions (kt) 
                                                                                        0 
                               HFC gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                        0 
                                                 Methane emissions (kt of CO2 equivalent) 
                                                                                        0 
                         Nitrous oxide emissions (thousand metric tons of CO2 equivalent) 
                                                                                        0 
Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent) 
                                                                                        0 
                               PFC gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                        0 
                               SF6 gas emissions (thousand metric tons of CO2 equivalent) 
                                                                                        0 
                                                                          Total Emissions 
                                                                                        0 
is.special <- function(x) {if(is.numeric(x))(is.infinite(x)|is.nan(x))}
sum(is.special(report_tidy5))
[1] 0

Scan II

report_num <- report_tidy5 %>% select(-(6:12)) %>% filter(`Total Emissions` != 0 & `Pop Growth` != 0)
report_num %>% plot(`Pop Growth`~`Total Emissions`, data = ., ylab = "Pop Growth", xlab = "Total Emissions")

report_sub_em <- report_num %>% arrange(desc(`Total Emissions`))
head(report_sub_em, 20)
report_sub_pop <- report_num %>% arrange(`Pop Growth`)
head(report_sub_pop, 5)
report_tidy6 <- report_tidy5 %>% filter(`Total Emissions` < 8000000) %>% filter(`Pop Growth` > -7)
report_tidy6 %>% plot(`Pop Growth`~`Total Emissions`, data = ., ylab = "Pop Growth", xlab = "Total Emissions", main = "After Removing Outliers")

Transform

report_scale <- report_tidy6 %>% select(`Total Emissions`)
report_scaled <- scale(report_scale, center = TRUE, scale = TRUE)
head(report_scaled)
     Total Emissions
[1,]      -0.2466611
[2,]      -0.2458115
[3,]      -0.2455334
[4,]      -0.2425110
[5,]      -0.2466611
[6,]      -0.2466386



