library(readr)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(ggplot2)
library(knitr)
With the intention of studying correlation between global population growth and CO2 emissions, I found three datasets on the World Bank website. First, I examined all the variables, and found a common key to join the datasets together. Then, I looked at the data structure and attributes in detail, here I made the decision to convert some of the variables to the correct type (e.g. char to num). Next, I transformed the data from wide format to long format, this step took two attempts due to the different number of records for each country in two of the datasets as I discovered, which meant in the second go I had to split the data, apply transformation and rejoin the data. Furthermore, because I am interested in total emissions, I added up all the different types of emissions to create a new variable called “Total Emissions”. Also I scanned the data for any missing values, special values and outliers, and dealt with them by making missing values zero or deleting outliers upon closer inspection. Lastly, due to the massive difference in scale between Pop Growth and Total Emissions, I normalised Total Emissions by changing the scale for better understanding.
1. Description
Weather reports on record-breaking temperatures and devastating natural disasters constantly remind us that global warming is real and climate change is unequivocal. Hence I decided to look at global population growth and CO2 emissions by country in the last half century or so with data from the World Bank. The population growth dataset also came with a sub-dataset, which categorises the countries in different income groups. Links: pop growth and emissions
Pop Growth Variables
Country Name: country name (qualitative variable)
Country Code: 3-letter code (qualitative variable)
Indicator Name: population growth (annual %) (constant qualitative variable)
Indicator Code: code for population growth (constant qualitative variable)
1960 to 2018 (58 columns): annual growth rate by country for respective year (quantitative variable)
Metadata Country Variables
Country Code: 3-letter code (qualitative variable)
Region: geographic region (qualitative variable)
IncomeGroup: High income; Upper middle income; Lower middle income; Low income (qualitative variable)
SpecialNote: sources of population estimates for each country (qualitative variable)
TableName: country name (qualitative variable)
Emissions Variables
Country Name: country name (qualitative variable)
Country Code: 3-letter code (qualitative variable)
Series Name: different types of emissions - all in kt of CO2 equivalent (qualitative variable)
Series Code: code for different types of emissions (qualitative variable)
1960 [YR1960] to 2018 [YR2018] (58 columns): annual emissions in thousand metric tonnes (quantitative variable)
2. Read/Import Data
pop <- read_csv("pop growth.csv",skip = 4)
Missing column names filled in: 'X64' [64]Parsed with column specification:
cols(
.default = col_double(),
`Country Name` = col_character(),
`Country Code` = col_character(),
`Indicator Name` = col_character(),
`Indicator Code` = col_character(),
`2018` = col_character(),
X64 = col_character()
)
See spec(...) for full column specifications.
head(pop)
country <- read_csv("Metadata_Country.csv")
Missing column names filled in: 'X6' [6]Parsed with column specification:
cols(
`Country Code` = col_character(),
Region = col_character(),
IncomeGroup = col_character(),
SpecialNotes = col_character(),
TableName = col_character(),
X6 = col_character()
)
head(country)
emissions <- read_csv("emissions.csv")
Parsed with column specification:
cols(
.default = col_character()
)
See spec(...) for full column specifications.
head(emissions)
3. Merge Data
report <- left_join(pop, country, by = "Country Code")
head(report)
report_new <- left_join(report, emissions, by = "Country Code")
head(report_new)
dim(report_new)
[1] 1848 131
names(report_new)
[1] "Country Name.x" "Country Code" "Indicator Name" "Indicator Code" "1960"
[6] "1961" "1962" "1963" "1964" "1965"
[11] "1966" "1967" "1968" "1969" "1970"
[16] "1971" "1972" "1973" "1974" "1975"
[21] "1976" "1977" "1978" "1979" "1980"
[26] "1981" "1982" "1983" "1984" "1985"
[31] "1986" "1987" "1988" "1989" "1990"
[36] "1991" "1992" "1993" "1994" "1995"
[41] "1996" "1997" "1998" "1999" "2000"
[46] "2001" "2002" "2003" "2004" "2005"
[51] "2006" "2007" "2008" "2009" "2010"
[56] "2011" "2012" "2013" "2014" "2015"
[61] "2016" "2017" "2018" "X64" "Region"
[66] "IncomeGroup" "SpecialNotes" "TableName" "X6" "Country Name.y"
[71] "Series Name" "Series Code" "1960 [YR1960]" "1961 [YR1961]" "1962 [YR1962]"
[76] "1963 [YR1963]" "1964 [YR1964]" "1965 [YR1965]" "1966 [YR1966]" "1967 [YR1967]"
[81] "1968 [YR1968]" "1969 [YR1969]" "1970 [YR1970]" "1971 [YR1971]" "1972 [YR1972]"
[86] "1973 [YR1973]" "1974 [YR1974]" "1975 [YR1975]" "1976 [YR1976]" "1977 [YR1977]"
[91] "1978 [YR1978]" "1979 [YR1979]" "1980 [YR1980]" "1981 [YR1981]" "1982 [YR1982]"
[96] "1983 [YR1983]" "1984 [YR1984]" "1985 [YR1985]" "1986 [YR1986]" "1987 [YR1987]"
[101] "1988 [YR1988]" "1989 [YR1989]" "1990 [YR1990]" "1991 [YR1991]" "1992 [YR1992]"
[106] "1993 [YR1993]" "1994 [YR1994]" "1995 [YR1995]" "1996 [YR1996]" "1997 [YR1997]"
[111] "1998 [YR1998]" "1999 [YR1999]" "2000 [YR2000]" "2001 [YR2001]" "2002 [YR2002]"
[116] "2003 [YR2003]" "2004 [YR2004]" "2005 [YR2005]" "2006 [YR2006]" "2007 [YR2007]"
[121] "2008 [YR2008]" "2009 [YR2009]" "2010 [YR2010]" "2011 [YR2011]" "2012 [YR2012]"
[126] "2013 [YR2013]" "2014 [YR2014]" "2015 [YR2015]" "2016 [YR2016]" "2017 [YR2017]"
[131] "2018 [YR2018]"
str(report_new)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1848 obs. of 131 variables:
$ Country Name.x: chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ Country Code : chr "ABW" "ABW" "ABW" "ABW" ...
$ Indicator Name: chr "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" ...
$ Indicator Code: chr "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" ...
$ 1960 : num 3.15 3.15 3.15 3.15 3.15 ...
$ 1961 : num 2.24 2.24 2.24 2.24 2.24 ...
$ 1962 : num 1.41 1.41 1.41 1.41 1.41 ...
$ 1963 : num 0.832 0.832 0.832 0.832 0.832 ...
$ 1964 : num 0.593 0.593 0.593 0.593 0.593 ...
$ 1965 : num 0.573 0.573 0.573 0.573 0.573 ...
$ 1966 : num 0.617 0.617 0.617 0.617 0.617 ...
$ 1967 : num 0.587 0.587 0.587 0.587 0.587 ...
$ 1968 : num 0.569 0.569 0.569 0.569 0.569 ...
$ 1969 : num 0.581 0.581 0.581 0.581 0.581 ...
$ 1970 : num 0.572 0.572 0.572 0.572 0.572 ...
$ 1971 : num 0.636 0.636 0.636 0.636 0.636 ...
$ 1972 : num 0.671 0.671 0.671 0.671 0.671 ...
$ 1973 : num 0.671 0.671 0.671 0.671 0.671 ...
$ 1974 : num 0.472 0.472 0.472 0.472 0.472 ...
$ 1975 : num 0.213 0.213 0.213 0.213 0.213 ...
$ 1976 : num -0.117 -0.117 -0.117 -0.117 -0.117 ...
$ 1977 : num -0.364 -0.364 -0.364 -0.364 -0.364 ...
$ 1978 : num -0.437 -0.437 -0.437 -0.437 -0.437 ...
$ 1979 : num -0.205 -0.205 -0.205 -0.205 -0.205 ...
$ 1980 : num 0.193 0.193 0.193 0.193 0.193 ...
$ 1981 : num 0.781 0.781 0.781 0.781 0.781 ...
$ 1982 : num 1.28 1.28 1.28 1.28 1.28 ...
$ 1983 : num 1.39 1.39 1.39 1.39 1.39 ...
$ 1984 : num 1.02 1.02 1.02 1.02 1.02 ...
$ 1985 : num 0.302 0.302 0.302 0.302 0.302 ...
$ 1986 : num -0.608 -0.608 -0.608 -0.608 -0.608 ...
$ 1987 : num -1.3 -1.3 -1.3 -1.3 -1.3 ...
$ 1988 : num -1.23 -1.23 -1.23 -1.23 -1.23 ...
$ 1989 : num -0.077 -0.077 -0.077 -0.077 -0.077 ...
$ 1990 : num 1.81 1.81 1.81 1.81 1.81 ...
$ 1991 : num 3.9 3.9 3.9 3.9 3.9 ...
$ 1992 : num 5.44 5.44 5.44 5.44 5.44 ...
$ 1993 : num 6.07 6.07 6.07 6.07 6.07 ...
$ 1994 : num 5.63 5.63 5.63 5.63 5.63 ...
$ 1995 : num 4.62 4.62 4.62 4.62 4.62 ...
$ 1996 : num 3.52 3.52 3.52 3.52 3.52 ...
$ 1997 : num 2.67 2.67 2.67 2.67 2.67 ...
$ 1998 : num 2.11 2.11 2.11 2.11 2.11 ...
$ 1999 : num 1.96 1.96 1.96 1.96 1.96 ...
$ 2000 : num 2.06 2.06 2.06 2.06 2.06 ...
$ 2001 : num 2.23 2.23 2.23 2.23 2.23 ...
$ 2002 : num 2.23 2.23 2.23 2.23 2.23 ...
$ 2003 : num 2.11 2.11 2.11 2.11 2.11 ...
$ 2004 : num 1.76 1.76 1.76 1.76 1.76 ...
$ 2005 : num 1.3 1.3 1.3 1.3 1.3 ...
$ 2006 : num 0.798 0.798 0.798 0.798 0.798 ...
$ 2007 : num 0.384 0.384 0.384 0.384 0.384 ...
$ 2008 : num 0.131 0.131 0.131 0.131 0.131 ...
$ 2009 : num 0.0986 0.0986 0.0986 0.0986 0.0986 ...
$ 2010 : num 0.213 0.213 0.213 0.213 0.213 ...
$ 2011 : num 0.377 0.377 0.377 0.377 0.377 ...
$ 2012 : num 0.512 0.512 0.512 0.512 0.512 ...
$ 2013 : num 0.593 0.593 0.593 0.593 0.593 ...
$ 2014 : num 0.587 0.587 0.587 0.587 0.587 ...
$ 2015 : num 0.525 0.525 0.525 0.525 0.525 ...
$ 2016 : num 0.46 0.46 0.46 0.46 0.46 ...
$ 2017 : num 0.421 0.421 0.421 0.421 0.421 ...
$ 2018 : chr NA NA NA NA ...
$ X64 : chr NA NA NA NA ...
$ Region : chr "Latin America & Caribbean" "Latin America & Caribbean" "Latin America & Caribbean" "Latin America & Caribbean" ...
$ IncomeGroup : chr "High income" "High income" "High income" "High income" ...
$ SpecialNotes : chr "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ ...
$ TableName : chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ X6 : chr NA NA NA NA ...
$ Country Name.y: chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ Series Name : chr "HFC gas emissions (thousand metric tons of CO2 equivalent)" "Methane emissions (kt of CO2 equivalent)" "Nitrous oxide emissions (thousand metric tons of CO2 equivalent)" "Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)" ...
$ Series Code : chr "EN.ATM.HFCG.KT.CE" "EN.ATM.METH.KT.CE" "EN.ATM.NOXE.KT.CE" "EN.ATM.GHGO.KT.CE" ...
$ 1960 [YR1960] : chr ".." ".." ".." ".." ...
$ 1961 [YR1961] : chr ".." ".." ".." ".." ...
$ 1962 [YR1962] : chr ".." ".." ".." ".." ...
$ 1963 [YR1963] : chr ".." ".." ".." ".." ...
$ 1964 [YR1964] : chr ".." ".." ".." ".." ...
$ 1965 [YR1965] : chr ".." ".." ".." ".." ...
$ 1966 [YR1966] : chr ".." ".." ".." ".." ...
$ 1967 [YR1967] : chr ".." ".." ".." ".." ...
$ 1968 [YR1968] : chr ".." ".." ".." ".." ...
$ 1969 [YR1969] : chr ".." ".." ".." ".." ...
$ 1970 [YR1970] : chr ".." "10.2469" "1.8261976" "4.44089209850063E-16" ...
$ 1971 [YR1971] : chr ".." "10.4531" "1.8269478" "-6.66133814775094E-16" ...
$ 1972 [YR1972] : chr ".." "10.657" "1.8448131" "-3.99680288865056E-15" ...
$ 1973 [YR1973] : chr ".." "10.8551" "1.8220901" "4.44089209850063E-16" ...
$ 1974 [YR1974] : chr ".." "11.0415" "1.821157" "3.10862446895044E-15" ...
$ 1975 [YR1975] : chr ".." "11.2194" "1.8296076" "-6.66133814775094E-16" ...
$ 1976 [YR1976] : chr ".." "11.5069" "2.6258457" "-1.33226762955019E-14" ...
$ 1977 [YR1977] : chr ".." "11.6718" "2.7775194" "7.99360577730113E-15" ...
$ 1978 [YR1978] : chr ".." "12.2338" "5.303201" "3.19744231092045E-14" ...
$ 1979 [YR1979] : chr ".." "12.4857" "6.067785" "2.39808173319034E-14" ...
$ 1980 [YR1980] : chr ".." "12.6755" "6.170271" "1.4210854715202E-14" ...
$ 1981 [YR1981] : chr ".." "12.937" "7.047013" "-5.15143483426073E-14" ...
$ 1982 [YR1982] : chr ".." "13.182" "7.624574" "6.83897383169096E-14" ...
$ 1983 [YR1983] : chr ".." "13.3622" "7.290487" "-4.17443857259059E-14" ...
$ 1984 [YR1984] : chr ".." "13.6488" "7.828492" "-7.90478793533111E-14" ...
$ 1985 [YR1985] : chr ".." "13.9676" "8.650364" "3.01980662698043E-14" ...
$ 1986 [YR1986] : chr ".." "13.8472" "6.47466" "1.4210854715202E-14" ...
[list output truncated]
report_new$`2018` <- as.numeric(report_new$`2018`)
report_new$`1960 [YR1960]` <- as.numeric(report_new$`1960 [YR1960]`)
NAs introduced by coercion
report_new$`1961 [YR1961]` <- as.numeric(report_new$`1961 [YR1961]`)
NAs introduced by coercion
report_new$`1962 [YR1962]` <- as.numeric(report_new$`1962 [YR1962]`)
NAs introduced by coercion
report_new$`1963 [YR1963]` <- as.numeric(report_new$`1963 [YR1963]`)
NAs introduced by coercion
report_new$`1964 [YR1964]` <- as.numeric(report_new$`1964 [YR1964]`)
NAs introduced by coercion
report_new$`1965 [YR1965]` <- as.numeric(report_new$`1965 [YR1965]`)
NAs introduced by coercion
report_new$`1966 [YR1966]` <- as.numeric(report_new$`1966 [YR1966]`)
NAs introduced by coercion
report_new$`1967 [YR1967]` <- as.numeric(report_new$`1967 [YR1967]`)
NAs introduced by coercion
report_new$`1968 [YR1968]` <- as.numeric(report_new$`1968 [YR1968]`)
NAs introduced by coercion
report_new$`1969 [YR1969]` <- as.numeric(report_new$`1969 [YR1969]`)
NAs introduced by coercion
report_new$`1970 [YR1970]` <- as.numeric(report_new$`1970 [YR1970]`)
NAs introduced by coercion
report_new$`1971 [YR1971]` <- as.numeric(report_new$`1971 [YR1971]`)
NAs introduced by coercion
report_new$`1972 [YR1972]` <- as.numeric(report_new$`1972 [YR1972]`)
NAs introduced by coercion
report_new$`1973 [YR1973]` <- as.numeric(report_new$`1973 [YR1973]`)
NAs introduced by coercion
report_new$`1974 [YR1974]` <- as.numeric(report_new$`1974 [YR1974]`)
NAs introduced by coercion
report_new$`1975 [YR1975]` <- as.numeric(report_new$`1975 [YR1975]`)
NAs introduced by coercion
report_new$`1976 [YR1976]` <- as.numeric(report_new$`1976 [YR1976]`)
NAs introduced by coercion
report_new$`1977 [YR1977]` <- as.numeric(report_new$`1977 [YR1977]`)
NAs introduced by coercion
report_new$`1978 [YR1978]` <- as.numeric(report_new$`1978 [YR1978]`)
NAs introduced by coercion
report_new$`1979 [YR1979]` <- as.numeric(report_new$`1979 [YR1979]`)
NAs introduced by coercion
report_new$`1980 [YR1980]` <- as.numeric(report_new$`1980 [YR1980]`)
NAs introduced by coercion
report_new$`1981 [YR1981]` <- as.numeric(report_new$`1981 [YR1981]`)
NAs introduced by coercion
report_new$`1982 [YR1982]` <- as.numeric(report_new$`1982 [YR1982]`)
NAs introduced by coercion
report_new$`1983 [YR1983]` <- as.numeric(report_new$`1983 [YR1983]`)
NAs introduced by coercion
report_new$`1984 [YR1984]` <- as.numeric(report_new$`1984 [YR1984]`)
NAs introduced by coercion
report_new$`1985 [YR1985]` <- as.numeric(report_new$`1985 [YR1985]`)
NAs introduced by coercion
report_new$`1986 [YR1986]` <- as.numeric(report_new$`1986 [YR1986]`)
NAs introduced by coercion
report_new$`1987 [YR1987]` <- as.numeric(report_new$`1987 [YR1987]`)
NAs introduced by coercion
report_new$`1988 [YR1988]` <- as.numeric(report_new$`1988 [YR1988]`)
NAs introduced by coercion
report_new$`1989 [YR1989]` <- as.numeric(report_new$`1989 [YR1989]`)
NAs introduced by coercion
report_new$`1990 [YR1990]` <- as.numeric(report_new$`1990 [YR1990]`)
NAs introduced by coercion
report_new$`1991 [YR1991]` <- as.numeric(report_new$`1991 [YR1991]`)
NAs introduced by coercion
report_new$`1992 [YR1992]` <- as.numeric(report_new$`1992 [YR1992]`)
NAs introduced by coercion
report_new$`1993 [YR1993]` <- as.numeric(report_new$`1993 [YR1993]`)
NAs introduced by coercion
report_new$`1994 [YR1994]` <- as.numeric(report_new$`1994 [YR1994]`)
NAs introduced by coercion
report_new$`1995 [YR1995]` <- as.numeric(report_new$`1995 [YR1995]`)
NAs introduced by coercion
report_new$`1996 [YR1996]` <- as.numeric(report_new$`1996 [YR1996]`)
NAs introduced by coercion
report_new$`1997 [YR1997]` <- as.numeric(report_new$`1997 [YR1997]`)
NAs introduced by coercion
report_new$`1998 [YR1998]` <- as.numeric(report_new$`1998 [YR1998]`)
NAs introduced by coercion
report_new$`1999 [YR1999]` <- as.numeric(report_new$`1999 [YR1999]`)
NAs introduced by coercion
report_new$`2000 [YR2000]` <- as.numeric(report_new$`2000 [YR2000]`)
NAs introduced by coercion
report_new$`2001 [YR2001]` <- as.numeric(report_new$`2001 [YR2001]`)
NAs introduced by coercion
report_new$`2002 [YR2002]` <- as.numeric(report_new$`2002 [YR2002]`)
NAs introduced by coercion
report_new$`2003 [YR2003]` <- as.numeric(report_new$`2003 [YR2003]`)
NAs introduced by coercion
report_new$`2004 [YR2004]` <- as.numeric(report_new$`2004 [YR2004]`)
NAs introduced by coercion
report_new$`2005 [YR2005]` <- as.numeric(report_new$`2005 [YR2005]`)
NAs introduced by coercion
report_new$`2006 [YR2006]` <- as.numeric(report_new$`2006 [YR2006]`)
NAs introduced by coercion
report_new$`2007 [YR2007]` <- as.numeric(report_new$`2007 [YR2007]`)
NAs introduced by coercion
report_new$`2008 [YR2008]` <- as.numeric(report_new$`2008 [YR2008]`)
NAs introduced by coercion
report_new$`2009 [YR2009]` <- as.numeric(report_new$`2009 [YR2009]`)
NAs introduced by coercion
report_new$`2010 [YR2010]` <- as.numeric(report_new$`2010 [YR2010]`)
NAs introduced by coercion
report_new$`2011 [YR2011]` <- as.numeric(report_new$`2011 [YR2011]`)
NAs introduced by coercion
report_new$`2012 [YR2012]` <- as.numeric(report_new$`2012 [YR2012]`)
NAs introduced by coercion
report_new$`2013 [YR2013]` <- as.numeric(report_new$`2013 [YR2013]`)
NAs introduced by coercion
report_new$`2014 [YR2014]` <- as.numeric(report_new$`2014 [YR2014]`)
NAs introduced by coercion
report_new$`2015 [YR2015]` <- as.numeric(report_new$`2015 [YR2015]`)
NAs introduced by coercion
report_new$`2016 [YR2016]` <- as.numeric(report_new$`2016 [YR2016]`)
NAs introduced by coercion
report_new$`2017 [YR2017]` <- as.numeric(report_new$`2017 [YR2017]`)
NAs introduced by coercion
report_new$`2018 [YR2018]` <- as.numeric(report_new$`2018 [YR2018]`)
NAs introduced by coercion
report_new$Region <- factor(report_new$Region,
levels = c("East Asia & Pacific", "Europe & Central Asia", "Latin America & Caribbean", "Middle East & North Africa", "North America", "South Asia", "Sub-Saharan Africa"))
levels(report_new$Region)
[1] "East Asia & Pacific" "Europe & Central Asia" "Latin America & Caribbean"
[4] "Middle East & North Africa" "North America" "South Asia"
[7] "Sub-Saharan Africa"
report_new$IncomeGroup <- factor(report_new$IncomeGroup,
levels = c("High income", "Upper middle income", "Lower middle income", "Low income"),
ordered = TRUE)
levels(report_new$IncomeGroup)
[1] "High income" "Upper middle income" "Lower middle income" "Low income"
str(report_new)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1848 obs. of 131 variables:
$ Country Name.x: chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ Country Code : chr "ABW" "ABW" "ABW" "ABW" ...
$ Indicator Name: chr "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" "Population growth (annual %)" ...
$ Indicator Code: chr "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" "SP.POP.GROW" ...
$ 1960 : num 3.15 3.15 3.15 3.15 3.15 ...
$ 1961 : num 2.24 2.24 2.24 2.24 2.24 ...
$ 1962 : num 1.41 1.41 1.41 1.41 1.41 ...
$ 1963 : num 0.832 0.832 0.832 0.832 0.832 ...
$ 1964 : num 0.593 0.593 0.593 0.593 0.593 ...
$ 1965 : num 0.573 0.573 0.573 0.573 0.573 ...
$ 1966 : num 0.617 0.617 0.617 0.617 0.617 ...
$ 1967 : num 0.587 0.587 0.587 0.587 0.587 ...
$ 1968 : num 0.569 0.569 0.569 0.569 0.569 ...
$ 1969 : num 0.581 0.581 0.581 0.581 0.581 ...
$ 1970 : num 0.572 0.572 0.572 0.572 0.572 ...
$ 1971 : num 0.636 0.636 0.636 0.636 0.636 ...
$ 1972 : num 0.671 0.671 0.671 0.671 0.671 ...
$ 1973 : num 0.671 0.671 0.671 0.671 0.671 ...
$ 1974 : num 0.472 0.472 0.472 0.472 0.472 ...
$ 1975 : num 0.213 0.213 0.213 0.213 0.213 ...
$ 1976 : num -0.117 -0.117 -0.117 -0.117 -0.117 ...
$ 1977 : num -0.364 -0.364 -0.364 -0.364 -0.364 ...
$ 1978 : num -0.437 -0.437 -0.437 -0.437 -0.437 ...
$ 1979 : num -0.205 -0.205 -0.205 -0.205 -0.205 ...
$ 1980 : num 0.193 0.193 0.193 0.193 0.193 ...
$ 1981 : num 0.781 0.781 0.781 0.781 0.781 ...
$ 1982 : num 1.28 1.28 1.28 1.28 1.28 ...
$ 1983 : num 1.39 1.39 1.39 1.39 1.39 ...
$ 1984 : num 1.02 1.02 1.02 1.02 1.02 ...
$ 1985 : num 0.302 0.302 0.302 0.302 0.302 ...
$ 1986 : num -0.608 -0.608 -0.608 -0.608 -0.608 ...
$ 1987 : num -1.3 -1.3 -1.3 -1.3 -1.3 ...
$ 1988 : num -1.23 -1.23 -1.23 -1.23 -1.23 ...
$ 1989 : num -0.077 -0.077 -0.077 -0.077 -0.077 ...
$ 1990 : num 1.81 1.81 1.81 1.81 1.81 ...
$ 1991 : num 3.9 3.9 3.9 3.9 3.9 ...
$ 1992 : num 5.44 5.44 5.44 5.44 5.44 ...
$ 1993 : num 6.07 6.07 6.07 6.07 6.07 ...
$ 1994 : num 5.63 5.63 5.63 5.63 5.63 ...
$ 1995 : num 4.62 4.62 4.62 4.62 4.62 ...
$ 1996 : num 3.52 3.52 3.52 3.52 3.52 ...
$ 1997 : num 2.67 2.67 2.67 2.67 2.67 ...
$ 1998 : num 2.11 2.11 2.11 2.11 2.11 ...
$ 1999 : num 1.96 1.96 1.96 1.96 1.96 ...
$ 2000 : num 2.06 2.06 2.06 2.06 2.06 ...
$ 2001 : num 2.23 2.23 2.23 2.23 2.23 ...
$ 2002 : num 2.23 2.23 2.23 2.23 2.23 ...
$ 2003 : num 2.11 2.11 2.11 2.11 2.11 ...
$ 2004 : num 1.76 1.76 1.76 1.76 1.76 ...
$ 2005 : num 1.3 1.3 1.3 1.3 1.3 ...
$ 2006 : num 0.798 0.798 0.798 0.798 0.798 ...
$ 2007 : num 0.384 0.384 0.384 0.384 0.384 ...
$ 2008 : num 0.131 0.131 0.131 0.131 0.131 ...
$ 2009 : num 0.0986 0.0986 0.0986 0.0986 0.0986 ...
$ 2010 : num 0.213 0.213 0.213 0.213 0.213 ...
$ 2011 : num 0.377 0.377 0.377 0.377 0.377 ...
$ 2012 : num 0.512 0.512 0.512 0.512 0.512 ...
$ 2013 : num 0.593 0.593 0.593 0.593 0.593 ...
$ 2014 : num 0.587 0.587 0.587 0.587 0.587 ...
$ 2015 : num 0.525 0.525 0.525 0.525 0.525 ...
$ 2016 : num 0.46 0.46 0.46 0.46 0.46 ...
$ 2017 : num 0.421 0.421 0.421 0.421 0.421 ...
$ 2018 : num NA NA NA NA NA NA NA NA NA NA ...
$ X64 : chr NA NA NA NA ...
$ Region : Factor w/ 7 levels "East Asia & Pacific",..: 3 3 3 3 3 3 3 6 6 6 ...
$ IncomeGroup : Ord.factor w/ 4 levels "High income"<..: 1 1 1 1 1 1 1 4 4 4 ...
$ SpecialNotes : chr "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ "Central Bureau of Statistics and Central Bank of Aruba ; Source of population estimates: UN Population Division"| __truncated__ ...
$ TableName : chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ X6 : chr NA NA NA NA ...
$ Country Name.y: chr "Aruba" "Aruba" "Aruba" "Aruba" ...
$ Series Name : chr "HFC gas emissions (thousand metric tons of CO2 equivalent)" "Methane emissions (kt of CO2 equivalent)" "Nitrous oxide emissions (thousand metric tons of CO2 equivalent)" "Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)" ...
$ Series Code : chr "EN.ATM.HFCG.KT.CE" "EN.ATM.METH.KT.CE" "EN.ATM.NOXE.KT.CE" "EN.ATM.GHGO.KT.CE" ...
$ 1960 [YR1960] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1961 [YR1961] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1962 [YR1962] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1963 [YR1963] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1964 [YR1964] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1965 [YR1965] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1966 [YR1966] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1967 [YR1967] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1968 [YR1968] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1969 [YR1969] : num NA NA NA NA NA NA NA NA NA NA ...
$ 1970 [YR1970] : num NA 1.02e+01 1.83 4.44e-16 NA ...
$ 1971 [YR1971] : num NA 1.05e+01 1.83 -6.66e-16 NA ...
$ 1972 [YR1972] : num NA 1.07e+01 1.84 -4.00e-15 NA ...
$ 1973 [YR1973] : num NA 1.09e+01 1.82 4.44e-16 NA ...
$ 1974 [YR1974] : num NA 1.10e+01 1.82 3.11e-15 NA ...
$ 1975 [YR1975] : num NA 1.12e+01 1.83 -6.66e-16 NA ...
$ 1976 [YR1976] : num NA 1.15e+01 2.63 -1.33e-14 NA ...
$ 1977 [YR1977] : num NA 1.17e+01 2.78 7.99e-15 NA ...
$ 1978 [YR1978] : num NA 1.22e+01 5.30 3.20e-14 NA ...
$ 1979 [YR1979] : num NA 1.25e+01 6.07 2.40e-14 NA ...
$ 1980 [YR1980] : num NA 1.27e+01 6.17 1.42e-14 NA ...
$ 1981 [YR1981] : num NA 1.29e+01 7.05 -5.15e-14 NA ...
$ 1982 [YR1982] : num NA 1.32e+01 7.62 6.84e-14 NA ...
$ 1983 [YR1983] : num NA 1.34e+01 7.29 -4.17e-14 NA ...
$ 1984 [YR1984] : num NA 1.36e+01 7.83 -7.90e-14 NA ...
$ 1985 [YR1985] : num NA 1.40e+01 8.65 3.02e-14 NA ...
$ 1986 [YR1986] : num NA 1.38e+01 6.47 1.42e-14 NA ...
[list output truncated]
Attempt 1
report_tidy <- report_new %>% select(-(`Indicator Name`:`Indicator Code`), -X64, -(SpecialNotes:`Country Name.y`), -`Series Code`)
head(report_tidy)
report_tidy1 <- report_tidy %>% gather(`1960`:`2018`,key = Year, value = "Pop Growth") %>% gather(`1960 [YR1960]`:`2018 [YR2018]`, key = EYear, value = Emissions) %>% spread(key = `Series Name`, value = Emissions) %>% separate(EYear, into = c("EmYear","YR"), sep = " ")
head(report_tidy1)
identical(report_tidy1$Year,report_tidy1$EmYear)
[1] FALSE
Attempt 2
report_new_pop <- report %>% select(`Country Name`, `Country Code`, `1960`:`2018`, Region, IncomeGroup) %>% gather(`1960`:`2018`,key = Year, value = "Pop Growth") %>% unite(Key, `Country Name`, Year, sep = "-")
head(report_new_pop)
report_new_em <- report_new %>% select(`Country Name.y`, `Series Name`, `1960 [YR1960]`:`2018 [YR2018]`) %>% gather(`1960 [YR1960]`:`2018 [YR2018]`, key = EYear, value = Emissions) %>% spread(key = `Series Name`, value = Emissions) %>% separate(EYear, into = c("Year","YR"), sep = " ") %>% unite(Key, `Country Name.y`, Year, sep = "-")
head(report_new_em)
report_tidy2 <- report_new_pop %>% left_join(report_new_em, by = "Key")
head(report_tidy2)
report_tidy2$Region <- factor(report_tidy2$Region,
levels = c("East Asia & Pacific", "Europe & Central Asia", "Latin America & Caribbean", "Middle East & North Africa", "North America", "South Asia", "Sub-Saharan Africa"))
levels(report_tidy2$Region)
[1] "East Asia & Pacific" "Europe & Central Asia" "Latin America & Caribbean"
[4] "Middle East & North Africa" "North America" "South Asia"
[7] "Sub-Saharan Africa"
report_tidy2$IncomeGroup <- factor(report_tidy2$IncomeGroup,
levels = c("High income", "Upper middle income", "Lower middle income", "Low income"),
ordered = TRUE)
levels(report_tidy2$IncomeGroup)
[1] "High income" "Upper middle income" "Lower middle income" "Low income"
report_tidy2$`Pop Growth` <- as.numeric(report_tidy2$`Pop Growth`)
report_tidy3 <- report_tidy2 %>% separate(Key, into = c("Country Name", "Year"), sep = "-") %>% select(-YR,-`Country Code`)
Expected 2 pieces. Additional pieces discarded in 531 rows [61, 86, 141, 190, 197, 214, 216, 236, 240, 325, 350, 405, 454, 461, 478, 480, 500, 504, 589, 614, ...].
head(report_tidy3, 30)
report_tidy4 <- report_tidy3 %>% mutate(`Total Emissions` = `CO2 emissions (kt)` + `HFC gas emissions (thousand metric tons of CO2 equivalent)` + `Methane emissions (kt of CO2 equivalent)` + `Nitrous oxide emissions (thousand metric tons of CO2 equivalent)` + `Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)` + `PFC gas emissions (thousand metric tons of CO2 equivalent)` + `SF6 gas emissions (thousand metric tons of CO2 equivalent)`)
head(report_tidy4)
colSums(is.na(report_tidy4))
Country Name
0
Year
0
Region
2773
IncomeGroup
2773
Pop Growth
480
CO2 emissions (kt)
3321
HFC gas emissions (thousand metric tons of CO2 equivalent)
14873
Methane emissions (kt of CO2 equivalent)
4862
Nitrous oxide emissions (thousand metric tons of CO2 equivalent)
4819
Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)
5627
PFC gas emissions (thousand metric tons of CO2 equivalent)
14872
SF6 gas emissions (thousand metric tons of CO2 equivalent)
14867
Total Emissions
14926
report_tidy4$`Pop Growth`[is.na(report_tidy4$`Pop Growth`)] <- 0
report_tidy4$`CO2 emissions (kt)`[is.na(report_tidy4$`CO2 emissions (kt)`)] <- 0
report_tidy4$`HFC gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`HFC gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`Methane emissions (kt of CO2 equivalent)`[is.na(report_tidy4$`Methane emissions (kt of CO2 equivalent)`)] <- 0
report_tidy4$`Nitrous oxide emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`Nitrous oxide emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`PFC gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`PFC gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4$`SF6 gas emissions (thousand metric tons of CO2 equivalent)`[is.na(report_tidy4$`SF6 gas emissions (thousand metric tons of CO2 equivalent)`)] <- 0
report_tidy4 <- report_tidy4 %>% filter(!is.na(Region))
report_tidy5 <- report_tidy4 %>% mutate(`Total Emissions` = `CO2 emissions (kt)` + `HFC gas emissions (thousand metric tons of CO2 equivalent)` + `Methane emissions (kt of CO2 equivalent)` + `Nitrous oxide emissions (thousand metric tons of CO2 equivalent)` + `Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)` + `PFC gas emissions (thousand metric tons of CO2 equivalent)` + `SF6 gas emissions (thousand metric tons of CO2 equivalent)`)
colSums(is.na(report_tidy5))
Country Name
0
Year
0
Region
0
IncomeGroup
0
Pop Growth
0
CO2 emissions (kt)
0
HFC gas emissions (thousand metric tons of CO2 equivalent)
0
Methane emissions (kt of CO2 equivalent)
0
Nitrous oxide emissions (thousand metric tons of CO2 equivalent)
0
Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)
0
PFC gas emissions (thousand metric tons of CO2 equivalent)
0
SF6 gas emissions (thousand metric tons of CO2 equivalent)
0
Total Emissions
0
is.special <- function(x) {if(is.numeric(x))(is.infinite(x)|is.nan(x))}
sum(is.special(report_tidy5))
[1] 0
report_num <- report_tidy5 %>% select(-(6:12)) %>% filter(`Total Emissions` != 0 & `Pop Growth` != 0)
report_num %>% plot(`Pop Growth`~`Total Emissions`, data = ., ylab = "Pop Growth", xlab = "Total Emissions")
report_sub_em <- report_num %>% arrange(desc(`Total Emissions`))
head(report_sub_em, 20)
report_sub_pop <- report_num %>% arrange(`Pop Growth`)
head(report_sub_pop, 5)
report_tidy6 <- report_tidy5 %>% filter(`Total Emissions` < 8000000) %>% filter(`Pop Growth` > -7)
report_tidy6 %>% plot(`Pop Growth`~`Total Emissions`, data = ., ylab = "Pop Growth", xlab = "Total Emissions", main = "After Removing Outliers")
report_scale <- report_tidy6 %>% select(`Total Emissions`)
report_scaled <- scale(report_scale, center = TRUE, scale = TRUE)
head(report_scaled)
Total Emissions
[1,] -0.2466611
[2,] -0.2458115
[3,] -0.2455334
[4,] -0.2425110
[5,] -0.2466611
[6,] -0.2466386