Required Packages

library(readr)
library(lubridate)
library(dplyr)
library(tidyr)
library(knitr)
library(forecast) 
library(magrittr) 
library(car) 
library(stringr) 
library(moments)

Exective Summary

The goal of this assignment was to preprocess the data of contribution of GHG(Greenhouse Gas) emission of Asian and European countries from agriculture from 1961-2017 with projection of years 2030 and 20150.Both the datasets were retrieved from FAOSTAT Food and Agriculture Organization of the United Nations. After importing the datasets, the next step was to change the datatypes and variable names. Initially both our datasets were untidy and did not conform with the tidy principles. In this step year variables were gathered in one variable. This step was done for both the datasets. After tidying the datasets, the next step is to merge them and to create a new variable “Emission in MegaGrams”. Emission can only be visualized into two distinct scales GigaGrams and Megagrams, so we initially had emission in gigagram we just the emission in megagrams using the emission in gigagram variable. After creating the new variable, now we had to check for missing and special values in the numerical variables of our new, combined dataset. In the next step we had to check for outliers, for first we will see the distribution of our numeric variables by using histogram and then by boxplots for potential outliers. Histogram showed that the variables had right skewness. In the last step the right skewness was reduced by applying logarithmic transformation(base 10) using log10 function which reduced the right skewness and gave us a nearly normal distribution.

Data

Since Agriculture is one of the major contributors to global emissions of the Greenhouse gases, both our datasets are about the contribution to the total amount of GHG emissions from agriculture. Both the datasets were retrieved from the open source platform “FAOSTAT” Food and Agriculture Organization of the United Nations. Both the datasets were in CSV format. The data illustrates the contribution of total amount of Greenhouse Gas(GHG) emissions i.e. (non-CO2 gases, methane (CH4) and nitrous oxide (N2O)), being generated in different agricultural emissions sub-domains i.e.(enteric fermentation, manure management, rice cultivation, synthetic fertilizers, manure applied to soils, manure left on pastures, crop residues, cultivation of organic soils, burning of crop residues, burning of savanna and energy use), computed and estimated in Gg(109 g) by FAO from 1961-2017 and with the forecast for the years 2030 and 2050 by following the IPCC Guidelines for National GHG Inventories The website FAOSTAT provides data by country, regions which is helpful for countries to assess and report their emissions.

The first dataset is about the contribution to the total amount of GHG emissions from agriculture of the Asian countries from 1961-2017 with projections of years 2030 and 2050.

The second dataset is about the contribution to the total amount of GHG emissions from agriculture of the European countries from 1961-2017 with projections of years 2030 and 2050.

Both the datasets were imported into RStudio using readr package. To merge the datasets, data preprocessing is required, after making the data tidy so that we will merge the datasets. After importing the datasets, i saved it in new variables Emission_Asia and Emission_Europe, respectively. Head function will return/display several rows of the dataframe, so i assigned the value of “n” to 5, it will display/return us the first five rows of the data frame but desired number of rows can be obtained the by assigning a different value.

Emissions_Asia <- read_csv("C:/Users/mafza/OneDrive/Desktop/data/Emissions_Asia.csv", 
    col_types = cols(`Area Code` = col_integer(), 
        `Item Code` = col_integer(), `Element Code` = col_integer()))
head(Emissions_Asia,n=5)
Emissions_Europe <- read_csv("C:/Users/mafza/OneDrive/Desktop/data/Emissions_Europe.csv", 
    col_types = cols(`Area Code` = col_integer(), 
        `Item Code` = col_integer(), `Element Code` = col_integer()))
head(Emissions_Europe,n=5)

Understand

Once the datasets were imported it is important to understand/inspect the structure of both the datasets to make sure all the variable has appropriate datatypes assigned to them before performing any analysis. Dim functions were used to check the dimensions of both the datasets. The first dataset has 1995 observations and 66 variables. Similarly, the second dataset has 1932 observations and 66 variables. Str Function was used to check the structure of both the datasets.

Both the datasets have same variables names (“Area code”, “Area”, “Item”, "“Item Code”, “Element Code”,“Element”, “Unit” and “Y1961”- “Y2050”) but have different obersvations. Emission_Asia dataset contains the data of total contribution of emission from agriculture of Asian Countries, while Emission_Europe dataset contains the data of otal contribution of emission from agriculture of European Countries.

After checking the structure and datatypes of the variables using str Function. The datatypes of numerical variables(“Area code”, “Item code”, “Element Code”) was changed from double to integer while reading the datasets .The variables(Item, Element) were assigned character datatype, so we need to change the datatypes of these variables to factor because the variables contained categorical data. The “Item” variable was converted to factor variable, While “Element” variable was converted to factor variable with five levels, “CH4 emission”, “Emission(C02 eq)”,”Emission(C02 eq from CH4)”,”Emission(C02 eq from N2O)”,”N2O emission” with levels beings labelled with 1,2,3,4,5 respectively.

I noticed that some variables have not named appropriately. So, before changing the datatypes of the variables, the first thing is to change the names of the variables in both datasets. Year variables in both datasets were named as “Y1961” to “Y2050” which was really confusing for the audience to understand, since it was YEAR number so i changed it simply to “1961” etc, so that later on i could convert its datatype in the next step.

dim(Emissions_Asia)
## [1] 1995   66
str(Emissions_Asia)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1995 obs. of  66 variables:
##  $ Area Code   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Area        : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Item Code   : int  5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
##  $ Item        : chr  "Enteric Fermentation" "Enteric Fermentation" "Enteric Fermentation" "Manure Management" ...
##  $ Element Code: int  7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
##  $ Element     : chr  "Emissions (CH4)" "Emissions (CO2eq)" "Emissions (CO2eq) from CH4" "Emissions (CH4)" ...
##  $ Unit        : chr  "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
##  $ Y1961       : num  240.7 5054.3 5054.3 11.6 367.8 ...
##  $ Y1962       : num  245 5152 5152 12 376 ...
##  $ Y1963       : num  255.8 5372.4 5372.4 12.6 392.6 ...
##  $ Y1964       : num  259.1 5440.4 5440.4 12.8 399.9 ...
##  $ Y1965       : num  265.6 5577.6 5577.6 13.3 413.4 ...
##  $ Y1966       : num  277 5817 5817 14 433 ...
##  $ Y1967       : num  280.1 5882 5882 14.3 440.1 ...
##  $ Y1968       : num  288.8 6065.2 6065.2 14.7 453.2 ...
##  $ Y1969       : num  286.4 6014 6014 14.5 449.9 ...
##  $ Y1970       : num  290.3 6095.5 6095.5 14.7 455 ...
##  $ Y1971       : num  287.8 6043.6 6043.6 14.6 450.9 ...
##  $ Y1972       : num  232 4862 4862 13 372 ...
##  $ Y1973       : num  245 5144.6 5144.6 13.1 388.3 ...
##  $ Y1974       : num  262.8 5519.6 5519.6 13.8 415.2 ...
##  $ Y1975       : num  282.1 5923.6 5923.6 14.5 444 ...
##  $ Y1976       : num  288.2 6052.7 6052.7 14.8 453.7 ...
##  $ Y1977       : num  280.9 5898.4 5898.4 14.4 441.7 ...
##  $ Y1978       : num  280.3 5886 5886 14.7 443.3 ...
##  $ Y1979       : num  274.2 5758.9 5758.9 14.6 436.1 ...
##  $ Y1980       : num  275.4 5782.8 5782.8 14.6 437.9 ...
##  $ Y1981       : num  278.2 5842.4 5842.4 14.8 442.6 ...
##  $ Y1982       : num  277.9 5836.7 5836.7 14.7 442.1 ...
##  $ Y1983       : num  262.6 5515.2 5515.2 14.2 419.9 ...
##  $ Y1984       : num  230.2 4833.1 4833.1 12.4 366.5 ...
##  $ Y1985       : num  202.7 4257.3 4257.3 10.7 319.8 ...
##  $ Y1986       : num  160.26 3365.54 3365.54 8.28 249.68 ...
##  $ Y1987       : num  172.94 3631.74 3631.74 8.67 266.26 ...
##  $ Y1988       : num  181.44 3810.16 3810.16 8.75 278.33 ...
##  $ Y1989       : num  179.56 3770.8 3770.8 8.62 275.35 ...
##  $ Y1990       : num  178.47 3747.83 3747.83 8.52 273.27 ...
##  $ Y1991       : num  187.55 3938.55 3938.55 9.34 290.59 ...
##  $ Y1992       : num  189.76 3984.96 3984.96 9.67 294.78 ...
##  $ Y1993       : num  190.83 4007.43 4007.43 9.83 296.51 ...
##  $ Y1994       : num  197.9 4156.3 4156.3 10.4 308.4 ...
##  $ Y1995       : num  211.2 4434.3 4434.3 11.4 331 ...
##  $ Y1996       : num  239.7 5034.1 5034.1 13.5 385.2 ...
##  $ Y1997       : num  264.6 5556.8 5556.8 14.9 423.2 ...
##  $ Y1998       : num  283.5 5952.5 5952.5 15.7 448.8 ...
##  $ Y1999       : num  318.3 6685.1 6685.1 17.9 504.8 ...
##  $ Y2000       : num  272.1 5714.9 5714.9 15.1 428.3 ...
##  $ Y2001       : num  225.4 4733.5 4733.5 12.2 355.2 ...
##  $ Y2002       : num  287.9 6045.8 6045.8 18.4 477.2 ...
##  $ Y2003       : num  293.6 6166.1 6166.1 18.7 486.1 ...
##  $ Y2004       : num  285.6 5997.6 5997.6 17.6 467.3 ...
##  $ Y2005       : num  295.4 6203.4 6203.4 18.5 490.1 ...
##  $ Y2006       : num  300.8 6316.9 6316.9 19.5 503.1 ...
##  $ Y2007       : num  304.2 6388.7 6388.7 20.5 516.4 ...
##  $ Y2008       : num  339.6 7130.7 7130.7 22.4 574 ...
##  $ Y2009       : num  345.7 7258.8 7258.8 22.6 583.5 ...
##  $ Y2010       : num  401.1 8422.4 8422.4 26.6 681.3 ...
##  $ Y2011       : num  402.5 8452.8 8452.8 26.2 678.8 ...
##  $ Y2012       : num  396.9 8335.3 8335.3 26.1 672.3 ...
##  $ Y2013       : num  393.1 8255 8255 26.1 667.7 ...
##  $ Y2014       : num  398.3 8364 8364 26.4 675.6 ...
##  $ Y2015       : num  384.1 8066.9 8066.9 24.9 645.2 ...
##  $ Y2016       : num  381.7 8015.3 8015.3 24.8 642 ...
##  $ Y2017       : num  371.9 7810.4 7810.4 23.8 623.4 ...
##  $ Y2030       : num  453.7 9528.7 9528.7 27.2 750.3 ...
##  $ Y2050       : num  603.6 12676 12676 35.3 1003.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Area Code` = col_integer(),
##   ..   Area = col_character(),
##   ..   `Item Code` = col_integer(),
##   ..   Item = col_character(),
##   ..   `Element Code` = col_integer(),
##   ..   Element = col_character(),
##   ..   Unit = col_character(),
##   ..   Y1961 = col_double(),
##   ..   Y1962 = col_double(),
##   ..   Y1963 = col_double(),
##   ..   Y1964 = col_double(),
##   ..   Y1965 = col_double(),
##   ..   Y1966 = col_double(),
##   ..   Y1967 = col_double(),
##   ..   Y1968 = col_double(),
##   ..   Y1969 = col_double(),
##   ..   Y1970 = col_double(),
##   ..   Y1971 = col_double(),
##   ..   Y1972 = col_double(),
##   ..   Y1973 = col_double(),
##   ..   Y1974 = col_double(),
##   ..   Y1975 = col_double(),
##   ..   Y1976 = col_double(),
##   ..   Y1977 = col_double(),
##   ..   Y1978 = col_double(),
##   ..   Y1979 = col_double(),
##   ..   Y1980 = col_double(),
##   ..   Y1981 = col_double(),
##   ..   Y1982 = col_double(),
##   ..   Y1983 = col_double(),
##   ..   Y1984 = col_double(),
##   ..   Y1985 = col_double(),
##   ..   Y1986 = col_double(),
##   ..   Y1987 = col_double(),
##   ..   Y1988 = col_double(),
##   ..   Y1989 = col_double(),
##   ..   Y1990 = col_double(),
##   ..   Y1991 = col_double(),
##   ..   Y1992 = col_double(),
##   ..   Y1993 = col_double(),
##   ..   Y1994 = col_double(),
##   ..   Y1995 = col_double(),
##   ..   Y1996 = col_double(),
##   ..   Y1997 = col_double(),
##   ..   Y1998 = col_double(),
##   ..   Y1999 = col_double(),
##   ..   Y2000 = col_double(),
##   ..   Y2001 = col_double(),
##   ..   Y2002 = col_double(),
##   ..   Y2003 = col_double(),
##   ..   Y2004 = col_double(),
##   ..   Y2005 = col_double(),
##   ..   Y2006 = col_double(),
##   ..   Y2007 = col_double(),
##   ..   Y2008 = col_double(),
##   ..   Y2009 = col_double(),
##   ..   Y2010 = col_double(),
##   ..   Y2011 = col_double(),
##   ..   Y2012 = col_double(),
##   ..   Y2013 = col_double(),
##   ..   Y2014 = col_double(),
##   ..   Y2015 = col_double(),
##   ..   Y2016 = col_double(),
##   ..   Y2017 = col_double(),
##   ..   Y2030 = col_double(),
##   ..   Y2050 = col_double()
##   .. )
names(Emissions_Asia)
##  [1] "Area Code"    "Area"         "Item Code"    "Item"         "Element Code"
##  [6] "Element"      "Unit"         "Y1961"        "Y1962"        "Y1963"       
## [11] "Y1964"        "Y1965"        "Y1966"        "Y1967"        "Y1968"       
## [16] "Y1969"        "Y1970"        "Y1971"        "Y1972"        "Y1973"       
## [21] "Y1974"        "Y1975"        "Y1976"        "Y1977"        "Y1978"       
## [26] "Y1979"        "Y1980"        "Y1981"        "Y1982"        "Y1983"       
## [31] "Y1984"        "Y1985"        "Y1986"        "Y1987"        "Y1988"       
## [36] "Y1989"        "Y1990"        "Y1991"        "Y1992"        "Y1993"       
## [41] "Y1994"        "Y1995"        "Y1996"        "Y1997"        "Y1998"       
## [46] "Y1999"        "Y2000"        "Y2001"        "Y2002"        "Y2003"       
## [51] "Y2004"        "Y2005"        "Y2006"        "Y2007"        "Y2008"       
## [56] "Y2009"        "Y2010"        "Y2011"        "Y2012"        "Y2013"       
## [61] "Y2014"        "Y2015"        "Y2016"        "Y2017"        "Y2030"       
## [66] "Y2050"
class(Emissions_Asia$Element)
## [1] "character"
typeof(Emissions_Asia$Element)
## [1] "character"
class(Emissions_Asia$Item)
## [1] "character"
typeof(Emissions_Asia$Item)
## [1] "character"
Emissions_Asia$Item <- factor(Emissions_Asia$Item,
                          levels = c("Agricultural Soils","Agriculture total","Burning - Crop residues","Burning - Savanna","Crop Residues","Cultivation of Organic Soils","Enteric Fermentation","Manure applied to Soils","Manure left on Pasture","Manure Management","Rice Cultivation","Synthetic Fertilizers"))

is.factor(Emissions_Asia$Item)
## [1] TRUE
Emissions_Asia$Element <- factor(Emissions_Asia$Element,
                          levels = c("Emissions (CH4)","Emissions (CO2eq)","Emissions (CO2eq) from CH4","Emissions (CO2eq) from N2O","Emissions (N2O)"),
                          labels = c(1,2,3,4,5))

is.factor(Emissions_Asia$Element)
## [1] TRUE
str(Emissions_Asia)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1995 obs. of  66 variables:
##  $ Area Code   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Area        : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Item Code   : int  5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
##  $ Item        : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
##  $ Element Code: int  7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
##  $ Element     : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
##  $ Unit        : chr  "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
##  $ Y1961       : num  240.7 5054.3 5054.3 11.6 367.8 ...
##  $ Y1962       : num  245 5152 5152 12 376 ...
##  $ Y1963       : num  255.8 5372.4 5372.4 12.6 392.6 ...
##  $ Y1964       : num  259.1 5440.4 5440.4 12.8 399.9 ...
##  $ Y1965       : num  265.6 5577.6 5577.6 13.3 413.4 ...
##  $ Y1966       : num  277 5817 5817 14 433 ...
##  $ Y1967       : num  280.1 5882 5882 14.3 440.1 ...
##  $ Y1968       : num  288.8 6065.2 6065.2 14.7 453.2 ...
##  $ Y1969       : num  286.4 6014 6014 14.5 449.9 ...
##  $ Y1970       : num  290.3 6095.5 6095.5 14.7 455 ...
##  $ Y1971       : num  287.8 6043.6 6043.6 14.6 450.9 ...
##  $ Y1972       : num  232 4862 4862 13 372 ...
##  $ Y1973       : num  245 5144.6 5144.6 13.1 388.3 ...
##  $ Y1974       : num  262.8 5519.6 5519.6 13.8 415.2 ...
##  $ Y1975       : num  282.1 5923.6 5923.6 14.5 444 ...
##  $ Y1976       : num  288.2 6052.7 6052.7 14.8 453.7 ...
##  $ Y1977       : num  280.9 5898.4 5898.4 14.4 441.7 ...
##  $ Y1978       : num  280.3 5886 5886 14.7 443.3 ...
##  $ Y1979       : num  274.2 5758.9 5758.9 14.6 436.1 ...
##  $ Y1980       : num  275.4 5782.8 5782.8 14.6 437.9 ...
##  $ Y1981       : num  278.2 5842.4 5842.4 14.8 442.6 ...
##  $ Y1982       : num  277.9 5836.7 5836.7 14.7 442.1 ...
##  $ Y1983       : num  262.6 5515.2 5515.2 14.2 419.9 ...
##  $ Y1984       : num  230.2 4833.1 4833.1 12.4 366.5 ...
##  $ Y1985       : num  202.7 4257.3 4257.3 10.7 319.8 ...
##  $ Y1986       : num  160.26 3365.54 3365.54 8.28 249.68 ...
##  $ Y1987       : num  172.94 3631.74 3631.74 8.67 266.26 ...
##  $ Y1988       : num  181.44 3810.16 3810.16 8.75 278.33 ...
##  $ Y1989       : num  179.56 3770.8 3770.8 8.62 275.35 ...
##  $ Y1990       : num  178.47 3747.83 3747.83 8.52 273.27 ...
##  $ Y1991       : num  187.55 3938.55 3938.55 9.34 290.59 ...
##  $ Y1992       : num  189.76 3984.96 3984.96 9.67 294.78 ...
##  $ Y1993       : num  190.83 4007.43 4007.43 9.83 296.51 ...
##  $ Y1994       : num  197.9 4156.3 4156.3 10.4 308.4 ...
##  $ Y1995       : num  211.2 4434.3 4434.3 11.4 331 ...
##  $ Y1996       : num  239.7 5034.1 5034.1 13.5 385.2 ...
##  $ Y1997       : num  264.6 5556.8 5556.8 14.9 423.2 ...
##  $ Y1998       : num  283.5 5952.5 5952.5 15.7 448.8 ...
##  $ Y1999       : num  318.3 6685.1 6685.1 17.9 504.8 ...
##  $ Y2000       : num  272.1 5714.9 5714.9 15.1 428.3 ...
##  $ Y2001       : num  225.4 4733.5 4733.5 12.2 355.2 ...
##  $ Y2002       : num  287.9 6045.8 6045.8 18.4 477.2 ...
##  $ Y2003       : num  293.6 6166.1 6166.1 18.7 486.1 ...
##  $ Y2004       : num  285.6 5997.6 5997.6 17.6 467.3 ...
##  $ Y2005       : num  295.4 6203.4 6203.4 18.5 490.1 ...
##  $ Y2006       : num  300.8 6316.9 6316.9 19.5 503.1 ...
##  $ Y2007       : num  304.2 6388.7 6388.7 20.5 516.4 ...
##  $ Y2008       : num  339.6 7130.7 7130.7 22.4 574 ...
##  $ Y2009       : num  345.7 7258.8 7258.8 22.6 583.5 ...
##  $ Y2010       : num  401.1 8422.4 8422.4 26.6 681.3 ...
##  $ Y2011       : num  402.5 8452.8 8452.8 26.2 678.8 ...
##  $ Y2012       : num  396.9 8335.3 8335.3 26.1 672.3 ...
##  $ Y2013       : num  393.1 8255 8255 26.1 667.7 ...
##  $ Y2014       : num  398.3 8364 8364 26.4 675.6 ...
##  $ Y2015       : num  384.1 8066.9 8066.9 24.9 645.2 ...
##  $ Y2016       : num  381.7 8015.3 8015.3 24.8 642 ...
##  $ Y2017       : num  371.9 7810.4 7810.4 23.8 623.4 ...
##  $ Y2030       : num  453.7 9528.7 9528.7 27.2 750.3 ...
##  $ Y2050       : num  603.6 12676 12676 35.3 1003.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Area Code` = col_integer(),
##   ..   Area = col_character(),
##   ..   `Item Code` = col_integer(),
##   ..   Item = col_character(),
##   ..   `Element Code` = col_integer(),
##   ..   Element = col_character(),
##   ..   Unit = col_character(),
##   ..   Y1961 = col_double(),
##   ..   Y1962 = col_double(),
##   ..   Y1963 = col_double(),
##   ..   Y1964 = col_double(),
##   ..   Y1965 = col_double(),
##   ..   Y1966 = col_double(),
##   ..   Y1967 = col_double(),
##   ..   Y1968 = col_double(),
##   ..   Y1969 = col_double(),
##   ..   Y1970 = col_double(),
##   ..   Y1971 = col_double(),
##   ..   Y1972 = col_double(),
##   ..   Y1973 = col_double(),
##   ..   Y1974 = col_double(),
##   ..   Y1975 = col_double(),
##   ..   Y1976 = col_double(),
##   ..   Y1977 = col_double(),
##   ..   Y1978 = col_double(),
##   ..   Y1979 = col_double(),
##   ..   Y1980 = col_double(),
##   ..   Y1981 = col_double(),
##   ..   Y1982 = col_double(),
##   ..   Y1983 = col_double(),
##   ..   Y1984 = col_double(),
##   ..   Y1985 = col_double(),
##   ..   Y1986 = col_double(),
##   ..   Y1987 = col_double(),
##   ..   Y1988 = col_double(),
##   ..   Y1989 = col_double(),
##   ..   Y1990 = col_double(),
##   ..   Y1991 = col_double(),
##   ..   Y1992 = col_double(),
##   ..   Y1993 = col_double(),
##   ..   Y1994 = col_double(),
##   ..   Y1995 = col_double(),
##   ..   Y1996 = col_double(),
##   ..   Y1997 = col_double(),
##   ..   Y1998 = col_double(),
##   ..   Y1999 = col_double(),
##   ..   Y2000 = col_double(),
##   ..   Y2001 = col_double(),
##   ..   Y2002 = col_double(),
##   ..   Y2003 = col_double(),
##   ..   Y2004 = col_double(),
##   ..   Y2005 = col_double(),
##   ..   Y2006 = col_double(),
##   ..   Y2007 = col_double(),
##   ..   Y2008 = col_double(),
##   ..   Y2009 = col_double(),
##   ..   Y2010 = col_double(),
##   ..   Y2011 = col_double(),
##   ..   Y2012 = col_double(),
##   ..   Y2013 = col_double(),
##   ..   Y2014 = col_double(),
##   ..   Y2015 = col_double(),
##   ..   Y2016 = col_double(),
##   ..   Y2017 = col_double(),
##   ..   Y2030 = col_double(),
##   ..   Y2050 = col_double()
##   .. )
head(Emissions_Asia)
dim(Emissions_Europe)
## [1] 1932   66
str(Emissions_Europe)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1932 obs. of  66 variables:
##  $ Area Code   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Area        : chr  "Albania" "Albania" "Albania" "Albania" ...
##  $ Item Code   : int  5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
##  $ Item        : chr  "Enteric Fermentation" "Enteric Fermentation" "Enteric Fermentation" "Manure Management" ...
##  $ Element Code: int  7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
##  $ Element     : chr  "Emissions (CH4)" "Emissions (CO2eq)" "Emissions (CO2eq) from CH4" "Emissions (CH4)" ...
##  $ Unit        : chr  "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
##  $ Y1961       : num  51.04 1071.77 1071.77 6.57 232.48 ...
##  $ Y1962       : num  51.49 1081.3 1081.3 6.61 233.71 ...
##  $ Y1963       : num  51.2 1075.29 1075.29 6.53 231.54 ...
##  $ Y1964       : num  51.46 1080.61 1080.61 6.68 235.83 ...
##  $ Y1965       : num  54.97 1154.32 1154.32 7.36 257.6 ...
##  $ Y1966       : num  54.73 1149.4 1149.4 7.39 258.7 ...
##  $ Y1967       : num  55.25 1160.15 1160.15 7.56 264.03 ...
##  $ Y1968       : num  51.16 1074.29 1074.29 7.02 246.35 ...
##  $ Y1969       : num  50.18 1053.86 1053.86 7.12 248.87 ...
##  $ Y1970       : num  49.7 1043.67 1043.67 7.09 247.85 ...
##  $ Y1971       : num  48.03 1008.55 1008.55 6.94 242.06 ...
##  $ Y1972       : num  46.76 981.99 981.99 6.86 238.54 ...
##  $ Y1973       : num  47.78 1003.4 1003.4 7.06 245.43 ...
##  $ Y1974       : num  49.7 1043.67 1043.67 7.43 258.44 ...
##  $ Y1975       : num  50.34 1057.08 1057.08 7.59 263.62 ...
##  $ Y1976       : num  53.46 1122.58 1122.58 8.07 281.38 ...
##  $ Y1977       : num  55.74 1170.5 1170.5 8.62 299.44 ...
##  $ Y1978       : num  59.54 1250.25 1250.25 9.32 323.77 ...
##  $ Y1979       : num  62.1 1304.9 1304.9 9.8 340.2 ...
##  $ Y1980       : num  64.1 1345.3 1345.3 10.1 349.3 ...
##  $ Y1981       : num  65.4 1373.9 1373.9 10.4 360.5 ...
##  $ Y1982       : num  65.9 1383.2 1383.2 10.6 365.9 ...
##  $ Y1983       : num  67.1 1408.8 1408.8 10.7 369.6 ...
##  $ Y1984       : num  67.1 1409.5 1409.5 10.8 370.5 ...
##  $ Y1985       : num  64.5 1354.4 1354.4 10.4 355.6 ...
##  $ Y1986       : num  67.3 1413.5 1413.5 10.9 372.5 ...
##  $ Y1987       : num  72.4 1521.4 1521.4 11.6 400.6 ...
##  $ Y1988       : num  76.4 1605.1 1605.1 12.1 418.2 ...
##  $ Y1989       : num  78.1 1639.7 1639.7 12.3 424.6 ...
##  $ Y1990       : num  74.7 1568.3 1568.3 12.1 413.3 ...
##  $ Y1991       : num  76.8 1612.2 1612.2 11.8 409.6 ...
##  $ Y1992       : num  76.8 1613.6 1613.6 11.4 396.5 ...
##  $ Y1993       : num  81.9 1720.1 1720.1 12.1 421.2 ...
##  $ Y1994       : num  104.7 2199.1 2199.1 15.5 543.3 ...
##  $ Y1995       : num  107 2247 2247 16 560 ...
##  $ Y1996       : num  100 2104 2104 16 555 ...
##  $ Y1997       : num  93.2 1958.1 1958.1 14.7 513 ...
##  $ Y1998       : num  88.3 1854.8 1854.8 14 488.7 ...
##  $ Y1999       : num  90.8 1906.2 1906.2 14.3 499.3 ...
##  $ Y2000       : num  92.1 1934.4 1934.4 14.9 515.4 ...
##  $ Y2001       : num  89.9 1887.1 1887.1 14.7 507.5 ...
##  $ Y2002       : num  87.3 1834 1834 14.5 500.5 ...
##  $ Y2003       : num  88.3 1853.8 1853.8 14.8 508.7 ...
##  $ Y2004       : num  84.7 1778.6 1778.6 14.5 498.8 ...
##  $ Y2005       : num  84.1 1765.3 1765.3 14.4 495.8 ...
##  $ Y2006       : num  82.6 1735.5 1735.5 14.1 484.6 ...
##  $ Y2007       : num  77.7 1631 1631 13.3 457.8 ...
##  $ Y2008       : num  72.7 1525.8 1525.8 12.6 432.7 ...
##  $ Y2009       : num  68.9 1447.2 1447.2 12.1 416.3 ...
##  $ Y2010       : num  69.2 1454 1454 12.2 420.3 ...
##  $ Y2011       : num  68.5 1439.2 1439.2 12.2 420.1 ...
##  $ Y2012       : num  69.9 1466.9 1466.9 12.4 426.6 ...
##  $ Y2013       : num  70.3 1476.8 1476.8 12.3 423.2 ...
##  $ Y2014       : num  71.1 1492.6 1492.6 12.5 430 ...
##  $ Y2015       : num  71.6 1503 1503 12.5 428.2 ...
##  $ Y2016       : num  71.2 1496 1496 12.4 423.7 ...
##  $ Y2017       : num  69.5 1459.7 1459.7 12.2 415.5 ...
##  $ Y2030       : num  86.7 1821.2 1821.2 14.3 495.2 ...
##  $ Y2050       : num  85.9 1804.9 1804.9 14.3 500.6 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Area Code` = col_integer(),
##   ..   Area = col_character(),
##   ..   `Item Code` = col_integer(),
##   ..   Item = col_character(),
##   ..   `Element Code` = col_integer(),
##   ..   Element = col_character(),
##   ..   Unit = col_character(),
##   ..   Y1961 = col_double(),
##   ..   Y1962 = col_double(),
##   ..   Y1963 = col_double(),
##   ..   Y1964 = col_double(),
##   ..   Y1965 = col_double(),
##   ..   Y1966 = col_double(),
##   ..   Y1967 = col_double(),
##   ..   Y1968 = col_double(),
##   ..   Y1969 = col_double(),
##   ..   Y1970 = col_double(),
##   ..   Y1971 = col_double(),
##   ..   Y1972 = col_double(),
##   ..   Y1973 = col_double(),
##   ..   Y1974 = col_double(),
##   ..   Y1975 = col_double(),
##   ..   Y1976 = col_double(),
##   ..   Y1977 = col_double(),
##   ..   Y1978 = col_double(),
##   ..   Y1979 = col_double(),
##   ..   Y1980 = col_double(),
##   ..   Y1981 = col_double(),
##   ..   Y1982 = col_double(),
##   ..   Y1983 = col_double(),
##   ..   Y1984 = col_double(),
##   ..   Y1985 = col_double(),
##   ..   Y1986 = col_double(),
##   ..   Y1987 = col_double(),
##   ..   Y1988 = col_double(),
##   ..   Y1989 = col_double(),
##   ..   Y1990 = col_double(),
##   ..   Y1991 = col_double(),
##   ..   Y1992 = col_double(),
##   ..   Y1993 = col_double(),
##   ..   Y1994 = col_double(),
##   ..   Y1995 = col_double(),
##   ..   Y1996 = col_double(),
##   ..   Y1997 = col_double(),
##   ..   Y1998 = col_double(),
##   ..   Y1999 = col_double(),
##   ..   Y2000 = col_double(),
##   ..   Y2001 = col_double(),
##   ..   Y2002 = col_double(),
##   ..   Y2003 = col_double(),
##   ..   Y2004 = col_double(),
##   ..   Y2005 = col_double(),
##   ..   Y2006 = col_double(),
##   ..   Y2007 = col_double(),
##   ..   Y2008 = col_double(),
##   ..   Y2009 = col_double(),
##   ..   Y2010 = col_double(),
##   ..   Y2011 = col_double(),
##   ..   Y2012 = col_double(),
##   ..   Y2013 = col_double(),
##   ..   Y2014 = col_double(),
##   ..   Y2015 = col_double(),
##   ..   Y2016 = col_double(),
##   ..   Y2017 = col_double(),
##   ..   Y2030 = col_double(),
##   ..   Y2050 = col_double()
##   .. )
class(Emissions_Europe$Element)
## [1] "character"
typeof(Emissions_Europe$Element)
## [1] "character"
class(Emissions_Asia$Item)
## [1] "factor"
typeof(Emissions_Asia$Item)
## [1] "integer"
Emissions_Europe$Item <- factor(Emissions_Europe$Item,
                          levels = c("Agricultural Soils","Agriculture total","Burning - Crop residues","Burning - Savanna","Crop Residues","Cultivation of Organic Soils","Enteric Fermentation","Manure applied to Soils","Manure left on Pasture","Manure Management","Rice Cultivation","Synthetic Fertilizers"))

is.factor(Emissions_Europe$Item)
## [1] TRUE
Emissions_Europe$Element <- factor(Emissions_Europe$Element,
                          levels = c("Emissions (CH4)","Emissions (CO2eq)","Emissions (CO2eq) from CH4","Emissions (CO2eq) from N2O","Emissions (N2O)"),
                          labels = c(1,2,3,4,5))

is.factor(Emissions_Europe$Element)
## [1] TRUE
str(Emissions_Europe)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1932 obs. of  66 variables:
##  $ Area Code   : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Area        : chr  "Albania" "Albania" "Albania" "Albania" ...
##  $ Item Code   : int  5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
##  $ Item        : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
##  $ Element Code: int  7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
##  $ Element     : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
##  $ Unit        : chr  "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
##  $ Y1961       : num  51.04 1071.77 1071.77 6.57 232.48 ...
##  $ Y1962       : num  51.49 1081.3 1081.3 6.61 233.71 ...
##  $ Y1963       : num  51.2 1075.29 1075.29 6.53 231.54 ...
##  $ Y1964       : num  51.46 1080.61 1080.61 6.68 235.83 ...
##  $ Y1965       : num  54.97 1154.32 1154.32 7.36 257.6 ...
##  $ Y1966       : num  54.73 1149.4 1149.4 7.39 258.7 ...
##  $ Y1967       : num  55.25 1160.15 1160.15 7.56 264.03 ...
##  $ Y1968       : num  51.16 1074.29 1074.29 7.02 246.35 ...
##  $ Y1969       : num  50.18 1053.86 1053.86 7.12 248.87 ...
##  $ Y1970       : num  49.7 1043.67 1043.67 7.09 247.85 ...
##  $ Y1971       : num  48.03 1008.55 1008.55 6.94 242.06 ...
##  $ Y1972       : num  46.76 981.99 981.99 6.86 238.54 ...
##  $ Y1973       : num  47.78 1003.4 1003.4 7.06 245.43 ...
##  $ Y1974       : num  49.7 1043.67 1043.67 7.43 258.44 ...
##  $ Y1975       : num  50.34 1057.08 1057.08 7.59 263.62 ...
##  $ Y1976       : num  53.46 1122.58 1122.58 8.07 281.38 ...
##  $ Y1977       : num  55.74 1170.5 1170.5 8.62 299.44 ...
##  $ Y1978       : num  59.54 1250.25 1250.25 9.32 323.77 ...
##  $ Y1979       : num  62.1 1304.9 1304.9 9.8 340.2 ...
##  $ Y1980       : num  64.1 1345.3 1345.3 10.1 349.3 ...
##  $ Y1981       : num  65.4 1373.9 1373.9 10.4 360.5 ...
##  $ Y1982       : num  65.9 1383.2 1383.2 10.6 365.9 ...
##  $ Y1983       : num  67.1 1408.8 1408.8 10.7 369.6 ...
##  $ Y1984       : num  67.1 1409.5 1409.5 10.8 370.5 ...
##  $ Y1985       : num  64.5 1354.4 1354.4 10.4 355.6 ...
##  $ Y1986       : num  67.3 1413.5 1413.5 10.9 372.5 ...
##  $ Y1987       : num  72.4 1521.4 1521.4 11.6 400.6 ...
##  $ Y1988       : num  76.4 1605.1 1605.1 12.1 418.2 ...
##  $ Y1989       : num  78.1 1639.7 1639.7 12.3 424.6 ...
##  $ Y1990       : num  74.7 1568.3 1568.3 12.1 413.3 ...
##  $ Y1991       : num  76.8 1612.2 1612.2 11.8 409.6 ...
##  $ Y1992       : num  76.8 1613.6 1613.6 11.4 396.5 ...
##  $ Y1993       : num  81.9 1720.1 1720.1 12.1 421.2 ...
##  $ Y1994       : num  104.7 2199.1 2199.1 15.5 543.3 ...
##  $ Y1995       : num  107 2247 2247 16 560 ...
##  $ Y1996       : num  100 2104 2104 16 555 ...
##  $ Y1997       : num  93.2 1958.1 1958.1 14.7 513 ...
##  $ Y1998       : num  88.3 1854.8 1854.8 14 488.7 ...
##  $ Y1999       : num  90.8 1906.2 1906.2 14.3 499.3 ...
##  $ Y2000       : num  92.1 1934.4 1934.4 14.9 515.4 ...
##  $ Y2001       : num  89.9 1887.1 1887.1 14.7 507.5 ...
##  $ Y2002       : num  87.3 1834 1834 14.5 500.5 ...
##  $ Y2003       : num  88.3 1853.8 1853.8 14.8 508.7 ...
##  $ Y2004       : num  84.7 1778.6 1778.6 14.5 498.8 ...
##  $ Y2005       : num  84.1 1765.3 1765.3 14.4 495.8 ...
##  $ Y2006       : num  82.6 1735.5 1735.5 14.1 484.6 ...
##  $ Y2007       : num  77.7 1631 1631 13.3 457.8 ...
##  $ Y2008       : num  72.7 1525.8 1525.8 12.6 432.7 ...
##  $ Y2009       : num  68.9 1447.2 1447.2 12.1 416.3 ...
##  $ Y2010       : num  69.2 1454 1454 12.2 420.3 ...
##  $ Y2011       : num  68.5 1439.2 1439.2 12.2 420.1 ...
##  $ Y2012       : num  69.9 1466.9 1466.9 12.4 426.6 ...
##  $ Y2013       : num  70.3 1476.8 1476.8 12.3 423.2 ...
##  $ Y2014       : num  71.1 1492.6 1492.6 12.5 430 ...
##  $ Y2015       : num  71.6 1503 1503 12.5 428.2 ...
##  $ Y2016       : num  71.2 1496 1496 12.4 423.7 ...
##  $ Y2017       : num  69.5 1459.7 1459.7 12.2 415.5 ...
##  $ Y2030       : num  86.7 1821.2 1821.2 14.3 495.2 ...
##  $ Y2050       : num  85.9 1804.9 1804.9 14.3 500.6 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Area Code` = col_integer(),
##   ..   Area = col_character(),
##   ..   `Item Code` = col_integer(),
##   ..   Item = col_character(),
##   ..   `Element Code` = col_integer(),
##   ..   Element = col_character(),
##   ..   Unit = col_character(),
##   ..   Y1961 = col_double(),
##   ..   Y1962 = col_double(),
##   ..   Y1963 = col_double(),
##   ..   Y1964 = col_double(),
##   ..   Y1965 = col_double(),
##   ..   Y1966 = col_double(),
##   ..   Y1967 = col_double(),
##   ..   Y1968 = col_double(),
##   ..   Y1969 = col_double(),
##   ..   Y1970 = col_double(),
##   ..   Y1971 = col_double(),
##   ..   Y1972 = col_double(),
##   ..   Y1973 = col_double(),
##   ..   Y1974 = col_double(),
##   ..   Y1975 = col_double(),
##   ..   Y1976 = col_double(),
##   ..   Y1977 = col_double(),
##   ..   Y1978 = col_double(),
##   ..   Y1979 = col_double(),
##   ..   Y1980 = col_double(),
##   ..   Y1981 = col_double(),
##   ..   Y1982 = col_double(),
##   ..   Y1983 = col_double(),
##   ..   Y1984 = col_double(),
##   ..   Y1985 = col_double(),
##   ..   Y1986 = col_double(),
##   ..   Y1987 = col_double(),
##   ..   Y1988 = col_double(),
##   ..   Y1989 = col_double(),
##   ..   Y1990 = col_double(),
##   ..   Y1991 = col_double(),
##   ..   Y1992 = col_double(),
##   ..   Y1993 = col_double(),
##   ..   Y1994 = col_double(),
##   ..   Y1995 = col_double(),
##   ..   Y1996 = col_double(),
##   ..   Y1997 = col_double(),
##   ..   Y1998 = col_double(),
##   ..   Y1999 = col_double(),
##   ..   Y2000 = col_double(),
##   ..   Y2001 = col_double(),
##   ..   Y2002 = col_double(),
##   ..   Y2003 = col_double(),
##   ..   Y2004 = col_double(),
##   ..   Y2005 = col_double(),
##   ..   Y2006 = col_double(),
##   ..   Y2007 = col_double(),
##   ..   Y2008 = col_double(),
##   ..   Y2009 = col_double(),
##   ..   Y2010 = col_double(),
##   ..   Y2011 = col_double(),
##   ..   Y2012 = col_double(),
##   ..   Y2013 = col_double(),
##   ..   Y2014 = col_double(),
##   ..   Y2015 = col_double(),
##   ..   Y2016 = col_double(),
##   ..   Y2017 = col_double(),
##   ..   Y2030 = col_double(),
##   ..   Y2050 = col_double()
##   .. )
head(Emissions_Europe)
names(Emissions_Asia)[8]<-"1961"
names(Emissions_Asia)[9]<-"1962"
names(Emissions_Asia)[10]<-"1963"
names(Emissions_Asia)[11]<-"1964"
names(Emissions_Asia)[12]<-"1965"
names(Emissions_Asia)[13]<-"1966"
names(Emissions_Asia)[14]<-"1967"
names(Emissions_Asia)[15]<-"1968"
names(Emissions_Asia)[16]<-"1969"
names(Emissions_Asia)[17]<-"1970"
names(Emissions_Asia)[18]<-"1971"
names(Emissions_Asia)[19]<-"1972"
names(Emissions_Asia)[20]<-"1973"
names(Emissions_Asia)[21]<-"1974"
names(Emissions_Asia)[22]<-"1975"
names(Emissions_Asia)[23]<-"1976"
names(Emissions_Asia)[24]<-"1977"
names(Emissions_Asia)[25]<-"1978"
names(Emissions_Asia)[26]<-"1979"
names(Emissions_Asia)[27]<-"1980"
names(Emissions_Asia)[28]<-"1981"
names(Emissions_Asia)[29]<-"1982"
names(Emissions_Asia)[30]<-"1983"
names(Emissions_Asia)[31]<-"1984"
names(Emissions_Asia)[32]<-"1985"
names(Emissions_Asia)[33]<-"1986"
names(Emissions_Asia)[34]<-"1987"
names(Emissions_Asia)[35]<-"1988"
names(Emissions_Asia)[36]<-"1989"
names(Emissions_Asia)[37]<-"1990"
names(Emissions_Asia)[38]<-"1991"
names(Emissions_Asia)[39]<-"1992"
names(Emissions_Asia)[40]<-"1993"
names(Emissions_Asia)[41]<-"1994"
names(Emissions_Asia)[42]<-"1995"
names(Emissions_Asia)[43]<-"1996"
names(Emissions_Asia)[44]<-"1997"
names(Emissions_Asia)[45]<-"1998"
names(Emissions_Asia)[46]<-"1999"
names(Emissions_Asia)[47]<-"2000"
names(Emissions_Asia)[48]<-"2001"
names(Emissions_Asia)[49]<-"2002"
names(Emissions_Asia)[50]<-"2003"
names(Emissions_Asia)[51]<-"2004"
names(Emissions_Asia)[52]<-"2005"
names(Emissions_Asia)[53]<-"2006"
names(Emissions_Asia)[54]<-"2007"
names(Emissions_Asia)[55]<-"2008"
names(Emissions_Asia)[56]<-"2009"
names(Emissions_Asia)[57]<-"2010"
names(Emissions_Asia)[58]<-"2011"
names(Emissions_Asia)[59]<-"2012"
names(Emissions_Asia)[60]<-"2013"
names(Emissions_Asia)[61]<-"2014"
names(Emissions_Asia)[62]<-"2015"
names(Emissions_Asia)[63]<-"2016"
names(Emissions_Asia)[64]<-"2017"
names(Emissions_Asia)[65]<-"2030"
names(Emissions_Asia)[66]<-"2050"
dim(Emissions_Asia)
## [1] 1995   66
names(Emissions_Asia)
##  [1] "Area Code"    "Area"         "Item Code"    "Item"         "Element Code"
##  [6] "Element"      "Unit"         "1961"         "1962"         "1963"        
## [11] "1964"         "1965"         "1966"         "1967"         "1968"        
## [16] "1969"         "1970"         "1971"         "1972"         "1973"        
## [21] "1974"         "1975"         "1976"         "1977"         "1978"        
## [26] "1979"         "1980"         "1981"         "1982"         "1983"        
## [31] "1984"         "1985"         "1986"         "1987"         "1988"        
## [36] "1989"         "1990"         "1991"         "1992"         "1993"        
## [41] "1994"         "1995"         "1996"         "1997"         "1998"        
## [46] "1999"         "2000"         "2001"         "2002"         "2003"        
## [51] "2004"         "2005"         "2006"         "2007"         "2008"        
## [56] "2009"         "2010"         "2011"         "2012"         "2013"        
## [61] "2014"         "2015"         "2016"         "2017"         "2030"        
## [66] "2050"
names(Emissions_Europe)[8]<-"1961"
names(Emissions_Europe)[9]<-"1962"
names(Emissions_Europe)[10]<-"1963"
names(Emissions_Europe)[11]<-"1964"
names(Emissions_Europe)[12]<-"1965"
names(Emissions_Europe)[13]<-"1966"
names(Emissions_Europe)[14]<-"1967"
names(Emissions_Europe)[15]<-"1968"
names(Emissions_Europe)[16]<-"1969"
names(Emissions_Europe)[17]<-"1970"
names(Emissions_Europe)[18]<-"1971"
names(Emissions_Europe)[19]<-"1972"
names(Emissions_Europe)[20]<-"1973"
names(Emissions_Europe)[21]<-"1974"
names(Emissions_Europe)[22]<-"1975"
names(Emissions_Europe)[23]<-"1976"
names(Emissions_Europe)[24]<-"1977"
names(Emissions_Europe)[25]<-"1978"
names(Emissions_Europe)[26]<-"1979"
names(Emissions_Europe)[27]<-"1980"
names(Emissions_Europe)[28]<-"1981"
names(Emissions_Europe)[29]<-"1982"
names(Emissions_Europe)[30]<-"1983"
names(Emissions_Europe)[31]<-"1984"
names(Emissions_Europe)[32]<-"1985"
names(Emissions_Europe)[33]<-"1986"
names(Emissions_Europe)[34]<-"1987"
names(Emissions_Europe)[35]<-"1988"
names(Emissions_Europe)[36]<-"1989"
names(Emissions_Europe)[37]<-"1990"
names(Emissions_Europe)[38]<-"1991"
names(Emissions_Europe)[39]<-"1992"
names(Emissions_Europe)[40]<-"1993"
names(Emissions_Europe)[41]<-"1994"
names(Emissions_Europe)[42]<-"1995"
names(Emissions_Europe)[43]<-"1996"
names(Emissions_Europe)[44]<-"1997"
names(Emissions_Europe)[45]<-"1998"
names(Emissions_Europe)[46]<-"1999"
names(Emissions_Europe)[47]<-"2000"
names(Emissions_Europe)[48]<-"2001"
names(Emissions_Europe)[49]<-"2002"
names(Emissions_Europe)[50]<-"2003"
names(Emissions_Europe)[51]<-"2004"
names(Emissions_Europe)[52]<-"2005"
names(Emissions_Europe)[53]<-"2006"
names(Emissions_Europe)[54]<-"2007"
names(Emissions_Europe)[55]<-"2008"
names(Emissions_Europe)[56]<-"2009"
names(Emissions_Europe)[57]<-"2010"
names(Emissions_Europe)[58]<-"2011"
names(Emissions_Europe)[59]<-"2012"
names(Emissions_Europe)[60]<-"2013"
names(Emissions_Europe)[61]<-"2014"
names(Emissions_Europe)[62]<-"2015"
names(Emissions_Europe)[63]<-"2016"
names(Emissions_Europe)[64]<-"2017"
names(Emissions_Europe)[65]<-"2030"
names(Emissions_Europe)[66]<-"2050"
dim(Emissions_Asia)
## [1] 1995   66
names(Emissions_Asia)
##  [1] "Area Code"    "Area"         "Item Code"    "Item"         "Element Code"
##  [6] "Element"      "Unit"         "1961"         "1962"         "1963"        
## [11] "1964"         "1965"         "1966"         "1967"         "1968"        
## [16] "1969"         "1970"         "1971"         "1972"         "1973"        
## [21] "1974"         "1975"         "1976"         "1977"         "1978"        
## [26] "1979"         "1980"         "1981"         "1982"         "1983"        
## [31] "1984"         "1985"         "1986"         "1987"         "1988"        
## [36] "1989"         "1990"         "1991"         "1992"         "1993"        
## [41] "1994"         "1995"         "1996"         "1997"         "1998"        
## [46] "1999"         "2000"         "2001"         "2002"         "2003"        
## [51] "2004"         "2005"         "2006"         "2007"         "2008"        
## [56] "2009"         "2010"         "2011"         "2012"         "2013"        
## [61] "2014"         "2015"         "2016"         "2017"         "2030"        
## [66] "2050"

Tidy & Manipulate Data I

In this step, we must make our datasets tidy because both our datasets do not conform the tidy data principles. One of the principles of tidy data is that each variable must has its own column. Column headers are values not variables names, that is the problem with both of our datasets where some of the column’s names are not variables but are values of a variable. In both of our datasets the column names from 1961 to 2050 represents values of the year variable and each row must represent these observations, not one. I used gather Function to gather all the year variables in one variable “Year” in both datasets. Now our variables in both the datasets are reduced to 8 variables instead of 66 variables.

After making our datasets tidy, we can now merge our datasets. I merged the datasets by binding the rows of both datasets and stored it in a new dataframe named “df”. Head Function was used to view new created dataframe.

So, Year number alone cannot alone makeup a valid date because month and days are not specified. And when i tried to change it to date datatype, so by default it was adding month and day with year, that will disorganzize or disrupt the structure of our dataset. So i simply change it to integer datatype from character.

asia<- Emissions_Asia %>%
  gather("1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2030","2050",key = "Year", value = "Emission_GigaGrams")
head(asia)
Europe<- Emissions_Europe %>%
 gather("1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2030","2050",key = "Year", value = "Emission_GigaGrams")
head(Europe)
df <- bind_rows(asia,Europe)
names(df)
## [1] "Area Code"          "Area"               "Item Code"         
## [4] "Item"               "Element Code"       "Element"           
## [7] "Unit"               "Year"               "Emission_GigaGrams"
df2<- df%>% dplyr::select("Area","Item","Element","Year","Emission_GigaGrams")
str(df2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    231693 obs. of  5 variables:
##  $ Area              : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Item              : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
##  $ Element           : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
##  $ Year              : chr  "1961" "1961" "1961" "1961" ...
##  $ Emission_GigaGrams: num  240.7 5054.3 5054.3 11.6 367.8 ...
head(df2)
df2$Year <- as.integer(df2$Year)
class(df2$Year)
## [1] "integer"
head(df2)

Tidy & Manipulate II

Emission can be visualized in two distinct scales Gigagrams and Megagrams.

Since in both our datasets emissions were calculated in Gigagrams, we will create a new variable named Emission(Megagrams) that will show us the emission in Megagrams. The new variable will be created by multiplying the Emission(Gigagrams) to 1000.

df3 <- mutate(df2, Emission_MegaGrams= Emission_GigaGrams*1000)
str(df3)
## Classes 'tbl_df', 'tbl' and 'data.frame':    231693 obs. of  6 variables:
##  $ Area              : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Item              : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
##  $ Element           : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
##  $ Year              : int  1961 1961 1961 1961 1961 1961 1961 1961 1961 1961 ...
##  $ Emission_GigaGrams: num  240.7 5054.3 5054.3 11.6 367.8 ...
##  $ Emission_MegaGrams: num  240683 5054346 5054346 11623 367831 ...
head(df3)

Scan I

In this step, we will check for any missing value (NA), special values (Inf,-Inf,NaN) for the numeric variables. For this we will use is.special function which will be applied to every numeric column of the dataframe for missing and special values and will return the sum of missing and special values using sapply function.

Is.specialorNA <- function(x) {sum(if(is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x)))} 
sapply (df3,Is.specialorNA)
##               Area               Item            Element               Year 
##                  0                  0                  0                  0 
## Emission_GigaGrams Emission_MegaGrams 
##              57128              57128

Scan II

In this step we will look for outliers in the numeric variables of our dataset. First, we will look for the distribution of our numeric variables using histogram. The histogram of both our numeric variables are positively skewed or skewed right, which means we do not have to use z-score method to look for outliers. We will use box plot for our numeric variables to check for outliers. Box plots showed outliers in both of our numeric variables. We used summary function to check the summarized statistics of our numeric variables which showed that these emission values are not outliers and we cannot exclude them because at the start, values of different types of emission in agriculture were 0 and has increased significantly over the years. The skewness value of our numeric variable was 15.00858.

hist(df3$Emission_GigaGrams)

df3$Emission_GigaGrams %>% boxplot(main= "BoxPlot OF Emission in GigaGrams")

skewness(df3$Emission_GigaGrams, na.rm = TRUE)
## [1] 15.00858
df3 %>% summarise(Min = min(df3$Emission_GigaGrams, na.rm = TRUE),
                             Q1 = quantile(df3$Emission_GigaGrams, probs = .25, na.rm = TRUE),
                             Median = median(df3$Emission_GigaGrams, na.rm = TRUE),
                             Q3 = quantile(df3$Emission_GigaGrams, probs = .75, na.rm = TRUE),
                             Max = max(df3$Emission_GigaGrams, na.rm = TRUE),
                             Mean = mean(df3$Emission_GigaGrams, na.rm = TRUE),
                             SD = sd(df3$Emission_GigaGrams, na.rm = TRUE),
                             IQR = IQR(df3$Emission_GigaGrams, na.rm = TRUE),
                             n = n(),
                             Missing = sum(is.na(df3$Emission_GigaGrams)))
hist(df3$Emission_MegaGrams)

df3$Emission_MegaGrams %>% boxplot(main= "BoxPlot OF Emission in MegaGrams")

skewness(df3$Emission_MegaGrams, na.rm = TRUE)
## [1] 15.00858
df3 %>% summarise(Min = min(df3$Emission_MegaGrams, na.rm = TRUE),
                             Q1 = quantile(df3$Emission_MegaGrams, probs = .25, na.rm = TRUE),
                             Median = median(df3$Emission_MegaGrams, na.rm = TRUE),
                             Q3 = quantile(df3$Emission_MegaGrams, probs = .75, na.rm = TRUE),
                             Max = max(df3$Emission_MegaGrams, na.rm = TRUE),
                             Mean = mean(df3$Emission_MegaGrams, na.rm = TRUE),
                             SD = sd(df3$Emission_MegaGrams, na.rm = TRUE),
                             IQR = IQR(df3$Emission_MegaGrams, na.rm = TRUE),
                             n = n(),
                             Missing = sum(is.na(df3$Emission_MegaGrams)))

Transform

In this step, we will apply transformation on the numeric variables. From the histogram and boxplot in the previous step we observed that numeric variables had right skewed distribution, so I applied logarithmic transformation(base 10) using log10 function which reduced the right skewness and gave us a nearly normal distribution.

ln_emission <- log(df3$Emission_GigaGrams)
par(mfrow=c(2,2))
hist(df3$Emission_GigaGrams, main= "Histogram Of Emission in GigaGrams")
hist(ln_emission, main="Emission(Gg) After Tranformation")

ln_emission <- log(df3$Emission_MegaGrams)
par(mfrow=c(2,2))
hist(df3$Emission_MegaGrams, main= "Histogram Of Emission in MegaGrams")
hist(ln_emission, main="Emission(Mg) After Tranformation")

References

The data was extracted from the open source http://www.fao.org/faostat/en/#data/GT