library(readr)
library(lubridate)
library(dplyr)
library(tidyr)
library(knitr)
library(forecast)
library(magrittr)
library(car)
library(stringr)
library(moments)
The goal of this assignment was to preprocess the data of contribution of GHG(Greenhouse Gas) emission of Asian and European countries from agriculture from 1961-2017 with projection of years 2030 and 20150.Both the datasets were retrieved from FAOSTAT Food and Agriculture Organization of the United Nations. After importing the datasets, the next step was to change the datatypes and variable names. Initially both our datasets were untidy and did not conform with the tidy principles. In this step year variables were gathered in one variable. This step was done for both the datasets. After tidying the datasets, the next step is to merge them and to create a new variable “Emission in MegaGrams”. Emission can only be visualized into two distinct scales GigaGrams and Megagrams, so we initially had emission in gigagram we just the emission in megagrams using the emission in gigagram variable. After creating the new variable, now we had to check for missing and special values in the numerical variables of our new, combined dataset. In the next step we had to check for outliers, for first we will see the distribution of our numeric variables by using histogram and then by boxplots for potential outliers. Histogram showed that the variables had right skewness. In the last step the right skewness was reduced by applying logarithmic transformation(base 10) using log10 function which reduced the right skewness and gave us a nearly normal distribution.
Since Agriculture is one of the major contributors to global emissions of the Greenhouse gases, both our datasets are about the contribution to the total amount of GHG emissions from agriculture. Both the datasets were retrieved from the open source platform “FAOSTAT” Food and Agriculture Organization of the United Nations. Both the datasets were in CSV format. The data illustrates the contribution of total amount of Greenhouse Gas(GHG) emissions i.e. (non-CO2 gases, methane (CH4) and nitrous oxide (N2O)), being generated in different agricultural emissions sub-domains i.e.(enteric fermentation, manure management, rice cultivation, synthetic fertilizers, manure applied to soils, manure left on pastures, crop residues, cultivation of organic soils, burning of crop residues, burning of savanna and energy use), computed and estimated in Gg(109 g) by FAO from 1961-2017 and with the forecast for the years 2030 and 2050 by following the IPCC Guidelines for National GHG Inventories The website FAOSTAT provides data by country, regions which is helpful for countries to assess and report their emissions.
The first dataset is about the contribution to the total amount of GHG emissions from agriculture of the Asian countries from 1961-2017 with projections of years 2030 and 2050.
The second dataset is about the contribution to the total amount of GHG emissions from agriculture of the European countries from 1961-2017 with projections of years 2030 and 2050.
Both the datasets were imported into RStudio using readr package. To merge the datasets, data preprocessing is required, after making the data tidy so that we will merge the datasets. After importing the datasets, i saved it in new variables Emission_Asia and Emission_Europe, respectively. Head function will return/display several rows of the dataframe, so i assigned the value of “n” to 5, it will display/return us the first five rows of the data frame but desired number of rows can be obtained the by assigning a different value.
Emissions_Asia <- read_csv("C:/Users/mafza/OneDrive/Desktop/data/Emissions_Asia.csv",
col_types = cols(`Area Code` = col_integer(),
`Item Code` = col_integer(), `Element Code` = col_integer()))
head(Emissions_Asia,n=5)
Emissions_Europe <- read_csv("C:/Users/mafza/OneDrive/Desktop/data/Emissions_Europe.csv",
col_types = cols(`Area Code` = col_integer(),
`Item Code` = col_integer(), `Element Code` = col_integer()))
head(Emissions_Europe,n=5)
Once the datasets were imported it is important to understand/inspect the structure of both the datasets to make sure all the variable has appropriate datatypes assigned to them before performing any analysis. Dim functions were used to check the dimensions of both the datasets. The first dataset has 1995 observations and 66 variables. Similarly, the second dataset has 1932 observations and 66 variables. Str Function was used to check the structure of both the datasets.
Both the datasets have same variables names (“Area code”, “Area”, “Item”, "“Item Code”, “Element Code”,“Element”, “Unit” and “Y1961”- “Y2050”) but have different obersvations. Emission_Asia dataset contains the data of total contribution of emission from agriculture of Asian Countries, while Emission_Europe dataset contains the data of otal contribution of emission from agriculture of European Countries.
After checking the structure and datatypes of the variables using str Function. The datatypes of numerical variables(“Area code”, “Item code”, “Element Code”) was changed from double to integer while reading the datasets .The variables(Item, Element) were assigned character datatype, so we need to change the datatypes of these variables to factor because the variables contained categorical data. The “Item” variable was converted to factor variable, While “Element” variable was converted to factor variable with five levels, “CH4 emission”, “Emission(C02 eq)”,”Emission(C02 eq from CH4)”,”Emission(C02 eq from N2O)”,”N2O emission” with levels beings labelled with 1,2,3,4,5 respectively.
I noticed that some variables have not named appropriately. So, before changing the datatypes of the variables, the first thing is to change the names of the variables in both datasets. Year variables in both datasets were named as “Y1961” to “Y2050” which was really confusing for the audience to understand, since it was YEAR number so i changed it simply to “1961” etc, so that later on i could convert its datatype in the next step.
dim(Emissions_Asia)
## [1] 1995 66
str(Emissions_Asia)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1995 obs. of 66 variables:
## $ Area Code : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Item Code : int 5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
## $ Item : chr "Enteric Fermentation" "Enteric Fermentation" "Enteric Fermentation" "Manure Management" ...
## $ Element Code: int 7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
## $ Element : chr "Emissions (CH4)" "Emissions (CO2eq)" "Emissions (CO2eq) from CH4" "Emissions (CH4)" ...
## $ Unit : chr "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
## $ Y1961 : num 240.7 5054.3 5054.3 11.6 367.8 ...
## $ Y1962 : num 245 5152 5152 12 376 ...
## $ Y1963 : num 255.8 5372.4 5372.4 12.6 392.6 ...
## $ Y1964 : num 259.1 5440.4 5440.4 12.8 399.9 ...
## $ Y1965 : num 265.6 5577.6 5577.6 13.3 413.4 ...
## $ Y1966 : num 277 5817 5817 14 433 ...
## $ Y1967 : num 280.1 5882 5882 14.3 440.1 ...
## $ Y1968 : num 288.8 6065.2 6065.2 14.7 453.2 ...
## $ Y1969 : num 286.4 6014 6014 14.5 449.9 ...
## $ Y1970 : num 290.3 6095.5 6095.5 14.7 455 ...
## $ Y1971 : num 287.8 6043.6 6043.6 14.6 450.9 ...
## $ Y1972 : num 232 4862 4862 13 372 ...
## $ Y1973 : num 245 5144.6 5144.6 13.1 388.3 ...
## $ Y1974 : num 262.8 5519.6 5519.6 13.8 415.2 ...
## $ Y1975 : num 282.1 5923.6 5923.6 14.5 444 ...
## $ Y1976 : num 288.2 6052.7 6052.7 14.8 453.7 ...
## $ Y1977 : num 280.9 5898.4 5898.4 14.4 441.7 ...
## $ Y1978 : num 280.3 5886 5886 14.7 443.3 ...
## $ Y1979 : num 274.2 5758.9 5758.9 14.6 436.1 ...
## $ Y1980 : num 275.4 5782.8 5782.8 14.6 437.9 ...
## $ Y1981 : num 278.2 5842.4 5842.4 14.8 442.6 ...
## $ Y1982 : num 277.9 5836.7 5836.7 14.7 442.1 ...
## $ Y1983 : num 262.6 5515.2 5515.2 14.2 419.9 ...
## $ Y1984 : num 230.2 4833.1 4833.1 12.4 366.5 ...
## $ Y1985 : num 202.7 4257.3 4257.3 10.7 319.8 ...
## $ Y1986 : num 160.26 3365.54 3365.54 8.28 249.68 ...
## $ Y1987 : num 172.94 3631.74 3631.74 8.67 266.26 ...
## $ Y1988 : num 181.44 3810.16 3810.16 8.75 278.33 ...
## $ Y1989 : num 179.56 3770.8 3770.8 8.62 275.35 ...
## $ Y1990 : num 178.47 3747.83 3747.83 8.52 273.27 ...
## $ Y1991 : num 187.55 3938.55 3938.55 9.34 290.59 ...
## $ Y1992 : num 189.76 3984.96 3984.96 9.67 294.78 ...
## $ Y1993 : num 190.83 4007.43 4007.43 9.83 296.51 ...
## $ Y1994 : num 197.9 4156.3 4156.3 10.4 308.4 ...
## $ Y1995 : num 211.2 4434.3 4434.3 11.4 331 ...
## $ Y1996 : num 239.7 5034.1 5034.1 13.5 385.2 ...
## $ Y1997 : num 264.6 5556.8 5556.8 14.9 423.2 ...
## $ Y1998 : num 283.5 5952.5 5952.5 15.7 448.8 ...
## $ Y1999 : num 318.3 6685.1 6685.1 17.9 504.8 ...
## $ Y2000 : num 272.1 5714.9 5714.9 15.1 428.3 ...
## $ Y2001 : num 225.4 4733.5 4733.5 12.2 355.2 ...
## $ Y2002 : num 287.9 6045.8 6045.8 18.4 477.2 ...
## $ Y2003 : num 293.6 6166.1 6166.1 18.7 486.1 ...
## $ Y2004 : num 285.6 5997.6 5997.6 17.6 467.3 ...
## $ Y2005 : num 295.4 6203.4 6203.4 18.5 490.1 ...
## $ Y2006 : num 300.8 6316.9 6316.9 19.5 503.1 ...
## $ Y2007 : num 304.2 6388.7 6388.7 20.5 516.4 ...
## $ Y2008 : num 339.6 7130.7 7130.7 22.4 574 ...
## $ Y2009 : num 345.7 7258.8 7258.8 22.6 583.5 ...
## $ Y2010 : num 401.1 8422.4 8422.4 26.6 681.3 ...
## $ Y2011 : num 402.5 8452.8 8452.8 26.2 678.8 ...
## $ Y2012 : num 396.9 8335.3 8335.3 26.1 672.3 ...
## $ Y2013 : num 393.1 8255 8255 26.1 667.7 ...
## $ Y2014 : num 398.3 8364 8364 26.4 675.6 ...
## $ Y2015 : num 384.1 8066.9 8066.9 24.9 645.2 ...
## $ Y2016 : num 381.7 8015.3 8015.3 24.8 642 ...
## $ Y2017 : num 371.9 7810.4 7810.4 23.8 623.4 ...
## $ Y2030 : num 453.7 9528.7 9528.7 27.2 750.3 ...
## $ Y2050 : num 603.6 12676 12676 35.3 1003.2 ...
## - attr(*, "spec")=
## .. cols(
## .. `Area Code` = col_integer(),
## .. Area = col_character(),
## .. `Item Code` = col_integer(),
## .. Item = col_character(),
## .. `Element Code` = col_integer(),
## .. Element = col_character(),
## .. Unit = col_character(),
## .. Y1961 = col_double(),
## .. Y1962 = col_double(),
## .. Y1963 = col_double(),
## .. Y1964 = col_double(),
## .. Y1965 = col_double(),
## .. Y1966 = col_double(),
## .. Y1967 = col_double(),
## .. Y1968 = col_double(),
## .. Y1969 = col_double(),
## .. Y1970 = col_double(),
## .. Y1971 = col_double(),
## .. Y1972 = col_double(),
## .. Y1973 = col_double(),
## .. Y1974 = col_double(),
## .. Y1975 = col_double(),
## .. Y1976 = col_double(),
## .. Y1977 = col_double(),
## .. Y1978 = col_double(),
## .. Y1979 = col_double(),
## .. Y1980 = col_double(),
## .. Y1981 = col_double(),
## .. Y1982 = col_double(),
## .. Y1983 = col_double(),
## .. Y1984 = col_double(),
## .. Y1985 = col_double(),
## .. Y1986 = col_double(),
## .. Y1987 = col_double(),
## .. Y1988 = col_double(),
## .. Y1989 = col_double(),
## .. Y1990 = col_double(),
## .. Y1991 = col_double(),
## .. Y1992 = col_double(),
## .. Y1993 = col_double(),
## .. Y1994 = col_double(),
## .. Y1995 = col_double(),
## .. Y1996 = col_double(),
## .. Y1997 = col_double(),
## .. Y1998 = col_double(),
## .. Y1999 = col_double(),
## .. Y2000 = col_double(),
## .. Y2001 = col_double(),
## .. Y2002 = col_double(),
## .. Y2003 = col_double(),
## .. Y2004 = col_double(),
## .. Y2005 = col_double(),
## .. Y2006 = col_double(),
## .. Y2007 = col_double(),
## .. Y2008 = col_double(),
## .. Y2009 = col_double(),
## .. Y2010 = col_double(),
## .. Y2011 = col_double(),
## .. Y2012 = col_double(),
## .. Y2013 = col_double(),
## .. Y2014 = col_double(),
## .. Y2015 = col_double(),
## .. Y2016 = col_double(),
## .. Y2017 = col_double(),
## .. Y2030 = col_double(),
## .. Y2050 = col_double()
## .. )
names(Emissions_Asia)
## [1] "Area Code" "Area" "Item Code" "Item" "Element Code"
## [6] "Element" "Unit" "Y1961" "Y1962" "Y1963"
## [11] "Y1964" "Y1965" "Y1966" "Y1967" "Y1968"
## [16] "Y1969" "Y1970" "Y1971" "Y1972" "Y1973"
## [21] "Y1974" "Y1975" "Y1976" "Y1977" "Y1978"
## [26] "Y1979" "Y1980" "Y1981" "Y1982" "Y1983"
## [31] "Y1984" "Y1985" "Y1986" "Y1987" "Y1988"
## [36] "Y1989" "Y1990" "Y1991" "Y1992" "Y1993"
## [41] "Y1994" "Y1995" "Y1996" "Y1997" "Y1998"
## [46] "Y1999" "Y2000" "Y2001" "Y2002" "Y2003"
## [51] "Y2004" "Y2005" "Y2006" "Y2007" "Y2008"
## [56] "Y2009" "Y2010" "Y2011" "Y2012" "Y2013"
## [61] "Y2014" "Y2015" "Y2016" "Y2017" "Y2030"
## [66] "Y2050"
class(Emissions_Asia$Element)
## [1] "character"
typeof(Emissions_Asia$Element)
## [1] "character"
class(Emissions_Asia$Item)
## [1] "character"
typeof(Emissions_Asia$Item)
## [1] "character"
Emissions_Asia$Item <- factor(Emissions_Asia$Item,
levels = c("Agricultural Soils","Agriculture total","Burning - Crop residues","Burning - Savanna","Crop Residues","Cultivation of Organic Soils","Enteric Fermentation","Manure applied to Soils","Manure left on Pasture","Manure Management","Rice Cultivation","Synthetic Fertilizers"))
is.factor(Emissions_Asia$Item)
## [1] TRUE
Emissions_Asia$Element <- factor(Emissions_Asia$Element,
levels = c("Emissions (CH4)","Emissions (CO2eq)","Emissions (CO2eq) from CH4","Emissions (CO2eq) from N2O","Emissions (N2O)"),
labels = c(1,2,3,4,5))
is.factor(Emissions_Asia$Element)
## [1] TRUE
str(Emissions_Asia)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1995 obs. of 66 variables:
## $ Area Code : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Item Code : int 5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
## $ Item : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
## $ Element Code: int 7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
## $ Element : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
## $ Unit : chr "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
## $ Y1961 : num 240.7 5054.3 5054.3 11.6 367.8 ...
## $ Y1962 : num 245 5152 5152 12 376 ...
## $ Y1963 : num 255.8 5372.4 5372.4 12.6 392.6 ...
## $ Y1964 : num 259.1 5440.4 5440.4 12.8 399.9 ...
## $ Y1965 : num 265.6 5577.6 5577.6 13.3 413.4 ...
## $ Y1966 : num 277 5817 5817 14 433 ...
## $ Y1967 : num 280.1 5882 5882 14.3 440.1 ...
## $ Y1968 : num 288.8 6065.2 6065.2 14.7 453.2 ...
## $ Y1969 : num 286.4 6014 6014 14.5 449.9 ...
## $ Y1970 : num 290.3 6095.5 6095.5 14.7 455 ...
## $ Y1971 : num 287.8 6043.6 6043.6 14.6 450.9 ...
## $ Y1972 : num 232 4862 4862 13 372 ...
## $ Y1973 : num 245 5144.6 5144.6 13.1 388.3 ...
## $ Y1974 : num 262.8 5519.6 5519.6 13.8 415.2 ...
## $ Y1975 : num 282.1 5923.6 5923.6 14.5 444 ...
## $ Y1976 : num 288.2 6052.7 6052.7 14.8 453.7 ...
## $ Y1977 : num 280.9 5898.4 5898.4 14.4 441.7 ...
## $ Y1978 : num 280.3 5886 5886 14.7 443.3 ...
## $ Y1979 : num 274.2 5758.9 5758.9 14.6 436.1 ...
## $ Y1980 : num 275.4 5782.8 5782.8 14.6 437.9 ...
## $ Y1981 : num 278.2 5842.4 5842.4 14.8 442.6 ...
## $ Y1982 : num 277.9 5836.7 5836.7 14.7 442.1 ...
## $ Y1983 : num 262.6 5515.2 5515.2 14.2 419.9 ...
## $ Y1984 : num 230.2 4833.1 4833.1 12.4 366.5 ...
## $ Y1985 : num 202.7 4257.3 4257.3 10.7 319.8 ...
## $ Y1986 : num 160.26 3365.54 3365.54 8.28 249.68 ...
## $ Y1987 : num 172.94 3631.74 3631.74 8.67 266.26 ...
## $ Y1988 : num 181.44 3810.16 3810.16 8.75 278.33 ...
## $ Y1989 : num 179.56 3770.8 3770.8 8.62 275.35 ...
## $ Y1990 : num 178.47 3747.83 3747.83 8.52 273.27 ...
## $ Y1991 : num 187.55 3938.55 3938.55 9.34 290.59 ...
## $ Y1992 : num 189.76 3984.96 3984.96 9.67 294.78 ...
## $ Y1993 : num 190.83 4007.43 4007.43 9.83 296.51 ...
## $ Y1994 : num 197.9 4156.3 4156.3 10.4 308.4 ...
## $ Y1995 : num 211.2 4434.3 4434.3 11.4 331 ...
## $ Y1996 : num 239.7 5034.1 5034.1 13.5 385.2 ...
## $ Y1997 : num 264.6 5556.8 5556.8 14.9 423.2 ...
## $ Y1998 : num 283.5 5952.5 5952.5 15.7 448.8 ...
## $ Y1999 : num 318.3 6685.1 6685.1 17.9 504.8 ...
## $ Y2000 : num 272.1 5714.9 5714.9 15.1 428.3 ...
## $ Y2001 : num 225.4 4733.5 4733.5 12.2 355.2 ...
## $ Y2002 : num 287.9 6045.8 6045.8 18.4 477.2 ...
## $ Y2003 : num 293.6 6166.1 6166.1 18.7 486.1 ...
## $ Y2004 : num 285.6 5997.6 5997.6 17.6 467.3 ...
## $ Y2005 : num 295.4 6203.4 6203.4 18.5 490.1 ...
## $ Y2006 : num 300.8 6316.9 6316.9 19.5 503.1 ...
## $ Y2007 : num 304.2 6388.7 6388.7 20.5 516.4 ...
## $ Y2008 : num 339.6 7130.7 7130.7 22.4 574 ...
## $ Y2009 : num 345.7 7258.8 7258.8 22.6 583.5 ...
## $ Y2010 : num 401.1 8422.4 8422.4 26.6 681.3 ...
## $ Y2011 : num 402.5 8452.8 8452.8 26.2 678.8 ...
## $ Y2012 : num 396.9 8335.3 8335.3 26.1 672.3 ...
## $ Y2013 : num 393.1 8255 8255 26.1 667.7 ...
## $ Y2014 : num 398.3 8364 8364 26.4 675.6 ...
## $ Y2015 : num 384.1 8066.9 8066.9 24.9 645.2 ...
## $ Y2016 : num 381.7 8015.3 8015.3 24.8 642 ...
## $ Y2017 : num 371.9 7810.4 7810.4 23.8 623.4 ...
## $ Y2030 : num 453.7 9528.7 9528.7 27.2 750.3 ...
## $ Y2050 : num 603.6 12676 12676 35.3 1003.2 ...
## - attr(*, "spec")=
## .. cols(
## .. `Area Code` = col_integer(),
## .. Area = col_character(),
## .. `Item Code` = col_integer(),
## .. Item = col_character(),
## .. `Element Code` = col_integer(),
## .. Element = col_character(),
## .. Unit = col_character(),
## .. Y1961 = col_double(),
## .. Y1962 = col_double(),
## .. Y1963 = col_double(),
## .. Y1964 = col_double(),
## .. Y1965 = col_double(),
## .. Y1966 = col_double(),
## .. Y1967 = col_double(),
## .. Y1968 = col_double(),
## .. Y1969 = col_double(),
## .. Y1970 = col_double(),
## .. Y1971 = col_double(),
## .. Y1972 = col_double(),
## .. Y1973 = col_double(),
## .. Y1974 = col_double(),
## .. Y1975 = col_double(),
## .. Y1976 = col_double(),
## .. Y1977 = col_double(),
## .. Y1978 = col_double(),
## .. Y1979 = col_double(),
## .. Y1980 = col_double(),
## .. Y1981 = col_double(),
## .. Y1982 = col_double(),
## .. Y1983 = col_double(),
## .. Y1984 = col_double(),
## .. Y1985 = col_double(),
## .. Y1986 = col_double(),
## .. Y1987 = col_double(),
## .. Y1988 = col_double(),
## .. Y1989 = col_double(),
## .. Y1990 = col_double(),
## .. Y1991 = col_double(),
## .. Y1992 = col_double(),
## .. Y1993 = col_double(),
## .. Y1994 = col_double(),
## .. Y1995 = col_double(),
## .. Y1996 = col_double(),
## .. Y1997 = col_double(),
## .. Y1998 = col_double(),
## .. Y1999 = col_double(),
## .. Y2000 = col_double(),
## .. Y2001 = col_double(),
## .. Y2002 = col_double(),
## .. Y2003 = col_double(),
## .. Y2004 = col_double(),
## .. Y2005 = col_double(),
## .. Y2006 = col_double(),
## .. Y2007 = col_double(),
## .. Y2008 = col_double(),
## .. Y2009 = col_double(),
## .. Y2010 = col_double(),
## .. Y2011 = col_double(),
## .. Y2012 = col_double(),
## .. Y2013 = col_double(),
## .. Y2014 = col_double(),
## .. Y2015 = col_double(),
## .. Y2016 = col_double(),
## .. Y2017 = col_double(),
## .. Y2030 = col_double(),
## .. Y2050 = col_double()
## .. )
head(Emissions_Asia)
dim(Emissions_Europe)
## [1] 1932 66
str(Emissions_Europe)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1932 obs. of 66 variables:
## $ Area Code : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Area : chr "Albania" "Albania" "Albania" "Albania" ...
## $ Item Code : int 5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
## $ Item : chr "Enteric Fermentation" "Enteric Fermentation" "Enteric Fermentation" "Manure Management" ...
## $ Element Code: int 7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
## $ Element : chr "Emissions (CH4)" "Emissions (CO2eq)" "Emissions (CO2eq) from CH4" "Emissions (CH4)" ...
## $ Unit : chr "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
## $ Y1961 : num 51.04 1071.77 1071.77 6.57 232.48 ...
## $ Y1962 : num 51.49 1081.3 1081.3 6.61 233.71 ...
## $ Y1963 : num 51.2 1075.29 1075.29 6.53 231.54 ...
## $ Y1964 : num 51.46 1080.61 1080.61 6.68 235.83 ...
## $ Y1965 : num 54.97 1154.32 1154.32 7.36 257.6 ...
## $ Y1966 : num 54.73 1149.4 1149.4 7.39 258.7 ...
## $ Y1967 : num 55.25 1160.15 1160.15 7.56 264.03 ...
## $ Y1968 : num 51.16 1074.29 1074.29 7.02 246.35 ...
## $ Y1969 : num 50.18 1053.86 1053.86 7.12 248.87 ...
## $ Y1970 : num 49.7 1043.67 1043.67 7.09 247.85 ...
## $ Y1971 : num 48.03 1008.55 1008.55 6.94 242.06 ...
## $ Y1972 : num 46.76 981.99 981.99 6.86 238.54 ...
## $ Y1973 : num 47.78 1003.4 1003.4 7.06 245.43 ...
## $ Y1974 : num 49.7 1043.67 1043.67 7.43 258.44 ...
## $ Y1975 : num 50.34 1057.08 1057.08 7.59 263.62 ...
## $ Y1976 : num 53.46 1122.58 1122.58 8.07 281.38 ...
## $ Y1977 : num 55.74 1170.5 1170.5 8.62 299.44 ...
## $ Y1978 : num 59.54 1250.25 1250.25 9.32 323.77 ...
## $ Y1979 : num 62.1 1304.9 1304.9 9.8 340.2 ...
## $ Y1980 : num 64.1 1345.3 1345.3 10.1 349.3 ...
## $ Y1981 : num 65.4 1373.9 1373.9 10.4 360.5 ...
## $ Y1982 : num 65.9 1383.2 1383.2 10.6 365.9 ...
## $ Y1983 : num 67.1 1408.8 1408.8 10.7 369.6 ...
## $ Y1984 : num 67.1 1409.5 1409.5 10.8 370.5 ...
## $ Y1985 : num 64.5 1354.4 1354.4 10.4 355.6 ...
## $ Y1986 : num 67.3 1413.5 1413.5 10.9 372.5 ...
## $ Y1987 : num 72.4 1521.4 1521.4 11.6 400.6 ...
## $ Y1988 : num 76.4 1605.1 1605.1 12.1 418.2 ...
## $ Y1989 : num 78.1 1639.7 1639.7 12.3 424.6 ...
## $ Y1990 : num 74.7 1568.3 1568.3 12.1 413.3 ...
## $ Y1991 : num 76.8 1612.2 1612.2 11.8 409.6 ...
## $ Y1992 : num 76.8 1613.6 1613.6 11.4 396.5 ...
## $ Y1993 : num 81.9 1720.1 1720.1 12.1 421.2 ...
## $ Y1994 : num 104.7 2199.1 2199.1 15.5 543.3 ...
## $ Y1995 : num 107 2247 2247 16 560 ...
## $ Y1996 : num 100 2104 2104 16 555 ...
## $ Y1997 : num 93.2 1958.1 1958.1 14.7 513 ...
## $ Y1998 : num 88.3 1854.8 1854.8 14 488.7 ...
## $ Y1999 : num 90.8 1906.2 1906.2 14.3 499.3 ...
## $ Y2000 : num 92.1 1934.4 1934.4 14.9 515.4 ...
## $ Y2001 : num 89.9 1887.1 1887.1 14.7 507.5 ...
## $ Y2002 : num 87.3 1834 1834 14.5 500.5 ...
## $ Y2003 : num 88.3 1853.8 1853.8 14.8 508.7 ...
## $ Y2004 : num 84.7 1778.6 1778.6 14.5 498.8 ...
## $ Y2005 : num 84.1 1765.3 1765.3 14.4 495.8 ...
## $ Y2006 : num 82.6 1735.5 1735.5 14.1 484.6 ...
## $ Y2007 : num 77.7 1631 1631 13.3 457.8 ...
## $ Y2008 : num 72.7 1525.8 1525.8 12.6 432.7 ...
## $ Y2009 : num 68.9 1447.2 1447.2 12.1 416.3 ...
## $ Y2010 : num 69.2 1454 1454 12.2 420.3 ...
## $ Y2011 : num 68.5 1439.2 1439.2 12.2 420.1 ...
## $ Y2012 : num 69.9 1466.9 1466.9 12.4 426.6 ...
## $ Y2013 : num 70.3 1476.8 1476.8 12.3 423.2 ...
## $ Y2014 : num 71.1 1492.6 1492.6 12.5 430 ...
## $ Y2015 : num 71.6 1503 1503 12.5 428.2 ...
## $ Y2016 : num 71.2 1496 1496 12.4 423.7 ...
## $ Y2017 : num 69.5 1459.7 1459.7 12.2 415.5 ...
## $ Y2030 : num 86.7 1821.2 1821.2 14.3 495.2 ...
## $ Y2050 : num 85.9 1804.9 1804.9 14.3 500.6 ...
## - attr(*, "spec")=
## .. cols(
## .. `Area Code` = col_integer(),
## .. Area = col_character(),
## .. `Item Code` = col_integer(),
## .. Item = col_character(),
## .. `Element Code` = col_integer(),
## .. Element = col_character(),
## .. Unit = col_character(),
## .. Y1961 = col_double(),
## .. Y1962 = col_double(),
## .. Y1963 = col_double(),
## .. Y1964 = col_double(),
## .. Y1965 = col_double(),
## .. Y1966 = col_double(),
## .. Y1967 = col_double(),
## .. Y1968 = col_double(),
## .. Y1969 = col_double(),
## .. Y1970 = col_double(),
## .. Y1971 = col_double(),
## .. Y1972 = col_double(),
## .. Y1973 = col_double(),
## .. Y1974 = col_double(),
## .. Y1975 = col_double(),
## .. Y1976 = col_double(),
## .. Y1977 = col_double(),
## .. Y1978 = col_double(),
## .. Y1979 = col_double(),
## .. Y1980 = col_double(),
## .. Y1981 = col_double(),
## .. Y1982 = col_double(),
## .. Y1983 = col_double(),
## .. Y1984 = col_double(),
## .. Y1985 = col_double(),
## .. Y1986 = col_double(),
## .. Y1987 = col_double(),
## .. Y1988 = col_double(),
## .. Y1989 = col_double(),
## .. Y1990 = col_double(),
## .. Y1991 = col_double(),
## .. Y1992 = col_double(),
## .. Y1993 = col_double(),
## .. Y1994 = col_double(),
## .. Y1995 = col_double(),
## .. Y1996 = col_double(),
## .. Y1997 = col_double(),
## .. Y1998 = col_double(),
## .. Y1999 = col_double(),
## .. Y2000 = col_double(),
## .. Y2001 = col_double(),
## .. Y2002 = col_double(),
## .. Y2003 = col_double(),
## .. Y2004 = col_double(),
## .. Y2005 = col_double(),
## .. Y2006 = col_double(),
## .. Y2007 = col_double(),
## .. Y2008 = col_double(),
## .. Y2009 = col_double(),
## .. Y2010 = col_double(),
## .. Y2011 = col_double(),
## .. Y2012 = col_double(),
## .. Y2013 = col_double(),
## .. Y2014 = col_double(),
## .. Y2015 = col_double(),
## .. Y2016 = col_double(),
## .. Y2017 = col_double(),
## .. Y2030 = col_double(),
## .. Y2050 = col_double()
## .. )
class(Emissions_Europe$Element)
## [1] "character"
typeof(Emissions_Europe$Element)
## [1] "character"
class(Emissions_Asia$Item)
## [1] "factor"
typeof(Emissions_Asia$Item)
## [1] "integer"
Emissions_Europe$Item <- factor(Emissions_Europe$Item,
levels = c("Agricultural Soils","Agriculture total","Burning - Crop residues","Burning - Savanna","Crop Residues","Cultivation of Organic Soils","Enteric Fermentation","Manure applied to Soils","Manure left on Pasture","Manure Management","Rice Cultivation","Synthetic Fertilizers"))
is.factor(Emissions_Europe$Item)
## [1] TRUE
Emissions_Europe$Element <- factor(Emissions_Europe$Element,
levels = c("Emissions (CH4)","Emissions (CO2eq)","Emissions (CO2eq) from CH4","Emissions (CO2eq) from N2O","Emissions (N2O)"),
labels = c(1,2,3,4,5))
is.factor(Emissions_Europe$Element)
## [1] TRUE
str(Emissions_Europe)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1932 obs. of 66 variables:
## $ Area Code : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Area : chr "Albania" "Albania" "Albania" "Albania" ...
## $ Item Code : int 5058 5058 5058 5059 5059 5059 5059 5059 5060 5060 ...
## $ Item : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
## $ Element Code: int 7225 7231 7244 7225 7231 7244 7243 7230 7225 7231 ...
## $ Element : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
## $ Unit : chr "gigagrams" "gigagrams" "gigagrams" "gigagrams" ...
## $ Y1961 : num 51.04 1071.77 1071.77 6.57 232.48 ...
## $ Y1962 : num 51.49 1081.3 1081.3 6.61 233.71 ...
## $ Y1963 : num 51.2 1075.29 1075.29 6.53 231.54 ...
## $ Y1964 : num 51.46 1080.61 1080.61 6.68 235.83 ...
## $ Y1965 : num 54.97 1154.32 1154.32 7.36 257.6 ...
## $ Y1966 : num 54.73 1149.4 1149.4 7.39 258.7 ...
## $ Y1967 : num 55.25 1160.15 1160.15 7.56 264.03 ...
## $ Y1968 : num 51.16 1074.29 1074.29 7.02 246.35 ...
## $ Y1969 : num 50.18 1053.86 1053.86 7.12 248.87 ...
## $ Y1970 : num 49.7 1043.67 1043.67 7.09 247.85 ...
## $ Y1971 : num 48.03 1008.55 1008.55 6.94 242.06 ...
## $ Y1972 : num 46.76 981.99 981.99 6.86 238.54 ...
## $ Y1973 : num 47.78 1003.4 1003.4 7.06 245.43 ...
## $ Y1974 : num 49.7 1043.67 1043.67 7.43 258.44 ...
## $ Y1975 : num 50.34 1057.08 1057.08 7.59 263.62 ...
## $ Y1976 : num 53.46 1122.58 1122.58 8.07 281.38 ...
## $ Y1977 : num 55.74 1170.5 1170.5 8.62 299.44 ...
## $ Y1978 : num 59.54 1250.25 1250.25 9.32 323.77 ...
## $ Y1979 : num 62.1 1304.9 1304.9 9.8 340.2 ...
## $ Y1980 : num 64.1 1345.3 1345.3 10.1 349.3 ...
## $ Y1981 : num 65.4 1373.9 1373.9 10.4 360.5 ...
## $ Y1982 : num 65.9 1383.2 1383.2 10.6 365.9 ...
## $ Y1983 : num 67.1 1408.8 1408.8 10.7 369.6 ...
## $ Y1984 : num 67.1 1409.5 1409.5 10.8 370.5 ...
## $ Y1985 : num 64.5 1354.4 1354.4 10.4 355.6 ...
## $ Y1986 : num 67.3 1413.5 1413.5 10.9 372.5 ...
## $ Y1987 : num 72.4 1521.4 1521.4 11.6 400.6 ...
## $ Y1988 : num 76.4 1605.1 1605.1 12.1 418.2 ...
## $ Y1989 : num 78.1 1639.7 1639.7 12.3 424.6 ...
## $ Y1990 : num 74.7 1568.3 1568.3 12.1 413.3 ...
## $ Y1991 : num 76.8 1612.2 1612.2 11.8 409.6 ...
## $ Y1992 : num 76.8 1613.6 1613.6 11.4 396.5 ...
## $ Y1993 : num 81.9 1720.1 1720.1 12.1 421.2 ...
## $ Y1994 : num 104.7 2199.1 2199.1 15.5 543.3 ...
## $ Y1995 : num 107 2247 2247 16 560 ...
## $ Y1996 : num 100 2104 2104 16 555 ...
## $ Y1997 : num 93.2 1958.1 1958.1 14.7 513 ...
## $ Y1998 : num 88.3 1854.8 1854.8 14 488.7 ...
## $ Y1999 : num 90.8 1906.2 1906.2 14.3 499.3 ...
## $ Y2000 : num 92.1 1934.4 1934.4 14.9 515.4 ...
## $ Y2001 : num 89.9 1887.1 1887.1 14.7 507.5 ...
## $ Y2002 : num 87.3 1834 1834 14.5 500.5 ...
## $ Y2003 : num 88.3 1853.8 1853.8 14.8 508.7 ...
## $ Y2004 : num 84.7 1778.6 1778.6 14.5 498.8 ...
## $ Y2005 : num 84.1 1765.3 1765.3 14.4 495.8 ...
## $ Y2006 : num 82.6 1735.5 1735.5 14.1 484.6 ...
## $ Y2007 : num 77.7 1631 1631 13.3 457.8 ...
## $ Y2008 : num 72.7 1525.8 1525.8 12.6 432.7 ...
## $ Y2009 : num 68.9 1447.2 1447.2 12.1 416.3 ...
## $ Y2010 : num 69.2 1454 1454 12.2 420.3 ...
## $ Y2011 : num 68.5 1439.2 1439.2 12.2 420.1 ...
## $ Y2012 : num 69.9 1466.9 1466.9 12.4 426.6 ...
## $ Y2013 : num 70.3 1476.8 1476.8 12.3 423.2 ...
## $ Y2014 : num 71.1 1492.6 1492.6 12.5 430 ...
## $ Y2015 : num 71.6 1503 1503 12.5 428.2 ...
## $ Y2016 : num 71.2 1496 1496 12.4 423.7 ...
## $ Y2017 : num 69.5 1459.7 1459.7 12.2 415.5 ...
## $ Y2030 : num 86.7 1821.2 1821.2 14.3 495.2 ...
## $ Y2050 : num 85.9 1804.9 1804.9 14.3 500.6 ...
## - attr(*, "spec")=
## .. cols(
## .. `Area Code` = col_integer(),
## .. Area = col_character(),
## .. `Item Code` = col_integer(),
## .. Item = col_character(),
## .. `Element Code` = col_integer(),
## .. Element = col_character(),
## .. Unit = col_character(),
## .. Y1961 = col_double(),
## .. Y1962 = col_double(),
## .. Y1963 = col_double(),
## .. Y1964 = col_double(),
## .. Y1965 = col_double(),
## .. Y1966 = col_double(),
## .. Y1967 = col_double(),
## .. Y1968 = col_double(),
## .. Y1969 = col_double(),
## .. Y1970 = col_double(),
## .. Y1971 = col_double(),
## .. Y1972 = col_double(),
## .. Y1973 = col_double(),
## .. Y1974 = col_double(),
## .. Y1975 = col_double(),
## .. Y1976 = col_double(),
## .. Y1977 = col_double(),
## .. Y1978 = col_double(),
## .. Y1979 = col_double(),
## .. Y1980 = col_double(),
## .. Y1981 = col_double(),
## .. Y1982 = col_double(),
## .. Y1983 = col_double(),
## .. Y1984 = col_double(),
## .. Y1985 = col_double(),
## .. Y1986 = col_double(),
## .. Y1987 = col_double(),
## .. Y1988 = col_double(),
## .. Y1989 = col_double(),
## .. Y1990 = col_double(),
## .. Y1991 = col_double(),
## .. Y1992 = col_double(),
## .. Y1993 = col_double(),
## .. Y1994 = col_double(),
## .. Y1995 = col_double(),
## .. Y1996 = col_double(),
## .. Y1997 = col_double(),
## .. Y1998 = col_double(),
## .. Y1999 = col_double(),
## .. Y2000 = col_double(),
## .. Y2001 = col_double(),
## .. Y2002 = col_double(),
## .. Y2003 = col_double(),
## .. Y2004 = col_double(),
## .. Y2005 = col_double(),
## .. Y2006 = col_double(),
## .. Y2007 = col_double(),
## .. Y2008 = col_double(),
## .. Y2009 = col_double(),
## .. Y2010 = col_double(),
## .. Y2011 = col_double(),
## .. Y2012 = col_double(),
## .. Y2013 = col_double(),
## .. Y2014 = col_double(),
## .. Y2015 = col_double(),
## .. Y2016 = col_double(),
## .. Y2017 = col_double(),
## .. Y2030 = col_double(),
## .. Y2050 = col_double()
## .. )
head(Emissions_Europe)
names(Emissions_Asia)[8]<-"1961"
names(Emissions_Asia)[9]<-"1962"
names(Emissions_Asia)[10]<-"1963"
names(Emissions_Asia)[11]<-"1964"
names(Emissions_Asia)[12]<-"1965"
names(Emissions_Asia)[13]<-"1966"
names(Emissions_Asia)[14]<-"1967"
names(Emissions_Asia)[15]<-"1968"
names(Emissions_Asia)[16]<-"1969"
names(Emissions_Asia)[17]<-"1970"
names(Emissions_Asia)[18]<-"1971"
names(Emissions_Asia)[19]<-"1972"
names(Emissions_Asia)[20]<-"1973"
names(Emissions_Asia)[21]<-"1974"
names(Emissions_Asia)[22]<-"1975"
names(Emissions_Asia)[23]<-"1976"
names(Emissions_Asia)[24]<-"1977"
names(Emissions_Asia)[25]<-"1978"
names(Emissions_Asia)[26]<-"1979"
names(Emissions_Asia)[27]<-"1980"
names(Emissions_Asia)[28]<-"1981"
names(Emissions_Asia)[29]<-"1982"
names(Emissions_Asia)[30]<-"1983"
names(Emissions_Asia)[31]<-"1984"
names(Emissions_Asia)[32]<-"1985"
names(Emissions_Asia)[33]<-"1986"
names(Emissions_Asia)[34]<-"1987"
names(Emissions_Asia)[35]<-"1988"
names(Emissions_Asia)[36]<-"1989"
names(Emissions_Asia)[37]<-"1990"
names(Emissions_Asia)[38]<-"1991"
names(Emissions_Asia)[39]<-"1992"
names(Emissions_Asia)[40]<-"1993"
names(Emissions_Asia)[41]<-"1994"
names(Emissions_Asia)[42]<-"1995"
names(Emissions_Asia)[43]<-"1996"
names(Emissions_Asia)[44]<-"1997"
names(Emissions_Asia)[45]<-"1998"
names(Emissions_Asia)[46]<-"1999"
names(Emissions_Asia)[47]<-"2000"
names(Emissions_Asia)[48]<-"2001"
names(Emissions_Asia)[49]<-"2002"
names(Emissions_Asia)[50]<-"2003"
names(Emissions_Asia)[51]<-"2004"
names(Emissions_Asia)[52]<-"2005"
names(Emissions_Asia)[53]<-"2006"
names(Emissions_Asia)[54]<-"2007"
names(Emissions_Asia)[55]<-"2008"
names(Emissions_Asia)[56]<-"2009"
names(Emissions_Asia)[57]<-"2010"
names(Emissions_Asia)[58]<-"2011"
names(Emissions_Asia)[59]<-"2012"
names(Emissions_Asia)[60]<-"2013"
names(Emissions_Asia)[61]<-"2014"
names(Emissions_Asia)[62]<-"2015"
names(Emissions_Asia)[63]<-"2016"
names(Emissions_Asia)[64]<-"2017"
names(Emissions_Asia)[65]<-"2030"
names(Emissions_Asia)[66]<-"2050"
dim(Emissions_Asia)
## [1] 1995 66
names(Emissions_Asia)
## [1] "Area Code" "Area" "Item Code" "Item" "Element Code"
## [6] "Element" "Unit" "1961" "1962" "1963"
## [11] "1964" "1965" "1966" "1967" "1968"
## [16] "1969" "1970" "1971" "1972" "1973"
## [21] "1974" "1975" "1976" "1977" "1978"
## [26] "1979" "1980" "1981" "1982" "1983"
## [31] "1984" "1985" "1986" "1987" "1988"
## [36] "1989" "1990" "1991" "1992" "1993"
## [41] "1994" "1995" "1996" "1997" "1998"
## [46] "1999" "2000" "2001" "2002" "2003"
## [51] "2004" "2005" "2006" "2007" "2008"
## [56] "2009" "2010" "2011" "2012" "2013"
## [61] "2014" "2015" "2016" "2017" "2030"
## [66] "2050"
names(Emissions_Europe)[8]<-"1961"
names(Emissions_Europe)[9]<-"1962"
names(Emissions_Europe)[10]<-"1963"
names(Emissions_Europe)[11]<-"1964"
names(Emissions_Europe)[12]<-"1965"
names(Emissions_Europe)[13]<-"1966"
names(Emissions_Europe)[14]<-"1967"
names(Emissions_Europe)[15]<-"1968"
names(Emissions_Europe)[16]<-"1969"
names(Emissions_Europe)[17]<-"1970"
names(Emissions_Europe)[18]<-"1971"
names(Emissions_Europe)[19]<-"1972"
names(Emissions_Europe)[20]<-"1973"
names(Emissions_Europe)[21]<-"1974"
names(Emissions_Europe)[22]<-"1975"
names(Emissions_Europe)[23]<-"1976"
names(Emissions_Europe)[24]<-"1977"
names(Emissions_Europe)[25]<-"1978"
names(Emissions_Europe)[26]<-"1979"
names(Emissions_Europe)[27]<-"1980"
names(Emissions_Europe)[28]<-"1981"
names(Emissions_Europe)[29]<-"1982"
names(Emissions_Europe)[30]<-"1983"
names(Emissions_Europe)[31]<-"1984"
names(Emissions_Europe)[32]<-"1985"
names(Emissions_Europe)[33]<-"1986"
names(Emissions_Europe)[34]<-"1987"
names(Emissions_Europe)[35]<-"1988"
names(Emissions_Europe)[36]<-"1989"
names(Emissions_Europe)[37]<-"1990"
names(Emissions_Europe)[38]<-"1991"
names(Emissions_Europe)[39]<-"1992"
names(Emissions_Europe)[40]<-"1993"
names(Emissions_Europe)[41]<-"1994"
names(Emissions_Europe)[42]<-"1995"
names(Emissions_Europe)[43]<-"1996"
names(Emissions_Europe)[44]<-"1997"
names(Emissions_Europe)[45]<-"1998"
names(Emissions_Europe)[46]<-"1999"
names(Emissions_Europe)[47]<-"2000"
names(Emissions_Europe)[48]<-"2001"
names(Emissions_Europe)[49]<-"2002"
names(Emissions_Europe)[50]<-"2003"
names(Emissions_Europe)[51]<-"2004"
names(Emissions_Europe)[52]<-"2005"
names(Emissions_Europe)[53]<-"2006"
names(Emissions_Europe)[54]<-"2007"
names(Emissions_Europe)[55]<-"2008"
names(Emissions_Europe)[56]<-"2009"
names(Emissions_Europe)[57]<-"2010"
names(Emissions_Europe)[58]<-"2011"
names(Emissions_Europe)[59]<-"2012"
names(Emissions_Europe)[60]<-"2013"
names(Emissions_Europe)[61]<-"2014"
names(Emissions_Europe)[62]<-"2015"
names(Emissions_Europe)[63]<-"2016"
names(Emissions_Europe)[64]<-"2017"
names(Emissions_Europe)[65]<-"2030"
names(Emissions_Europe)[66]<-"2050"
dim(Emissions_Asia)
## [1] 1995 66
names(Emissions_Asia)
## [1] "Area Code" "Area" "Item Code" "Item" "Element Code"
## [6] "Element" "Unit" "1961" "1962" "1963"
## [11] "1964" "1965" "1966" "1967" "1968"
## [16] "1969" "1970" "1971" "1972" "1973"
## [21] "1974" "1975" "1976" "1977" "1978"
## [26] "1979" "1980" "1981" "1982" "1983"
## [31] "1984" "1985" "1986" "1987" "1988"
## [36] "1989" "1990" "1991" "1992" "1993"
## [41] "1994" "1995" "1996" "1997" "1998"
## [46] "1999" "2000" "2001" "2002" "2003"
## [51] "2004" "2005" "2006" "2007" "2008"
## [56] "2009" "2010" "2011" "2012" "2013"
## [61] "2014" "2015" "2016" "2017" "2030"
## [66] "2050"
In this step, we must make our datasets tidy because both our datasets do not conform the tidy data principles. One of the principles of tidy data is that each variable must has its own column. Column headers are values not variables names, that is the problem with both of our datasets where some of the column’s names are not variables but are values of a variable. In both of our datasets the column names from 1961 to 2050 represents values of the year variable and each row must represent these observations, not one. I used gather Function to gather all the year variables in one variable “Year” in both datasets. Now our variables in both the datasets are reduced to 8 variables instead of 66 variables.
After making our datasets tidy, we can now merge our datasets. I merged the datasets by binding the rows of both datasets and stored it in a new dataframe named “df”. Head Function was used to view new created dataframe.
So, Year number alone cannot alone makeup a valid date because month and days are not specified. And when i tried to change it to date datatype, so by default it was adding month and day with year, that will disorganzize or disrupt the structure of our dataset. So i simply change it to integer datatype from character.
asia<- Emissions_Asia %>%
gather("1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2030","2050",key = "Year", value = "Emission_GigaGrams")
head(asia)
Europe<- Emissions_Europe %>%
gather("1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2030","2050",key = "Year", value = "Emission_GigaGrams")
head(Europe)
df <- bind_rows(asia,Europe)
names(df)
## [1] "Area Code" "Area" "Item Code"
## [4] "Item" "Element Code" "Element"
## [7] "Unit" "Year" "Emission_GigaGrams"
df2<- df%>% dplyr::select("Area","Item","Element","Year","Emission_GigaGrams")
str(df2)
## Classes 'tbl_df', 'tbl' and 'data.frame': 231693 obs. of 5 variables:
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Item : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
## $ Element : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
## $ Year : chr "1961" "1961" "1961" "1961" ...
## $ Emission_GigaGrams: num 240.7 5054.3 5054.3 11.6 367.8 ...
head(df2)
df2$Year <- as.integer(df2$Year)
class(df2$Year)
## [1] "integer"
head(df2)
Emission can be visualized in two distinct scales Gigagrams and Megagrams.
Since in both our datasets emissions were calculated in Gigagrams, we will create a new variable named Emission(Megagrams) that will show us the emission in Megagrams. The new variable will be created by multiplying the Emission(Gigagrams) to 1000.
df3 <- mutate(df2, Emission_MegaGrams= Emission_GigaGrams*1000)
str(df3)
## Classes 'tbl_df', 'tbl' and 'data.frame': 231693 obs. of 6 variables:
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Item : Factor w/ 12 levels "Agricultural Soils",..: 7 7 7 10 10 10 10 10 11 11 ...
## $ Element : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 1 2 3 4 5 1 2 ...
## $ Year : int 1961 1961 1961 1961 1961 1961 1961 1961 1961 1961 ...
## $ Emission_GigaGrams: num 240.7 5054.3 5054.3 11.6 367.8 ...
## $ Emission_MegaGrams: num 240683 5054346 5054346 11623 367831 ...
head(df3)
In this step, we will check for any missing value (NA), special values (Inf,-Inf,NaN) for the numeric variables. For this we will use is.special function which will be applied to every numeric column of the dataframe for missing and special values and will return the sum of missing and special values using sapply function.
Is.specialorNA <- function(x) {sum(if(is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x)))}
sapply (df3,Is.specialorNA)
## Area Item Element Year
## 0 0 0 0
## Emission_GigaGrams Emission_MegaGrams
## 57128 57128
In this step we will look for outliers in the numeric variables of our dataset. First, we will look for the distribution of our numeric variables using histogram. The histogram of both our numeric variables are positively skewed or skewed right, which means we do not have to use z-score method to look for outliers. We will use box plot for our numeric variables to check for outliers. Box plots showed outliers in both of our numeric variables. We used summary function to check the summarized statistics of our numeric variables which showed that these emission values are not outliers and we cannot exclude them because at the start, values of different types of emission in agriculture were 0 and has increased significantly over the years. The skewness value of our numeric variable was 15.00858.
hist(df3$Emission_GigaGrams)
df3$Emission_GigaGrams %>% boxplot(main= "BoxPlot OF Emission in GigaGrams")
skewness(df3$Emission_GigaGrams, na.rm = TRUE)
## [1] 15.00858
df3 %>% summarise(Min = min(df3$Emission_GigaGrams, na.rm = TRUE),
Q1 = quantile(df3$Emission_GigaGrams, probs = .25, na.rm = TRUE),
Median = median(df3$Emission_GigaGrams, na.rm = TRUE),
Q3 = quantile(df3$Emission_GigaGrams, probs = .75, na.rm = TRUE),
Max = max(df3$Emission_GigaGrams, na.rm = TRUE),
Mean = mean(df3$Emission_GigaGrams, na.rm = TRUE),
SD = sd(df3$Emission_GigaGrams, na.rm = TRUE),
IQR = IQR(df3$Emission_GigaGrams, na.rm = TRUE),
n = n(),
Missing = sum(is.na(df3$Emission_GigaGrams)))
hist(df3$Emission_MegaGrams)
df3$Emission_MegaGrams %>% boxplot(main= "BoxPlot OF Emission in MegaGrams")
skewness(df3$Emission_MegaGrams, na.rm = TRUE)
## [1] 15.00858
df3 %>% summarise(Min = min(df3$Emission_MegaGrams, na.rm = TRUE),
Q1 = quantile(df3$Emission_MegaGrams, probs = .25, na.rm = TRUE),
Median = median(df3$Emission_MegaGrams, na.rm = TRUE),
Q3 = quantile(df3$Emission_MegaGrams, probs = .75, na.rm = TRUE),
Max = max(df3$Emission_MegaGrams, na.rm = TRUE),
Mean = mean(df3$Emission_MegaGrams, na.rm = TRUE),
SD = sd(df3$Emission_MegaGrams, na.rm = TRUE),
IQR = IQR(df3$Emission_MegaGrams, na.rm = TRUE),
n = n(),
Missing = sum(is.na(df3$Emission_MegaGrams)))
In this step, we will apply transformation on the numeric variables. From the histogram and boxplot in the previous step we observed that numeric variables had right skewed distribution, so I applied logarithmic transformation(base 10) using log10 function which reduced the right skewness and gave us a nearly normal distribution.
ln_emission <- log(df3$Emission_GigaGrams)
par(mfrow=c(2,2))
hist(df3$Emission_GigaGrams, main= "Histogram Of Emission in GigaGrams")
hist(ln_emission, main="Emission(Gg) After Tranformation")
ln_emission <- log(df3$Emission_MegaGrams)
par(mfrow=c(2,2))
hist(df3$Emission_MegaGrams, main= "Histogram Of Emission in MegaGrams")
hist(ln_emission, main="Emission(Mg) After Tranformation")
The data was extracted from the open source http://www.fao.org/faostat/en/#data/GT