1 Global Warming and Climate Change

Photo by Anna Shvets from Pexels

Figure 1.1: Photo by Anna Shvets from Pexels

Climate change is a long-term change in the average weather patterns that have come to define Earth’s local, regional and global climates. These changes have a broad range of observed effects that are synonymous with the term.

Changes observed in Earth’s climate since the early 20th century are primarily driven by human activities, particularly fossil fuel burning, which increases heat-trapping greenhouse gas levels in Earth’s atmosphere, raising Earth’s average surface temperature. These human-produced temperature increases are commonly referred to as global warming. Natural processes can also contribute to climate change, including internal variability (e.g., cyclical ocean patterns like El Niño, La Niña and the Pacific Decadal Oscillation) and external forcings (e.g., volcanic activity, changes in the Sun’s energy output, variations in Earth’s orbit).

Scientists use observations from the ground, air and space, along with theoretical models, to monitor and study past, present and future climate change. Climate data records provide evidence of climate change key indicators, such as global land and ocean temperature increases; rising sea levels; ice loss at Earth’s poles and in mountain glaciers; frequency and severity changes in extreme weather such as hurricanes, heatwaves, wildfires, droughts, floods and precipitation; and cloud and vegetation cover changes, to name but a few. These records are classified into potential footprint (ecological consumption) and biocapacity (ecological production). We will mainly try to analyze these two categories to determine the global ecological impact recorded for each country in 2017, in this report.

2 About the Dataset

Photo by Deva Darshan from Pexels

Figure 2.1: Photo by Deva Darshan from Pexels

2.1 Summary

The ecological footprint measures the ecological assets that a given population requires to produce the natural resources it consumes (including plant-based food and fiber products, livestock and fish products, timber and other forest products, space for urban infrastructure) and to absorb its waste, especially carbon emissions.

A nation’s biocapacity represents the productivity of its ecological assets, including cropland, grazing land, forest land, fishing grounds, and built-up land. These areas, especially if left unharvested, can also absorb much of the waste we generate, especially our carbon emissions.

Both the ecological footprint and biocapacity are expressed in global hectares — globally comparable, standardized hectares with world average productivity.

If a population’s ecological footprint exceeds the region’s biocapacity, that region runs an ecological deficit. Its demand for the goods and services that its land and seas can provide — fruits and vegetables, meat, fish, wood, cotton for clothing, and carbon dioxide absorption — exceeds what the region’s ecosystems can renew. A region in ecological deficit meets demand by importing, liquidating its own ecological assets (such as overfishing), and/or emitting carbon dioxide into the atmosphere. If a region’s biocapacity exceeds its ecological footprint, it has an ecological reserve.

2.2 Glossary

biological capacity available per person (or per capita)

There were ~ 12.2 billion hectares of biologically productive land and water on Earth in 2019. Dividing by the number of people alive in that year (7.7 billion) gives 1.6 global hectares per person. This area also needs to accommodate the wild species that compete for the same biological material and spaces as humans.

biological capacity or biocapacity

The capacity of ecosystems to regenerate what people demand from those surfaces. Life, including human life, competes for space. The biocapacity of a particular surface represents its ability to regenerate what people demand. Biocapacity is therefore the ecosystems’ capacity to produce biological materials used by people and to absorb waste material generated by humans, under current management schemes and extraction technologies. Biocapacity can change from year to year due to climate, management, and also what portions are considered useful inputs to the human economy. In the National Footprint and Biocapacity Accounts, the biocapacity of an area is calculated by multiplying the actual physical area by the yield factor and the appropriate equivalence factor. Biocapacity is usually expressed in global hectares.

ecological deficit / reserve OR biocapacity deficit / reserve

The difference between the biocapacity and Ecological Footprint of a region or country. An ecological deficit occurs when the Footprint of a population exceeds the biocapacity of the area available to that population. Conversely, an ecological reserve exists when the biocapacity of a region exceeds its population’s Footprint. If there is a regional or national ecological deficit, it means that the region is importing biocapacity through trade or liquidating regional ecological assets, or emitting wastes into a global commons such as the atmosphere. In contrast to the national scale, the global ecological deficit cannot be compensated for through trade, and is therefore equal to overshoot by definition.

Ecological Footprint

A measure of how much area of biologically productive land and water an individual, population or activity requires to produce all the resources it consumes and to absorb the waste it generates, using prevailing technology and resource management practices. The Ecological Footprint is usually measured in global hectares. Because trade is global, an individual or country’s Footprint includes land or sea from all over the world. Without further specification, Ecological Footprint generally refers to the Ecological Footprint of consumption. Ecological Footprint is often referred to in short form as Footprint. “Ecological Footprint” and “Footprint” are proper nouns and thus should always be capitalized.

global hectare

Global hectares are the accounting unit for the Ecological Footprint and Biocapacity accounts. These productivity weighted biologically productive hectares allow researchers to report both the biocapacity of the earth or a region and the demand on biocapacity (the Ecological Footprint). A global hectare is a biologically productive hectare with world average biological productivity for a given year. Global hectares are needed because different land types have different productivities. A global hectare of, for example, cropland, would occupy a smaller physical area than the much less biologically productive pasture land, as more pasture would be needed to provide the same biocapacity as one hectare of cropland. Because world productivity varies slightly from year to year, the value of a global hectare may change slightly from year to year.

land or area type

The Earth’s approximately 12.2 billion hectares of biologically productive land and water areas are categorized into five types. The five area types for biocapacity that support the 6 Footprint demand types are:

  • Cropland: Cropland is the most bioproductive of all the land-use types and consists of areas used to produce food and fiber for human consumption, feed for livestock, oil crops, and rubber. Due to lack of globally consistent data sets, current cropland Footprint calculations do not yet take into account the extent to which farming techniques or unsustainable agricultural practices may cause long-term degradation of soil. The cropland Footprint includes crop products allocated to livestock and aquaculture feed mixes, and those used for fibers and materials.
  • Forest land provides for two services: The forest product Footprint, which is calculated based on the amount of lumber, pulp, timber products, and fuel wood consumed by a country on a yearly basis. It also accommodates the Carbon Footprint, which represents the carbon dioxide emissions from burning fossil fuels. The carbon Footprint also includes embodied carbon in imported goods. It is represented by the area necessary to sequester these carbon emissions. The carbon Footprint component of the Ecological Footprint is calculated as the amount of forest land needed to absorb these carbon dioxide emissions. Currently, the carbon Footprint is the largest portion of humanity’s Footprint.
  • Grazing land: Grazing land is used to raise livestock for meat, dairy, hide, and wool products. The grazing land Footprint is calculated by comparing the amount of livestock feed available in a country with the amount of feed required for all livestock in that year, with the remainder of feed demand assumed to come from grazing land.
  • Fishing grounds: The fishing grounds Footprint is calculated based on estimates of the maximum sustainable catch for a variety of fish species. These sustainable catch estimates are converted into an equivalent mass of primary production based on the various species’ trophic levels. This estimate of maximum harvestable primary production is then divided amongst the continental shelf areas of the world. Fish caught and used in aquaculture feed mixes are included.
  • Built-up land: The built-up land Footprint is calculated based on the area of land covered by human infrastructure — transportation, housing, industrial structures, and reservoirs for hydropower. Built-up land may occupy what would previously have been cropland.

3 Data Wrangling

3.1 Library

First and foremost, let’s include the possible libraries that we are going to work with. I am thinking of at least dplyr, tidyr, and glue for easier time of pre-processing the data. Also ggplot2 and plotly for helpful visualization of data.

library(dplyr)
library(glue)
library(tidyr)

library(ggplot2)
library(plotly)

3.2 Reading the Dataset

Now, let’s read the dataset. It is made available as a CSV file, so I will use read.csv function to do just that, and using footprint as the object name. As a habit, I will see the first 6 data using head function to have any expectations about the data.

footprint <- read.csv("countries.csv")
head(footprint)
##               Country                   Region Population..millions.  HDI
## 1         Afghanistan Middle East/Central Asia                 29.82 0.46
## 2             Albania  Northern/Eastern Europe                  3.16 0.73
## 3             Algeria                   Africa                 38.48 0.73
## 4              Angola                   Africa                 20.82 0.52
## 5 Antigua and Barbuda            Latin America                  0.09 0.78
## 6           Argentina            Latin America                 41.09 0.83
##   GDP.per.Capita Cropland.Footprint Grazing.Footprint Forest.Footprint
## 1        $614.66               0.30              0.20             0.08
## 2      $4,534.37               0.78              0.22             0.25
## 3      $5,430.57               0.60              0.16             0.17
## 4      $4,665.91               0.33              0.15             0.12
## 5     $13,205.10                 NA                NA               NA
## 6     $13,540.00               0.78              0.79             0.29
##   Carbon.Footprint Fish.Footprint Total.Ecological.Footprint Cropland
## 1             0.18           0.00                       0.79     0.24
## 2             0.87           0.02                       2.21     0.55
## 3             1.14           0.01                       2.12     0.24
## 4             0.20           0.09                       0.93     0.20
## 5               NA             NA                       5.38       NA
## 6             1.08           0.10                       3.14     2.64
##   Grazing.Land Forest.Land Fishing.Water Urban.Land Total.Biocapacity
## 1         0.20        0.02          0.00       0.04              0.50
## 2         0.21        0.29          0.07       0.06              1.18
## 3         0.27        0.03          0.01       0.03              0.59
## 4         1.42        0.64          0.26       0.04              2.55
## 5           NA          NA            NA         NA              0.94
## 6         1.86        0.66          1.67       0.10              6.92
##   Biocapacity.Deficit.or.Reserve Earths.Required Countries.Required
## 1                          -0.30            0.46               1.60
## 2                          -1.03            1.27               1.87
## 3                          -1.53            1.22               3.61
## 4                           1.61            0.54               0.37
## 5                          -4.44            3.11               5.70
## 6                           3.78            1.82               0.45
##   Data.Quality
## 1            6
## 2            6
## 3            5
## 4            6
## 5            2
## 6            6

Looking at the data number 5 of Antigue and Barbuda, we can see some missing values from the footprint and biocapacity details, though the totals is still usable. Keeping this in mind for possible exclusion in case we are going to compare those metrics.

3.3 Dataset Structure and Data Types

Let’s analyze the data structure of the dataset and make sure that the data types are properly used. I prefer dplyr’s glimpse function compared to base-R’s str but either one will do the trick:

glimpse(footprint)
## Rows: 188
## Columns: 21
## $ Country                        <chr> "Afghanistan", "Albania", "Algeria", "A~
## $ Region                         <chr> "Middle East/Central Asia", "Northern/E~
## $ Population..millions.          <dbl> 29.82, 3.16, 38.48, 20.82, 0.09, 41.09,~
## $ HDI                            <dbl> 0.46, 0.73, 0.73, 0.52, 0.78, 0.83, 0.7~
## $ GDP.per.Capita                 <chr> "$614.66", "$4,534.37", "$5,430.57", "$~
## $ Cropland.Footprint             <dbl> 0.30, 0.78, 0.60, 0.33, NA, 0.78, 0.74,~
## $ Grazing.Footprint              <dbl> 0.20, 0.22, 0.16, 0.15, NA, 0.79, 0.18,~
## $ Forest.Footprint               <dbl> 0.08, 0.25, 0.17, 0.12, NA, 0.29, 0.34,~
## $ Carbon.Footprint               <dbl> 0.18, 0.87, 1.14, 0.20, NA, 1.08, 0.89,~
## $ Fish.Footprint                 <dbl> 0.00, 0.02, 0.01, 0.09, NA, 0.10, 0.01,~
## $ Total.Ecological.Footprint     <dbl> 0.79, 2.21, 2.12, 0.93, 5.38, 3.14, 2.2~
## $ Cropland                       <dbl> 0.24, 0.55, 0.24, 0.20, NA, 2.64, 0.44,~
## $ Grazing.Land                   <dbl> 0.20, 0.21, 0.27, 1.42, NA, 1.86, 0.26,~
## $ Forest.Land                    <dbl> 0.02, 0.29, 0.03, 0.64, NA, 0.66, 0.10,~
## $ Fishing.Water                  <dbl> 0.00, 0.07, 0.01, 0.26, NA, 1.67, 0.02,~
## $ Urban.Land                     <dbl> 0.04, 0.06, 0.03, 0.04, NA, 0.10, 0.07,~
## $ Total.Biocapacity              <dbl> 0.50, 1.18, 0.59, 2.55, 0.94, 6.92, 0.8~
## $ Biocapacity.Deficit.or.Reserve <dbl> -0.30, -1.03, -1.53, 1.61, -4.44, 3.78,~
## $ Earths.Required                <dbl> 0.46, 1.27, 1.22, 0.54, 3.11, 1.82, 1.2~
## $ Countries.Required             <dbl> 1.60, 1.87, 3.61, 0.37, 5.70, 0.45, 2.5~
## $ Data.Quality                   <chr> "6", "6", "5", "6", "2", "6", "3B", "2"~
summary(footprint)
##    Country             Region          Population..millions.      HDI        
##  Length:188         Length:188         Min.   :   0.000      Min.   :0.3400  
##  Class :character   Class :character   1st Qu.:   2.038      1st Qu.:0.5575  
##  Mode  :character   Mode  :character   Median :   7.970      Median :0.7200  
##                                        Mean   :  37.342      Mean   :0.6864  
##                                        3rd Qu.:  24.870      3rd Qu.:0.8025  
##                                        Max.   :1408.040      Max.   :0.9400  
##                                                              NA's   :16      
##  GDP.per.Capita     Cropland.Footprint Grazing.Footprint Forest.Footprint
##  Length:188         Min.   :0.0700     Min.   :0.0000    Min.   :0.0100  
##  Class :character   1st Qu.:0.3500     1st Qu.:0.0800    1st Qu.:0.1700  
##  Mode  :character   Median :0.5200     Median :0.1800    Median :0.2600  
##                     Mean   :0.5782     Mean   :0.2632    Mean   :0.3738  
##                     3rd Qu.:0.7000     3rd Qu.:0.3200    3rd Qu.:0.4600  
##                     Max.   :2.6800     Max.   :3.4700    Max.   :3.0300  
##                     NA's   :15         NA's   :15        NA's   :15      
##  Carbon.Footprint Fish.Footprint   Total.Ecological.Footprint    Cropland     
##  Min.   : 0.000   Min.   :0.0000   Min.   : 0.420             Min.   :0.0000  
##  1st Qu.: 0.420   1st Qu.:0.0200   1st Qu.: 1.482             1st Qu.:0.1800  
##  Median : 1.140   Median :0.0700   Median : 2.740             Median :0.3500  
##  Mean   : 1.805   Mean   :0.1225   Mean   : 3.318             Mean   :0.5319  
##  3rd Qu.: 2.600   3rd Qu.:0.1500   3rd Qu.: 4.640             3rd Qu.:0.5900  
##  Max.   :12.650   Max.   :0.8200   Max.   :15.820             Max.   :5.4200  
##  NA's   :15       NA's   :15                                  NA's   :15      
##   Grazing.Land     Forest.Land     Fishing.Water       Urban.Land     
##  Min.   :0.0000   Min.   : 0.000   Min.   : 0.0000   Min.   :0.00000  
##  1st Qu.:0.0300   1st Qu.: 0.060   1st Qu.: 0.0300   1st Qu.:0.03000  
##  Median :0.1200   Median : 0.340   Median : 0.1100   Median :0.05000  
##  Mean   :0.4566   Mean   : 2.459   Mean   : 0.5951   Mean   :0.06711  
##  3rd Qu.:0.3400   3rd Qu.: 1.170   3rd Qu.: 0.3700   3rd Qu.:0.09000  
##  Max.   :8.2300   Max.   :95.160   Max.   :16.0700   Max.   :0.27000  
##  NA's   :15       NA's   :15       NA's   :15        NA's   :15       
##  Total.Biocapacity Biocapacity.Deficit.or.Reserve Earths.Required
##  Min.   :  0.050   Min.   :-14.1400               Min.   :0.240  
##  1st Qu.:  0.675   1st Qu.: -1.9350               1st Qu.:0.855  
##  Median :  1.310   Median : -0.7300               Median :1.580  
##  Mean   :  4.020   Mean   :  0.7021               Mean   :1.916  
##  3rd Qu.:  2.815   3rd Qu.:  0.2125               3rd Qu.:2.678  
##  Max.   :111.350   Max.   :109.0100               Max.   :9.140  
##                                                                  
##  Countries.Required Data.Quality      
##  Min.   :  0.0200   Length:188        
##  1st Qu.:  0.9425   Class :character  
##  Median :  1.7050   Mode  :character  
##  Mean   :  4.0374                     
##  3rd Qu.:  2.8475                     
##  Max.   :159.4700                     
## 
Few points that I would like to highlight or improve from this data:
  1. I might convert the Region and Data Quality columns into factors since those columns seem to be categorical or have so many repeating values.
  2. Column GDP.Per.Capita seem to be more useful if I can use it as a number. Therefore I need to at least get rid of the dollar sign, and possibly the thousand separator for better chance of using that data.
  3. Column Data.Quality seem to classify the data based on its completeness. I assumed so based on, again, the 5th data of Antigua and Barbuda which has 2 as Data Quality. Perhaps low number of Data Quality equals less complete data? I can use this to filter the data with missing values, but I would need to verify this.
  4. Column Total.Ecological.Footprint seem to be a simple sum of the 5 columns ending in Footprint just before it. I will need to verify this to avoid some redundancy.
  5. Similar to above, column Total.Biocapacity seem to be a simple sum of the 5 columns just before it and have similar namings with the assumed components of Footprint from the previous point.
  6. Column Biocapacity.Deficit.or.Reserve seem to be a simple subtraction of Total.Ecological.Footprint from the value of Total.Biocapacity. Need to make sure of that.

3.3.1 Converting Some Columns into Factors

Checking if the Region and Data.Quality columns are having quite low number of unique values to convert them into factors.

unique(footprint$Region)
## [1] "Middle East/Central Asia" "Northern/Eastern Europe" 
## [3] "Africa"                   "Latin America"           
## [5] "Asia-Pacific"             "European Union"          
## [7] "North America"
unique(footprint$Data.Quality)
## [1] "6"  "5"  "2"  "3B" "3L" "3T" "4"

Both of them have only 7 unique values, compared to the whole set of database that totals in 188 in row count. Therefore, I can safely conclude that we can convert both of the columns into factors.

footprint <- footprint %>% 
  mutate(
    Region = as.factor(Region),
    Data.Quality = as.factor(Data.Quality)
  )

glimpse(footprint$Region)
##  Factor w/ 7 levels "Africa","Asia-Pacific",..: 5 7 1 1 4 4 5 4 2 3 ...
glimpse(footprint$Data.Quality)
##  Factor w/ 7 levels "2","3B","3L",..: 7 7 6 7 1 7 2 1 6 6 ...

To confirm, I have checked them using glimpse function as shown above.

3.3.2 Converting Dollar Values into Numeric

To do this, I think we can use gsub function to replace all dollar sign and comma as thousand separator occurrences inside the GDP.per.Capita column, to be able to convert it into a usable numeric column.

Then, making sure that it is correct by checking the first 6 data and its data type using head and glimpse functions, respectively.

footprint <- footprint %>% 
  mutate(
    GDP.per.Capita = as.numeric(gsub("[\\$,]","",GDP.per.Capita))
  )

head(footprint$GDP.per.Capita)
## [1]   614.66  4534.37  5430.57  4665.91 13205.10 13540.00
glimpse(footprint$GDP.per.Capita)
##  num [1:188] 615 4534 5431 4666 13205 ...

3.3.3 Data.Quality = Data Completeness?

Is Data.Quality column reflecting data completeness of the whole row? Since we know that some of components are missing or even literally having “NA” as its values, we need to count missing values within each rows and compare it to the Data Quality values. Let’s check using a combination of rowSums and is.na functions to calculate the missing values of each row, then arrange it descendingly from both columns of NA.Count and Data.Quality.

footprint %>% 
  mutate(Total.NA.Count = rowSums(is.na(footprint) | footprint == "")) %>% 
  arrange(desc(Total.NA.Count), desc(Data.Quality)) %>% 
  head(10)
##                      Country                  Region Population..millions.  HDI
## 1     British Virgin Islands           Latin America                  0.03   NA
## 2  Wallis and Futuna Islands            Asia-Pacific                  0.01   NA
## 3                      Aruba           Latin America                  0.10   NA
## 4                 Montserrat           Latin America                  0.00   NA
## 5                      Nauru            Asia-Pacific                  0.01   NA
## 6                    Bermuda           North America                  0.06   NA
## 7                     Norway Northern/Eastern Europe                  4.99 0.94
## 8                 Cabo Verde                  Africa                  0.49 0.64
## 9                   Cambodia            Asia-Pacific                 14.86 0.55
## 10                   Estonia          European Union                  1.29 0.85
##    GDP.per.Capita Cropland.Footprint Grazing.Footprint Forest.Footprint
## 1              NA                 NA                NA               NA
## 2              NA                 NA                NA               NA
## 3              NA                 NA                NA               NA
## 4              NA                 NA                NA               NA
## 5              NA                 NA                NA               NA
## 6        70626.30                 NA                NA               NA
## 7       100172.00                 NA                NA               NA
## 8         3801.45                 NA                NA               NA
## 9          877.64                 NA                NA               NA
## 10       17304.40                 NA                NA               NA
##    Carbon.Footprint Fish.Footprint Total.Ecological.Footprint Cropland
## 1                NA             NA                       2.86       NA
## 2                NA             NA                       2.07       NA
## 3                NA             NA                      11.88       NA
## 4                NA             NA                       7.78       NA
## 5                NA             NA                       2.94       NA
## 6                NA             NA                       5.77       NA
## 7                NA             NA                       4.98       NA
## 8                NA             NA                       2.52       NA
## 9                NA             NA                       1.21       NA
## 10               NA             NA                       6.86       NA
##    Grazing.Land Forest.Land Fishing.Water Urban.Land Total.Biocapacity
## 1            NA          NA            NA         NA              2.05
## 2            NA          NA            NA         NA              1.51
## 3            NA          NA            NA         NA              0.57
## 4            NA          NA            NA         NA              1.36
## 5            NA          NA            NA         NA              0.19
## 6            NA          NA            NA         NA              0.13
## 7            NA          NA            NA         NA              8.18
## 8            NA          NA            NA         NA              0.62
## 9            NA          NA            NA         NA              1.09
## 10           NA          NA            NA         NA             10.53
##    Biocapacity.Deficit.or.Reserve Earths.Required Countries.Required
## 1                           -0.81            1.65               1.40
## 2                           -0.56            1.19               1.37
## 3                          -11.31            6.86              20.69
## 4                           -6.42            4.49               5.71
## 5                           -2.76            1.70              15.83
## 6                           -5.64            3.33              44.05
## 7                            3.19            2.88               0.61
## 8                           -1.90            1.46               4.06
## 9                           -0.11            0.70               1.11
## 10                           3.67            3.96               0.65
##    Data.Quality Total.NA.Count
## 1            3T             12
## 2            3T             12
## 3             2             12
## 4             2             12
## 5             2             12
## 6            3T             11
## 7             4             10
## 8            3T             10
## 9            3T             10
## 10           3T             10

Seems like we can’t be sure about using Data.Quality as an adequate column to use to filter for data completeness, since some Data Quality of ‘3’ and ‘4’ has around 10+ missing values, while similar others are quite complete. Other noticeable things are that the 10 missing values are coming from each ‘suspected’ 5 components before each columns of Total.Ecological.Footprint and Total.Biocapacity, while 2 other columns of HDI and GDP.per.Capita. That’s the total of up to 12 missing values.

3.3.4 Verifying Information Redundancy

Rather than being unclear or unsure about the usage of Data.Quality column, I will focus on the completeness of the data, therefore creating 3 more columns to represent the missing values count of the components and the others.

footprint <- footprint %>% 
  mutate(Total.NA.Count = rowSums(is.na(footprint) | footprint == ""),
         Components.NA.Count = rowSums(is.na(footprint[,c(6:10,12:16)]) | footprint[,c(6:10,12:16)] == "")) %>% 
  mutate(Other.NA.Count = Total.NA.Count - Components.NA.Count)

head(footprint)
##               Country                   Region Population..millions.  HDI
## 1         Afghanistan Middle East/Central Asia                 29.82 0.46
## 2             Albania  Northern/Eastern Europe                  3.16 0.73
## 3             Algeria                   Africa                 38.48 0.73
## 4              Angola                   Africa                 20.82 0.52
## 5 Antigua and Barbuda            Latin America                  0.09 0.78
## 6           Argentina            Latin America                 41.09 0.83
##   GDP.per.Capita Cropland.Footprint Grazing.Footprint Forest.Footprint
## 1         614.66               0.30              0.20             0.08
## 2        4534.37               0.78              0.22             0.25
## 3        5430.57               0.60              0.16             0.17
## 4        4665.91               0.33              0.15             0.12
## 5       13205.10                 NA                NA               NA
## 6       13540.00               0.78              0.79             0.29
##   Carbon.Footprint Fish.Footprint Total.Ecological.Footprint Cropland
## 1             0.18           0.00                       0.79     0.24
## 2             0.87           0.02                       2.21     0.55
## 3             1.14           0.01                       2.12     0.24
## 4             0.20           0.09                       0.93     0.20
## 5               NA             NA                       5.38       NA
## 6             1.08           0.10                       3.14     2.64
##   Grazing.Land Forest.Land Fishing.Water Urban.Land Total.Biocapacity
## 1         0.20        0.02          0.00       0.04              0.50
## 2         0.21        0.29          0.07       0.06              1.18
## 3         0.27        0.03          0.01       0.03              0.59
## 4         1.42        0.64          0.26       0.04              2.55
## 5           NA          NA            NA         NA              0.94
## 6         1.86        0.66          1.67       0.10              6.92
##   Biocapacity.Deficit.or.Reserve Earths.Required Countries.Required
## 1                          -0.30            0.46               1.60
## 2                          -1.03            1.27               1.87
## 3                          -1.53            1.22               3.61
## 4                           1.61            0.54               0.37
## 5                          -4.44            3.11               5.70
## 6                           3.78            1.82               0.45
##   Data.Quality Total.NA.Count Components.NA.Count Other.NA.Count
## 1            6              0                   0              0
## 2            6              0                   0              0
## 3            5              0                   0              0
## 4            6              0                   0              0
## 5            2             10                  10              0
## 6            6              0                   0              0
After that is done, we can move on to verify if the columns are representing the data as following assumptions, since the dataset provider isn’t giving us enough metadata about this:
  1. Total.Footprint is the sum of 5 columns prior to it.
  2. Total.Biocapacity is the sum of 5 columns prior to it.
  3. Earths.Required is the ratio of the country’s footprint against the average of the all countries’ biocapacity (or some other coefficient)
  4. Countries.Required is the ratio of average of the country’s footprint against its own biocapacity to fulfill the need of nature (or some other coefficient).

We would need to filter out the rows/country data with missing component values on them.

footprint %>% 
  filter(Components.NA.Count == 0) %>%
  mutate(Test.Total.Footprint = Cropland.Footprint + Grazing.Footprint + Forest.Footprint + Carbon.Footprint + Fish.Footprint,
         Test.Total.Biocapacity = Cropland + Grazing.Land + Forest.Land + Fishing.Water + Urban.Land,
         # Test.Earths.Required = round(Total.Ecological.Footprint / median(footprint$Total.Biocapacity),2),
         Test.Earths.Required = round(Total.Ecological.Footprint / (sum(Total.Biocapacity * Population..millions.)/sum(Population..millions.)),2),
         Test.Countries.Required = round(Total.Ecological.Footprint / Total.Biocapacity,2)) %>% 
  mutate(Diff.Total.Footprint = abs(Total.Ecological.Footprint-Test.Total.Footprint)/Total.Ecological.Footprint,
         Diff.Total.Biocapacity = abs(Total.Biocapacity-Test.Total.Biocapacity)/Total.Biocapacity,
         Diff.Earths.Required = abs(Earths.Required-Test.Earths.Required)/Earths.Required,
         Diff.Countries.Required = abs(Countries.Required-Test.Countries.Required)/Countries.Required) %>% 
  summarise(mean_fp = mean(Diff.Total.Footprint),
            mean_bc = mean(Diff.Total.Biocapacity),
            mean_er = mean(Diff.Earths.Required),
            mean_cr = mean(Diff.Countries.Required)
            )
##     mean_fp     mean_bc    mean_er    mean_cr
## 1 0.0287381 0.006181684 0.01716617 0.00335664

Since the mean/average of the difference of the real columns and the assumed formula are close to 0, I think we can safely conclude that the assumed formulas are correct.

4 Data Exploration and Insights

4.1 Footprint / Consumption

hist(footprint$Total.Ecological.Footprint, xlab = "Footprint in Global Hectare (gha)", main = "Spread of Ecological Footprint")

summary(footprint$Total.Ecological.Footprint)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.420   1.482   2.740   3.318   4.640  15.820

It seems that the Ecological Footprint does not have a normal distribution. From the histogram, we can see that it is very skewed to the left, having 0-5 gha. This should be good for the world rather than having most countries on the heavier side.

From the summary statistics, we can see that although the central of data - represented by the median and mean - is placed around 2-3 gha, the maximum value is highly distinguishable at 15.82 gha.

Let’s see which countries are consuming the most of the global’s ecological resources.

footprint_plot1 <- footprint %>%
  arrange(desc(Total.Ecological.Footprint)) %>%
  head(10) %>% 
  mutate(label = glue("{Country}
                      Region = {Region}
                      Footprint = {Total.Ecological.Footprint} gha
                      Population = {Population..millions.} mil")) %>% 
  ggplot(mapping = aes(x=Total.Ecological.Footprint, y=reorder(Country, Total.Ecological.Footprint), text=label, fill = Total.Ecological.Footprint)) + 
  scale_fill_gradient(low = "black", high = "red")+
  geom_col() +
  labs(x="Footprint in Global Hectare (gha)",
       y="Country Name",
       title = "Top 10 Footprint Contributor",
       fill = "Footprint Value (gha)")

ggplotly(footprint_plot1, tooltip="label")

I assumed the top ones it would be most known countries for their technological growth and/or land areas, like USA, China, India, and Japan. Though USA actually made it in the Top 10, apparently other unexpected countries are actually more contributing to leave ecological footprint globally, that is Luxembourg, Aruba, Qatar, and Australia. Singapore, our neighboring country apparently made quite a “contribution” too, putting them in Top 10. Region-wise, it’s also quite spread. I would expect Europe or North America to have more countries in the Top 10, but apparently not.

4.2 Biocapacity

hist(footprint$Total.Biocapacity, xlab = "Biocapacity in Global Hectare per Person (gha/person)", main = "Spread of Ecological Biocapacity")

summary(footprint$Total.Biocapacity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.050   0.675   1.310   4.020   2.815 111.350

Apparently the biocapacity of the countries are heavily on the lower side, within 0-10 gha. We can see that there are some small numbers of outliers on the histogram, also seeing from the summary statistics that the highest value are so far away in the 100+ gha, while the median is only at 1.31 gha. As we want to live more in this Earth, we should try to make the number go bigger rather than the ecological footprint, to ensure that we are not using too much resources than we are producing.

footprint_plot2 <- footprint %>%
  arrange(desc(Total.Biocapacity)) %>%
  head(10) %>% 
  mutate(label = glue("{Country}
                      Region = {Region}
                      Biocapacity = {Total.Biocapacity} gha
                      Population = {Population..millions.} mil")) %>% 
  ggplot(mapping = aes(x=Total.Biocapacity, y=reorder(Country, Total.Biocapacity), text = label, fill = Total.Biocapacity)) + 
  geom_col() +
  labs(x="Biocapacity in Global Hectare (gha)",
       y="Country Name",
       title = "Top 10 Biocapacity Producer",
       fill = "Biocapacity Value (gha)")

ggplotly(footprint_plot2, tooltip="label")

Most countries making it to this list are not very well-known, and are originating from Latin America or Africa area. This is maybe due to less populated areas, therefore less development are being done, sacrificing less natural resources. We can also note Australia and Canada, that were also countries in the Top 10 as ecological footprint contributor, are also in this list, therefore we can expect those countries to be quite balanced.

##Footprint vs Biocapacity We have a data calculating if each country is actually creating a deficit or reserve in the ecological resources, by simply subtracting each country’s footprint to its biocapacity. Let’s see the top 10 countries and worst 10 countries in this regard.

footprint_plot3 <- footprint %>%
  arrange(desc(Biocapacity.Deficit.or.Reserve)) %>%
  head(10) %>% 
  mutate(label = glue("{Country}
                      Region = {Region}
                      Biocapacity-Footprint = {Biocapacity.Deficit.or.Reserve} gha
                      Population = {Population..millions.} mil")) %>% 
  ggplot(mapping = aes(x=Biocapacity.Deficit.or.Reserve, y=reorder(Country, Biocapacity.Deficit.or.Reserve), text = label, fill = Biocapacity.Deficit.or.Reserve)) + 
  geom_col() +
  labs(x="Deficit/Reserve in Global Hectare per Person(gha)",
       y="Country Name",
       title = "Top 10 Global Ecological Preserver",
       fill = "Ecological Reserve (gha)")

ggplotly(footprint_plot3, tooltip="label")

Most countries in this list are very similar with the top 10 biocapacity contributor, that we hope that this is due a green initiatives, but this once again could also be due to low number of population causing the not much development happening in the country. It could be helpful to learn the top 5 countries actions that is causing quite a higher number of biocapacity and lower number of footprint.

Few notable countries here is Canada and Bolivia, both have rather a higher number of population in this list, but then again actually help to preserve the Earth.

footprint_plot4 <- footprint %>%
  arrange(Biocapacity.Deficit.or.Reserve) %>%
  head(10) %>% 
  mutate(label = glue("{Country}
                      Region = {Region}
                      Biocapacity-Footprint = {Biocapacity.Deficit.or.Reserve} gha/person
                      Population = {Population..millions.} mil")) %>% 
  ggplot(mapping = aes(x=Biocapacity.Deficit.or.Reserve, y=reorder(Country, Biocapacity.Deficit.or.Reserve), text = label, fill = Biocapacity.Deficit.or.Reserve)) + 
  geom_col() +
  scale_fill_gradient2(high = "black", low = "red")+
  labs(x="Deficit/Reserve in Global Hectare per Person(gha/person)",
       y="Country Name",
       title = "Worst 10 Global Ecological Preserver",
       fill = "Ecological Deficit (gha)")

ggplotly(footprint_plot4, tooltip="label")

Continuing the insight from before, the list is very similar with the Top 10 Footprint Contributor. The regions are quite spread. These countries need to do more green initiative and hold back developments that sacrificed the environment.

4.3 Region Groups

footprint_plot5 <- footprint %>% 
  ggplot(mapping = aes(y=Region, x=Total.Ecological.Footprint)) +
  geom_boxplot(fill = "red") +
  labs(title = "Ecological Footprint in Regions",
       x = "Footprint (gha)",
       y = "Region Name")

footprint_plot5
  • Highest median of Footprint value is seen on North America region (around 7-8 gha), while the lowest can be seen in Africa (around 1-2 gha).
  • Highest individual country can be seen as an outlier in European Union region (around 15-16 gha).
  • We can see not so much outliers, therefore we can safely say that the data is spread quite normally within the lower-upper fence / tolerance of the interquartile ranges.
  • Judging from the interquartile range, the data in Middle East / Central Asia having longer range can be said to be quite varied, meanwhile countries data in Africa and North America are less varied.
  • The data in North America seem to be skewed to right based on the median position inside the interquartile range, showing that the values within the region are spread towards the higher side. Meanwhile it’s rather on the lower side for Africa, Latin America, Middle East, and Northern/Eastern Europe.
footprint_plot6 <- footprint %>% 
  ggplot(mapping = aes(y=Region, x=Total.Biocapacity)) +
  geom_boxplot(fill = "cyan")

footprint_plot6

  • Highest median of Biocapacity value is seen on North America region, while the lowest is not too clear within the other regions.
  • Highest individual country values can be seen as at least 3 outliers in Latin America region (more than 60 gha).
  • Judging from the interquartile range, the Biocapacity data in North America can be said to be quite varied, meanwhile other regions seem to be much less varied.
footprint_region <- footprint %>% 
  filter(Other.NA.Count==0) %>% 
  group_by(Region) %>% 
  summarise(Mean.HDI = mean(HDI),
            Sum.Pop.Mil = sum(Population..millions.),
            Sum.Footprint = -1*sum(Total.Ecological.Footprint),
            Sum.Biocap = sum(Total.Biocapacity),
            Sum.Diff = sum(Biocapacity.Deficit.or.Reserve)
            ) %>% 
  mutate(label = glue("{Region}
                      Ecology Deficit/Reserve = {Sum.Diff} gha
                      Total Footprint = {Sum.Footprint} gha
                      Total Biocapacity = {Sum.Biocap} gha
                      Average HDI = {round(Mean.HDI,2)}
                      Total Population = {Sum.Pop.Mil} Mil
                      "))

footprint_region %>% select(-label)
## # A tibble: 7 x 6
##   Region                  Mean.HDI Sum.Pop.Mil Sum.Footprint Sum.Biocap Sum.Diff
##   <fct>                      <dbl>       <dbl>         <dbl>      <dbl>    <dbl>
## 1 Africa                     0.516       1004.         -80.7      114.     33.7 
## 2 Asia-Pacific               0.688       3855.         -83.5       82.0    -1.56
## 3 European Union             0.865        504.        -142.        94.9   -47.5 
## 4 Latin America              0.720        604.        -100.       256.    156.  
## 5 Middle East/Central As~    0.732        384.         -91.7       21.8   -70.0 
## 6 North America              0.91         352.         -16.4       19.8     3.37
## 7 Northern/Eastern Europe    0.788        238.         -45.2       34.6   -10.6
footprint_region2 <- footprint_region %>% 
  pivot_longer(cols = c("Sum.Footprint", "Sum.Biocap"))

footprint_region2[footprint_region2$name =="Sum.Biocap", ]$name <- "Biocapacity" 
footprint_region2[footprint_region2$name =="Sum.Footprint", ]$name <- "Footprint" 

footprint_region_plot <- footprint_region2 %>% 
  ggplot(mapping = aes(x=value, y=reorder(Region,Sum.Diff), fill=name, text=label))+
  geom_bar(position = "stack", stat = "identity")+
  labs(title = "Regions Sorted by Ecological Contribution",
       y="Region Name",
       x="Global Hectare (gha)",
       fill = "Type of Contribution")

ggplotly(footprint_region_plot, tooltip="label")

As we have seen from the Top 10 and Worst 10 in previous explorations, Latin America and Africa are the top 2 regions having positive contributions towards the Earth’s ecological reserve. North America is only slightly on the positive side, while the other 4 are on negative side, causing deficit. European Union and Middle East/Central Asia are the worst 2 regions comparatively.

4.4 Correlations

4.4.1 Between 2 Main Metrics: Footprint and Biocapacity

plot(footprint$Total.Ecological.Footprint, footprint$Total.Biocapacity, main = "Footprint vs Biocapacity in Each Country", xlab = "Footprint (gha)", ylab = "Biocapacity (gha)")
abline(lm(footprint$Total.Biocapacity~footprint$Total.Ecological.Footprint), col="red")

cor(footprint$Total.Ecological.Footprint, footprint$Total.Biocapacity)
## [1] 0.06658034

We can see that there is almost a flat line of the trend line, and it is even shown in the correlation value of 0.07, which shows a positive correlation, but it’s very weak. Therefore we can safely say that these two metrics do not correlate to each other. Other insight that we might see is that the points are centralized towards the lower values of biocapacity compared to the footprint, seen from the points spread close to X-axis above in the plot.

4.4.2 Including Population

footprint_pophdi_notna <- footprint %>% 
  filter(Other.NA.Count == 0)

plot(footprint_pophdi_notna$Total.Ecological.Footprint, footprint_pophdi_notna$Population..millions., main = "Footprint vs Population in Each Country", xlab = "Footprint (gha)", ylab = "Population (million)")
abline(lm(footprint_pophdi_notna$Population..millions. ~ footprint_pophdi_notna$Total.Ecological.Footprint), col="red")

cor(footprint_pophdi_notna$Total.Ecological.Footprint, footprint_pophdi_notna$Population..millions.)
## [1] -0.05402079

Though it seems to be a negative ones, Population also seem to have a very weak correlation with the Footprint of a country, shown by an almost flat line and a negative correlation value that is close to 0. Looking at the spread, we can see a rather low number of Population, shown by the data points are spread heavily towards X-axis.

plot(footprint_pophdi_notna$Total.Biocapacity, footprint_pophdi_notna$Population..millions., main = "Biocapacity vs Population in Each Country", xlab = "Biocapacity (gha)", ylab = "Population (mil)")
abline(lm(footprint_pophdi_notna$Population..millions. ~ footprint_pophdi_notna$Total.Biocapacity), col="blue")

cor(footprint_pophdi_notna$Total.Biocapacity, footprint_pophdi_notna$Population..millions.)
## [1] -0.05743381

Same as the Footprint, we can see a similar negative and very weak correlation between Biocapacity and Population.

4.4.3 Including Human Development Index (HDI)

plot(footprint_pophdi_notna$Total.Ecological.Footprint, footprint_pophdi_notna$HDI, main = "Footprint vs Human Development Index (HDI) in Each Country", xlab = "Footprint (gha)", ylab = "HDI Index (0.0-1.0)")
abline(lm(footprint_pophdi_notna$HDI ~ footprint_pophdi_notna$Total.Ecological.Footprint), col="red")

cor(footprint_pophdi_notna$Total.Ecological.Footprint, footprint_pophdi_notna$HDI)
## [1] 0.7388287

I might expect this one to have a better chance of a strong positive correlation, since a higher HDI could be interpreted as a better and larger development for the people, therefore, more Ecological Footprint will be produced, and apparently it is so. We can see a positive and quite strong correlation between HDI and Ecological Footprint of a country, resulting in a 0.7388 correlation value. Though a linear model trend line might not be suitable for this correlation, since we can kind of imagine a curving line to represent the data better, but it’s quite a good fit. Therefore we can safely say that an increase of HDI will result in an increase of Footprint, and vice versa.

Meanwhile about the data spread, most countries seems to fall within the lower area of Footprint, around 0-5 gha.

plot(footprint_pophdi_notna$Total.Biocapacity, footprint_pophdi_notna$HDI, main = "Biocapacity vs Human Development Index (HDI) in Each Country", xlab = "Biocapacity (gha)", ylab = "HDI Index (0.0-1.0)")
abline(lm(footprint_pophdi_notna$HDI ~ footprint_pophdi_notna$Total.Biocapacity), col="blue")

cor(footprint_pophdi_notna$Total.Biocapacity, footprint_pophdi_notna$HDI)
## [1] 0.07693505

From the scatter plot, we can see a positive yet very weak correlation between HDI and Biocapacity. We can also say that the most data are once again spread in the lower area of Biocapacity, since the data points are spread close to the Y-axis.

5 Conclusions

  1. Looking globally, the Ecological Footprint and Biocapacity values of each country are skewed to the lower valued area.
  2. Australia and Canada are both consuming and producing significant ecological contributions, since both are showing in the Top 10 list of Footprint and Biocapacity Contributors.
  3. Regions of Latin America and Africa are producing quite a large number of Ecological Biocapacity compared to the rest.
  4. There is a strong, positive correlation found between Human Development Index (HDI) and Ecological Footprint of each country.
  5. Most countries were recorded to have rather lower numbers of Biocapacity instead of Footprint, excluding some extreme cases. This needs to be improved so that we can avoid global warming and extreme climate change that seemed to be happening.

6 References and About Me

6.2 About Me

Hi! My name is Calvin, I am from Jakarta, Indonesia. I am looking forward to be a full-time data analyst and/or data scientist. I have a background in Mathematics and Computer Science from my Bachelor’s Degrees, and I love playing with numbers and data. I am doing this to enhance my Data Science portfolio (constructive criticism is very much welcomed!), also as part of Learn-By-Building assignment at Algoritma Data Science School.

You can reach me at my LinkedIn for more discussion. Thank you!