Introduction

The Ecological footprint collects data on the ecological assets that a given population requires to produce the natural resources it consumes (including plant-based food and fiber products, livestock and fish products, timber and other forest products, space for urban infrastructure) and to absorb its waste, especially carbon emissions.

This is an Exploratory Data Analysis on the Dataset.

I would like to find answers with questions such as: Which countries ranks highest in terms of Ecological Footprints and Total Biocapacities? We would like to look for some trends with regards to Population and Ecological footprints, Total Biocapacity within each Region corresponding to it’s individual HDI indexes.

This is a Work In Progress. I hope you like it.

Read the Data

Check our Data Variables

Country and Total Ecological Footprint

Country and Ecological footprint

We examine the top countries which rank highest in terms of ecological footprint

country %>%
  group_by(Country) %>%
  summarise(EcoFootprintMedian = median(`Total Ecological Footprint`, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(Country = reorder(Country,EcoFootprintMedian)) %>%
  arrange(desc(EcoFootprintMedian)) %>%
  head(20) %>%
  
  
  ggplot(aes(x = Country,y = EcoFootprintMedian, fill = Country)) +
  geom_bar(stat='identity', color = "white") +
  geom_text(aes(x = Country, y = 1, label = paste0("(",EcoFootprintMedian,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'blue',
            fontface = 'bold') +
  labs(x = 'countries', 
       y = 'Median Eco Footprint', 
       title = 'Countries With Highest EcoFootprint') +
  coord_flip() + 
  theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + theme(legend.position = "")

Country and Total Biocapacity

We examine the top countries which rank highest in terms of Biocapacity

country %>%
  group_by(Country) %>%
  summarise(BiocapacityMedian = median(`Total Biocapacity`, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(Country = reorder(Country,BiocapacityMedian)) %>%
  arrange(desc(BiocapacityMedian)) %>%
  head(20) %>%
  
  
  ggplot(aes(x = Country,y = BiocapacityMedian, fill = Country)) +
  geom_bar(stat='identity', color = "white") +
  geom_text(aes(x = Country, y = 1, label = paste0("(",BiocapacityMedian,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'blue',
            fontface = 'bold') +
  labs(x = 'countries', 
       y = 'Median Biocapacities', 
       title = 'Countries With Highest Biocapacities') +
  coord_flip() + 
  theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + theme(legend.position = "")

HDI , Region , Ecofootprint and Population

HDI , Region , Total Ecological Footprint and Population

We will explore relationships between Population and Total Ecological Footprint within each Region and it’s HDI.

by_region <- country %>%
  group_by(HDI, Region) %>%
  summarize(EcofootprintMedian = median(`Total Ecological Footprint`),
            PopMedian = median(`Population (millions)`))

# Plot the change in EcofootprintMedian in each Region over HDI
ggplot(by_region, aes(x = HDI, y = EcofootprintMedian, color = Region, size = PopMedian)) +
  geom_point() + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank())

HDI , Region , Biocapacity and Population

We will explore relationships between Population and Biocapacity within each Region and it’s HDI.

by_region <- country %>%
  group_by(HDI, Region) %>%
  summarize(BiocapacityMedian = median(`Total Biocapacity`),
            MedianPop = median(`Population (millions)`))

# Plot the change in medianGdpPercap in each continent over time
ggplot(by_region, aes(x = HDI, y = BiocapacityMedian, color = Region, size = MedianPop)) +
  geom_point() + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank())

Comparing Total Biocapacities across Regions

Comparing total biocapacities across Regions

I would like to compare total biocapacities across Regions

ggplot(country, aes(x = Region, y = `Total Biocapacity`, color = Region)) +
  geom_boxplot(fill = colors) + scale_y_log10()+ ggtitle("Comparing Total Biocapacity across Regions") + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + theme(legend.position = "") + geom_jitter()

Comparing Total Ecofootprints across Regions

I would like to compare total ecological footprints across Regions

ggplot(country, aes(x = Region, y = `Total Ecological Footprint`, color = Region)) +
  geom_boxplot(fill = colors) + scale_y_log10()+ ggtitle("Comparing Total Ecofootprints across Regions") + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + theme(legend.position = "") + geom_jitter()

Comparing GDP per Capita across Regions

I would like to compare GDP per capita across Regions

ggplot(country, aes(x = Region, y = `GDP per Capita`, color = Region)) +
  geom_boxplot(fill = colors) + ggtitle("Comparing GDP per Capita across Regions") + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + theme(legend.position = "") + geom_jitter() + scale_y_log10()

HDI , Region , GDP per Capita and Population

We will explore relationships between Population and GDP per Capita within each Region and it’s HDI.

by_region <- country %>%
  group_by(HDI, Region) %>%
  summarize(GDPperCapitaMedian = median(`GDP per Capita`),
            MedianPop = median(`Population (millions)`))

# Plot the change in medianGdpPercap 
ggplot(by_region, aes(x = HDI, y = GDPperCapitaMedian, color = Region, size = MedianPop)) +
  geom_point() + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank())

Distribution of Different Variables

Distrubution of Population

ggplot(country, aes(`Population (millions)`)) + geom_histogram(alpha = 0.8, fill = "orange2") + scale_x_log10() + 
  
  labs(x= 'Population',y = 'Count', title = paste("Distribution of", ' Population ')) + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + geom_vline(aes(xintercept = mean(country$`Population (millions)`)), 
             col = "blue",
             size = 1.5) 

Distribution of GDP per Capita

ggplot(country, aes(`GDP per Capita`)) + geom_histogram(alpha = 0.8, fill = "orange2") +  
  
  labs(x= 'GDP per Capita',y = 'Count', title = paste("Distribution of", ' GDP Per Capita ')) + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) 

Distribution of Total Ecofootprint

ggplot(country, aes(`Total Ecological Footprint`)) + geom_histogram(alpha = 0.8, fill = "yellow2") + scale_x_log10() + 
  
  labs(x= 'Total Ecofootprint',y = 'Count', title = paste("Distribution of", ' Total Ecofootprint ')) + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + geom_vline(aes(xintercept = mean(country$`Total Ecological Footprint`)), 
             col = "blue",
             size = 1.5) 

Distribution of Total Biocapacity

ggplot(country, aes(`Total Biocapacity`)) + geom_histogram(alpha = 0.8, fill = "yellow2") + scale_x_log10() + 
  
  labs(x= 'Total Biocapacity',y = 'Count', title = paste("Distribution of", ' Total Biocapacity ')) + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=8),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + geom_vline(aes(xintercept = mean(country$`Total Biocapacity`)), 
             col = "blue",
             size = 1.5) 

Distribution of Region and Different Ecofootprint variables

Distrubution of Each Eco-Footprint variable

ggplot(country1, aes(x = HDI ,`Total Ecological Footprint` , color = Region, size = Area)) +
  geom_point() +
  facet_wrap(~Footprint) + scale_y_log10() + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=12),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + labs(title = "Distribution of  Each Eco-footprint") + theme(legend.position = "bottom")

Our faceted plots show above the distribution of each Eco-footprint per Total Ecological Footprint Vs. Human Development Index(HDI) and see strong relationship between both variables. The Regions are shown in colors and the area shown as sizes. The regions in European Union, Middle East/Central Asia and North America have high total ecological footprints in all 5 footprints and at the same time rank high in HDI. From our plot, we can say that Regions that are least developed(Africa, Asia-Pacific, Middle East/Central Asia) have lower land areas with regards to each individual footprint. Something has to be done here, I guess, in order to improve these numbers.

Distrubution of Each Biocapacity variable

ggplot(country1, aes(x = HDI , `Total Biocapacity`, color = Region, size = Area)) +
  geom_point() +
  facet_wrap(~Biocapacity) + scale_y_log10() + theme(axis.text.x = element_text(angle=90),plot.title  =element_text(size=12),panel.background = element_rect(fill="black"),
                         panel.grid.major = element_blank(),
                         panel.grid.minor=element_blank()) + labs(title = "Distribution of Each Biocapacity") + theme(legend.position = "bottom")

Our faceted plots show above the distribution of each Eco-footprint per Total Biocapacity Vs. Human Development Index(HDI) and see a weak relationship between both variables. The Regions are shown in colors and the area shown as sizes. We can still see regions in European Union, Middle East/Central Asia and North America rank high in HDI but they are joined with Asia-Pacific Regions which also has more land area and rank high in HDI but has low total biocapacity. Africa Regions still ranks below HDI but has almost equal total biocapacities with advanced regions.