Introduction

The Global Deforestation dataset is retrieved from TidyTuesday originally acquired from OurWorldInData.org(Hannah Ritchie and Max Roser, 2021). The purpose of this project is to take a close look at the Global Deforestation trends covering years from 1990 to 2015. It includes four datasets titled as forest, forest_area, brazil_loss, and soybean_use. The Forest dataset includes change every 5 years for forest area in conversion alongside country and country code, and year. Forest_area includes Change in global forest area expressed as a percentage of total global forest area. Brazil_loss dataset contains Brazilian forest loss as a result of particular types such as flooding, mining, pasture and others. Soybean_use includes soybean production and use by year and country.

# Loading the packages
library(RColorBrewer)
library(modelsummary)
library(plotly)
library(gganimate)
library(ggplot2)
library(animation)
library(data.table)
library(plotly)
library(kableExtra)
library(ggthemes)
library(readr)
library(stringr)
library(tidyr)
library(dplyr)

# Downloading the datasets
forest <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/forest.csv')
forest_area <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/forest_area.csv')
brazil_loss <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/brazil_loss.csv')
soybean_use <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/soybean_use.csv')

Analysis

The following table demonstrates that although deforestation had a declining tendency up until 2015, the average hectare of forests being cut grew to 65,900 hectares in that year. I want to examine which countries contribute the most to deforestation and what the factors are causing this increasing trend.

forest_summaries <- forest[, .(
  min = min(net_forest_conversion),
  max = max(net_forest_conversion),
  mean = mean(net_forest_conversion),
  N = .N
), by = year]
knitr::kable(forest_summaries, caption = "Forest Conversion for Each Year", digits = 4, align = "lcccr", booktabs=T) %>% kable_styling(latex_options = c("HOLD_position", "resizebox=2.5\\textwidth"), font_size = 12, full_width = T) 
Forest Conversion for Each Year
year min max mean N
1990 -7818000 1986000 -137988.65 104
2000 -5117000 2360980 -68698.77 122
2010 -4801000 1936770 -45594.92 128
2015 -5150000 1936790 -65900.99 121

Visual Analysis and Analytical Questions

How has the Net Forest Conversion changed over the years?

To answer the question, the map below shows Net Forest Conversions (NFC) around the world. According to the map, China has the lowest deforestation rate, implying the highest Net Forest Conversion Rate, while Brazil has the highest. The cursor can be used to select the desired year and to check the NFC value for each country.

# creating custom theme
custom_theme <- function() {
  theme_classic() + 
  theme(
    text = element_text(size = 12, color = "darkslategrey", face = "bold", family = "mono"),
    plot.title = element_text(size = 15, color = "darkslategrey", face = "bold", family = "serif"),
    panel.background = element_rect(fill = "white", color = NA),
    axis.text = element_text(size = 10, color = "darkslategrey", family = "mono"),
    axis.title = element_text(size = 10, color = "darkslategrey", family = "mono")
  )
}
map <- forest[code != "" & str_length(code)==3]
map$hover <- paste0(map$entity, "\n", map$net_forest_conversion)
pal <- brewer.pal(n = 9, name = "YlGnBu")

map1 <- plot_geo(map, locationnode = "world", frame = ~year) %>% 
  add_trace(locations = ~code, z = ~net_forest_conversion,
            zmax = max(map$net_forest_conversion),
            zmin = min(map$net_forest_conversion),
            color = ~net_forest_conversion,
            colors = pal,
            text = ~hover,
            hoverinfo = "text") %>% 
  layout(geo = list(scope = "world"), 
          title = list(text = "Global Deforestation Trend", y=1, x =0.3),
         legend = list(size = 10, x = 0.9, y = 0.3), width = 800,
         height = 380)
         updatemenus = list(list(
           type = "buttons",
           buttons = list(list(
             args = list("frame.animation.transition.duration", 0),
             label = "Play",
             method = "animate",
             play = list(step = "forward", frame = list(duration = 100, redraw = F), mode = "immediate")
           )),
           x = 0.1,
           y = 0.9,
           showactive = T,
           bgcolor = "white"
         ))
map1

What are the top lowest and highest NFC valued countries?

The top ten countries with the highest and lowest NFC were extracted from the Forest dataset and then combined into a new dataset. To put things into perspective, NFC was divided by 1000, and the bar plot below was created. It shows that Brazil has the highest level of deforestation, followed by Indonesia, Tanzania, Myanmar, and others, while China has the highest level of net forest conversion meaning it has positive afforestaion rate.

setDT(forest)
# Getting the 10 countries with the highest net forest conversion
net_forest_max <- forest[entity != "World", .(total = sum(net_forest_conversion)), by = entity][order(-total)][1:10]

# Getting the 10 countries with the lowest net forest conversion
net_forest_min <- forest[entity != "World", .(total = sum(net_forest_conversion)), by = entity][order(total)][1:10]
top10 <- rbindlist(list(net_forest_max,net_forest_min), fill=T)
top10 <- top10[, total_1000:=total/1000]
ggplot(top10,aes(total_1000, entity, fill= total_1000 > 0 ))+ 
  geom_col()+
  labs(x="Net Forest Conversion (in Thousands Hectars)", y= "", subtitle = "From 1990-2015")+
  ggtitle("Top 20 Countries with Highest and Lowest NFC")+ 
  custom_theme()+
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#0072B2", "#FEE391"))

What are the caused of Deforestation in Brazil?

Now that it is apparent that Brazil’s deforestation rate trends the highest, I want to examine what causes the decrease in forest lands. The Brazil_loss dataset is used for this task, the causes of deforestation and the forest lost in 10,000 Hectars is plotted. Each cause is faceted into different sections and forest loss and year are plot in vertical and horizontal axis successively. While the causes are different, some causes like fire, roads, and selective lodging have peaked a few times across year. Currently, mining, pasture, small scale clearning, and three plantations including palm are shown to have an increasing trend and are the main causes of deforestation in Brazil, while selective lodging has a decreasing trend, but still comprises one of the highest forest loss share.

# Creating a new dataframe that includes all the causes of Brazil forest loss in one coulumn and their values in another by first creating a function that takes variable name as input and returns with that variable in tabular format with one row for each observation and and one column for each variable.
brazil_loss_cause <- function(variable){
  setDT(brazil_loss)[, .(year, cause = as.character(variable), loss = get(variable)), by = entity][, cause := as.factor(cause)][]
}
variable_names <- names(brazil_loss)[4:14]
brazil_list <- lapply(variable_names, brazil_loss_cause)
brazil_causes <- rbindlist(brazil_list)
setDT(brazil_causes)
# Dividing the loss by 10,000 to simplify interpretation
brazil_causes <- brazil_causes[,loss_10000:=loss/10000]

ggplot(brazil_causes, aes(year, loss_10000, colour = cause)) +
  geom_line(size = 0.8) +
  facet_wrap(~cause, scales = "free") +
  guides(colour = FALSE) +
  labs(y = "Forest loss (in 10,000 hectares)", x = "Year",
       title = "Forest Loss Causes in Brazil",
       subtitle = "Lost Forest Hectars are Divided by 10,000") + custom_theme()

Which soybean application contibutes the most to Global deforestation across continents?

To answer this question, the data was filtered to six continents and then faceted for each continent. The animation below depicts how soybean production, particularly for processed food, has increased significantly in Asia, North America, and South America. However, the link between deforestation and soybean production remains elusive.

soybean_use_dt <- data.table(soybean_use)
soybean_use_long <- melt(soybean_use_dt, id.vars = c("entity", "code", "year"), measure.vars = c("human_food", "animal_feed", "processed"))

# Filtering the data to only include continents
continents <- c("Africa", "Asia","Europe", "Northern America", "Oceania", "South America")
soybean_use_long_cont <- soybean_use_long[entity %in% continents]; soybean_use_long_cont[complete.cases(soybean_use_long_cont), ]
## Empty data.table (0 rows and 5 cols): entity,code,year,variable,value
p <- ggplot(soybean_use_long_cont[year == year,], aes(x = year, y = value/10000, color = variable)) +
      geom_point() +
      facet_wrap(~ entity) +
      xlab("Year") +
      ylab("Soybean Use (in 10,000 tonnes)") +
      labs(title = "Soybean Use by Continent", subtitle = "Finding the Dominant Soybean Usage") + scale_color_discrete(labels = c("Animal Feed", "Human Food", "Processed"))+ scale_color_manual(values = c("#808080", "#FEE391", "#0072B2"))  + theme(
        plot.title = element_text(face = "bold", hjust = 0.5),
        legend.position = "bottom",
        legend.box = "horizontal"
      ) + custom_theme() + theme(legend.position = "bottom")

p +transition_states(variable, wrap =FALSE)+ shadow_mark(alpha=0.5)+
  enter_grow()+
  exit_fade()+
  ease_aes("back-out")

What are top 6 countries in Soybean Processed Food Production?

Because processed food is the most important application for soybean production, the table below shows the total amount of soybean produced (in 1000 tonnes) that is used to make processed food. The United States appears to be the trend’s successor.

 cumulatives <- c("World", "Northern America","Asia", "Asia, Central" , "Western Asia", "USSR", "South America", "South Eastern Asia", "Southern Europe", "Northern Europe", "Northern Africa", "Low Income Food Deficit Countries", "Europe, Western", "European Union", "Eastern Europe", "Eastern Africa", "Eastern Asia", "Americas", "Europe", "Southern Asia")
soybean_use_filtered <- soybean_use[!entity %in% cumulatives]
 net_process_max <- soybean_use_filtered[, .(total = sum(processed)/1000), by = entity][order(-total)][1:6]
 
 knitr::kable(net_process_max, caption = "Sum of Soybean Produced for Processed Food by Country (in 1000 tonnes)", digits = 4, align = "lcccr", booktabs=T) %>% kable_styling(latex_options = c("HOLD_position", "resizebox=2.5\\textwidth"), font_size = 16)
Sum of Soybean Produced for Processed Food by Country (in 1000 tonnes)
entity total
United States 1647179
Brazil 780793
China 658969
Argentina 545039
India 163760
Japan 155933

I was also curious about the distribution of processed food applications in soybean production. The box plot for the United States and Brazil shows a close to normal distribution with a large difference between the minimum and maximum values; the United States has the highest soybean production going to processed food. The distribution for China is skewed, with many points further away from the median, which could be potential outliers or extreme values. As previously stated, China has the highest net forest conversion rate, implying the highest afforestation rate; thus, there appears to be no relationship between soybean production and deforestation.

top6 <- soybean_use[code== "USA"|code=="BRA"|code=="CHN"|code=="ARG"|code=="IND"|
              code== "JPN",]
soy_dt <- gather(top6,"category","value",4:6)

boxplot <- soy_dt[soy_dt$category %in% "processed", ] %>%
  group_by(entity, year) %>%
  summarize(value = mean(value, na.rm = TRUE)) %>%
  arrange(desc(value)) %>%
  ggplot() +
  aes(x = entity, y = value/10000, fill = entity) +
  geom_boxplot(alpha = 0.90) +
  stat_summary(fun="mean",  alpha = 0.9, size = .5) +
  ggtitle("Distribution of Top 6 Processed Foods Producing Countries") +
  labs(x="",y="Processed Food Production (in 10,000 tonnes)") +  theme(legend.position = "none") + custom_theme() + scale_fill_manual(values = c("#0072B2", "#808080", "#FEE391", "#2E8B57", "#FFCBA4", "#FFCBA5")) + theme(legend.position = "none")


boxplot+transition_states(entity, wrap =FALSE)+ shadow_mark(alpha=0.5)+
  enter_grow()+
  exit_fade()+
  ease_aes("back-out")

Is There a Correlation Between Deforestation and Soybean Production?

Combining the dataset containing countries with highest soybean production for processed food purposes with their net conversion rate, it is inferred from the graph below that there is no specific correlation between soybean production and deforestation.

country <- c("Argentina", "Brazil", "China", "India", "Japan", "United States")
forest_filtered <- forest[entity %in% country, .(total = sum(net_forest_conversion)/1000), by = entity]

setnames(forest_filtered, "total", "NFC")
setnames(net_process_max, "total", "processed")
# Perform the left join
correlation <- forest_filtered[net_process_max, on = "entity", nomatch = 0]
# Plot a scatterplot
ggplot(correlation, aes(x = processed, y = NFC)) +
  geom_point(fill="lightblue", color="blue", size=2) +
  geom_smooth(method = "lm", color="red") +
  labs(x = "Processed Food Production (in 1000 tonnes)", y = "Net Forest Conversion (in 1000 hectares)") +
  ggtitle("Relationship between Deforestation and Processed Food Production") + custom_theme()

Conclusion

To summarize, the global deforestation dataset was used to examine the trend and causes of deforestation over time. It is concluded that Brazil contributes the most to deforestation, whereas China has a positive afforestation trend. Furthermore, soybean production for processed foods does not appear to drive deforestation; it does not appear to have any specific correlation with net forest conversion.