Load required libraries

library(conflicted)
library(dplyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded

Read the data

From the table below, we can see that there are columns such as domain code, domain, area code, area, element code, element, item code, item, year code, year, unit, value, flag, flag description. Item seems to indicate the type of bird, while value seems to indicate the count of the given category in the specified year. Still, this assumption remains to be verified.

birds <- read.csv("challenge_datasets/birds.csv")
as_tibble(birds)
## # A tibble: 30,977 × 14
##    Domain.Code Domain       Area.Code Area  Element.Code Element Item.Code Item 
##    <chr>       <chr>            <int> <chr>        <int> <chr>       <int> <chr>
##  1 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  2 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  3 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  4 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  5 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  6 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  7 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  8 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
##  9 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
## 10 QA          Live Animals         2 Afgh…         5112 Stocks       1057 Chic…
## # ℹ 30,967 more rows
## # ℹ 6 more variables: Year.Code <int>, Year <int>, Unit <chr>, Value <int>,
## #   Flag <chr>, Flag.Description <chr>

Analyzing data

In this section, we will do an elementary analysis to understand the data.

Basic Exploration

# Visualize the first few rows of the data
head(birds)
##   Domain.Code       Domain Area.Code        Area Element.Code Element Item.Code
## 1          QA Live Animals         2 Afghanistan         5112  Stocks      1057
## 2          QA Live Animals         2 Afghanistan         5112  Stocks      1057
## 3          QA Live Animals         2 Afghanistan         5112  Stocks      1057
## 4          QA Live Animals         2 Afghanistan         5112  Stocks      1057
## 5          QA Live Animals         2 Afghanistan         5112  Stocks      1057
## 6          QA Live Animals         2 Afghanistan         5112  Stocks      1057
##       Item Year.Code Year      Unit Value Flag Flag.Description
## 1 Chickens      1961 1961 1000 Head  4700    F     FAO estimate
## 2 Chickens      1962 1962 1000 Head  4900    F     FAO estimate
## 3 Chickens      1963 1963 1000 Head  5000    F     FAO estimate
## 4 Chickens      1964 1964 1000 Head  5300    F     FAO estimate
## 5 Chickens      1965 1965 1000 Head  5500    F     FAO estimate
## 6 Chickens      1966 1966 1000 Head  5800    F     FAO estimate

The Year column spans from 1961 to 2018, indicating historical data over several decades. The Value field, which likely represents the count of birds or some measurement related to them, has a broad range, from 0 to over 23 million, with some missing values (1036 NAs).

# Get a summary of the dataset
summary(birds)
##  Domain.Code           Domain            Area.Code        Area          
##  Length:30977       Length:30977       Min.   :   1   Length:30977      
##  Class :character   Class :character   1st Qu.:  79   Class :character  
##  Mode  :character   Mode  :character   Median : 156   Mode  :character  
##                                        Mean   :1202                     
##                                        3rd Qu.: 231                     
##                                        Max.   :5504                     
##                                                                         
##   Element.Code    Element            Item.Code        Item          
##  Min.   :5112   Length:30977       Min.   :1057   Length:30977      
##  1st Qu.:5112   Class :character   1st Qu.:1057   Class :character  
##  Median :5112   Mode  :character   Median :1068   Mode  :character  
##  Mean   :5112                      Mean   :1066                     
##  3rd Qu.:5112                      3rd Qu.:1072                     
##  Max.   :5112                      Max.   :1083                     
##                                                                     
##    Year.Code         Year          Unit               Value         
##  Min.   :1961   Min.   :1961   Length:30977       Min.   :       0  
##  1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:     171  
##  Median :1992   Median :1992   Mode  :character   Median :    1800  
##  Mean   :1991   Mean   :1991                      Mean   :   99411  
##  3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   15404  
##  Max.   :2018   Max.   :2018                      Max.   :23707134  
##                                                   NA's   :1036      
##      Flag           Flag.Description  
##  Length:30977       Length:30977      
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 
# View the structure of the data
str(birds)
## 'data.frame':    30977 obs. of  14 variables:
##  $ Domain.Code     : chr  "QA" "QA" "QA" "QA" ...
##  $ Domain          : chr  "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
##  $ Area.Code       : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Area            : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Element.Code    : int  5112 5112 5112 5112 5112 5112 5112 5112 5112 5112 ...
##  $ Element         : chr  "Stocks" "Stocks" "Stocks" "Stocks" ...
##  $ Item.Code       : int  1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 ...
##  $ Item            : chr  "Chickens" "Chickens" "Chickens" "Chickens" ...
##  $ Year.Code       : int  1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
##  $ Year            : int  1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
##  $ Unit            : chr  "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
##  $ Value           : int  4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
##  $ Flag            : chr  "F" "F" "F" "F" ...
##  $ Flag.Description: chr  "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
# Get count of unique areas (the first few)
head(unique(birds$Area))
## [1] "Afghanistan"         "Albania"             "Algeria"            
## [4] "American Samoa"      "Angola"              "Antigua and Barbuda"

This kind of dataset is useful for understanding global trends in poultry farming given the type of birds discussed in this data.

# Get the birds discussed in the data
unique(birds$Item)
## [1] "Chickens"               "Ducks"                  "Geese and guinea fowls"
## [4] "Turkeys"                "Pigeons, other birds"

This data seems to be collected by organizations related to food or poultery farming.

unique(birds$Flag.Description)
## [1] "FAO estimate"                                                                
## [2] "Official data"                                                               
## [3] "FAO data based on imputation methodology"                                    
## [4] "Data not available"                                                          
## [5] "Unofficial figure"                                                           
## [6] "Aggregate, may include official, semi-official, estimated or calculated data"

Deep Dive

The graph below captures the changes in populations of various bird types from 1961 to 2018. Notably, the chicken population has experienced a significant increase, dominating the chart with the steepest growth. Other bird types, such as ducks, geese and guinea fowls, pigeons, and turkeys, show relatively stable and much lower population levels over time.

Surpisingly, there has been a decrease in the chicken population 3 times (the steepest being in 1996). More analysis or research is required to understand the reason for these dips.

ggplot(birds, aes(x = Year, y = Value, group = Item, color = Item)) +
  geom_line() +
  theme_minimal() +
  labs(title = "Bird Population Trends Over Time", x = "Year", y = "Population")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

We next compare the total bird populations across the top 10 areas or regions. The World category, has the highest count, and “Asia” and “Americas” follow as regions with substantial bird populations, while more specific areas like “Eastern Asia,” “China, mainland,” and “Europe” indicate the significant contributions of these regions to the total bird population.

# Summarize the data by Area and then arrange in descending order of total value
country_wise <- birds %>% group_by(Area) %>% summarize(Total = sum(Value, na.rm = TRUE)) %>% arrange(desc(Total))

# Select only the top 10 Areas
top_areas <- head(country_wise, 10)

# Plot the data for these top Areas
ggplot(top_areas, aes(x = reorder(Area, Total), y = Total)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Area-wise Total Population of Birds (Top 10 Areas)", x = "Area", y = "Total Population")