library(conflicted)
library(dplyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded
From the table below, we can see that there are columns such as domain code, domain, area code, area, element code, element, item code, item, year code, year, unit, value, flag, flag description. Item seems to indicate the type of bird, while value seems to indicate the count of the given category in the specified year. Still, this assumption remains to be verified.
birds <- read.csv("challenge_datasets/birds.csv")
as_tibble(birds)
## # A tibble: 30,977 × 14
## Domain.Code Domain Area.Code Area Element.Code Element Item.Code Item
## <chr> <chr> <int> <chr> <int> <chr> <int> <chr>
## 1 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 2 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 3 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 4 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 5 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 6 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 7 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 8 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 9 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## 10 QA Live Animals 2 Afgh… 5112 Stocks 1057 Chic…
## # ℹ 30,967 more rows
## # ℹ 6 more variables: Year.Code <int>, Year <int>, Unit <chr>, Value <int>,
## # Flag <chr>, Flag.Description <chr>
In this section, we will do an elementary analysis to understand the data.
# Visualize the first few rows of the data
head(birds)
## Domain.Code Domain Area.Code Area Element.Code Element Item.Code
## 1 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## 2 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## 3 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## 4 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## 5 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## 6 QA Live Animals 2 Afghanistan 5112 Stocks 1057
## Item Year.Code Year Unit Value Flag Flag.Description
## 1 Chickens 1961 1961 1000 Head 4700 F FAO estimate
## 2 Chickens 1962 1962 1000 Head 4900 F FAO estimate
## 3 Chickens 1963 1963 1000 Head 5000 F FAO estimate
## 4 Chickens 1964 1964 1000 Head 5300 F FAO estimate
## 5 Chickens 1965 1965 1000 Head 5500 F FAO estimate
## 6 Chickens 1966 1966 1000 Head 5800 F FAO estimate
The Year column spans from 1961 to 2018, indicating historical data over several decades. The Value field, which likely represents the count of birds or some measurement related to them, has a broad range, from 0 to over 23 million, with some missing values (1036 NAs).
# Get a summary of the dataset
summary(birds)
## Domain.Code Domain Area.Code Area
## Length:30977 Length:30977 Min. : 1 Length:30977
## Class :character Class :character 1st Qu.: 79 Class :character
## Mode :character Mode :character Median : 156 Mode :character
## Mean :1202
## 3rd Qu.: 231
## Max. :5504
##
## Element.Code Element Item.Code Item
## Min. :5112 Length:30977 Min. :1057 Length:30977
## 1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
## Median :5112 Mode :character Median :1068 Mode :character
## Mean :5112 Mean :1066
## 3rd Qu.:5112 3rd Qu.:1072
## Max. :5112 Max. :1083
##
## Year.Code Year Unit Value
## Min. :1961 Min. :1961 Length:30977 Min. : 0
## 1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
## Median :1992 Median :1992 Mode :character Median : 1800
## Mean :1991 Mean :1991 Mean : 99411
## 3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
## Max. :2018 Max. :2018 Max. :23707134
## NA's :1036
## Flag Flag.Description
## Length:30977 Length:30977
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
# View the structure of the data
str(birds)
## 'data.frame': 30977 obs. of 14 variables:
## $ Domain.Code : chr "QA" "QA" "QA" "QA" ...
## $ Domain : chr "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
## $ Area.Code : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Element.Code : int 5112 5112 5112 5112 5112 5112 5112 5112 5112 5112 ...
## $ Element : chr "Stocks" "Stocks" "Stocks" "Stocks" ...
## $ Item.Code : int 1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 ...
## $ Item : chr "Chickens" "Chickens" "Chickens" "Chickens" ...
## $ Year.Code : int 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
## $ Year : int 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
## $ Unit : chr "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
## $ Value : int 4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
## $ Flag : chr "F" "F" "F" "F" ...
## $ Flag.Description: chr "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
# Get count of unique areas (the first few)
head(unique(birds$Area))
## [1] "Afghanistan" "Albania" "Algeria"
## [4] "American Samoa" "Angola" "Antigua and Barbuda"
This kind of dataset is useful for understanding global trends in poultry farming given the type of birds discussed in this data.
# Get the birds discussed in the data
unique(birds$Item)
## [1] "Chickens" "Ducks" "Geese and guinea fowls"
## [4] "Turkeys" "Pigeons, other birds"
This data seems to be collected by organizations related to food or poultery farming.
unique(birds$Flag.Description)
## [1] "FAO estimate"
## [2] "Official data"
## [3] "FAO data based on imputation methodology"
## [4] "Data not available"
## [5] "Unofficial figure"
## [6] "Aggregate, may include official, semi-official, estimated or calculated data"
The graph below captures the changes in populations of various bird types from 1961 to 2018. Notably, the chicken population has experienced a significant increase, dominating the chart with the steepest growth. Other bird types, such as ducks, geese and guinea fowls, pigeons, and turkeys, show relatively stable and much lower population levels over time.
Surpisingly, there has been a decrease in the chicken population 3 times (the steepest being in 1996). More analysis or research is required to understand the reason for these dips.
ggplot(birds, aes(x = Year, y = Value, group = Item, color = Item)) +
geom_line() +
theme_minimal() +
labs(title = "Bird Population Trends Over Time", x = "Year", y = "Population")
## Warning: Removed 3 rows containing missing values (`geom_line()`).
We next compare the total bird populations across the top 10 areas or regions. The World category, has the highest count, and “Asia” and “Americas” follow as regions with substantial bird populations, while more specific areas like “Eastern Asia,” “China, mainland,” and “Europe” indicate the significant contributions of these regions to the total bird population.
# Summarize the data by Area and then arrange in descending order of total value
country_wise <- birds %>% group_by(Area) %>% summarize(Total = sum(Value, na.rm = TRUE)) %>% arrange(desc(Total))
# Select only the top 10 Areas
top_areas <- head(country_wise, 10)
# Plot the data for these top Areas
ggplot(top_areas, aes(x = reorder(Area, Total), y = Total)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Area-wise Total Population of Birds (Top 10 Areas)", x = "Area", y = "Total Population")