Challenge2

Reading In Data

For this challenge, I will be reading in the FATOSTAT egg chicken dataset to keep on theme with the birds from the last challenge.

The dataset and the first rows are displayed below.

egg_data <- read_csv("../challenge_datasets/FAOSTAT_egg_chicken.csv")

## Rows: 38170 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
## dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Describe The Data

Similar to the last birds.csv dataset, this data seems to be collected by the FAO, hinted at by the FAO estimate description. The years are increasing per country and it seems to be measuring the egg production in each of those countries. In overview, it recorded the number of hens, yield per hen and total production for variety of countries.

Provided Grouped Summary Statistics (Central Tendency)

# Group it by the tonnes_unit
tonnes_data <- egg_data %>% filter(Unit == "tonnes")

mean_tonnes_data <- tonnes_data %>%
  group_by(Area) %>%
  summarize(mean_yield = mean(Value))

top_countries <- mean_tonnes_data %>% 
                   filter(Area != "World") %>%
                   arrange(desc(mean_yield)) %>%
                   head(5) 
print(top_countries)

## # A tibble: 5 × 2
##   Area            mean_yield
##   <chr>                <dbl>
## 1 Asia             18896761.
## 2 Eastern Asia     13566855.
## 3 China, mainland  10744941.
## 4 Europe            9783322.
## 5 Americas          8943798.

bottom_countries <- mean_tonnes_data %>% 
                    arrange(mean_yield) %>%
                    head(5)
print(bottom_countries)

## # A tibble: 5 × 2
##   Area                      mean_yield
##   <chr>                          <dbl>
## 1 Tokelau                         6.84
## 2 Tuvalu                         13.6 
## 3 Saint Pierre and Miquelon      13.6 
## 4 Nauru                          17.2 
## 5 Niue                           19.9

I observed that there are different categories for units so to have any meaningful analysis, we must filter it by the different values. In this case, I narrowed the scope down to just the units labled tonnes.

We can observe the top 5 and bottom 5 produced value areas/countries for eggs cumulative over the years. As expected, mostly areas with larger land space tends to have higher production of eggs.

Summary regarding Dispersion

egg_tonnes <- egg_data %>%
              filter(!is.na(Value)) %>% 
              filter(Unit == "tonnes")

egg_tonnes %>% 
  group_by(Area) %>%
  summarize(
    mean = mean(Value), 
    median = median(Value),
    sd = sd(Value),
    min = min(Value),
    max = max(Value),
    q25 = quantile(Value, 0.25),
    q75 = quantile(Value, 0.75)  
  )

## # A tibble: 244 × 8
##    Area                     mean   median         sd    min    max    q25    q75
##    <chr>                   <dbl>    <dbl>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Afghanistan           15264.    14325     2589.   1   e4 2.24e4 1.36e4 1.68e4
##  2 Africa              1580155.  1470368   919353.   3.92e5 3.31e6 7.59e5 2.24e6
##  3 Albania               17902.    13430    14941.   2.74e3 5.29e4 6.90e3 2.66e4
##  4 Algeria              114104.    94000   113206.   7.5 e3 3.90e5 1.58e4 1.73e5
##  5 American Samoa           31.5      32        5.02 1.8 e1 4.5 e1 3   e1 3.5 e1
##  6 Americas            8943798.  7986902  3265157.   4.85e6 1.63e7 5.95e6 1.13e7
##  7 Angola                 3968.     3900      826.   2.38e3 5.25e3 3.41e3 4.67e3
##  8 Antigua and Barbuda     186.      170       68.0  9.5 e1 3   e2 1.26e2 2.48e2
##  9 Argentina            327012.   275302.  177930.   1.41e5 8.29e5 2.03e5 3.49e5
## 10 Armenia               25861.    28784     9911.   1.04e4 3.82e4 1.60e4 3.47e4
## # ℹ 234 more rows

With this we can observe a variety of data yet again! We see that the standard deviation is very high for countries with high production in general. Dividing the standard deviation by the mean production could make this list a lot more helpful on the size differences. There is a vast difference between Max and Min between the countries showing the dominating produce/agricultural economy for the larger countries.

Challenge2

Alex Kim

2023-12-21

Reading In Data

Describe The Data

Provided Grouped Summary Statistics (Central Tendency)

Summary regarding Dispersion