For this challenge, I will be loading in the birds.csv file! With no previous experience in R and with some Python programming experience I thought this file should provide enough challenge and a good introduction to R.
I have loaded in a dataset where the dataset is stored in a common directory for all challenge datasets.
birds <- read_csv("../challenge_datasets/birds.csv")
## Rows: 30977 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
## dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We can see that the data loaded in has many columns, as we can see here
colnames(birds)
## [1] "Domain Code" "Domain" "Area Code" "Area"
## [5] "Element Code" "Element" "Item Code" "Item"
## [9] "Year Code" "Year" "Unit" "Value"
## [13] "Flag" "Flag Description"
From these columns, it seems like the data was likely gathered via FAO, the Food and Agricultural Organization. From this, it seems that the data is tracking inventory of species of birds in countries over a time period. The data seemed to be collected by either directly report or estimates on the inventory.
head(unique(birds$Area))
## [1] "Afghanistan" "Albania" "Algeria"
## [4] "American Samoa" "Angola" "Antigua and Barbuda"
head(unique(birds$Unit))
## [1] "1000 Head"
head(unique(birds$Value))
## [1] 4700 4900 5000 5300 5500 5800
unique(birds$`Flag Description`)
## [1] "FAO estimate"
## [2] "Official data"
## [3] "FAO data based on imputation methodology"
## [4] "Data not available"
## [5] "Unofficial figure"
## [6] "Aggregate, may include official, semi-official, estimated or calculated data"
Due to the size of the dataset I am displaying only the first few values. As we can see there are different countries listed checking its value by the value column. The data seems to always be based on 1000 Head which seems like the quantity of the birds in stock. The source of the flag or data is observed by the Flag Description column which is shown above.