── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("countrycode")
2. Import and clean the data
I clean the data through the following:
Delete the column that contains “Code,” which includes repetitive information.
Delete the “Unit,” “Element,” and “Domain” columns, which only contain one variable (in the data set “Livestock”) or might not be used in data analysis.
Filter the aggregated data.
Delete the “Flag” and “Flag Description column, which is not very helpful for data analysis.
Keep the information in the “Country_groups” column in the “Country_groups” data set only if they refer to the continent.
Rows: 36449 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1943 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Country Group, Country, M49 Code, ISO2 Code, ISO3 Code
dbl (2): Country Group Code, Country Code
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Country_groups
# A tibble: 276 × 2
`Country Group` Area
<chr> <chr>
1 Africa Algeria
2 Africa Angola
3 Africa Benin
4 Africa Botswana
5 Africa Burkina Faso
6 Africa Burundi
7 Africa Cabo Verde
8 Africa Cameroon
9 Africa Central African Republic
10 Africa Chad
# ℹ 266 more rows
Rows: 38170 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Egg_chicken
# A tibble: 27,436 × 4
Area Item Year Value
<chr> <chr> <dbl> <dbl>
1 Afghanistan Eggs, hen, in shell 1961 4000
2 Afghanistan Eggs, hen, in shell 1961 25000
3 Afghanistan Eggs, hen, in shell 1961 10000
4 Afghanistan Eggs, hen, in shell 1962 4400
5 Afghanistan Eggs, hen, in shell 1962 25000
6 Afghanistan Eggs, hen, in shell 1962 11000
7 Afghanistan Eggs, hen, in shell 1963 4600
8 Afghanistan Eggs, hen, in shell 1963 25000
9 Afghanistan Eggs, hen, in shell 1963 11500
10 Afghanistan Eggs, hen, in shell 1964 4800
# ℹ 27,426 more rows
Rows: 82116 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I use the left_join(),right_join(), and full_join to add information about the continent in the summary table and join several data sets together for analysis. I use inner_join(), semi_join(), andanti_join() functions to clean the information that is not available and filter the values in the “Area” column based on whether or not they represent the country.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'Area'. You can override using the
`.groups` argument.
Egg_summary
# A tibble: 492 × 5
# Groups: Area [276]
Area median sd `range(Value)` `Country Group`
<chr> <dbl> <dbl> <dbl> <chr>
1 Afghanistan 14200 7625. 4000 Asia
2 Afghanistan 14200 7625. 29048 Asia
3 Albania 12200 35029. 420 Europe
4 Albania 12200 35029. 110543 Europe
5 Algeria 27600 57487. 4600 Africa
6 Algeria 27600 57487. 390000 Africa
7 American Samoa 32 18858. 5 Oceania
8 American Samoa 32 18858. 51648 Oceania
9 Angola 3900 19755. 654 Africa
10 Angola 3900 19755. 63067 Africa
# ℹ 482 more rows
`summarise()` has grouped output by 'Area'. You can override using the
`.groups` argument.
summary_total
# A tibble: 1,613 × 7
# Groups: Area [305]
Area Item avg_stocks med_stocks sd_stock n_missing `Country Group`
<chr> <chr> <dbl> <dbl> <dbl> <int> <chr>
1 Afghanistan Asses 1035797. 1250000 319734. 0 Asia
2 Afghanistan Camels 246944. 250000 43255. 0 Asia
3 Afghanistan Cattle 2672409. 2797500 928853. 0 Asia
4 Afghanistan Eggs, h… 15398. 14200 7625. 0 Asia
5 Afghanistan Goats 4018809. 3750000 1570651. 0 Asia
6 Afghanistan Horses 337873. 370000 96615. 0 Asia
7 Afghanistan Milk, w… 873703. 556000 1069804. 0 Asia
8 Afghanistan Mules 27473. 27500 3471. 0 Asia
9 Afghanistan Sheep 15680483. 15055000 2387024. 0 Asia
10 Africa Eggs, h… 46818. 47052 9912. 0 <NA>
# ℹ 1,603 more rows
summary_total %>%distinct(Area)
# A tibble: 305 × 1
# Groups: Area [305]
Area
<chr>
1 Afghanistan
2 Africa
3 Albania
4 Algeria
5 American Samoa
6 Americas
7 Andorra
8 Angola
9 Anguilla
10 Antigua and Barbuda
# ℹ 295 more rows
summary_total_1 <- summary_total%>%semi_join(Country_groups, by="Area") # Show the values in the "Area" column that match continent information. In other words, show the country values in the "Area" column. summary_total_1
# A tibble: 1,556 × 7
# Groups: Area [276]
Area Item avg_stocks med_stocks sd_stock n_missing `Country Group`
<chr> <chr> <dbl> <dbl> <dbl> <int> <chr>
1 Afghanistan Asses 1035797. 1250000 319734. 0 Asia
2 Afghanistan Camels 246944. 250000 43255. 0 Asia
3 Afghanistan Cattle 2672409. 2797500 928853. 0 Asia
4 Afghanistan Eggs, h… 15398. 14200 7625. 0 Asia
5 Afghanistan Goats 4018809. 3750000 1570651. 0 Asia
6 Afghanistan Horses 337873. 370000 96615. 0 Asia
7 Afghanistan Milk, w… 873703. 556000 1069804. 0 Asia
8 Afghanistan Mules 27473. 27500 3471. 0 Asia
9 Afghanistan Sheep 15680483. 15055000 2387024. 0 Asia
10 Albania Asses 79896. 78000 20242. 0 Europe
# ℹ 1,546 more rows
summary_total_1 %>%distinct(Area)
# A tibble: 276 × 1
# Groups: Area [276]
Area
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 American Samoa
5 Andorra
6 Angola
7 Anguilla
8 Antigua and Barbuda
9 Argentina
10 Armenia
# ℹ 266 more rows
summary_total_2 <- summary_total%>%anti_join(Country_groups, by="Area")# Show the values in the "Area" column that do not match continent information. In other words, show the values in the "Area" column that do not represent the country.summary_total_2
# A tibble: 29 × 1
# Groups: Area [29]
Area
<chr>
1 Africa
2 Americas
3 Asia
4 Australia and New Zealand
5 Caribbean
6 Central America
7 Central Asia
8 Eastern Africa
9 Eastern Asia
10 Eastern Europe
# ℹ 19 more rows