Fruits & Vegetables Prices and Calorie Guidelines

How much do fruits and vegetables cost? Economic Research Service estimated average prices for over 150 commonly consumed fresh and processed fruits and vegetables. Reported estimates include each product’s average retail price per pound and per edible cup equivalent (the unit of measurement for Federal recommendations for fruit and vegetable consumption). For many fruits and vegetables, a 1-cup equivalent equals the weight of enough edible food to fill a measuring cup. ERS calculated average prices at retail stores using 2013 and 2016 retail scanner data from Information Resources, Inc. (IRI). A selection of retail establishments-grocery stores, supermarkets, supercenters, convenience stores, drug stores, and liquor stores-across the U.S. provides IRI with weekly retail sales data (revenue and quantity).

ERS reports average prices per edible cup equivalent to inform policymakers and nutritionists about how much money it costs Americans to eat a sufficient quantity and variety of fruits and vegetables. Every five years the Departments of Agriculture and Health and Human Services release a new version of the Dietary Guidelines for Americans with information about how individuals can achieve a healthy diet. However, the average American falls short in meeting these recommendations. Many people consume too many calories from refined grains, solid fats, and added sugars, and do not eat enough whole grains, fruits, and vegetables.

Are food prices a barrier to eating a healthy diet? ERS research using this data set shows that, in 2013, it was possible for a person on a 2,000-calorie diet to eat a sufficient quantity and variety of fruits and vegetables for about $2.10 to $2.60 per day. The report also illustrates the variety of fruits and vegetables affordable to a family on a limited budget

Analysis of the Impact on Food Caloric Intake and Costs of substituting certain groups of Snacks with certain categories of Fruits & Vegetables

Installing and Loading necessary packages

#install.packages("dplyr")
#install.packages("tidyr")
#install.packages("readxl")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(readxl)
library(ggplot2)

1st Data set - Calorie Intake Impact on Snack Substitution

Load data set (CSV) into R

Download target file locally from github “caloricimpacts (untidy dataset).xls”

setwd("C:/DATA/HHP/Personal/Degrees/Ms. Data Science (CUNY)/R Working Dir")
calorieimpact_raw <- data.frame(read_excel("caloricimpacts (untidy dataset).xls", skip = 1, col_names = TRUE))
dim(calorieimpact_raw)
## [1] 26 23

Data Wrangling

Elminating Headers, Footers and Calculated columns

calorieimpact_raw <- calorieimpact_raw[-(22:26),]
calorieimpact_raw <- calorieimpact_raw[,-23]
calorieimpact_raw <- calorieimpact_raw[-1,]

Pivoting columns to a key column and Renaming columns

calorieimpact <- gather(calorieimpact_raw, "FruitVeg", "DiffCalories", 3:22)
names(calorieimpact) <- c("Snack", "SnackCalories", "FruitVeg", "DiffCalories")
head(calorieimpact,22)
##                                           Snack SnackCalories
## 1     Chocolate  candy (milk chocolate candies)           262
## 2                Cookies (chocolate chip, soft)           123
## 3                                    Corn chips           140
## 4                              Crackers (wheat)           114
## 5  Cupcakes  (chocolate, with low-fat frosting)           174
## 6                           Danish (with fruit)           271
## 7               Donuts (yeast-leavened, glazed)           235
## 8                                   Fruit rolls            82
## 9                               Graham crackers           102
## 10         Granola  bars (oats, fruit, and nut)           119
## 11                   Ice cream (vanilla, light)           196
## 12                          Muffins (blueberry)           369
## 13   Pizza, from frozen (cheese, regular crust)           252
## 14        Popsicles and bars (fruit/juice bars)            80
## 15                 Potato chips (plain, salted)           169
## 16               Pretzels (hard, plain, salted)           168
## 17              Pudding, ready-to-eat (vanilla)           152
## 18  Sandwich crackers (rye with cheese filling)           183
## 19             Toaster pastries (fruit frosted)           299
## 20                       Tortilla chips (white)           161
## 21    Chocolate  candy (milk chocolate candies)           262
## 22               Cookies (chocolate chip, soft)           123
##                  FruitVeg DiffCalories
## 1                  Apples          185
## 2                  Apples           46
## 3                  Apples           63
## 4                  Apples           37
## 5                  Apples           97
## 6                  Apples          194
## 7                  Apples          158
## 8                  Apples            5
## 9                  Apples           25
## 10                 Apples           42
## 11                 Apples          119
## 12                 Apples          292
## 13                 Apples          175
## 14                 Apples            3
## 15                 Apples           92
## 16                 Apples           91
## 17                 Apples           75
## 18                 Apples          106
## 19                 Apples          222
## 20                 Apples           84
## 21 Applesauce..sweetened.          162
## 22 Applesauce..sweetened.           23

Casting data types, creating calculated-columns, arranging column layout

calorieimpact$SnackCalories <- as.numeric(calorieimpact$SnackCalories)
calorieimpact$DiffCalories <- as.numeric(calorieimpact$DiffCalories)
calorieimpact <- mutate(calorieimpact, FruitVegCalories = SnackCalories-DiffCalories)
avgCalChgbyFruitVeg <- summarise(group_by(calorieimpact, FruitVeg), AvgCalChgbyFruitVeg = round(mean(DiffCalories)))
calorieimpact <- calorieimpact %>% inner_join(avgCalChgbyFruitVeg, by = c("FruitVeg"))
calorieimpactfinal <- select(calorieimpact, Snack, SnackCalories, FruitVeg, FruitVegCalories, AvgCalChgbyFruitVeg, DiffCalories)
head(calorieimpactfinal,22)
##                                           Snack SnackCalories
## 1     Chocolate  candy (milk chocolate candies)           262
## 2                Cookies (chocolate chip, soft)           123
## 3                                    Corn chips           140
## 4                              Crackers (wheat)           114
## 5  Cupcakes  (chocolate, with low-fat frosting)           174
## 6                           Danish (with fruit)           271
## 7               Donuts (yeast-leavened, glazed)           235
## 8                                   Fruit rolls            82
## 9                               Graham crackers           102
## 10         Granola  bars (oats, fruit, and nut)           119
## 11                   Ice cream (vanilla, light)           196
## 12                          Muffins (blueberry)           369
## 13   Pizza, from frozen (cheese, regular crust)           252
## 14        Popsicles and bars (fruit/juice bars)            80
## 15                 Potato chips (plain, salted)           169
## 16               Pretzels (hard, plain, salted)           168
## 17              Pudding, ready-to-eat (vanilla)           152
## 18  Sandwich crackers (rye with cheese filling)           183
## 19             Toaster pastries (fruit frosted)           299
## 20                       Tortilla chips (white)           161
## 21    Chocolate  candy (milk chocolate candies)           262
## 22               Cookies (chocolate chip, soft)           123
##                  FruitVeg FruitVegCalories AvgCalChgbyFruitVeg
## 1                  Apples               77                 106
## 2                  Apples               77                 106
## 3                  Apples               77                 106
## 4                  Apples               77                 106
## 5                  Apples               77                 106
## 6                  Apples               77                 106
## 7                  Apples               77                 106
## 8                  Apples               77                 106
## 9                  Apples               77                 106
## 10                 Apples               77                 106
## 11                 Apples               77                 106
## 12                 Apples               77                 106
## 13                 Apples               77                 106
## 14                 Apples               77                 106
## 15                 Apples               77                 106
## 16                 Apples               77                 106
## 17                 Apples               77                 106
## 18                 Apples               77                 106
## 19                 Apples               77                 106
## 20                 Apples               77                 106
## 21 Applesauce..sweetened.              100                  83
## 22 Applesauce..sweetened.              100                  83
##    DiffCalories
## 1           185
## 2            46
## 3            63
## 4            37
## 5            97
## 6           194
## 7           158
## 8             5
## 9            25
## 10           42
## 11          119
## 12          292
## 13          175
## 14            3
## 15           92
## 16           91
## 17           75
## 18          106
## 19          222
## 20           84
## 21          162
## 22           23

Analysis of the calorie intake impact of substituting Snacks with Fruits/Vegetables

ggplot(calorieimpactfinal, aes(x=DiffCalories, y=Snack, colour = FruitVeg)) + geom_point() + facet_wrap(~FruitVeg) + labs(y = "Snack", x = "Differential of Calories Snack vs Fruit/Vegetable", title = "Analysis of the calorie intake impact of substituting Snacks with Fruits/Vegetables") + theme_bw()

(Values < 0 mean calorie intake increase due to substitution)

Conclusion: With the exception of very few cases like: Sweet Potatoes, Raisins & Apple Sauce, the calorie intake is substantially decreased when substituting Snacks with Fruits/Vegetables

2nd Data set - Cost Impact on Snack Substitution

Load data set (CSV) into R

Download target file locally from github “costimpacts (untidy dataset).xls”

setwd("C:/DATA/HHP/Personal/Degrees/Ms. Data Science (CUNY)/R Working Dir")
costimpact_raw <- data.frame(read_excel("costimpacts (untidy dataset).xls", skip = 1, col_names = TRUE))
dim(costimpact_raw)
## [1] 25 23

Data Wrangling

Elminating Headers, Footers and Calculated columns

costimpact_raw <- costimpact_raw[-(22:25),]
costimpact_raw <- costimpact_raw[,-23]
costimpact_raw <- costimpact_raw[-1,]

Pivoting columns to a key column and Renaming columns

costimpact <- gather(costimpact_raw, "FruitVeg", "DiffCost", 3:22)
names(costimpact) <- c("Snack", "SnackCost", "FruitVeg", "DiffCost")
head(costimpact,22)
##                    Snack           SnackCost           FruitVeg DiffCost
## 1       Chocolate  candy 0.23999999999999999             Apples    -0.12
## 2                Cookies                0.16             Apples    -0.20
## 3             Corn chips 0.20999999999999999             Apples    -0.15
## 4               Crackers                0.16             Apples    -0.20
## 5               Cupcakes 0.34000000000000002             Apples    -0.02
## 6                 Danish 0.46999999999999997             Apples     0.11
## 7                 Donuts 0.35999999999999999             Apples     0.00
## 8            Fruit rolls 0.28000000000000003             Apples    -0.08
## 9        Graham crackers 0.14000000000000001             Apples    -0.22
## 10         Granola  bars 0.29999999999999999             Apples    -0.06
## 11             Ice cream 0.39000000000000001             Apples     0.03
## 12               Muffins 0.82999999999999996             Apples     0.47
## 13    Pizza, from frozen                0.63             Apples     0.27
## 14    Popsicles and bars 0.34000000000000002             Apples    -0.02
## 15          Potato chips 0.27000000000000002             Apples    -0.09
## 16              Pretzels                0.25             Apples    -0.11
## 17 Pudding, ready-to-eat                0.38             Apples     0.02
## 18     Sandwich crackers 0.20000000000000001             Apples    -0.16
## 19      Toaster pastries 0.34999999999999998             Apples    -0.01
## 20        Tortilla chips 0.20999999999999999             Apples    -0.15
## 21      Chocolate  candy 0.23999999999999999 Applesauce..jarred     0.02
## 22               Cookies                0.16 Applesauce..jarred    -0.06

Casting data types, creating calculated-columns, arranging column layout

costimpact$SnackCost <- as.numeric(format(costimpact$SnackCost, nsmall = 2))
costimpact$DiffCost <- as.numeric(format(costimpact$DiffCost, nsmall = 2))
costimpact <- mutate(costimpact, FruitVegCost = SnackCost-DiffCost)
totCostchgbyFruitVeg <- summarise(group_by(costimpact, FruitVeg), TotCostChgbyFruitVeg = sum(DiffCost))
costimpact <- costimpact %>% inner_join(totCostchgbyFruitVeg, by = c("FruitVeg"))
costimpactfinal <- select(costimpact, Snack, SnackCost, FruitVeg, FruitVegCost, TotCostChgbyFruitVeg, DiffCost)
head(costimpactfinal,22)
##                    Snack SnackCost           FruitVeg FruitVegCost
## 1       Chocolate  candy      0.24             Apples         0.36
## 2                Cookies      0.16             Apples         0.36
## 3             Corn chips      0.21             Apples         0.36
## 4               Crackers      0.16             Apples         0.36
## 5               Cupcakes      0.34             Apples         0.36
## 6                 Danish      0.47             Apples         0.36
## 7                 Donuts      0.36             Apples         0.36
## 8            Fruit rolls      0.28             Apples         0.36
## 9        Graham crackers      0.14             Apples         0.36
## 10         Granola  bars      0.30             Apples         0.36
## 11             Ice cream      0.39             Apples         0.36
## 12               Muffins      0.83             Apples         0.36
## 13    Pizza, from frozen      0.63             Apples         0.36
## 14    Popsicles and bars      0.34             Apples         0.36
## 15          Potato chips      0.27             Apples         0.36
## 16              Pretzels      0.25             Apples         0.36
## 17 Pudding, ready-to-eat      0.38             Apples         0.36
## 18     Sandwich crackers      0.20             Apples         0.36
## 19      Toaster pastries      0.35             Apples         0.36
## 20        Tortilla chips      0.21             Apples         0.36
## 21      Chocolate  candy      0.24 Applesauce..jarred         0.22
## 22               Cookies      0.16 Applesauce..jarred         0.22
##    TotCostChgbyFruitVeg DiffCost
## 1                 -0.69    -0.12
## 2                 -0.69    -0.20
## 3                 -0.69    -0.15
## 4                 -0.69    -0.20
## 5                 -0.69    -0.02
## 6                 -0.69     0.11
## 7                 -0.69     0.00
## 8                 -0.69    -0.08
## 9                 -0.69    -0.22
## 10                -0.69    -0.06
## 11                -0.69     0.03
## 12                -0.69     0.47
## 13                -0.69     0.27
## 14                -0.69    -0.02
## 15                -0.69    -0.09
## 16                -0.69    -0.11
## 17                -0.69     0.02
## 18                -0.69    -0.16
## 19                -0.69    -0.01
## 20                -0.69    -0.15
## 21                 2.11     0.02
## 22                 2.11    -0.06

Analysis of the cost impact of substituting Snacks with Fruits/Vegetables

ggplot(costimpactfinal, aes(x=DiffCost, y=Snack, colour = FruitVeg)) + geom_point() + facet_wrap(~FruitVeg)+ labs(y = "Snack", x = "Differential of Cost Snack vs Fruit/Vegetable ", 
        title = "Analysis of the cost impact of substituting Snacks with Fruits/Vegetables") + theme_bw()

(Values < 0 mean cost increase due to subsitution)
ggplot(costimpactfinal, aes(y = FruitVeg, x = TotCostChgbyFruitVeg)) + geom_point(aes(col=FruitVeg, size=TotCostChgbyFruitVeg)) + labs(y = "Fruit/Vegetable", x = "Total Cost Replacing all Snacks with specific Fruit/Veg", 
        title = "Total Cost of Replacing all Snacks with a specific Fruit/Vegetable") + theme_bw()

(Values < 0 mean cost increase due to subsitution)

Conclusion: With the exception of very few cases like: Red Peppers, Tangerines & Tomatoes, the cost is substantially decreased when substituting Snacks with Fruits/Vegetables. Some of the best savings contributors are: Celery, Carrots, Broccoli & Bananas

3rd Data set - United Nations Migration Data

Load data set (CSV) into R and eliminate headers

Download target file locally from github “UN_MigrantStockByOriginAndDestination_2017.xlsx”

setwd("C:/DATA/HHP/Personal/Degrees/Ms. Data Science (CUNY)/R Working Dir")
migration_raw <- data.frame(read_excel("UN_MigrantStockByOriginAndDestination_2017.xlsx", sheet = "Table 1", skip = 15, col_names = TRUE))
dim(migration_raw)
## [1] 1917  241
head(migration_raw,1)
##   X__1    X__2  X__3 X__4 X__5 X__6     Total Other.North Other.South
## 1 1990 1990001 WORLD <NA>  900 <NA> 152542373     2144536     6342531
##   Afghanistan Albania Algeria American.Samoa Andorra Angola Anguilla
## 1     6724681  179490  906030           2041    3792 809942     2047
##   Antigua.and.Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan
## 1               21747    430322  899683 10597    303406  505818    1634120
##   Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize  Benin
## 1   25172   12767    5451546    84917 1769029  365414  36114 233642
##   Bermuda Bhutan Bolivia..Plurinational.State.of. Bosnia.and.Herzegovina
## 1   71703  28366                           224687                 863399
##   Botswana Brazil British.Virgin.Islands Brunei.Darussalam Bulgaria
## 1    26037 500149                   3094             26323   617155
##   Burkina.Faso Burundi Cabo.Verde Cambodia Cameroon Canada
## 1      1018868  337199      91681   354608   115311 997144
##   Caribbean.Netherlands Cayman.Islands Central.African.Republic   Chad
## 1                  4325            373                    46240 336916
##   Channel.Islands  Chile   China China..Hong.Kong.SAR China..Macao.SAR
## 1           18727 489955 4229860               551080            95648
##   Colombia Comoros Congo Cook.Islands Costa.Rica Côte.d.Ivoire Croatia
## 1  1009148   40076 91302        17488      69724        370866  426201
##     Cuba Curaçao Cyprus Czechia Dem..People.s.Republic.of.Korea
## 1 835546   41878 174364  277514                           38574
##   Democratic.Republic.of.the.Congo Denmark Djibouti Dominica
## 1                           401962  202194     5281    42366
##   Dominican.Republic Ecuador   Egypt El.Salvador Equatorial.Guinea Eritrea
## 1             465022  213731 1321128     1241877             34972  170617
##   Estonia Ethiopia Faeroe.Islands Falkland.Islands..Malvinas.  Fiji
## 1  113928  1687517           7524                         260 90156
##   Finland  France French.Guiana French.Polynesia Gabon Gambia Georgia
## 1  250690 1197097          2848             3149 15208  36149  920441
##   Germany  Ghana Gibraltar  Greece Greenland Grenada Guadeloupe Guam
## 1 3277677 373303     11994 1041455      9510   43250       5828 1376
##   Guatemala Guinea Guinea.Bissau Guyana  Haiti Holy.See Honduras Hungary
## 1    343623 352590         55320 233660 527307       32   156553  386934
##   Iceland   India Indonesia Iran..Islamic.Republic.of.    Iraq Ireland
## 1   17635 6718862   1636326                     629834 1506679  917639
##   Isle.of.Man Israel   Italy Jamaica  Japan Jordan Kazakhstan  Kenya
## 1       10735 278956 3416421  588820 609032 313737    2972433 242119
##   Kiribati Kuwait Kyrgyzstan Lao.People.s.Democratic.Republic Latvia
## 1     4053  81482     522615                           482094 215165
##   Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg
## 1  506600  191993  516719 76071          3430    341083      27828
##   Madagascar Malawi Malaysia Maldives   Mali  Malta Marshall.Islands
## 1      58633 121365   562617     2192 647744 110735             1428
##   Martinique Mauritania Mauritius Mayotte  Mexico
## 1      11041     134129    108385    1835 4394684
##   Micronesia..Fed..States.of. Monaco Mongolia Montenegro Montserrat
## 1                        7714   4407    24155      78678       7188
##   Morocco Mozambique Myanmar Namibia Nauru  Nepal Netherlands
## 1 1588218    2218009  685288   16057  1412 748046      728810
##   New.Caledonia New.Zealand Nicaragua  Niger Nigeria Niue
## 1          4151      388089    442037 132726  448460 5860
##   Northern.Mariana.Islands Norway  Oman Pakistan Palau Panama
## 1                     2525 138537 12534  3341574  2958 134703
##   Papua.New.Guinea Paraguay   Peru Philippines  Poland Portugal
## 1             3015   297925 313595     2029190 1533306  1880727
##   Puerto.Rico Qatar Republic.of.Korea Republic.of.Moldova Réunion Romania
## 1     1200835 12204           1624797              625810    3087  813087
##   Russian.Federation Rwanda Saint.Helena Saint.Kitts.and.Nevis Saint.Lucia
## 1           12664537 550719          884                 20714       22006
##   Saint.Pierre.and.Miquelon Saint.Vincent.and.the.Grenadines Samoa
## 1                       485                            37043 74861
##   San.Marino Sao.Tome.and.Principe Saudi.Arabia Senegal Serbia Seychelles
## 1       1415                 13941       107037  369263 708804      29376
##   Sierra.Leone Singapore Sint.Maarten..Dutch.part. Slovakia Slovenia
## 1        60952    156201                     14368   133461    91652
##   Solomon.Islands Somalia South.Africa South.Sudan   Spain Sri.Lanka
## 1            2212  846967       327661      514943 1449316    885836
##   State.of.Palestine  Sudan Suriname Swaziland Sweden Switzerland
## 1            1813068 587063   179657     35184 207067      323233
##   Syrian.Arab.Republic Tajikistan TFYR.Macedonia Thailand Timor.Leste
## 1               620868     536252         429555   309088       11261
##     Togo Tokelau Tonga Trinidad.and.Tobago Tunisia  Turkey Turkmenistan
## 1 193369    1684 32666              197522  453933 2530619       259991
##   Turks.and.Caicos.Islands Tuvalu Uganda Ukraine United.Arab.Emirates
## 1                     2311   2350 311490 5549477                79540
##   United.Kingdom United.Republic.of.Tanzania United.States.of.America
## 1        3795662                      204272                  1736288
##   United.States.Virgin.Islands Uruguay Uzbekistan Vanuatu
## 1                         2362  237258    1429956    5060
##   Venezuela..Bolivarian.Republic.of. Viet.Nam Wallis.and.Futuna.Islands
## 1                             185888  1235348                      6484
##   Western.Sahara  Yemen Zambia Zimbabwe
## 1         168239 455442  83210   176697

Data Wrangling

Elminating irrelevant columns, including totals

migration_raw <- migration_raw[,-2]
migration_raw <- migration_raw[,-(3:6)]

Pivoting columns to a key column and Renaming columns

migration <- gather(migration_raw, "CountryOrig", "MigrantStock", 3:236)
names(migration) <- c("Year", "CountryDest", "CountryOrig", "MigrantStock")
head(migration,5)
##   Year                                                 CountryDest
## 1 1990                                                       WORLD
## 2 1990                                      More developed regions
## 3 1990                                      Less developed regions
## 4 1990                                   Least developed countries
## 5 1990 Less developed regions, excluding least developed countries
##   CountryOrig MigrantStock
## 1 Other.North      2144536
## 2 Other.North      1094823
## 3 Other.North      1049713
## 4 Other.North       250575
## 5 Other.North       799138

Casting data types and filtering NAs and certain summary categories like: World, Developed, High Income Regions, Continents & Subcontinents, etc.

str(migration)
## 'data.frame':    448578 obs. of  4 variables:
##  $ Year        : num  1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
##  $ CountryDest : chr  "WORLD" "More developed regions" "Less developed regions" "Least developed countries" ...
##  $ CountryOrig : chr  "Other.North" "Other.North" "Other.North" "Other.North" ...
##  $ MigrantStock: chr  "2144536" "1094823" "1049713" "250575" ...
migration$MigrantStock <- as.numeric(migration$MigrantStock)
## Warning: NAs introduced by coercion
migration_final <- migration %>% filter(!is.na(MigrantStock))
migration_final <- migration_final %>% filter(!str_detect(CountryDest,"regions|countries|WORLD"))
migration_final <- migration_final %>% filter(!str_detect(CountryDest,"ASIA|AFRICA|EUROPE|LATIN|CARIBBEAN|OCEANIA|Central|Eastern|Western|Southern|Northern|NORTHERN|Other|Sub"))
migration_final <- migration_final %>% filter(!str_detect(CountryOrig,"Other"))
head(migration_final,10)
##    Year                CountryDest CountryOrig MigrantStock
## 1  1990                      Egypt Afghanistan          237
## 2  1990                      Libya Afghanistan          677
## 3  1990                    Namibia Afghanistan           64
## 4  1990               South Africa Afghanistan           59
## 5  1990                 Tajikistan Afghanistan         8485
## 6  1990                      India Afghanistan        14159
## 7  1990 Iran (Islamic Republic of) Afghanistan      3123968
## 8  1990                   Pakistan Afghanistan      3276673
## 9  1990                   Malaysia Afghanistan           32
## 10 1990                Philippines Afghanistan           25

Preparing datasets for Ranking Top CountryDest and CountryOrigin by MigrationStock

TopCountryDest <- summarise(group_by(migration_final, CountryDest), MigrantStock=sum(MigrantStock)) %>% arrange(desc(MigrantStock)) %>% head(., 20)
TopCountryOrig <- summarise(group_by(migration_final, CountryOrig), MigrantStock=sum(MigrantStock)) %>% arrange(desc(MigrantStock)) %>% head(., 20)

Preparing dataset for Migration by CountryDest over time

TopCountryDestOT <- summarise(group_by(migration_final, CountryDest, Year), MigrantStock=sum(MigrantStock)) %>% arrange(desc(MigrantStock)) %>% filter(str_detect(CountryDest,"United States of America|Russia|Germany|Saudi|France|United Kingdom|Canada|Australia|India|New Zealand|Ukraine|United Arab|Pakistan|Italy|Spain|Iran|China"))

Analysis of the Migration Patterns

Most Attractive Countries for Immigrants

ggplot(TopCountryDest, aes(x=reorder(CountryDest, MigrantStock), y=MigrantStock, fill=CountryDest)) + geom_col(position = "dodge") + coord_flip() + labs(y = "Migration Stock", x = "Destination Country", title = "Most Attractive Countries for Immigrants") + theme(axis.text.y=element_blank())

Top Countries by # of Migrants

ggplot(TopCountryOrig, aes(x=reorder(CountryOrig, MigrantStock), y=MigrantStock, fill=CountryOrig)) + geom_col(position = "dodge") + coord_flip() + labs(y = "Migration Stock", x = "Origin Country", title = "Top Countries by # of Migrants") + theme(axis.text.y=element_blank())

MigrationStock over the Years for Top Destination Countries

ggplot(TopCountryDestOT, aes(x=Year, y=MigrantStock, group=CountryDest)) + geom_line(aes(color=CountryDest)) + geom_point(aes(color=CountryDest)) + labs(y = "Migration Stock", x = "Year", title = "MigrationStock over the Years for Top Destination Countries") + theme_bw() 

Faceted View
ggplot(TopCountryDestOT, aes(x=Year, y=MigrantStock, group=CountryDest)) + geom_line(aes(color=CountryDest)) + geom_point(aes(color=CountryDest)) + facet_wrap(~CountryDest) + labs(y = "Migration Stock", x = "Year", title = "MigrationStock over the Years for Top Destination Countries") + theme_bw()

Conclusion: The United States of America stays as the top destination country for immigrants WW with a linear growth over the last four decades. Ukraine, Pakistan, India, China and Iran experiencing a decline or a flat trend. Developed countries and economies like: UK, UAE, Canada, Australia, France and Germany, showing a steady growth trend over the last decades. Saudi Arabia showing up with an almost linear growth rate during the last two decades