Data-607-Project-2a---UN-World-Migrant-Stock.utf8.md

title: “Data 607 - Project 2a”

author: “Sufian”

date: “9/30/2019”

output: html_document

Rpub link:

Source: United Nations Population Division

     Department of Economic and Social Affairs :  UN  International Migrant Stock

Jai’s list of Questions from his Discussion 5 post:

NOTE: Migrant Data refers to Immigration into area of destination; area, region or countries

Jai posed these questions: Summaries could be done by gender at any of the aggregation levels below

• Gender ratios of migrant stock for each region of the world, for each income group, etc.

• Average gender ratios of world migrant stock.

• What is the variance across countries.

• Is there a trend across years for any of these sequences.

This project will focus only the following 2 regions straddling between high vs. low income:

Developed vs. Less Developed Regions
High vs. Low Income Countries

Loading libraries

Loading Data from United Nations Reports

url <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/UN_MigrantStockTotal_2019%20(1).csv'

# Reading & Loading data
df <- read.csv(file = url ,sep = ",", na.strings = c("NA", " ", ""), strip.white = TRUE, stringsAsFactors = F, skip=13,header=F)

head(df)

##   V1                                                          V2   V3  V4
## 1  1                                                       WORLD <NA> 900
## 2  2                                       UN development groups <NA>  NA
## 3  3                                      More developed regions    b 901
## 4  4                                      Less developed regions    c 902
## 5  5                                   Least developed countries    d 941
## 6  6 Less developed regions, excluding least developed countries <NA> 934
##     V5          V6          V7          V8          V9         V10
## 1 <NA> 153,011,473 161,316,895 173,588,441 191,615,574 220,781,909
## 2 <NA>          ..          ..          ..          ..          ..
## 3 <NA>  82,767,216  92,935,095 103,961,989 116,687,616 130,613,460
## 4 <NA>  70,244,257  68,381,800  69,626,452  74,927,958  90,168,449
## 5 <NA>  11,060,221  11,681,777  10,063,948   9,833,150  10,432,671
## 6 <NA>  59,184,036  56,700,023  59,562,504  65,094,808  79,735,778
##           V11         V12        V13        V14        V15        V16
## 1 248,861,296 271,642,105 77,661,689 81,686,116 88,029,221 97,860,838
## 2          ..          ..         ..         ..         ..         ..
## 3 140,643,317 152,069,261 40,426,798 45,377,588 50,801,898 57,078,401
## 4 108,217,979 119,572,844 37,234,891 36,308,528 37,227,323 40,782,437
## 5  13,631,349  16,289,023  5,550,233  5,824,077  5,033,932  4,987,537
## 6  94,586,630 103,283,821 31,684,658 30,484,451 32,193,391 35,794,900
##           V17         V18         V19        V20        V21        V22
## 1 114,061,680 128,863,389 141,488,004 75,349,784 79,630,779 85,559,220
## 2          ..          ..          ..         ..         ..         ..
## 3  63,408,858  67,824,389  73,765,353 42,340,418 47,557,507 53,160,091
## 4  50,652,822  61,039,000  67,722,651 33,009,366 32,073,272 32,399,129
## 5   5,185,496   6,784,461   8,086,158  5,509,988  5,857,700  5,030,016
## 6  45,467,326  54,254,539  59,636,493 27,499,378 26,215,572 27,369,113
##          V23         V24         V25         V26
## 1 93,754,736 106,720,229 119,997,907 130,154,101
## 2         ..          ..          ..          ..
## 3 59,609,215  67,204,602  72,818,928  78,303,908
## 4 34,145,521  39,515,627  47,178,979  51,850,193
## 5  4,845,613   5,247,175   6,846,888   8,202,865
## 6 29,299,908  34,268,452  40,332,091  43,647,328

#labeling columns 
new_name <- c("Sort","Region","Notes","Code","Data_type","1990.Total","1995.Total","2000.Total","2005.Total","2010.Total","2015.Total","2019.Total","1990.Male","1995.Male","2000.Male","2005.Male","2010.Male","2015.Male","2019.Male","1990.Female","1995.Female","2000.Female","2005.Female","2010.Female","2015.Female","2019.Female")


#Rename Columns
df <- df %>% 
     rename_at(vars(starts_with("V")), funs(gsub(.,"V",new_name)))

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

## Warning in gsub(., "V", new_name): argument 'pattern' has length > 1 and
## only the first element will be used

head(df)

##   Sort                                                      Region Notes
## 1    1                                                       WORLD  <NA>
## 2    2                                       UN development groups  <NA>
## 3    3                                      More developed regions     b
## 4    4                                      Less developed regions     c
## 5    5                                   Least developed countries     d
## 6    6 Less developed regions, excluding least developed countries  <NA>
##   Code Data_type  1990.Total  1995.Total  2000.Total  2005.Total
## 1  900      <NA> 153,011,473 161,316,895 173,588,441 191,615,574
## 2   NA      <NA>          ..          ..          ..          ..
## 3  901      <NA>  82,767,216  92,935,095 103,961,989 116,687,616
## 4  902      <NA>  70,244,257  68,381,800  69,626,452  74,927,958
## 5  941      <NA>  11,060,221  11,681,777  10,063,948   9,833,150
## 6  934      <NA>  59,184,036  56,700,023  59,562,504  65,094,808
##    2010.Total  2015.Total  2019.Total  1990.Male  1995.Male  2000.Male
## 1 220,781,909 248,861,296 271,642,105 77,661,689 81,686,116 88,029,221
## 2          ..          ..          ..         ..         ..         ..
## 3 130,613,460 140,643,317 152,069,261 40,426,798 45,377,588 50,801,898
## 4  90,168,449 108,217,979 119,572,844 37,234,891 36,308,528 37,227,323
## 5  10,432,671  13,631,349  16,289,023  5,550,233  5,824,077  5,033,932
## 6  79,735,778  94,586,630 103,283,821 31,684,658 30,484,451 32,193,391
##    2005.Male   2010.Male   2015.Male   2019.Male 1990.Female 1995.Female
## 1 97,860,838 114,061,680 128,863,389 141,488,004  75,349,784  79,630,779
## 2         ..          ..          ..          ..          ..          ..
## 3 57,078,401  63,408,858  67,824,389  73,765,353  42,340,418  47,557,507
## 4 40,782,437  50,652,822  61,039,000  67,722,651  33,009,366  32,073,272
## 5  4,987,537   5,185,496   6,784,461   8,086,158   5,509,988   5,857,700
## 6 35,794,900  45,467,326  54,254,539  59,636,493  27,499,378  26,215,572
##   2000.Female 2005.Female 2010.Female 2015.Female 2019.Female
## 1  85,559,220  93,754,736 106,720,229 119,997,907 130,154,101
## 2          ..          ..          ..          ..          ..
## 3  53,160,091  59,609,215  67,204,602  72,818,928  78,303,908
## 4  32,399,129  34,145,521  39,515,627  47,178,979  51,850,193
## 5   5,030,016   4,845,613   5,247,175   6,846,888   8,202,865
## 6  27,369,113  29,299,908  34,268,452  40,332,091  43,647,328

Data Munging using Dplyr & TidyR

df1 <- df

df1[df1==".."] <- "0"

# making dataset long format
df1 <- gather(df1,"year_types","n_years",6:26)
head(df1)

##   Sort                                                      Region Notes
## 1    1                                                       WORLD  <NA>
## 2    2                                       UN development groups  <NA>
## 3    3                                      More developed regions     b
## 4    4                                      Less developed regions     c
## 5    5                                   Least developed countries     d
## 6    6 Less developed regions, excluding least developed countries  <NA>
##   Code Data_type year_types     n_years
## 1  900      <NA> 1990.Total 153,011,473
## 2   NA      <NA> 1990.Total           0
## 3  901      <NA> 1990.Total  82,767,216
## 4  902      <NA> 1990.Total  70,244,257
## 5  941      <NA> 1990.Total  11,060,221
## 6  934      <NA> 1990.Total  59,184,036

df2<-df1 %>% 
  mutate(n_years=str_replace_all(n_years,",","")) %<>% mutate_at(7, as.numeric)

head(df2)

##   Sort                                                      Region Notes
## 1    1                                                       WORLD  <NA>
## 2    2                                       UN development groups  <NA>
## 3    3                                      More developed regions     b
## 4    4                                      Less developed regions     c
## 5    5                                   Least developed countries     d
## 6    6 Less developed regions, excluding least developed countries  <NA>
##   Code Data_type year_types   n_years
## 1  900      <NA> 1990.Total 153011473
## 2   NA      <NA> 1990.Total         0
## 3  901      <NA> 1990.Total  82767216
## 4  902      <NA> 1990.Total  70244257
## 5  941      <NA> 1990.Total  11060221
## 6  934      <NA> 1990.Total  59184036

#segregate total years into male and female years 

separate_DF <- df2 %>% separate(year_types, c("Year", "gender"))
head(separate_DF)

##   Sort                                                      Region Notes
## 1    1                                                       WORLD  <NA>
## 2    2                                       UN development groups  <NA>
## 3    3                                      More developed regions     b
## 4    4                                      Less developed regions     c
## 5    5                                   Least developed countries     d
## 6    6 Less developed regions, excluding least developed countries  <NA>
##   Code Data_type Year gender   n_years
## 1  900      <NA> 1990  Total 153011473
## 2   NA      <NA> 1990  Total         0
## 3  901      <NA> 1990  Total  82767216
## 4  902      <NA> 1990  Total  70244257
## 5  941      <NA> 1990  Total  11060221
## 6  934      <NA> 1990  Total  59184036

wide_DF <- separate_DF%>% spread(gender, n_years)
head(wide_DF)

##   Sort                                                      Region Notes
## 1    1                                                       WORLD  <NA>
## 2    2                                       UN development groups  <NA>
## 3    3                                      More developed regions     b
## 4    4                                      Less developed regions     c
## 5    5                                   Least developed countries     d
## 6    6 Less developed regions, excluding least developed countries  <NA>
##   Code Data_type Year   Female     Male     Total
## 1  900      <NA> 1990 75349784 77661689 153011473
## 2   NA      <NA> 1990        0        0         0
## 3  901      <NA> 1990 42340418 40426798  82767216
## 4  902      <NA> 1990 33009366 37234891  70244257
## 5  941      <NA> 1990  5509988  5550233  11060221
## 6  934      <NA> 1990 27499378 31684658  59184036

no_zero_DF_wide <- wide_DF %>% filter(Female != 0)

# Drop the unnecessary columns of the dataframe

no_zero_DF_wide <- select (no_zero_DF_wide,-c(Notes,Code,Data_type)) %>% mutate_at(3, as.integer) 
              
head(no_zero_DF_wide)

##   Sort                                                      Region Year
## 1    1                                                       WORLD 1990
## 2    3                                      More developed regions 1990
## 3    4                                      Less developed regions 1990
## 4    5                                   Least developed countries 1990
## 5    6 Less developed regions, excluding least developed countries 1990
## 6    8                                       High-income countries 1990
##     Female     Male     Total
## 1 75349784 77661689 153011473
## 2 42340418 40426798  82767216
## 3 33009366 37234891  70244257
## 4  5509988  5550233  11060221
## 5 27499378 31684658  59184036
## 6 37812794 39990074  77802868

no_zero_DF1 <- gather(no_zero_DF_wide, "gender","N_years",4:6)
                     
head(no_zero_DF1)

##   Sort                                                      Region Year
## 1    1                                                       WORLD 1990
## 2    3                                      More developed regions 1990
## 3    4                                      Less developed regions 1990
## 4    5                                   Least developed countries 1990
## 5    6 Less developed regions, excluding least developed countries 1990
## 6    8                                       High-income countries 1990
##   gender  N_years
## 1 Female 75349784
## 2 Female 42340418
## 3 Female 33009366
## 4 Female  5509988
## 5 Female 27499378
## 6 Female 37812794

Migration Trends Studies (by different levels of economic & developed regions)

require(ggthemes)

## Loading required package: ggthemes

world_trend <- filter(no_zero_DF1, Region == 'WORLD') %>% 
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
               
head(world_trend)

##   Sort Region Year gender   N_years percent_migration_trends
## 1    1  WORLD 1990 Female  75349784               0.05303269
## 2    1  WORLD 1995 Female  79630779               0.05604574
## 3    1  WORLD 2000 Female  85559220               0.06021830
## 4    1  WORLD 2005 Female  93754736               0.06598646
## 5    1  WORLD 2010 Female 106720229               0.07511184
## 6    1  WORLD 2015 Female 119997907               0.08445693

#trendline plots of World Migrants
# Multiple line plot
ggplot(world_trend , aes(x = Year, y = percent_migration_trends)) + 
  geom_line(aes(color = gender), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  ggtitle("World - Migration Trendlines % terms: Men vs Women")+
  theme_excel()

Observation 1a:

Total World Migrations by men & women have been rising from 1990 thru 2020 with men consistently higher than women

Men seems to be more mobile than women across time

Drivers of World Migration Trends by gender (abs numbers)

require(ggthemes)
library(ggplot2)
library(formattable)
#world migration over the years male vs. female
# Basic barplot - migration patterns over the years; men vs women

world <- no_zero_DF1 %>% filter(Region == "WORLD")
head(world)

##   Sort Region Year gender   N_years
## 1    1  WORLD 1990 Female  75349784
## 2    1  WORLD 1995 Female  79630779
## 3    1  WORLD 2000 Female  85559220
## 4    1  WORLD 2005 Female  93754736
## 5    1  WORLD 2010 Female 106720229
## 6    1  WORLD 2015 Female 119997907

 world %>% group_by(gender, Year) %>% 
    ggplot(aes(x=Year, y=N_years,fill = gender)) + 
    geom_bar(stat = "identity", position = "dodge") +
  xlab("Year") + ylab("Int'l Migrant Stock") +
  ggtitle("Drivers of World Migration Trends by Gender") + ylim(0, 271642105)+
   theme_excel()

Observation 1b:

Total migratory growth mainly driven by Men in abs. terms across time; confirming the first chart

Variances of World Migration Trends by gender

# Investigating variances of World Migration trends by Income & Development regions

#variance by male & female
world_trend_variance <- no_zero_DF1 %>% 
                     group_by(gender) %>% 
                      filter(gender == "Male" |gender == "Female" ) %>% 
filter(Region== "High-income countries" | Region=="Low-income countries"| Region=="More developed regions"| Region=="Less developed regions") %>% 
                mutate(std_dev = sd(N_years)) 

#Boxplot to show variances between regions and gender

p <- ggplot(world_trend_variance, aes(y=N_years,x=Region,fill=Region))+geom_boxplot()+
  ggtitle("Variances of World Migration by Gender from FY90 - FY20")+facet_grid( ~ gender)+
    theme_excel()
p <- p + theme(axis.text = element_text(size = 10,angle =45, hjust = 1))

p

Observation 2a:

Both High Income countries and more developed regions experienced greater immigration influxes based on their

medians with men seeing greater variances. Not surprisingly, less developed regions and lower income countries saw

lower immigration based on their much lower medians and it also had lower spread in its

distributions

Lets go underneath the regions and the income groupings to better understand the

underlying migration patterns of the 2 genders better

# Trendlines by Hi Income Group
Hi_Income_trend <- no_zero_DF1 %>% filter( Region == 'High-income countries') %>% 
              group_by(Year) %>% 
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
            
head(Hi_Income_trend)

## # A tibble: 6 x 6
## # Groups:   Year [6]
##    Sort Region                 Year gender  N_years percent_migration_tren~
##   <int> <chr>                 <int> <chr>     <dbl>                   <dbl>
## 1     8 High-income countries  1990 Female 37812794                   0.486
## 2     8 High-income countries  1995 Female 43679245                   0.489
## 3     8 High-income countries  2000 Female 50568214                   0.491
## 4     8 High-income countries  2005 Female 58753262                   0.487
## 5     8 High-income countries  2010 Female 69216394                   0.480
## 6     8 High-income countries  2015 Female 76927104                   0.480

#trendline plots of Hi Income Migrants

ggplot(Hi_Income_trend , aes(x = Year, y = percent_migration_trends)) + 
  geom_line(aes(color = gender), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  ggtitle("Hi Income Group - Migration Trendlines % terms: Men vs Women")+
  theme_excel()

# trendline plots of More Developed Regions Migrants
More_developed_trend  <- no_zero_DF1 %>% filter( Region == 'More developed regions') %>% 
               group_by(Year) %>%
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
               
head(More_developed_trend)

## # A tibble: 6 x 6
## # Groups:   Year [6]
##    Sort Region                 Year gender  N_years percent_migration_tren~
##   <int> <chr>                 <int> <chr>     <dbl>                   <dbl>
## 1     3 More developed regio~  1990 Female 42340418                   0.512
## 2     3 More developed regio~  1995 Female 47557507                   0.512
## 3     3 More developed regio~  2000 Female 53160091                   0.511
## 4     3 More developed regio~  2005 Female 59609215                   0.511
## 5     3 More developed regio~  2010 Female 67204602                   0.515
## 6     3 More developed regio~  2015 Female 72818928                   0.518

#trendline plots of More Developed Region Migrants

ggplot(More_developed_trend , aes(x = Year, y = percent_migration_trends)) + 
  geom_line(aes(color = gender), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  ggtitle("More Developed Region - Migration Trendlines % terms: Men vs Women")+
  theme_excel()

#--------------------------------------------------------------------------------------------
# trendline plots of Low Income Migrants
Lo_Income_trend  <- no_zero_DF1%>% filter(Region == 'Low-income countries') %>% 
               group_by(Year) %>%
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
               
head(Lo_Income_trend)

## # A tibble: 6 x 6
## # Groups:   Year [6]
##    Sort Region                Year gender N_years percent_migration_trends
##   <int> <chr>                <int> <chr>    <dbl>                    <dbl>
## 1    12 Low-income countries  1990 Female 4909022                    0.501
## 2    12 Low-income countries  1995 Female 5347591                    0.506
## 3    12 Low-income countries  2000 Female 4526567                    0.504
## 4    12 Low-income countries  2005 Female 4463965                    0.498
## 5    12 Low-income countries  2010 Female 5094169                    0.507
## 6    12 Low-income countries  2015 Female 6027533                    0.508

#trendline plots of low Income Migrants

ggplot(Lo_Income_trend , aes(x = Year, y = percent_migration_trends)) + 
  geom_line(aes(color = gender), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  ggtitle("Low Income Group - Migration Trendlines % terms: Men vs Women")+
  theme_excel()

# trendline plots of Less Developed Regions Migrants
Less_developed_trend  <- no_zero_DF1 %>%  filter(Region == 'Less developed regions') %>% 
               group_by(Year) %>%
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
               
head(Less_developed_trend)

## # A tibble: 6 x 6
## # Groups:   Year [6]
##    Sort Region                 Year gender  N_years percent_migration_tren~
##   <int> <chr>                 <int> <chr>     <dbl>                   <dbl>
## 1     4 Less developed regio~  1990 Female 33009366                   0.470
## 2     4 Less developed regio~  1995 Female 32073272                   0.469
## 3     4 Less developed regio~  2000 Female 32399129                   0.465
## 4     4 Less developed regio~  2005 Female 34145521                   0.456
## 5     4 Less developed regio~  2010 Female 39515627                   0.438
## 6     4 Less developed regio~  2015 Female 47178979                   0.436

#trendline plots of Less Developed Migrants

ggplot(Less_developed_trend  , aes(x = Year, y = percent_migration_trends)) + 
  geom_line(aes(color = gender), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  ggtitle("Less Developed Region - Migration Trendlines % terms: Men vs Women")+
  theme_excel()

Observation 2b:

The various charts above were performed within each income countries and development regions to

see how each gender’s(men vs. women) migatory patterns changed over time.

Hi Income countries: Men is the dominant immigrant group relative to women but they diverged

after FY20 with men

growing while women decreased

Less Developed Region: Simiarly, Men again is the dominant immigrant group with men

increasing post FTY20 while

women decreased; similar divergent trendline as in Hi Income countries

More Developed Region: Surprisingly, Women is the dominant group here; trending at an

elevated level vs.men. The interesting fact was, both genders did not see any growth (flat

line) for almost 2 decades before women started to diverge and grew while surprisingly, men

decreased around FY2005. It was until late 2018’s that men rebounded slightly and the women

decreased simultaneously at the opposite end

Low Income countries: Perhaps the most interesting trends of all is this sub-group. Women

were trending higherearly FY20 and then criss-crossed with men in mid FY2005. Post FY2005, men

decreased while women’s numbers soared

Trying to understand the contradictory migration patterns between more developed

regions vs. high income countries and less developed regions vs. low income countries?

Let’s group known developed nations and high income together next to known less developed

nations that are also low income. For example, G7 countries vs. 7 Less developed and Low Income

Countries, which were randomly selected from each hemisphere

Then make a migratory trend comparision along gender lines to see if there were any

discerning (clear) migration behavior between genders.

Note: My hunch is that the way United Nations grouped the countries into various categories

may have cross-listed countries leading to this contradiction

#Segregating countries into poor (randomnly picked from each hemisphere) vs rich (G7 nations)

rich_countires <- no_zero_DF1 %>% filter(Region == 'United States of America'|Region ==  'Canada'|Region ==  'France'|Region == 'Germany'|Region ==  'Italy'|Region == 'United Kingdom'|Region == 'Japan') %>% 
              group_by(Year) %>%
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))
  
  
poor_countries <- no_zero_DF1 %>% filter(Region == 'Albania'|Region =='Venezuela (Bolivarian Republic of)'|Region == 'Mexico'|Region == 'Honduras'|Region =='Syrian Arab Republic'|Region =='Egypt'
|Region =='Senegal') %>% 
               group_by(Year) %>%
               filter(gender == "Male" | gender == "Female") %>% 
               mutate(percent_migration_trends = N_years/sum(N_years))

poor_countries_men <- poor_countries %>% 
                      filter(gender == "Male" | gender == "Female") %>% 
                      group_by(gender) %>% 
                      summarise(sum(N_years)) 

rich_countries_men <- rich_countires%>% 
                      filter(gender == "Male" | gender == "Female") %>% 
                      group_by(gender) %>% 
                      summarise(sum(N_years))  

head(poor_countries)

## # A tibble: 6 x 6
## # Groups:   Year [1]
##    Sort Region                    Year gender N_years percent_migration_tr~
##   <int> <chr>                    <int> <chr>    <dbl>                 <dbl>
## 1    75 Senegal                   1990 Female  131570                0.0409
## 2    81 Egypt                     1990 Female   81838                0.0255
## 3   102 Syrian Arab Republic      1990 Female  350063                0.109 
## 4   177 Honduras                  1990 Female  132850                0.0413
## 5   178 Mexico                    1990 Female  347321                0.108 
## 6   195 Venezuela (Bolivarian R~  1990 Female  507430                0.158

head(rich_countires)

## # A tibble: 6 x 6
## # Groups:   Year [1]
##    Sort Region          Year gender N_years percent_migration_trends
##   <int> <chr>          <int> <chr>    <dbl>                    <dbl>
## 1   129 Japan           1990 Female  536552                   0.0118
## 2   250 United Kingdom  1990 Female 1893838                   0.0416
## 3   259 Italy           1990 Female  785805                   0.0172
## 4   271 France          1990 Female 2897891                   0.0636
## 5   272 Germany         1990 Female 2643053                   0.0580
## 6   280 Canada          1990 Female 2223666                   0.0488

# bar plots of poor countries
ggplot(poor_countries, aes(x=Year, y=percent_migration_trends,fill = gender)) + 
    geom_bar(stat = "identity", position = "dodge") +
  xlab("Year") + ylab("Int'l Migrant Stock") +
  ggtitle("Drivers of World Migration in poor countries by Gender") +
   theme_excel()

# bar plots of G7 "rich" countries
ggplot(rich_countires, aes(x=Year, y=percent_migration_trends,fill = gender)) + 
    geom_bar(stat = "identity", position = "dodge") +
  xlab("Year") + ylab("Int'l Migrant Stock") +
  ggtitle("Drivers of World Migration in G7 countries by Gender") +
   theme_excel()

Observation 3a:

By randomly picking known poor and less deveoped countries vs. known rich and developed

countries such as the G7 nations, it was more visible which gender was more mobile (dominant).

It should be noted also that a more rigorous and robust way to select poor vs. rich countries

is via each countries’ GDP or other socio-economic metrics. This extra step was to quickly

answer the very confusing and contracdictory patterns that emerged in the deep dive into

the constituents of the different groupings in Observation 2b

Visually from the Barcharts, Males were more mobile over the last 3 decades in poorer

nations while the opposite it true for richer nations that made up of the G7 nations; in fact

now that rich vs. poor countries were segregated distinctly to avoid cross-over

categorization, women actually were slightly more mobile 51% to 49% relative men in the rich

nations. While the opposite is true in the poorer countries.

One last question was in the poorer Nations, which country experienced the most

number of migration?

The top 3 countries that experienced the largest migration flows were Venezuela, Syria and

Mexico for the last 3 decades: 1990 to 2019. This was not surprising as these 3 were conflict

nations; Venezuela had the most punishing economic collapse, the Syrians had a massive civil

war while sadly enough, Mexico had the worst civilian violence stemming from the Narco trade

most_migrant_countries <- no_zero_DF1 %>% filter(Region == 'Albania'|Region =='Venezuela (Bolivarian Republic of)'|Region == 'Mexico'|Region == 'Honduras'|Region =='Syrian Arab Republic'|Region =='Egypt'
|Region =='Senegal') %>% 
               group_by(Year) %>% 
               filter(gender=="Total") %>% 
                          arrange(desc(N_years))

most_migrant_countries

## # A tibble: 49 x 5
## # Groups:   Year [7]
##     Sort Region                              Year gender N_years
##    <int> <chr>                              <int> <chr>    <dbl>
##  1   102 Syrian Arab Republic                2010 Total  1787561
##  2   195 Venezuela (Bolivarian Republic of)  2015 Total  1404448
##  3   195 Venezuela (Bolivarian Republic of)  2019 Total  1375690
##  4   195 Venezuela (Bolivarian Republic of)  2010 Total  1347347
##  5   195 Venezuela (Bolivarian Republic of)  2005 Total  1076474
##  6   178 Mexico                              2019 Total  1060707
##  7   178 Mexico                              2015 Total  1028803
##  8   195 Venezuela (Bolivarian Republic of)  1990 Total  1025009
##  9   195 Venezuela (Bolivarian Republic of)  1995 Total  1019996
## 10   195 Venezuela (Bolivarian Republic of)  2000 Total  1013738
## # ... with 39 more rows

ggplot(most_migrant_countries, aes(x=Year, y=N_years/sum(N_years),fill = Region)) + 
    geom_bar(stat = "identity", position = "dodge") +
  xlab("Year") + ylab(" Migrant Stock") +
  ggtitle("Driver countries of Migration in poorer nations") +
   theme_excel()

World migratory Variations over time

# comparing world migration variances over ALL Regions over time

world_migration_wide <- no_zero_DF_wide %>% 
                   group_by(Region, Year) %>% 
                   mutate(femalepct = Female/Total) %>% 
                   mutate(malepct = Male/Total) 
                  

head(world_migration_wide)

## # A tibble: 6 x 8
## # Groups:   Region, Year [6]
##    Sort Region                 Year  Female   Male  Total femalepct malepct
##   <int> <chr>                 <int>   <dbl>  <dbl>  <dbl>     <dbl>   <dbl>
## 1     1 WORLD                  1990  7.53e7 7.77e7 1.53e8     0.492   0.508
## 2     3 More developed regio~  1990  4.23e7 4.04e7 8.28e7     0.512   0.488
## 3     4 Less developed regio~  1990  3.30e7 3.72e7 7.02e7     0.470   0.530
## 4     5 Least developed coun~  1990  5.51e6 5.55e6 1.11e7     0.498   0.502
## 5     6 Less developed regio~  1990  2.75e7 3.17e7 5.92e7     0.465   0.535
## 6     8 High-income countries  1990  3.78e7 4.00e7 7.78e7     0.486   0.514

dfhist <- world_migration_wide %>% 
         group_by(Year) #%>% 

# Overlaid histograms
pf <- ggplot(dfhist, aes(x=femalepct, color=Year)) +
  geom_histogram(fill="red", alpha=0.5, position="identity")+facet_grid(Year ~ .)
pf

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pm <- ggplot(dfhist, aes(x=malepct, color=Year)) +
  geom_histogram(fill="yellow", alpha=0.5, position="identity")+facet_grid(Year ~ .)
pm

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Observation 3b:

Variance Analysis showed that both men and women Total World migratory distributions over time

exhibited normality.

The only difference was women had a left-skew while men had a right skew. Their central

tendencies were very similar as well. The right skewness in men also confirmed the earlier box

plots of the hight income & high developed region groupings which was showing that men relative

to women had a higher spread due to longer positive tails.

Summary Page

This short study had shown very interesting migatory behaviors between men and women over the

time periods: 1990 to 2019. In general, men seem to be more mobile and was able to move into

higher income countries. It was also shown that men were able to move more into the "less

developed regions" as well. However, what is paradoxical, were the trends that showed women

were outpacing men in the “developed regions” and was also better in the low income

countries. This was an ironic “finding” that deserved further investigation and analysis to

say the least. Because these two sets of findings seems to be in contradiction. The next

step of this study was to truly separated out the traditionally known rich nations relative to

to the poorer nations. I randomly picked 7 poor nations from each hemisphere and compared it

to the G7 countries:

The extra step show the following observations:

Once countries were unpacked, we clearly saw that women tend to be more mobile relative to

men at 51% vs. 49%. This statistics curiously was exactly the opposite in poorer countries

with men having the slight advantage.

It was also shown that within the poor countries; Venezuela, Syria and Mexico had the most

migration flow activities, most probably stemming from its internal socio-economic issues