DATA 607 Project 2.

The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to: (1) Choose any three of the “wide” datasets identified in the Week 5 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets: ??? Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. ??? Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] ??? Perform the analysis requested in the discussion item. ??? Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions. (2) Please include in your homework submission, for each of the three chosen datasets: ??? The URL to the .Rmd file in your GitHub repository, and ??? The URL for your rpubs.com web page.

set working directory and Install all the relevant packages and load their respective libraries into R.

Male migrants

Load the following libraries

library(stringr)

library(tidyr)

library(dplyr)

library(tidyverse)

library(tibble)

library(caret)

library(readr)

Upload the data into Github

This will ensure that everyone with access to the github repository can easily audit or retest the data. This ensures ease of accessibility and testing by a wide audience. Follow this link to see uploaded Male migrants .csv file (https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockMale_2019.csv)

male_migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockMale_2019.csv")
head(male_migrants)
#view(head(male_migrants, 20)) # vIew data frame structure and see how many rows to skip.

Skip first 15 rows

As part of data cleanup, skip the first 15 rows that include source information not relevant to out analysis.

male_migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockMale_2019.csv", skip = 15)

head(male_migrants) #Print out first few rows to confirm that the data have been loaded correctly.

Filter for N/As in column X6

Careful review of the data shows that column named X6 only includes data for rows related to countries and N/A’s for rows relating to regions and regional totals. Thus filtering out all N/As in column X6 will leave us with country data only, which is the basis of out analysis. We first view all the N/As under column X6 to confirm none of them relate to country information.

colX6 <- filter(male_migrants, is.na(X6))

x <- length(colX6)
x
## [1] 530
head(colX6)

Exclude N/As in column X6

We then exclude all N/A’s in column X6 and print out the first 6 rows using the head() function.

male_migrants_by_country <- filter(male_migrants, !is.na(X6))

head(male_migrants_by_country)

Rename column X1 and X3

From the above print out, there is need to rename column X1 and X3 as year and country_to respectively.

male_migrants_by_country <- male_migrants_by_country %>% 
        rename(
                year = X1,
                country_to = X3
        )
head(male_migrants_by_country)

View all columns

The above printout shows a number of irrelevant columns that are not necessary for our analysis. Lets print out the entire column names and delete the unnecessary ones to have a cleaner data set.

column_names <- colnames(male_migrants_by_country)
#column_names # umcomment to view entire list of column names
head(column_names)
## [1] "year"       "X2"         "country_to" "X4"         "X5"        
## [6] "X6"

Exclude irrelevant columns

The above print out reveals that we do not need all column names that start with “X”, “Total” or “Other”. We delete these columns using the srtarts_with function.

male_migrants_by_country <- male_migrants_by_country %>% 
        select(-starts_with("X"), -starts_with("Other"), -starts_with("Total"))

head(male_migrants_by_country)

View dimentions of resulting data frame

We use dim() function to have an idea of how many rows and columns we have for our analysis.

dim(male_migrants_by_country)
## [1] 1624  234

Confrim column names.

This is what we need for our analysis.

column_names_clean <- colnames(male_migrants_by_country)
#column_names_clean # uncomment to view entire list of cleaned up column names
head(column_names_clean)
## [1] "year"           "country_to"     "Afghanistan"    "Albania"       
## [5] "Algeria"        "American Samoa"

View number of columns

Get the length of the column names to be used in the next line of code.

y <- length(colnames(male_migrants_by_country))

y 
## [1] 234

Gather relevant columns

Let us use gather() function to gather all columns with country names from the 3rd column spanning the entire length of the columns into a single column and exclude any and all N/As to obtain clean data.

no_of_migrants_per_country <- gather(male_migrants_by_country, "country_from", "no_of_migrants", 3:y, na.rm = TRUE)

head(no_of_migrants_per_country)

Conversion of chr to dbl

convert the no_of_migrants data column from characters to doubles for statistical analysis. This we will do using the parse_number() function. Print out using head() function the first 6 rows and confirm this conversion.

no_of_migrants_per_country$no_of_migrants <- parse_number(no_of_migrants_per_country$no_of_migrants)

clean_male_data <- no_of_migrants_per_country

head(clean_male_data)

Down stream analysis

Ordering of data

Ordering data by country with largest inflow of male migrants

by_country_to <- clean_male_data %>% 
        group_by(year, country_from, country_to) %>% 
        summarise(total_male_migrants = sum(no_of_migrants)) %>% 
        arrange(desc(total_male_migrants))
head(by_country_to)

Ordering the data by the total no of male migrants since 1995 to 2019.

total_migrants_since_1995 <- clean_male_data %>% 
        group_by(country_from, country_to) %>% 
        summarise(total_male_migrants = sum(no_of_migrants)) %>% 
        arrange(desc(total_male_migrants))
head(total_migrants_since_1995)

Ordering the data by the countries sending out the least number of migrants

least_no_migrants_from <- clean_male_data %>% 
        group_by(country_from) %>% 
        summarise(total_migrants_since_1995 = sum(no_of_migrants)) %>% 
        arrange(total_migrants_since_1995)
head(least_no_migrants_from)

Ordering the data by the countries receiving the largest number of imigrants since 1995.

largest_no_migrants_to <- clean_male_data %>% 
        group_by(country_to) %>% 
        summarise(total_migrants_since_1995 = sum(no_of_migrants)) %>% 
        arrange(desc(total_migrants_since_1995))
head(largest_no_migrants_to)

Ordering the data by the countries receiving the least number of imigrants since 1995.

least_no_migrants_to <- clean_male_data %>% 
        group_by(country_to) %>% 
        summarise(total_migrants_since_1995 = sum(no_of_migrants)) %>% 
        arrange(total_migrants_since_1995)

head(least_no_migrants_to)

Conclusion:

The top 5 countries receiving the largest mumber of male migrants are USA, Rusia Federation, Saudi Arabia, GErmany and France The top 5 countries receiving the least mumber of male migrants are Tivalu, Saint Helena, Tokelau, Niue and Saint Pierre and Miqueton

Female migrants

The second section will involve replicating the code above to analyse the immigration data on women migrants. This will serve as a confirmation of the replicability of the code to similar data.

Follow this link to see uploaded female migrants .csv file (https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockFemale_2019.csv)

female_migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockFemale_2019.csv")

#view(head(female_migrants, 20)) # uncomment to view data frame structure and see how many rows to skip.

Skip first 15 rows

As part of data cleanup, skip the first 15 rows that include source information not relevant to out analysis.

female_migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockFemale_2019.csv", skip = 15)

head(female_migrants) #Print out first few rows to confirm that the data have been loaded correctly.

Filter for N/As in column X6

Careful review of the data shows that column named X6 only includes data for rows related to countries and N/A’s for rows relating to regions and regional totals. Thus filtering out all N/As in column X6 will leave us with country data only, which is the basis of out analysis. We first view all the N/As under column X6 to confirm none of them relate to country information.

colX6 <- filter(female_migrants, is.na(X6))

a <- length(colX6)
a
## [1] 530
head(colX6)

Exclude N/As in column X6

We then exclude all N/A’s in column X6 and print out the first 6 rows using the head() function.

female_migrants_by_country <- filter(female_migrants, !is.na(X6))

head(female_migrants_by_country)

Rename column X1 and X3

From the above print out, there is need to rename column X1 and X3 as year and country_to respectively.

female_migrants_by_country <- female_migrants_by_country %>% 
        rename(
                year = X1,
                country_to = X3
        )
head(female_migrants_by_country)

View all columns

The above printout shows a number of irrelevant columns that are not necessary for our analysis. Lets print out the entire column names and delete the unnecessary ones to have a cleaner data set.

female_col_names <- colnames(female_migrants_by_country)
#female_col_names # uncomment to view entire list of column names
head(female_col_names)
## [1] "year"       "X2"         "country_to" "X4"         "X5"        
## [6] "X6"

Exclude irrelevant columns

The above print out reveals that we do not need all column names that start with “X”, “Total” or “Other”. We delete these columns using the srtarts_with function.

female_migrants_by_country <- female_migrants_by_country %>% 
        select(-starts_with("X"), -starts_with("Other"), -starts_with("Total"))

head(female_migrants_by_country)

View dimentions of resulting data frame

We use dim() function to have an idea of how many rows and columns we have for our analysis.

dim(female_migrants_by_country)
## [1] 1624  234

Confrim column names.

The print out below is a confrimation of the column names. This is what we need for our analysis.

clean_female_col_name <- colnames(female_migrants_by_country)
#clean_female_col_name # uncomment to view entire list of clean column names
clean_female_col_name
##   [1] "year"                              
##   [2] "country_to"                        
##   [3] "Afghanistan"                       
##   [4] "Albania"                           
##   [5] "Algeria"                           
##   [6] "American Samoa"                    
##   [7] "Andorra"                           
##   [8] "Angola"                            
##   [9] "Anguilla"                          
##  [10] "Antigua and Barbuda"               
##  [11] "Argentina"                         
##  [12] "Armenia"                           
##  [13] "Aruba"                             
##  [14] "Australia"                         
##  [15] "Austria"                           
##  [16] "Azerbaijan"                        
##  [17] "Bahamas"                           
##  [18] "Bahrain"                           
##  [19] "Bangladesh"                        
##  [20] "Barbados"                          
##  [21] "Belarus"                           
##  [22] "Belgium"                           
##  [23] "Belize"                            
##  [24] "Benin"                             
##  [25] "Bermuda"                           
##  [26] "Bhutan"                            
##  [27] "Bolivia (Plurinational State of)"  
##  [28] "Bonaire, Sint Eustatius and Saba"  
##  [29] "Bosnia and Herzegovina"            
##  [30] "Botswana"                          
##  [31] "Brazil"                            
##  [32] "British Virgin Islands"            
##  [33] "Brunei Darussalam"                 
##  [34] "Bulgaria"                          
##  [35] "Burkina Faso"                      
##  [36] "Burundi"                           
##  [37] "Cabo Verde"                        
##  [38] "Cambodia"                          
##  [39] "Cameroon"                          
##  [40] "Canada"                            
##  [41] "Cayman Islands"                    
##  [42] "Central African Republic"          
##  [43] "Chad"                              
##  [44] "Channel Islands"                   
##  [45] "Chile"                             
##  [46] "China"                             
##  [47] "China, Hong Kong SAR"              
##  [48] "China, Macao SAR"                  
##  [49] "Colombia"                          
##  [50] "Comoros"                           
##  [51] "Congo"                             
##  [52] "Cook Islands"                      
##  [53] "Costa Rica"                        
##  [54] "Côte d'Ivoire"                     
##  [55] "Croatia"                           
##  [56] "Cuba"                              
##  [57] "Curaçao"                           
##  [58] "Cyprus"                            
##  [59] "Czechia"                           
##  [60] "Dem. People's Republic of Korea"   
##  [61] "Democratic Republic of the Congo"  
##  [62] "Denmark"                           
##  [63] "Djibouti"                          
##  [64] "Dominica"                          
##  [65] "Dominican Republic"                
##  [66] "Ecuador"                           
##  [67] "Egypt"                             
##  [68] "El Salvador"                       
##  [69] "Equatorial Guinea"                 
##  [70] "Eritrea"                           
##  [71] "Estonia"                           
##  [72] "Eswatini"                          
##  [73] "Ethiopia"                          
##  [74] "Falkland Islands (Malvinas)"       
##  [75] "Faroe Islands"                     
##  [76] "Fiji"                              
##  [77] "Finland"                           
##  [78] "France"                            
##  [79] "French Guiana"                     
##  [80] "French Polynesia"                  
##  [81] "Gabon"                             
##  [82] "Gambia"                            
##  [83] "Georgia"                           
##  [84] "Germany"                           
##  [85] "Ghana"                             
##  [86] "Gibraltar"                         
##  [87] "Greece"                            
##  [88] "Greenland"                         
##  [89] "Grenada"                           
##  [90] "Guadeloupe"                        
##  [91] "Guam"                              
##  [92] "Guatemala"                         
##  [93] "Guinea"                            
##  [94] "Guinea-Bissau"                     
##  [95] "Guyana"                            
##  [96] "Haiti"                             
##  [97] "Holy See"                          
##  [98] "Honduras"                          
##  [99] "Hungary"                           
## [100] "Iceland"                           
## [101] "India"                             
## [102] "Indonesia"                         
## [103] "Iran (Islamic Republic of)"        
## [104] "Iraq"                              
## [105] "Ireland"                           
## [106] "Isle of Man"                       
## [107] "Israel"                            
## [108] "Italy"                             
## [109] "Jamaica"                           
## [110] "Japan"                             
## [111] "Jordan"                            
## [112] "Kazakhstan"                        
## [113] "Kenya"                             
## [114] "Kiribati"                          
## [115] "Kuwait"                            
## [116] "Kyrgyzstan"                        
## [117] "Lao People's Democratic Republic"  
## [118] "Latvia"                            
## [119] "Lebanon"                           
## [120] "Lesotho"                           
## [121] "Liberia"                           
## [122] "Libya"                             
## [123] "Liechtenstein"                     
## [124] "Lithuania"                         
## [125] "Luxembourg"                        
## [126] "Madagascar"                        
## [127] "Malawi"                            
## [128] "Malaysia"                          
## [129] "Maldives"                          
## [130] "Mali"                              
## [131] "Malta"                             
## [132] "Marshall Islands"                  
## [133] "Martinique"                        
## [134] "Mauritania"                        
## [135] "Mauritius"                         
## [136] "Mayotte"                           
## [137] "Mexico"                            
## [138] "Micronesia (Fed. States of)"       
## [139] "Monaco"                            
## [140] "Mongolia"                          
## [141] "Montenegro"                        
## [142] "Montserrat"                        
## [143] "Morocco"                           
## [144] "Mozambique"                        
## [145] "Myanmar"                           
## [146] "Namibia"                           
## [147] "Nauru"                             
## [148] "Nepal"                             
## [149] "Netherlands"                       
## [150] "New Caledonia"                     
## [151] "New Zealand"                       
## [152] "Nicaragua"                         
## [153] "Niger"                             
## [154] "Nigeria"                           
## [155] "Niue"                              
## [156] "North Macedonia"                   
## [157] "Northern Mariana Islands"          
## [158] "Norway"                            
## [159] "Oman"                              
## [160] "Pakistan"                          
## [161] "Palau"                             
## [162] "Panama"                            
## [163] "Papua New Guinea"                  
## [164] "Paraguay"                          
## [165] "Peru"                              
## [166] "Philippines"                       
## [167] "Poland"                            
## [168] "Portugal"                          
## [169] "Puerto Rico"                       
## [170] "Qatar"                             
## [171] "Republic of Korea"                 
## [172] "Republic of Moldova"               
## [173] "Réunion"                           
## [174] "Romania"                           
## [175] "Russian Federation"                
## [176] "Rwanda"                            
## [177] "Saint Helena"                      
## [178] "Saint Kitts and Nevis"             
## [179] "Saint Lucia"                       
## [180] "Saint Pierre and Miquelon"         
## [181] "Saint Vincent and the Grenadines"  
## [182] "Samoa"                             
## [183] "San Marino"                        
## [184] "Sao Tome and Principe"             
## [185] "Saudi Arabia"                      
## [186] "Senegal"                           
## [187] "Serbia"                            
## [188] "Seychelles"                        
## [189] "Sierra Leone"                      
## [190] "Singapore"                         
## [191] "Sint Maarten (Dutch part)"         
## [192] "Slovakia"                          
## [193] "Slovenia"                          
## [194] "Solomon Islands"                   
## [195] "Somalia"                           
## [196] "South Africa"                      
## [197] "South Sudan"                       
## [198] "Spain"                             
## [199] "Sri Lanka"                         
## [200] "State of Palestine"                
## [201] "Sudan"                             
## [202] "Suriname"                          
## [203] "Sweden"                            
## [204] "Switzerland"                       
## [205] "Syrian Arab Republic"              
## [206] "Tajikistan"                        
## [207] "Thailand"                          
## [208] "Timor-Leste"                       
## [209] "Togo"                              
## [210] "Tokelau"                           
## [211] "Tonga"                             
## [212] "Trinidad and Tobago"               
## [213] "Tunisia"                           
## [214] "Turkey"                            
## [215] "Turkmenistan"                      
## [216] "Turks and Caicos Islands"          
## [217] "Tuvalu"                            
## [218] "Uganda"                            
## [219] "Ukraine"                           
## [220] "United Arab Emirates"              
## [221] "United Kingdom"                    
## [222] "United Republic of Tanzania"       
## [223] "United States of America"          
## [224] "United States Virgin Islands"      
## [225] "Uruguay"                           
## [226] "Uzbekistan"                        
## [227] "Vanuatu"                           
## [228] "Venezuela (Bolivarian Republic of)"
## [229] "Viet Nam"                          
## [230] "Wallis and Futuna Islands"         
## [231] "Western Sahara"                    
## [232] "Yemen"                             
## [233] "Zambia"                            
## [234] "Zimbabwe"

View number of columns

Get the length of the column names to be used in the next line of code.

y <- length(colnames(female_migrants_by_country))

y 
## [1] 234

Gather relevant columns

Let us use gather() function to gather all columns with country names from the 3rd column spanning the entire length of the columns into a single column and exclude any and all N/As to obtain clean data.

no_of_female_migrants_per_country <- gather(female_migrants_by_country, "country_from", "no_of_female_migrants", 3:y, na.rm = TRUE)

head(no_of_female_migrants_per_country)

Conversion of chr to dbl

convert the no_of_migrants data column from characters to doubles for statistical analysis. This we will do using the parse_number() function. Print out using head() function the first 6 rows and confirm this conversion.

no_of_female_migrants_per_country$no_of_female_migrants <- parse_number(no_of_female_migrants_per_country$no_of_female_migrants)

clean_female_data <- no_of_female_migrants_per_country

head(clean_female_data)

Down stream analysis

Ordering of data

Ordering data by country with largest inflow of male migrants

female_by_country_to <- clean_female_data %>% 
        group_by(year, country_from, country_to) %>% 
        summarise(total_female_migrants = sum(no_of_female_migrants)) %>% 
        arrange(desc(total_female_migrants))
head(female_by_country_to)

Ordering the data by the total no of male migrants since 1995 to 2019.

total_female_migrants_since_1995 <- clean_female_data %>% 
        group_by(country_from, country_to) %>% 
        summarise(total_female_migrants = sum(no_of_female_migrants)) %>% 
        arrange(desc(total_female_migrants))
head(total_female_migrants_since_1995)

Ordering the data by the countries sending out the least number of migrants

least_no_female_migrants_from <- clean_female_data %>% 
        group_by(country_from) %>% 
        summarise(total_female_migrants_since_1995 = sum(no_of_female_migrants)) %>% 
        arrange(total_female_migrants_since_1995)
head(least_no_female_migrants_from)

Ordering the data by the countries receiving the largest number of imigrants since 1995.

largest_no_female_migrants_to <- clean_female_data %>% 
        group_by(country_to) %>% 
        summarise(total_female_migrants_since_1995 = sum(no_of_female_migrants)) %>% 
        arrange(desc(total_female_migrants_since_1995))
head(largest_no_female_migrants_to)

Ordering the data by the countries receiving the least number of imigrants since 1995.

least_no_female_migrants_to <- clean_female_data %>% 
        group_by(country_to) %>% 
        summarise(total_female_migrants_since_1995 = sum(no_of_female_migrants)) %>% 
        arrange(total_female_migrants_since_1995)

head(least_no_female_migrants_to)

Conclusion:

The top 5 countries receiving the largest mumber of female migrants are USA, Rusia Federation, Germany, France and United Kingdom. The top 5 countries receiving the least mumber of female migrants are Saint Helena, Tivalu, Tokelau, Niue and Micronesia (Fed. States of).


Migrants by destimation country

Upload the data into Github

This will ensure that everyone with access to the github repository can easily audit or retest the data. This ensures ease of accessibility and testing by a wide audience. Follow this link to see uploaded Male migrants .csv file (https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockBySexByDestination_2019.csv)

migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockBySexByDestination_2019.csv")
head(migrants)
#view(head(male_migrants, 20)) # vIew data frame structure and see how many rows to skip.

Skip first 15 rows

As part of data cleanup, skip the first 15 rows that include source information not relevant to out analysis.

migrants <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Project-2/master/UN_MigrantStockBySexByDestination_2019.csv", skip = 15)

head(migrants) #Print out first few rows to confirm that the data have been loaded correctly.

Filter for N/As in column X5

Careful review of the data shows that column named X5 only includes data for rows related to countries and N/A’s for rows relating to regions and regional totals. Thus filtering out all N/As in column X5 will leave us with country data only, which is the basis of out analysis. We first view all the N/As under column X5 to confirm none of them relate to country information.

colX5 <- filter(migrants, is.na(X5))

x <- length(colX5)
x
## [1] 26
head(colX5)

Exclude N/As in column X5

We then exclude all N/A’s in column X6 and print out the first 6 rows using the head() function.

migrants_by_country <- filter(migrants, !is.na(X5))

head(migrants_by_country)

Rename column X2

From the above print out, there is need to rename column X2 dest_country.

migrants_by_country <- migrants_by_country %>% 
        rename(
                dest_country = X2
        )
head(migrants_by_country)

View all columns

The above printout shows a number of irrelevant columns that are not necessary for our analysis. Lets print out the entire column names and delete the unnecessary ones to have a cleaner data set.

column_names <- colnames(migrants_by_country)
#column_names # umcomment to view entire list of column names
head(column_names)
## [1] "X1"           "dest_country" "X3"           "X4"          
## [5] "X5"           "1990"

Exclude irrelevant columns

The above print out reveals that we do not need all column names that start with “X”. We delete these columns using the srtarts_with function.

migrants_by_country <- migrants_by_country %>% 
        select(-starts_with("X"))


migrants_by_country <- migrants_by_country %>% 
        select(-c(2:8))

migrants_by_country

View dimentions of resulting data frame

We use dim() function to have an idea of how many rows and columns we have for our analysis.

dim(migrants_by_country)
## [1] 232  15

Confrim column names.

This is what we need for our analysis.

column_names_clean <- colnames(migrants_by_country)
#column_names_clean # uncomment to view entire list of cleaned up column names
head(column_names_clean)
## [1] "dest_country" "1990_1"       "1995_1"       "2000_1"      
## [5] "2005_1"       "2010_1"

View number of columns

Get the length of the column names to be used in the next line of code.

y <- length(colnames(migrants_by_country))

y 
## [1] 15

clean up data

Let us use gather() function to gather all columns with years into a single columns and exclude any and all N/As to obtain clean data. Spread the resulting data by year column and rename “1” as male and “2” as female.

no_of_migrants_per_country <- mutate(gather(migrants_by_country, "year", "no_of_migrants", 2:y, na.rm = TRUE))

head(no_of_migrants_per_country)
no_of_migrants_per_country <- no_of_migrants_per_country %>%
        separate(year, c("year", "sex"), sep = "_")

no_of_migrants_per_country

Convert the years column to number format

no_of_migrants_per_country$year <- parse_number(no_of_migrants_per_country$year)

no_of_migrants_per_country
no_of_migrants_per_country <- no_of_migrants_per_country %>%
        spread(sex, no_of_migrants)
        

names(no_of_migrants_per_country)
## [1] "dest_country" "year"         "1"            "2"
no_of_migrants_per_country <- no_of_migrants_per_country %>% 
        rename(
                male = "1",
                female = "2"
        )
head(no_of_migrants_per_country)

Conversion of chr to dbl

convert the no_of_migrants data column from characters to doubles for statistical analysis. This we will do using the parse_number() function. Print out using head() function the first 6 rows and confirm this conversion.

no_of_migrants_per_country$male <- parse_number(no_of_migrants_per_country$male)
no_of_migrants_per_country$female <- parse_number(no_of_migrants_per_country$female)

clean_migrants_data <- no_of_migrants_per_country

head(clean_migrants_data)

Down stream analysis

Ordering of data

Ordering data by country with largest inflow of migrants

by_country <- clean_migrants_data %>% 
        group_by(year, dest_country) %>% 
        summarise(total_migrants = male + female) %>% 
        arrange(desc(total_migrants))
head(by_country)

Ordering the data of male migrants by the destination countries by year

male_by_country <- clean_migrants_data %>% 
        group_by(dest_country, year) %>% 
        summarise(male = male) %>% 
        arrange(desc(male))
head(male_by_country)

Ordering the data of female migrants by the destination countries by year

female_by_country <- clean_migrants_data %>% 
        group_by(dest_country, year) %>% 
        summarise(female = female) %>% 
        arrange(desc(female))
head(female_by_country)

Ordering the data by % of male migrants by the destination countries by year

Perc_male_by_country <- clean_migrants_data %>% 
        group_by(dest_country, year) %>% 
        summarise(perc_male = male/(male + female)) %>% 
        arrange(desc(perc_male))
head(Perc_male_by_country)

Ordering the data by % female migrants by the destination countries by year

Perc_female_by_country <- clean_migrants_data %>% 
        group_by(dest_country, year) %>% 
        summarise(perc_female = female/(male + female)) %>% 
        arrange(desc(perc_female))
head(Perc_female_by_country)

Conclusion

Maldives received the highest % of male migrants while nepal received the highest % of female migrants