Data Source: United Nations, Department of Economic and Social Affairs. Population Division (2019). International Migrant Stock 2019 (United Nations database, POP/DB/MIG/Stock/Rev.2019).
The dataset presents estimates of international migrant by age, sex and origin. Estimates are presented for 1990, 1995, 2000, 2005, 2010, 2015 and 2019 and are available for all countries and areas of the world. The estimates are based on official statistics on the foreign-born or the foreign population.
I will be using the International migrant stock by destination and origin data file for this project.
Here I am importing a .csv file with UN migration data for 1990 - 2019. It has three columns with Years, Destination Regions and Total number of migrations. These data came from nited Nations, Department of Economic and Social Affairs. Population Division (2019) web site.
UN_migration_data <- read.csv("C:/Users/Staff/Documents/Geeth/SPS-Fall 19/DATA 607/UN_migration_data.csv")
head(UN_migration_data)
## ï..Year Major.area..region..country.or.area.of.destination origin.Total
## 1 1990 Geographic regions ..
## 2 1990 Africa 15,689,666
## 3 1990 Asia 48,209,949
## 4 1990 Europe 49,608,231
## 5 1990 Latin America and the Caribbean 7,161,371
## 6 1990 Northern America 27,610,408
View(UN_migration_data)
In several sections I am going to tidy my data using some data wrangling functions that comes with dplyr and tidyr.
names(UN_migration_data)
## [1] "ï..Year"
## [2] "Major.area..region..country.or.area.of.destination"
## [3] "origin.Total"
UN_migration_data$Major.area..region..country.or.area.of.destination <- as.character(UN_migration_data$Major.area..region..country.or.area.of.destination)
Fist of all I am going to start by renaming my columns. By default my data table has column titles with symbols and some long names. In this section I will eliminate all of them and rename with simple titles that make sense.
UN_migration <- UN_migration_data %>%
rename(Year = ï..Year) %>%
rename(Region = Major.area..region..country.or.area.of.destination) %>%
rename(Total.migrations = origin.Total)
UN_migration
## Year Region Total.migrations
## 1 1990 Geographic regions ..
## 2 1990 Africa 15,689,666
## 3 1990 Asia 48,209,949
## 4 1990 Europe 49,608,231
## 5 1990 Latin America and the Caribbean 7,161,371
## 6 1990 Northern America 27,610,408
## 7 1990 Oceania 4,731,848
## 8 1995 Geographic regions ..
## 9 1995 Africa 16,357,077
## 10 1995 Asia 46,418,044
## 11 1995 Europe 53,489,829
## 12 1995 Latin America and the Caribbean 6,688,710
## 13 1995 Northern America 33,340,948
## 14 1995 Oceania 5,022,287
## 15 2000 Geographic regions ..
## 16 2000 Africa 15,051,677
## 17 2000 Asia 49,394,322
## 18 2000 Europe 56,858,788
## 19 2000 Latin America and the Caribbean 6,570,729
## 20 2000 Northern America 40,351,694
## 21 2000 Oceania 5,361,231
## 22 2005 Geographic regions ..
## 23 2005 Africa 15,969,835
## 24 2005 Asia 53,439,306
## 25 2005 Europe 63,594,822
## 26 2005 Latin America and the Caribbean 7,224,942
## 27 2005 Northern America 45,363,257
## 28 2005 Oceania 6,023,412
## 29 2010 Geographic regions ..
## 30 2010 Africa 17,804,198
## 31 2010 Asia 65,938,712
## 32 2010 Europe 70,678,025
## 33 2010 Latin America and the Caribbean 8,262,433
## 34 2010 Northern America 50,970,861
## 35 2010 Oceania 7,127,680
## 36 2015 Geographic regions ..
## 37 2015 Africa 23,476,251
## 38 2015 Asia 77,231,760
## 39 2015 Europe 75,008,219
## 40 2015 Latin America and the Caribbean 9,441,679
## 41 2015 Northern America 55,633,443
## 42 2015 Oceania 8,069,944
## 43 2019 Geographic regions ..
## 44 2019 Africa 26,529,334
## 45 2019 Asia 83,559,197
## 46 2019 Europe 82,304,539
## 47 2019 Latin America and the Caribbean 11,673,288
## 48 2019 Northern America 58,647,822
## 49 2019 Oceania 8,927,925
While I was going through my data, I found out that there is a sub tiltle in every row called Geographic region I am going to delete those rows so that my data will only have region names.
UN_migration <- UN_migration[UN_migration$Region != "Geographic regions",]
UN_migration
## Year Region Total.migrations
## 2 1990 Africa 15,689,666
## 3 1990 Asia 48,209,949
## 4 1990 Europe 49,608,231
## 5 1990 Latin America and the Caribbean 7,161,371
## 6 1990 Northern America 27,610,408
## 7 1990 Oceania 4,731,848
## 9 1995 Africa 16,357,077
## 10 1995 Asia 46,418,044
## 11 1995 Europe 53,489,829
## 12 1995 Latin America and the Caribbean 6,688,710
## 13 1995 Northern America 33,340,948
## 14 1995 Oceania 5,022,287
## 16 2000 Africa 15,051,677
## 17 2000 Asia 49,394,322
## 18 2000 Europe 56,858,788
## 19 2000 Latin America and the Caribbean 6,570,729
## 20 2000 Northern America 40,351,694
## 21 2000 Oceania 5,361,231
## 23 2005 Africa 15,969,835
## 24 2005 Asia 53,439,306
## 25 2005 Europe 63,594,822
## 26 2005 Latin America and the Caribbean 7,224,942
## 27 2005 Northern America 45,363,257
## 28 2005 Oceania 6,023,412
## 30 2010 Africa 17,804,198
## 31 2010 Asia 65,938,712
## 32 2010 Europe 70,678,025
## 33 2010 Latin America and the Caribbean 8,262,433
## 34 2010 Northern America 50,970,861
## 35 2010 Oceania 7,127,680
## 37 2015 Africa 23,476,251
## 38 2015 Asia 77,231,760
## 39 2015 Europe 75,008,219
## 40 2015 Latin America and the Caribbean 9,441,679
## 41 2015 Northern America 55,633,443
## 42 2015 Oceania 8,069,944
## 44 2019 Africa 26,529,334
## 45 2019 Asia 83,559,197
## 46 2019 Europe 82,304,539
## 47 2019 Latin America and the Caribbean 11,673,288
## 48 2019 Northern America 58,647,822
## 49 2019 Oceania 8,927,925
In this section I am going to change my long data view to a wide data view by using spread() function. Column Year in my original data set will break in into 7 columns for each year.
UN_Migration_byYear <- as.data.frame.matrix(UN_migration)
UN_Migration_byYear <- UN_Migration_byYear %>%
spread(key = Year, value = Total.migrations)
head(UN_Migration_byYear)
## Region 1990 1995 2000
## 1 Africa 15,689,666 16,357,077 15,051,677
## 2 Asia 48,209,949 46,418,044 49,394,322
## 3 Europe 49,608,231 53,489,829 56,858,788
## 4 Latin America and the Caribbean 7,161,371 6,688,710 6,570,729
## 5 Northern America 27,610,408 33,340,948 40,351,694
## 6 Oceania 4,731,848 5,022,287 5,361,231
## 2005 2010 2015 2019
## 1 15,969,835 17,804,198 23,476,251 26,529,334
## 2 53,439,306 65,938,712 77,231,760 83,559,197
## 3 63,594,822 70,678,025 75,008,219 82,304,539
## 4 7,224,942 8,262,433 9,441,679 11,673,288
## 5 45,363,257 50,970,861 55,633,443 58,647,822
## 6 6,023,412 7,127,680 8,069,944 8,927,925
UN_Migration_byYear$`1990` <- str_remove(UN_Migration_byYear$`1990`, ",")
as.numeric(UN_Migration_byYear$`1990`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`1995` <- str_remove(UN_Migration_byYear$`1995`, ",")
as.numeric(UN_Migration_byYear$`1995`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`2000` <- str_remove(UN_Migration_byYear$`2000`, ",")
as.numeric(UN_Migration_byYear$`2000`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`2005` <- str_remove(UN_Migration_byYear$`2005`, ",")
as.numeric(UN_Migration_byYear$`2005`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`2010` <- str_remove(UN_Migration_byYear$`2010`, ",")
as.numeric(UN_Migration_byYear$`2010`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`2015` <- str_remove(UN_Migration_byYear$`2015`, ",")
as.numeric(UN_Migration_byYear$`2015`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
UN_Migration_byYear$`2019` <- str_remove(UN_Migration_byYear$`2019`, ",")
as.numeric(UN_Migration_byYear$`2019`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
as.integer(UN_Migration_byYear$`1990`)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA
str(UN_Migration_byYear)
## 'data.frame': 6 obs. of 8 variables:
## $ Region: chr "Africa" "Asia" "Europe" "Latin America and the Caribbean" ...
## $ 1990 : chr "15689,666" "48209,949" "49608,231" "7161,371" ...
## $ 1995 : chr "16357,077" "46418,044" "53489,829" "6688,710" ...
## $ 2000 : chr "15051,677" "49394,322" "56858,788" "6570,729" ...
## $ 2005 : chr "15969,835" "53439,306" "63594,822" "7224,942" ...
## $ 2010 : chr "17804,198" "65938,712" "70678,025" "8262,433" ...
## $ 2015 : chr "23476,251" "77231,760" "75008,219" "9441,679" ...
## $ 2019 : chr "26529,334" "83559,197" "82304,539" "11673,288" ...