Missing Migrants Exploratory Data Analysis

In this notebook, I will be exploring the data about migrants who have gone missing or died during the migrations routes worldwide in 2014 to 2021.

Introduction

The dataset comes from the Missing Migrant Project that tracks deaths of migrants, including refugees and asylum-seekers. This project is a joint initiative of IOM’s Global Migration Data Analysis Centre (GMDAC) and Media and Communications Division (MCD). For more information, you can go to this website: https://missingmigrants.iom.int/

Data Preparation

library(flexdashboard)
library(lubridate)
library(readr) 
library(ggplot2)
library(dplyr)
library(viridis)
library(reshape2) 
migrants <- read.csv("data/missing_migrants.csv")
head(migrants)
##   X       Main.ID   Incident.ID        Region           Incident.Date Year
## 1 0 2014.MMP00001 2014.MMP00001 North America Mon, 01/06/2014 - 12:00 2014
## 2 1 2014.MMP00002 2014.MMP00002 North America Sun, 01/12/2014 - 12:00 2014
## 3 2 2014.MMP00003 2014.MMP00003 North America Tue, 01/14/2014 - 12:00 2014
## 4 3 2014.MMP00004 2014.MMP00004 North America Thu, 01/16/2014 - 12:00 2014
## 5 4 2014.MMP00005 2014.MMP00005        Europe Thu, 01/16/2014 - 12:00 2014
## 6 5 2014.MMP00006 2014.MMP00006 North America Fri, 01/17/2014 - 12:00 2014
##   Reported.Month Number.Dead Minimum.Estimated.Number.of.Missing
## 1        January           1                                  NA
## 2        January           1                                  NA
## 3        January           1                                  NA
## 4        January           1                                  NA
## 5        January           1                                   0
## 6        January           1                                  NA
##   Total.Number.of.Dead.and.Missing Number.of.Survivors Number.of.Females
## 1                                1                  NA                NA
## 2                                1                  NA                NA
## 3                                1                  NA                NA
## 4                                1                  NA                NA
## 5                                1                   2                NA
## 6                                1                  NA                NA
##   Number.of.Males Number.of.Children
## 1               1                 NA
## 2              NA                 NA
## 3              NA                 NA
## 4               1                 NA
## 5               1                 NA
## 6              NA                 NA
##                                                           Cause.of.Death
## 1                                                       Mixed or unknown
## 2                                                       Mixed or unknown
## 3                                                       Mixed or unknown
## 4                                                               Violence
## 5 Harsh environmental conditions / lack of adequate shelter, food, water
## 6                                                               Violence
##                                                                                             Location.of.death
## 1 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 2 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 3 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 4                                                                                  near Douglas, Arizona, USA
## 5                                                                           Border between Russia and Estonia
## 6 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
##                                                                       Information.Source
## 1                                     Pima County Office of the Medical Examiner (PCOME)
## 2                                     Pima County Office of the Medical Examiner (PCOME)
## 3                                     Pima County Office of the Medical Examiner (PCOME)
## 4 Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)
## 5                                                      EUBusiness (Agence France-Presse)
## 6                                     Pima County Office of the Medical Examiner (PCOME)
##              Coordinates          Migrantion.route                        URL
## 1 31.650259, -110.366453 US-Mexico border crossing http://humaneborders.info/
## 2   31.59713, -111.73756 US-Mexico border crossing                           
## 3   31.94026, -113.01125 US-Mexico border crossing                           
## 4 31.506777, -109.315632 US-Mexico border crossing      http://bit.ly/1qfIw00
## 5            59.1551, 28                                http://bit.ly/1rTFTjR
## 6   32.45435, -113.18402 US-Mexico border crossing                           
##   UNSD.Geographical.Grouping Source.Quality
## 1           Northern America              5
## 2           Northern America              5
## 3           Northern America              5
## 4           Northern America              5
## 5            Northern Europe              1
## 6           Northern America              5

Datasets Information

Here are variables information from the datasets:

  • Main.ID : An automatically generated number used to identify each unique entry in the dataset.
  • Incident.ID : An automatically generated number used to identify each unique entry in the dataset.
  • Region : The region in which an incident took place.
  • Incident.Date : Estimated date of death. In cases where the exact date of death is not known, this variable indicates the date in.
  • Year : The year in which the incident occurred.
  • Reported.Month : The month in which the incident occurred.
  • Number.Dead : The total number of people confirmed dead in one incident, i.e. the number of bodies recovered.
  • Minimum.Estimated.Number.of.Missing : The total number of those who are missing and are thus assumed to be dead.
  • Total.Number.of.Dead.and.Missing : The sum of the ‘number dead’ and ‘number missing’ variables.
  • Number.of.Survivors : The number of migrants that survived the incident, if known.
  • Number.of.Females : Indicates the number of females found dead or missing. If unknown, it is left blank.
  • Number.of.Males : Indicates the number of males found dead or missing. If unknown, it is left blank.
  • Number.of.Children : Indicates the number of individuals under the age of 18 found dead or missing. If unknown, it is left blank.
  • Cause.of.Death : The determination of conditions resulting in the migrant’s death i.e. the circumstances of the event.
  • Location.of.death : Place where the death(s) occurred or where the body or bodies were found.
  • Information.Source : The source of information.
  • Coordinates : Place where the death(s) occurred or where the body or bodies were found.
  • Migrantion.route : Name of the migrant route on which incident occurred, if known. If unknown, it is left blank.
  • URL : Link to sources.
  • UNSD.Geographical.Grouping : Geographical region in which the incident took place, as designated by the United Nations Statistics Division (UNSD).
  • Source.Quality : Incidents are ranked on a scale from 1-5 based on the source(s) of information available.

Check Missing Values

Let’s check how many missing values in this dataset

colSums(is.na.data.frame(migrants))
##                                   X                             Main.ID 
##                                   0                                   0 
##                         Incident.ID                              Region 
##                                   0                                   0 
##                       Incident.Date                                Year 
##                                   0                                   0 
##                      Reported.Month                         Number.Dead 
##                                   0                                 552 
## Minimum.Estimated.Number.of.Missing    Total.Number.of.Dead.and.Missing 
##                                8871                                   0 
##                 Number.of.Survivors                   Number.of.Females 
##                                8427                                8020 
##                     Number.of.Males                  Number.of.Children 
##                                4080                                8670 
##                      Cause.of.Death                   Location.of.death 
##                                   0                                   0 
##                  Information.Source                         Coordinates 
##                                   0                                   0 
##                    Migrantion.route                                 URL 
##                                   0                                   0 
##          UNSD.Geographical.Grouping                      Source.Quality 
##                                   0                                  42
missing_value <- data.frame(Variable_names = c("Minimum.Estimated.Number.of.Missing", 
                                               "Number.of.Children", 
                                               "Number.of.Survivors",
                                               "Number.of.Females",
                                               "Number.of.Males",
                                               "Number.Dead",
                                               "Source.Quality"),
                            Missing = c(8871, 8670, 8427, 8020, 4080, 552, 42))

missing_value
##                        Variable_names Missing
## 1 Minimum.Estimated.Number.of.Missing    8871
## 2                  Number.of.Children    8670
## 3                 Number.of.Survivors    8427
## 4                   Number.of.Females    8020
## 5                     Number.of.Males    4080
## 6                         Number.Dead     552
## 7                      Source.Quality      42

From the missing values data above, we know that there are 4 variables that have more than 80% of missing data, such as Minimum estimated number of missing, number of children, number of survivors, as well as number of females.

Data Preprocessing

First, I want to check the data types in this dataset.

str(migrants)
## 'data.frame':    9906 obs. of  22 variables:
##  $ X                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Main.ID                            : chr  "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
##  $ Incident.ID                        : chr  "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
##  $ Region                             : chr  "North America" "North America" "North America" "North America" ...
##  $ Incident.Date                      : chr  "Mon, 01/06/2014 - 12:00" "Sun, 01/12/2014 - 12:00" "Tue, 01/14/2014 - 12:00" "Thu, 01/16/2014 - 12:00" ...
##  $ Year                               : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
##  $ Reported.Month                     : chr  "January" "January" "January" "January" ...
##  $ Number.Dead                        : num  1 1 1 1 1 1 12 1 1 1 ...
##  $ Minimum.Estimated.Number.of.Missing: num  NA NA NA NA 0 NA NA NA NA NA ...
##  $ Total.Number.of.Dead.and.Missing   : int  1 1 1 1 1 1 12 1 1 1 ...
##  $ Number.of.Survivors                : num  NA NA NA NA 2 NA NA NA NA NA ...
##  $ Number.of.Females                  : num  NA NA NA NA NA NA 9 NA NA NA ...
##  $ Number.of.Males                    : num  1 NA NA 1 1 NA NA NA NA NA ...
##  $ Number.of.Children                 : num  NA NA NA NA NA NA 3 NA NA NA ...
##  $ Cause.of.Death                     : chr  "Mixed or unknown" "Mixed or unknown" "Mixed or unknown" "Violence" ...
##  $ Location.of.death                  : chr  "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "near Douglas, Arizona, USA" ...
##  $ Information.Source                 : chr  "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)" ...
##  $ Coordinates                        : chr  "31.650259, -110.366453" "31.59713, -111.73756" "31.94026, -113.01125" "31.506777, -109.315632" ...
##  $ Migrantion.route                   : chr  "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" ...
##  $ URL                                : chr  "http://humaneborders.info/" "" "" "http://bit.ly/1qfIw00" ...
##  $ UNSD.Geographical.Grouping         : chr  "Northern America" "Northern America" "Northern America" "Northern America" ...
##  $ Source.Quality                     : num  5 5 5 5 1 5 5 5 5 5 ...

I will change these variables into the right data types.

  • Region -> Factor
  • Incident.Date -> datetime
  • Year -> Factor
  • Reported.Month -> Factor
  • Cause.of.Death -> Factor
  • UNSD.Geographical.Grouping -> Factor
migrants <- migrants%>%
  mutate(Region = as.factor(Region),
         Year = as.factor(Year),
         Reported.Month = as.factor(Reported.Month),
         Cause.of.Death = as.factor(Cause.of.Death),
         UNSD.Geographical.Grouping = as.factor(UNSD.Geographical.Grouping))
migrants$Incident.Date <- mdy_hm(migrants$Incident.Date)
str(migrants)
## 'data.frame':    9906 obs. of  22 variables:
##  $ X                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Main.ID                            : chr  "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
##  $ Incident.ID                        : chr  "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
##  $ Region                             : Factor w/ 16 levels "Caribbean","Central America",..: 9 9 9 9 6 9 7 9 9 9 ...
##  $ Incident.Date                      : POSIXct, format: "2014-01-06 12:00:00" "2014-01-12 12:00:00" ...
##  $ Year                               : Factor w/ 8 levels "2014","2015",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Reported.Month                     : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ Number.Dead                        : num  1 1 1 1 1 1 12 1 1 1 ...
##  $ Minimum.Estimated.Number.of.Missing: num  NA NA NA NA 0 NA NA NA NA NA ...
##  $ Total.Number.of.Dead.and.Missing   : int  1 1 1 1 1 1 12 1 1 1 ...
##  $ Number.of.Survivors                : num  NA NA NA NA 2 NA NA NA NA NA ...
##  $ Number.of.Females                  : num  NA NA NA NA NA NA 9 NA NA NA ...
##  $ Number.of.Males                    : num  1 NA NA 1 1 NA NA NA NA NA ...
##  $ Number.of.Children                 : num  NA NA NA NA NA NA 3 NA NA NA ...
##  $ Cause.of.Death                     : Factor w/ 7 levels "Accidental death",..: 4 4 4 7 3 7 2 4 4 4 ...
##  $ Location.of.death                  : chr  "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "near Douglas, Arizona, USA" ...
##  $ Information.Source                 : chr  "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)" ...
##  $ Coordinates                        : chr  "31.650259, -110.366453" "31.59713, -111.73756" "31.94026, -113.01125" "31.506777, -109.315632" ...
##  $ Migrantion.route                   : chr  "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" ...
##  $ URL                                : chr  "http://humaneborders.info/" "" "" "http://bit.ly/1qfIw00" ...
##  $ UNSD.Geographical.Grouping         : Factor w/ 20 levels "","Caribbean",..: 10 10 10 10 11 10 17 10 10 10 ...
##  $ Source.Quality                     : num  5 5 5 5 1 5 5 5 5 5 ...

From the given data, there are 2 question that I want to explore:

  • Where did the death/missing migrants mostly happen?
  • What are the causes of the death of migrants based on their region?

Exploratory Analysis and Plot

Where it usually happen

location <- migrants %>%
  select(c(Region, Location.of.death, Migrantion.route, Total.Number.of.Dead.and.Missing)) %>%
  group_by(Region) %>%
  summarise(total = sum(Total.Number.of.Dead.and.Missing))%>%
  ungroup()%>%
  arrange(desc(total))
location$Region <- as.factor(location$Region)

location
## # A tibble: 16 x 2
##    Region             total
##    <fct>              <int>
##  1 Mediterranean      23312
##  2 Northern Africa     6460
##  3 North America       3018
##  4 South-eastern Asia  2851
##  5 Western Africa      2457
##  6 Eastern Africa      1738
##  7 Central America     1499
##  8 Western Asia        1238
##  9 Southern Asia       1124
## 10 Caribbean            957
## 11 Europe               766
## 12 South America        357
## 13 Middle Africa        181
## 14 Central Asia          52
## 15 Eastern Asia          38
## 16 Southern Africa       34
missing_region <- location %>% 
  ggplot(aes(x = total, 
             y = reorder(Region, total),
             fill = total
             )) +
  geom_col() +
  geom_text(aes(label = location$total), 
            size = 3, 
            nudge_x = 50) +
  scale_fill_gradientn(colors = c("bisque", "bisque4")) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = 'Number of Incidents From Each Region' ,
       x = 'Total Missing',
       y = 'Region')

missing_region

According from the plot above, Mediterranean has the highest number of incidents of missing/death migrants. It has a total 23,212 migrants almost four times higher than the second region, Northern Africa.

Causes of the death of migrants

death_reg <- migrants%>%
  mutate(Cause.of.Death = factor(Cause.of.Death, levels = unique(migrants$Cause.of.Death)),
         Region = as.factor(Region))%>%
  group_by(Cause.of.Death, Region)%>%
  summarise("Counts" = n())%>%
  arrange(desc(Counts))
## `summarise()` has grouped output by 'Cause.of.Death'. You can override using
## the `.groups` argument.
death_reg
## # A tibble: 97 x 3
## # Groups:   Cause.of.Death [7]
##    Cause.of.Death                                                  Region Counts
##    <fct>                                                           <fct>   <int>
##  1 Drowning                                                        Medit~   1442
##  2 Mixed or unknown                                                North~   1260
##  3 Mixed or unknown                                                North~    725
##  4 Harsh environmental conditions / lack of adequate shelter, foo~ North~    461
##  5 Drowning                                                        Centr~    414
##  6 Vehicle accident / death linked to hazardous transport          Centr~    282
##  7 Sickness / lack of access to adequate healthcare                Weste~    278
##  8 Sickness / lack of access to adequate healthcare                North~    269
##  9 Harsh environmental conditions / lack of adequate shelter, foo~ North~    256
## 10 Drowning                                                        North~    239
## # ... with 87 more rows
death_reg%>%
  ggplot(aes(x = Region,
             y = Counts,
             fill= Cause.of.Death
             )) +
  geom_col(position = "stack", width = 0.7)+
  scale_fill_viridis(discrete = T, option = "D")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position  = "right")+
  guides(fill=guide_legend(title="Causes"))+
  labs(title = 'Migrants Cause of Death Based on Region' ,
       x = NULL,
       y = 'Number of Death')

As the Mediterranean has the highest number of incidents, the top cause of death in the region of Mediterranean is drowning, followed by mixed and unknown cause in the region of North America.