In this notebook, I will be exploring the data about migrants who have gone missing or died during the migrations routes worldwide in 2014 to 2021.
The dataset comes from the Missing Migrant Project that tracks deaths of migrants, including refugees and asylum-seekers. This project is a joint initiative of IOM’s Global Migration Data Analysis Centre (GMDAC) and Media and Communications Division (MCD). For more information, you can go to this website: https://missingmigrants.iom.int/
library(flexdashboard)
library(lubridate)
library(readr)
library(ggplot2)
library(dplyr)
library(viridis)
library(reshape2)
migrants <- read.csv("data/missing_migrants.csv")
head(migrants)
## X Main.ID Incident.ID Region Incident.Date Year
## 1 0 2014.MMP00001 2014.MMP00001 North America Mon, 01/06/2014 - 12:00 2014
## 2 1 2014.MMP00002 2014.MMP00002 North America Sun, 01/12/2014 - 12:00 2014
## 3 2 2014.MMP00003 2014.MMP00003 North America Tue, 01/14/2014 - 12:00 2014
## 4 3 2014.MMP00004 2014.MMP00004 North America Thu, 01/16/2014 - 12:00 2014
## 5 4 2014.MMP00005 2014.MMP00005 Europe Thu, 01/16/2014 - 12:00 2014
## 6 5 2014.MMP00006 2014.MMP00006 North America Fri, 01/17/2014 - 12:00 2014
## Reported.Month Number.Dead Minimum.Estimated.Number.of.Missing
## 1 January 1 NA
## 2 January 1 NA
## 3 January 1 NA
## 4 January 1 NA
## 5 January 1 0
## 6 January 1 NA
## Total.Number.of.Dead.and.Missing Number.of.Survivors Number.of.Females
## 1 1 NA NA
## 2 1 NA NA
## 3 1 NA NA
## 4 1 NA NA
## 5 1 2 NA
## 6 1 NA NA
## Number.of.Males Number.of.Children
## 1 1 NA
## 2 NA NA
## 3 NA NA
## 4 1 NA
## 5 1 NA
## 6 NA NA
## Cause.of.Death
## 1 Mixed or unknown
## 2 Mixed or unknown
## 3 Mixed or unknown
## 4 Violence
## 5 Harsh environmental conditions / lack of adequate shelter, food, water
## 6 Violence
## Location.of.death
## 1 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 2 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 3 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## 4 near Douglas, Arizona, USA
## 5 Border between Russia and Estonia
## 6 Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)
## Information.Source
## 1 Pima County Office of the Medical Examiner (PCOME)
## 2 Pima County Office of the Medical Examiner (PCOME)
## 3 Pima County Office of the Medical Examiner (PCOME)
## 4 Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)
## 5 EUBusiness (Agence France-Presse)
## 6 Pima County Office of the Medical Examiner (PCOME)
## Coordinates Migrantion.route URL
## 1 31.650259, -110.366453 US-Mexico border crossing http://humaneborders.info/
## 2 31.59713, -111.73756 US-Mexico border crossing
## 3 31.94026, -113.01125 US-Mexico border crossing
## 4 31.506777, -109.315632 US-Mexico border crossing http://bit.ly/1qfIw00
## 5 59.1551, 28 http://bit.ly/1rTFTjR
## 6 32.45435, -113.18402 US-Mexico border crossing
## UNSD.Geographical.Grouping Source.Quality
## 1 Northern America 5
## 2 Northern America 5
## 3 Northern America 5
## 4 Northern America 5
## 5 Northern Europe 1
## 6 Northern America 5
Here are variables information from the datasets:
Main.ID : An automatically generated number used to
identify each unique entry in the dataset.Incident.ID : An automatically generated number used to
identify each unique entry in the dataset.Region : The region in which an incident took
place.Incident.Date : Estimated date of death. In cases where
the exact date of death is not known, this variable indicates the date
in.Year : The year in which the incident occurred.Reported.Month : The month in which the incident
occurred.Number.Dead : The total number of people confirmed dead
in one incident, i.e. the number of bodies recovered.Minimum.Estimated.Number.of.Missing : The total number
of those who are missing and are thus assumed to be dead.Total.Number.of.Dead.and.Missing : The sum of the
‘number dead’ and ‘number missing’ variables.Number.of.Survivors : The number of migrants that
survived the incident, if known.Number.of.Females : Indicates the number of females
found dead or missing. If unknown, it is left blank.Number.of.Males : Indicates the number of males found
dead or missing. If unknown, it is left blank.Number.of.Children : Indicates the number of
individuals under the age of 18 found dead or missing. If unknown, it is
left blank.Cause.of.Death : The determination of conditions
resulting in the migrant’s death i.e. the circumstances of the
event.Location.of.death : Place where the death(s) occurred
or where the body or bodies were found.Information.Source : The source of information.Coordinates : Place where the death(s) occurred or
where the body or bodies were found.Migrantion.route : Name of the migrant route on which
incident occurred, if known. If unknown, it is left blank.URL : Link to sources.UNSD.Geographical.Grouping : Geographical region in
which the incident took place, as designated by the United Nations
Statistics Division (UNSD).Source.Quality : Incidents are ranked on a scale from
1-5 based on the source(s) of information available.Let’s check how many missing values in this dataset
colSums(is.na.data.frame(migrants))
## X Main.ID
## 0 0
## Incident.ID Region
## 0 0
## Incident.Date Year
## 0 0
## Reported.Month Number.Dead
## 0 552
## Minimum.Estimated.Number.of.Missing Total.Number.of.Dead.and.Missing
## 8871 0
## Number.of.Survivors Number.of.Females
## 8427 8020
## Number.of.Males Number.of.Children
## 4080 8670
## Cause.of.Death Location.of.death
## 0 0
## Information.Source Coordinates
## 0 0
## Migrantion.route URL
## 0 0
## UNSD.Geographical.Grouping Source.Quality
## 0 42
missing_value <- data.frame(Variable_names = c("Minimum.Estimated.Number.of.Missing",
"Number.of.Children",
"Number.of.Survivors",
"Number.of.Females",
"Number.of.Males",
"Number.Dead",
"Source.Quality"),
Missing = c(8871, 8670, 8427, 8020, 4080, 552, 42))
missing_value
## Variable_names Missing
## 1 Minimum.Estimated.Number.of.Missing 8871
## 2 Number.of.Children 8670
## 3 Number.of.Survivors 8427
## 4 Number.of.Females 8020
## 5 Number.of.Males 4080
## 6 Number.Dead 552
## 7 Source.Quality 42
From the missing values data above, we know that there are 4 variables that have more than 80% of missing data, such as Minimum estimated number of missing, number of children, number of survivors, as well as number of females.
First, I want to check the data types in this dataset.
str(migrants)
## 'data.frame': 9906 obs. of 22 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Main.ID : chr "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
## $ Incident.ID : chr "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
## $ Region : chr "North America" "North America" "North America" "North America" ...
## $ Incident.Date : chr "Mon, 01/06/2014 - 12:00" "Sun, 01/12/2014 - 12:00" "Tue, 01/14/2014 - 12:00" "Thu, 01/16/2014 - 12:00" ...
## $ Year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
## $ Reported.Month : chr "January" "January" "January" "January" ...
## $ Number.Dead : num 1 1 1 1 1 1 12 1 1 1 ...
## $ Minimum.Estimated.Number.of.Missing: num NA NA NA NA 0 NA NA NA NA NA ...
## $ Total.Number.of.Dead.and.Missing : int 1 1 1 1 1 1 12 1 1 1 ...
## $ Number.of.Survivors : num NA NA NA NA 2 NA NA NA NA NA ...
## $ Number.of.Females : num NA NA NA NA NA NA 9 NA NA NA ...
## $ Number.of.Males : num 1 NA NA 1 1 NA NA NA NA NA ...
## $ Number.of.Children : num NA NA NA NA NA NA 3 NA NA NA ...
## $ Cause.of.Death : chr "Mixed or unknown" "Mixed or unknown" "Mixed or unknown" "Violence" ...
## $ Location.of.death : chr "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "near Douglas, Arizona, USA" ...
## $ Information.Source : chr "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)" ...
## $ Coordinates : chr "31.650259, -110.366453" "31.59713, -111.73756" "31.94026, -113.01125" "31.506777, -109.315632" ...
## $ Migrantion.route : chr "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" ...
## $ URL : chr "http://humaneborders.info/" "" "" "http://bit.ly/1qfIw00" ...
## $ UNSD.Geographical.Grouping : chr "Northern America" "Northern America" "Northern America" "Northern America" ...
## $ Source.Quality : num 5 5 5 5 1 5 5 5 5 5 ...
I will change these variables into the right data types.
migrants <- migrants%>%
mutate(Region = as.factor(Region),
Year = as.factor(Year),
Reported.Month = as.factor(Reported.Month),
Cause.of.Death = as.factor(Cause.of.Death),
UNSD.Geographical.Grouping = as.factor(UNSD.Geographical.Grouping))
migrants$Incident.Date <- mdy_hm(migrants$Incident.Date)
str(migrants)
## 'data.frame': 9906 obs. of 22 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Main.ID : chr "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
## $ Incident.ID : chr "2014.MMP00001" "2014.MMP00002" "2014.MMP00003" "2014.MMP00004" ...
## $ Region : Factor w/ 16 levels "Caribbean","Central America",..: 9 9 9 9 6 9 7 9 9 9 ...
## $ Incident.Date : POSIXct, format: "2014-01-06 12:00:00" "2014-01-12 12:00:00" ...
## $ Year : Factor w/ 8 levels "2014","2015",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Reported.Month : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Number.Dead : num 1 1 1 1 1 1 12 1 1 1 ...
## $ Minimum.Estimated.Number.of.Missing: num NA NA NA NA 0 NA NA NA NA NA ...
## $ Total.Number.of.Dead.and.Missing : int 1 1 1 1 1 1 12 1 1 1 ...
## $ Number.of.Survivors : num NA NA NA NA 2 NA NA NA NA NA ...
## $ Number.of.Females : num NA NA NA NA NA NA 9 NA NA NA ...
## $ Number.of.Males : num 1 NA NA 1 1 NA NA NA NA NA ...
## $ Number.of.Children : num NA NA NA NA NA NA 3 NA NA NA ...
## $ Cause.of.Death : Factor w/ 7 levels "Accidental death",..: 4 4 4 7 3 7 2 4 4 4 ...
## $ Location.of.death : chr "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "Pima Country Office of the Medical Examiner jurisdiction, Arizona, USA (see coordinates for exact location)" "near Douglas, Arizona, USA" ...
## $ Information.Source : chr "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Pima County Office of the Medical Examiner (PCOME)" "Ministry of Foreign Affairs Mexico, Pima County Office of the Medical Examiner (PCOME)" ...
## $ Coordinates : chr "31.650259, -110.366453" "31.59713, -111.73756" "31.94026, -113.01125" "31.506777, -109.315632" ...
## $ Migrantion.route : chr "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" "US-Mexico border crossing" ...
## $ URL : chr "http://humaneborders.info/" "" "" "http://bit.ly/1qfIw00" ...
## $ UNSD.Geographical.Grouping : Factor w/ 20 levels "","Caribbean",..: 10 10 10 10 11 10 17 10 10 10 ...
## $ Source.Quality : num 5 5 5 5 1 5 5 5 5 5 ...
From the given data, there are 2 question that I want to explore:
location <- migrants %>%
select(c(Region, Location.of.death, Migrantion.route, Total.Number.of.Dead.and.Missing)) %>%
group_by(Region) %>%
summarise(total = sum(Total.Number.of.Dead.and.Missing))%>%
ungroup()%>%
arrange(desc(total))
location$Region <- as.factor(location$Region)
location
## # A tibble: 16 x 2
## Region total
## <fct> <int>
## 1 Mediterranean 23312
## 2 Northern Africa 6460
## 3 North America 3018
## 4 South-eastern Asia 2851
## 5 Western Africa 2457
## 6 Eastern Africa 1738
## 7 Central America 1499
## 8 Western Asia 1238
## 9 Southern Asia 1124
## 10 Caribbean 957
## 11 Europe 766
## 12 South America 357
## 13 Middle Africa 181
## 14 Central Asia 52
## 15 Eastern Asia 38
## 16 Southern Africa 34
missing_region <- location %>%
ggplot(aes(x = total,
y = reorder(Region, total),
fill = total
)) +
geom_col() +
geom_text(aes(label = location$total),
size = 3,
nudge_x = 50) +
scale_fill_gradientn(colors = c("bisque", "bisque4")) +
theme_minimal() +
theme(legend.position = "none") +
labs(title = 'Number of Incidents From Each Region' ,
x = 'Total Missing',
y = 'Region')
missing_region
According from the plot above, Mediterranean has the highest number of incidents of missing/death migrants. It has a total 23,212 migrants almost four times higher than the second region, Northern Africa.
death_reg <- migrants%>%
mutate(Cause.of.Death = factor(Cause.of.Death, levels = unique(migrants$Cause.of.Death)),
Region = as.factor(Region))%>%
group_by(Cause.of.Death, Region)%>%
summarise("Counts" = n())%>%
arrange(desc(Counts))
## `summarise()` has grouped output by 'Cause.of.Death'. You can override using
## the `.groups` argument.
death_reg
## # A tibble: 97 x 3
## # Groups: Cause.of.Death [7]
## Cause.of.Death Region Counts
## <fct> <fct> <int>
## 1 Drowning Medit~ 1442
## 2 Mixed or unknown North~ 1260
## 3 Mixed or unknown North~ 725
## 4 Harsh environmental conditions / lack of adequate shelter, foo~ North~ 461
## 5 Drowning Centr~ 414
## 6 Vehicle accident / death linked to hazardous transport Centr~ 282
## 7 Sickness / lack of access to adequate healthcare Weste~ 278
## 8 Sickness / lack of access to adequate healthcare North~ 269
## 9 Harsh environmental conditions / lack of adequate shelter, foo~ North~ 256
## 10 Drowning North~ 239
## # ... with 87 more rows
death_reg%>%
ggplot(aes(x = Region,
y = Counts,
fill= Cause.of.Death
)) +
geom_col(position = "stack", width = 0.7)+
scale_fill_viridis(discrete = T, option = "D")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right")+
guides(fill=guide_legend(title="Causes"))+
labs(title = 'Migrants Cause of Death Based on Region' ,
x = NULL,
y = 'Number of Death')
As the Mediterranean has the highest number of incidents, the top cause of death in the region of Mediterranean is drowning, followed by mixed and unknown cause in the region of North America.