The data on migrant deaths comes from the Arizona OpenGIS Project, cosponsored by Humane Borders and the Pima County Office of Medical Examiner. I’ve edited the data set to include all recorded migrant deaths between 2000 up through March 2025. The objective of this project is to give you continued experience in working with and making sense of quantitative data, while at the same time, giving you the opportunity to better understand at a deep level, the nature of the migrant death issue. Above all else, my goal is to always honor and humanize those who have died in the desert. These data permit a better understanding of the death crisis.
For purposes of this project, you will be asked to produce a number of high-quality visualizations as well as compute some basic univariate statistics. Your job is to take what I have provided, analyze it, and tell the story I want to know something important, interesting, and useful about the migrant death crisis given the tasks assigned to you. Ultimately, when presented with information, you need to get practice in learning how to engage it. While this assignment is tethered to the migrant death data set, all skills used here would be transferrable to any data set.
What I am looking for here is something other than mechanical interpretations. I want to see creativity. What do I mean by creativity? To be blunt, creativity is not repeating numbers or statistics you produce in a table. I don’t need anyone to do that as I can see it with my own eyes. Creativity paints a picture as to what the human picture of the death crisis looks like, looking at the observed data. Answers that are mechanical are usually nonanalytical. Creativity means bringing in information and context outside of the confines of the specific charts, plots, or questions. In the end, if you simply report the results with no attempt at interpretation, I would not expect anything above a “C”-level grade (i.e. you just reproduce in words what I see in the tables or charts). In class, we will go over in great detail tips on how to avoid these problems, so follow those tips and you’re going to be well on your way.
This assignment is worth 800 points and is due by 11:59 PM on April 29.
The data file is a csv file saved to my GitHub site.
## Rows: 4033 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): ML Number, Name, Sex, Reporting Date, Surface Management, Location...
## dbl (7): Age, Corridor Code, Condition Code, Latitude, Longitude, UTM X, UTM Y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## ML Number Name Sex Age
## Length:4033 Length:4033 Length:4033 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.:24.00
## Mode :character Mode :character Mode :character Median :30.00
## Mean :31.46
## 3rd Qu.:38.00
## Max. :99.00
## NA's :1398
## Reporting Date Surface Management Location Location Precision
## Length:4033 Length:4033 Length:4033 Length:4033
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Corridor Code Corridor Cause of Death OME Determined COD
## Min. : 0.00 Length:4033 Length:4033 Length:4033
## 1st Qu.: 6.00 Class :character Class :character Class :character
## Median : 7.00 Mode :character Mode :character Mode :character
## Mean : 7.03
## 3rd Qu.: 8.00
## Max. :13.00
##
## Condition Code Body Condition Post Mortem Interval State
## Min. :1.00 Length:4033 Length:4033 Length:4033
## 1st Qu.:1.00 Class :character Class :character Class :character
## Median :4.00 Mode :character Mode :character Mode :character
## Mean :3.84
## 3rd Qu.:6.00
## Max. :8.00
##
## County Latitude Longitude UTM X
## Length:4033 Min. :31.33 Min. :-114.8 Min. :143294
## Class :character 1st Qu.:31.75 1st Qu.:-112.3 1st Qu.:381244
## Mode :character Median :31.96 Median :-111.8 Median :424426
## Mean :32.00 Mean :-111.8 Mean :422503
## 3rd Qu.:32.21 3rd Qu.:-111.3 3rd Qu.:468782
## Max. :33.86 Max. :-109.1 Max. :685151
##
## UTM Y
## Min. :3466477
## 1st Qu.:3513324
## Median :3536711
## Mean :3540649
## 3rd Qu.:3565118
## Max. :3751034
##
Because we may want to look at yearly data, it’s useful to generate a variable that records the calendar year in which migrant remains were found. The code in the chunk below does just this. In addition to creating the new variable, I use the R command tabyl to produce a table of migrant deaths by year. In all, there are about 4,000 recorded deaths in the time frame of the data set I’ve created. The table will show you the number of remains recovered by year along with the proportion of total deaths each year accounts for. So in 2010, we see 224 remains were recovered. The proportion of the total number of deaths accounted for by this year is 0.05476773 (i.e. \(\frac{224}{4090}\)). Multiply this proportion by 100 and you get the percent contribution. For 2010, about 5.5% of all the recovered remains occurred in 2010. One way to quickly assess the persistence of the death crisis is to inspect the proportions. If the crisis was abating, we’d expect to see a substantial decline in the proportion. If the crisis is persistent, we’d expect to see these proportions to be very similar across time. (Note that 2025 will be a very small number because we only have partial data for this year.) What do you see when you look at these proportions?
md$date <- as.Date(md$`Reporting Date`,'%m/%d/%y')
md$year <- as.numeric(format(md$date,'%y'))
md$month <- as.numeric(format(md$date, '%m'))
#This ensures the year is reported as "2000, 2001, ... ,2024" instead of "0,1, ... , 24".
md$year20 <- 2000 + md$year
tabyl(md$year20)
## md$year20 n percent
## 2000 74 0.018348624
## 2001 77 0.019092487
## 2002 147 0.036449293
## 2003 155 0.038432928
## 2004 169 0.041904290
## 2005 196 0.048599058
## 2006 168 0.041656335
## 2007 214 0.053062237
## 2008 162 0.040168609
## 2009 190 0.047111332
## 2010 222 0.055045872
## 2011 178 0.044135879
## 2012 154 0.038184974
## 2013 165 0.040912472
## 2014 127 0.031490206
## 2015 135 0.033473841
## 2016 147 0.036449293
## 2017 118 0.029258616
## 2018 120 0.029754525
## 2019 136 0.033721795
## 2020 213 0.052814282
## 2021 215 0.053310191
## 2022 171 0.042400198
## 2023 197 0.048847012
## 2024 155 0.038432928
## 2025 28 0.006942723
For this task, I want you to produce a barplot of migrant remains recovered by year. You can use the code below to generate the plot. Often (most always), it’s easier to visualize quantitative data than looking at a table of data. The code in the chunk below will create a barplot where each bar corresponds to the number of migrant remains recovered in each year. When you look at this plot, what do you see? What interpretation would you give to this? Does the plot show the crisis abating? Does it seem persistent? Is it getting worse? Your interpretation should be well-written, clear, and non-mechanical. The plot and the interpretation is worth 100 points.
labelsforx<- c("2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015",
"2016", "2017", "2018", "2019", "2020",
"2021", "2022", "2023", "2024", "2025")
ggplot(md, aes(year20)) +
geom_bar(fill = "lightskyblue4") +
scale_x_continuous(breaks=seq(2000,2025,1), labels= labelsforx) +
labs(title="Migrant Remains Recovered By Year Along The Arizona/Mexico Border, 2000-2025",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
y="Number recovered",
x="Year") +
theme_classic() +
theme(axis.text.x = element_text(size=8, angle=45, hjust=1),
title=element_text(size=10),
axis.ticks.length = unit(0.2, "cm"))
Looking at the graph of migrant remains recovered year by year along the Arizona/Mexico border, there is variation year to year. In the early 2000’s, there was a rapid increase in the number of migrant remains recovered, which could reflect the implementation of Prevention Through Deterrence. By ‘fortifying’ areas of high crossings, migrants were forced to go through harsher and often unfamiliar paths in an attempt to enter the United States. Consequently, the journey increased in risk and danger, which resulted in more deaths. However, there was a plateau in the number of migrants recovered, reinforcing the idea of these policies making the journey harder. Following the plateau, there was a decline in the number of migrants recovered, which could suggest fewer migrants crossing or migrants’ remains were harder to find as they ventured onto more clandestine paths. This could result in them being susceptible to organized crime or violence along the way. After the pattern of decline, there was another increase of migrants recovered, which could stem from Trump’s policies that reinforce the border forcing migrants to take far more dangerous paths. The threat of climate change could also be a contributing factor as year by year the summers get hotter making migrants more susceptible to dehydration and hyperthermia. Overall, this chart tells the story of a persistent crisis that is not abating anytime soon as multiple factors steer the data.
For this task, I want you to create a barplot of the number of migrant remains recovered by calendar month. Using the code in the previous chunk will be useful but it will need to be modified. After you create the plot, provide a substantive interpretation of the results. Your score for this will be based on quality of the plot (100 points) and quality of the write-up (100 points)
labelsforx<- c("January", "Feburary", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
ggplot(md, aes(month)) +
geom_bar(fill = "lightskyblue4") +
scale_x_continuous(breaks=seq(1,12,1), labels= labelsforx) +
scale_y_continuous(breaks=seq(0,700,100)) +
labs(title="Migrant Remains Recovered By Month Along The Arizona/Mexico Border, January-December",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
y="Number recovered",
x="Month") +
theme_classic() +
theme(axis.text.x = element_text(size=9, angle=45, hjust=1),
title=element_text(size=8.5),
axis.ticks.length = unit(0.2, "cm"))
The barplot illustrates a normal distribution where the majority of the migrants recovered are in the middle, which correlates to the hottest months of the year: June and July. These results illustrate how the hottest months become the most dangerous because of the limited resources migrants have access to while attempting to cross the border. Their resources often dwindle faster as migrants consume their water and food faster to stay hydrated and energized during the warmer months. While in the cooler months, migrants are able to consume less as their bodies are not constantly sweating to cool their body temperatures. Instead, their bodies focus on maintaining their heat during the winter, which if not properly equipped can kill them. Moreover, the chart illustrates how the terrain migrants cross becomes substantially more dangerous with warmer climate and a subtle decrease in migrant deaths in the cooler months. Overall, this reveals climate/temperature as one of the largest factors migrants face and ultimately dictates their survival because as temperature increases so do migrant deaths.
In the dataset, there is a variable called “Age.” For this task we will analyze this item. In the chunk below I want you to: 1)compute the average and median age of migrants at death. Note that there is missing data that you need to account for. 2) Create a boxplot using ggplot2 of the age variable. Provide a substantive interpretation of the information you compute. This task is worth 50 points.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 24.00 30.00 31.46 38.00 99.00 1398
## [1] 31.45958
## [1] 30
## Warning: Removed 1398 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
From the data, we are able to draw conclusions from the age demographics of migrants crossing the border. Here, we can see the majority of the migrant remains are middle aged individuals; given a median of 30 years old and a mean of 38 years old. Since the median is less than the mean, it represents slightly left skewed data meaning younger individuals make up the majority of the remains recovered. Often, these are young individuals who are looking to provide for their families by sending remittances back to them or those who are fleeing from organized crime, gang violence, or poor economic conditions. Families sometimes send their younger children so they have an opportunity to have a better life by keeping them away from gangs without measuring the danger that lies ahead in their journey. The older migrant remains found inflate the mean so it is higher than the median, but the median tells us younger migrant remains are more common. This could also result from entire families embarking on the dangerous paths together hinted with the migrant older than 95 years old. The outliers near 0 years old hint towards families attempting to cross the desert together in an attempt to for.
Below I will ask you to analyze migrant deaths by agegroups.
The chunk below creates a factor-level variable for 5-year age groups. I
give you the beginning and ending code; I want you to supply the
intermediate lines of code. Hint: 8 additional lines of code are
needed.
| md$agegroup | n | percent |
|---|---|---|
| 0-9 | 7 | 0% |
| 10-14 | 21 | 1% |
| 15-19 | 248 | 6% |
| 60 and over | 23 | 1% |
| NA | 3734 | 93% |
Using this information, compute a barplot for each age group and provide a substantive interpretation. Do not include the missing data (i.e. NAs) in this plot. Below is code to get you started! This task is worth 50 points.
md_ssage<-md[which(md$agegroup!="NA"),]
ggplot(md_ssage, aes(agegroup)) +
geom_bar(fill = "lightskyblue4") +
labs(title="Age Groups Of Migrants Recovered Along The Arizona/Mexico Border",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/Lookiapp/map.asp)",
y="Percent",
x="Age Group") +
theme_classic()
Looking at the data, the largest age group is 15-19 years old followed by groups of ages 10 to 14 then 60 and over. This reveals how necessity and aspirations of a better future pushes the younger generation to brave the desert in hope of improving their families lives. These individuals are often the most susceptible to gang violence in their home countries and could be fleeing in search of safer conditions. Other driving forces could be political instability, lack of work opportunities, or poverty. Oftentime families are sending their younger children in hopes of providing them with a better future and education without knowing the dangers that lie ahead of them. They may believe they are more tolerant and resistant to the elements of the desert. We can also see how the geophagy takes a toll on the younger individuals as they make up an overwhelming majority of the migrants recovered. I believe these individuals are more susceptible to dehydration and violence along the way. In terms of the older group, 60 and above, their health conditions and physical state might have impacted their journey and ability to make it through. Similar to the younger individuals, they can also be more susceptible to the elements because they depend on their parents. If something happens to their parents, the likelihood of the youngest children surviving decreases significantly.
Suppose we are interested in see if there are any trends in average ages of deceased migrants? One way to do this is through a simple regression model. This model will give us an estimate of the average age conditional on year. The code in the chunk will do this along with the resulting starter plot. What do you see in this graph? Are there any trends you spot?
##
## Call:
## lm(formula = Age ~ YEAR, data = md)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.750 -7.832 -1.158 6.625 66.404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.230 1.305 21.624 < 0.0000000000000002 ***
## YEAR1 2.686 1.862 1.443 0.14927
## YEAR2 1.903 1.601 1.188 0.23479
## YEAR3 1.770 1.601 1.106 0.26891
## YEAR4 2.891 1.577 1.833 0.06685 .
## YEAR5 1.843 1.547 1.192 0.23350
## YEAR6 1.797 1.615 1.112 0.26606
## YEAR7 3.929 1.537 2.556 0.01064 *
## YEAR8 2.602 1.606 1.621 0.10517
## YEAR9 2.324 1.579 1.472 0.14117
## YEAR10 3.277 1.537 2.132 0.03310 *
## YEAR11 3.891 1.633 2.383 0.01726 *
## YEAR12 4.434 1.663 2.666 0.00772 **
## YEAR13 3.990 1.641 2.430 0.01515 *
## YEAR14 5.564 1.832 3.038 0.00241 **
## YEAR15 3.459 1.691 2.046 0.04088 *
## YEAR16 4.366 1.695 2.576 0.01005 *
## YEAR17 2.946 1.878 1.568 0.11692
## YEAR18 7.145 1.774 4.027 0.0000581 ***
## YEAR19 3.360 1.743 1.928 0.05395 .
## YEAR20 2.346 1.639 1.432 0.15236
## YEAR21 3.270 1.595 2.051 0.04037 *
## YEAR22 3.411 1.647 2.071 0.03847 *
## YEAR23 4.520 1.579 2.864 0.00422 **
## YEAR24 5.083 1.733 2.933 0.00339 **
## YEAR25 3.270 7.327 0.446 0.65537
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.2 on 2609 degrees of freedom
## (1398 observations deleted due to missingness)
## Multiple R-squared: 0.01562, Adjusted R-squared: 0.006183
## F-statistic: 1.656 on 25 and 2609 DF, p-value: 0.02164
The data reveals a relatively consistent estimated age of migrants recovered across the Arizona/Mexico border. Across the 25 years, with changes in policy and administration, the age of migrants attempting to cross remains the same, primarily consisting of young adults. I believe this number remains relatively constant because individuals in their 30’s are in their prime health and productivity where they themselves feel suitable to cross the border in hopes of providing for their families. The vertical lines at each point on the graph represent little variation from the line of best fit. Most notability there is a slight uptick in age in 2018. This could’ve resulted from a change in the demographics of migrants attempting to cross the border due to changing circumstances in their native home such as violence or a desire for family reunification. Another potential contributing factor could’ve been a change in policy such as strengthening the border patrol which requires older more experienced individuals to cross the border.
How do migrant deaths and gender relate to one another? The code in the chunk below creates a “factor-level” variable recording the gender of the migrant. Since gender is not always determined, there is a category called “undetermined.” Using the tabyl function, I create a table showing the total number of remains recovered that are male, female, and undetermined. For this task, reproduce the code below and provide a brief interpretation of the results. This task is worth 50 points.
md$gender<-factor(md$Sex,
levels=c("male", "female", "undetermined"),
labels=c("male", "female", "undetermined"))
tabyl(md$gender)
## md$gender n percent
## male 3292 0.8162658
## female 605 0.1500124
## undetermined 136 0.0337218
The data reveals the majority of the migrant remains recovered were male with a lower percentage of them being women. Unfortunately, some remained undetermined. One possibility resulting in this data can stem from the traditional role of a man in Central and South America. Men are often seen as the provider and migration is seen as a male dominated act in their cultures, which drives men to attempt the journey, despite the imminent dangers. They brave the journey in hopes of sending remittances back to their families to improve their quality of life. Women make up a smaller percentage of remains found, it shows they, like men, are embarking on the dangerous journey too. Oftentimes, women stay home to take care of their families, but there are times where domestic violence, femicide violence, or economic circumstances push them out of their homes. Despite them making the journey, fewer of their remains are recovered, which could result from increased decomposition or they are harder to find due to the violence they are susceptible to.
On Github, there is a .csv file called mdg.csv. You will need to access this file. The code below will do this.
rm(list=ls(all=FALSE)) #Keep this line in
rm(list=ls(all=TRUE)) #Keep this line in
mdg="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/mdg.csv"
mdg<-read_csv(url(mdg))
## New names:
## Rows: 25 Columns: 5
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (4): year, male, female, undetermined lgl (1): ...5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...5`
summary(mdg)
## year male female undetermined
## Min. :2000 Min. :0.6400 Min. :0.0300 Min. :0.0000
## 1st Qu.:2006 1st Qu.:0.7700 1st Qu.:0.1000 1st Qu.:0.0000
## Median :2012 Median :0.8100 Median :0.1400 Median :0.0000
## Mean :2012 Mean :0.8244 Mean :0.1496 Mean :0.0272
## 3rd Qu.:2018 3rd Qu.:0.8800 3rd Qu.:0.2000 3rd Qu.:0.0300
## Max. :2024 Max. :0.9700 Max. :0.2400 Max. :0.1900
## ...5
## Mode:logical
## NA's:25
##
##
##
##
This data set has 4 variables: year, male, female, and undetermined. The variables “male,” “female”, and “undetermined” give the proportion of remains recovered for each group by year. So if in a given year, 0.76 of the remains recovered were male, this implies 76 percent were male. To tidy up the plot I am going to ask you to create three new objects called “Male,” “Female,” and “Undetermined” that convert the proportions into percentages.
It may be useful to visualize these data by way of a line plot. Below is the start of some code to create a time-series plot. Provide the relevant code to complete the plot. Your plot, if properly done will have three lines corresponding to the three groups (Male, Female, Undetermined) and have informative main, \(y\), and \(x\) titling. After creation of the plot, provide a substantive interpretation of the results.
R tasks of reading in the data, creating the new objects, and creating the plot will be worth 100 points. The analysis will be worth 100 points.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## List of 1
## $ title:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : num 9
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
The data in the chart reveals how from the beginning men have made up the majority of the remains recovered every year. This reflects the general stereotype of migration being a male dominated as they, from a cultural perspective, spearhead the family. Men often take on the dangerous journey in hopes of providing for their family while being in the United States to give them a better life. However, the proportion of females is steadily increasing, which could come from the changing social roles where women are doing what men used to do. The current social and political turmoil in South and Central America have women facing femicides and sexual assault, which can prompt more women to flee their homes in search of a safer home for themselves and their families. Climate changes result in a decrease of working opportunities available, which prompt women, like men, to look elsewhere for jobs to sustain their families. Women also want their children to grow up in a better environment where they are not exposed to gang related violence and instead could join them on their journey, without weighing the risks. With policies in place, women are forced to face these difficult and often deadly terrains. The increase in undetermined migrant remains tells a fighting story that potentially stems from underfunded agencies or intentional neglect in accurately logging the migrant remains to favor particular administrations.
I have created a dataset called “monthlydeathsbyyear.csv.” And it can be found on my Github site. The code in the chunk below will access the file.
rm(list=ls(all=TRUE)) #Keep this line in
mdy="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/monthlydeathsbyyear.csv"
mdy<-read_csv(url(mdy))
## Rows: 299 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): year, month, obs, yearmonth, remainsmonth
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(mdy)
## year month obs yearmonth remainsmonth
## Min. :2000 Min. : 1.000 Min. :1 Min. :200001 Min. : 1.00
## 1st Qu.:2006 1st Qu.: 4.000 1st Qu.:1 1st Qu.:200604 1st Qu.: 7.00
## Median :2012 Median : 7.000 Median :1 Median :201207 Median :12.00
## Mean :2012 Mean : 6.512 Mean :1 Mean :201210 Mean :13.39
## 3rd Qu.:2018 3rd Qu.: 9.500 3rd Qu.:1 3rd Qu.:201810 3rd Qu.:16.00
## Max. :2024 Max. :12.000 Max. :1 Max. :202412 Max. :69.00
In order to take advantage of some R capabilities with time series data, I am creating a date variable that will allow us to assess migrant remains recovered by month and year. The code in the chunk below will do this. I will discuss the logic of this in class.
mdy$ym<-as.Date(with(mdy,paste(year,month,obs, sep="-")),"%Y-%m-%d")
mdy
## # A tibble: 299 × 6
## year month obs yearmonth remainsmonth ym
## <dbl> <dbl> <dbl> <dbl> <dbl> <date>
## 1 2000 1 1 200001 4 2000-01-01
## 2 2000 2 1 200002 7 2000-02-01
## 3 2000 3 1 200003 6 2000-03-01
## 4 2000 4 1 200004 5 2000-04-01
## 5 2000 5 1 200005 11 2000-05-01
## 6 2000 6 1 200006 15 2000-06-01
## 7 2000 7 1 200007 5 2000-07-01
## 8 2000 8 1 200008 7 2000-08-01
## 9 2000 9 1 200009 6 2000-09-01
## 10 2000 10 1 200010 4 2000-10-01
## # ℹ 289 more rows
View(mdy)
The code in the chunk below will produce a time series plot of the migrant death data by month and year. The plot is saved as an object I called “p.” Run this code to verify it works.
# Usual area chart
p <- mdy %>%
ggplot( aes(x=ym, y=remainsmonth)) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
labs(title="Monthly Migrant remains recovered on the AZ/Mex border, 2000-2025",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
y="Remains recovered", x="Year and month") +
theme_bw() +
theme(title=element_text(size=11))
p
Next, run this code and see what it is you produce! For this part of the task, write an interpretation of this plot. What do we learn substantively about the migrant death crisis over time? This write-up is worth 100 points.
# Turn it interactive with ggplotly
p <- ggplotly(p, title="Monthly Migrant Remains Recovered On The AZ/MEX Border, 2000-2025",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)", title=element_text(size=4))
p
This chart illustrates the variation of migrant deaths at different moments in time across 25 years. One of the potential contributing factors to these results stems from the policies established by different administrations. In the early 2000’s Prevention Through Deterrence was put in place and designed to decrease the number of crossings and instead increase the number of clandestine crosses, which put migrants in more danger. The chart mimics this hypothesis such that migrant deaths increased significantly after the implementation of this policy. A similar pattern emerged during the pandemic (2020) where the border was closed down making legal paths harder to pass and instead forced migrants to take deadlier paths, which parallel the spikes that remained recovered. Another influential element behind this data is climate and how as the years progressed climate change is impacting global temperatures at an alarming rate. Every year the summers get hotter and the winters get colder, but the higher temperatures make the journey more dangerous as the years go by. The worsening of climate change also prolongs the hot and cold seasons making migrants journey significantly more dangerous. Post 2015, spikes remained at a relatively stable level where the crisis ‘normalized’ such that there were no dramatic changes in the migrants recovered. Overall, we can see the graph illustrates how migrant deaths are subject to environmental, social, and political pressures.
What is this plot showing us? What is going on in the plot? This task is worth 50 points.
pal <- c("green", "blue", "red")
fig <- plot_ly(data = mdy ,x = ~ym, y = ~remainsmonth, color = ~remainsmonth, colors = pal, type = 'scatter', mode = 'markers')%>%
layout(title = 'Monthly/Yearly Recovery of Migrant Remains Along AZ/MEX Border 2000-2025', plot_bgcolor = "white",
xaxis=list(range = list("2000-01-01", "2025-3-01")), theme(title=element_text(size=4)))
fig
From the graph we are able to see a pattern with migrants recovered across both months and years. Rather than finding a consistent correlation between, we can say the number of migrants remains constant with minor peaks. The consistency can be explained by looking at the policies tailored to immigration and how the protection of the border hasn’t changed much. Once Prevention Through Deterrence was put into effect in the early 2000’s the number of migrants recovered nearly doubled, which reflects how migrants were forced to take far more dangerous paths. Prior to their installment, migrant remains recovered hovered around 20 per month but after they nearly doubled nearing the 40 migrants recovered per month. The subtle increase in migrants remains recovered per month across the 25 years of data correlating with an increase of temperature with climate change. As time passes, climate change is worsening and proving to make the journey for migrants, of all ages, more dangerous. It is more common with June and July as those are the highest points in the scatter plots. Rarely, do we see a dot nearing the 60’s if they are cooler months, those stayed between 0 and 40 migrants; yet, those numbers are still significant. The years in which they peak are also significant. Peaks came from around 2008, 2011, and 2020. In 2008, there was a global economic crash that significantly changed the lives of people in Central and South America, which could have prompted them to try to end the United States despite the risks. Around 2011, cartel violence increased in Mexico, which could’ve resulted in more migrants taking dangerous paths to enter the United States. All of these factors contribute to the increases in migrant deaths.
For this task, I want you to provide a substantive answer to the following questions (10.1 and 10.2 are worth 50 points each; 10.3 is worth 100 points):
What is the mean, standard deviation, median, and IQR for the variable called “remainsmonth.” This variable records the number of deaths by month by year. What do we learn from these statistics? Is there evidence of skewness in the data?
#Insert code to answer this here
summary(mdy$remainsmonth, na.rm = TRUE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 12.00 13.39 16.00 69.00
mean(mdy$remainsmonth, na.rm = TRUE)
## [1] 13.39465
sd(mdy$remainsmonth, na.rm = TRUE)
## [1] 9.028187
IQR(mdy$remainsmonth, na.rm = TRUE)
## [1] 9
From the data, we are able to see it is slightly skewed to the right because the mean is 13.39 migrants recovered, which we can round to 14, compared to the 12 migrants recovered serving as the median. The variation between the two values results from less migrant remains recovered across the majority of the months throughout the different years compared to the two hottest months June and July in every year. The number of migrant remains recovered increases substantially in June and July across every year due to the hot temperatures that make migrants more susceptible to dehydration. Every year, as climate change gets worse, the toll of migrants increases too. The number of migrants recovered during these months inflates the mean resulting in a larger mean compared to the slightly smaller median. Having a standard deviation of 9, indicates some variation between the data we have. This often results from the difference in cooler and warmer months. In cooler months, less migrant remains are found, which can result from the lower temperatures making the journey less dangerous or harder to find as water can accelerate decomposition. In warmer climates, migrants are more susceptible to the consequences of being exposed to heat for too long and may be easier to find as there is no rain obstructing their investigations.
Compute the mean number of remains recovered by year. Based on these results, what do we learn? What years stand out to you?
#Insert code to answer this here
#Produces the chart
avg_remains_by_year <- mdy %>%
group_by(year) %>%
summarize(avg_remains = mean(remainsmonth, na.rm = TRUE))
view(avg_remains_by_year)
#Produces a visual graph
ggplot(avg_remains_by_year, aes(x = year, y = avg_remains)) +
geom_bar(stat = "identity", fill = "lightskyblue4") +
scale_x_continuous(breaks = seq(2000, 2025, 1)) +
scale_y_continuous(breaks = seq(0, 30, 2)) +
labs(title = "Average Migrant Remains Recovered per Year Along the AZ/MEX Border, 2000–2025",subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
x = "Year",
y = "Average Remains") +
theme_classic() +
theme(axis.text.x = element_text(size = 8, angle = 45, hjust = 1),
axis.line = element_line(color = "black"),
axis.ticks = element_line(color = "black"),
axis.ticks.length = unit(0.2, "cm"),
title = element_text(size = 10))
The mean of migrants recovered substantially increased post 2000 which can result from the implementation of Operation Hold the Line and Gatekeeper in tandem with Prevention Through Deterrence. These two policies refrained from addressing why migrants are moving and instead forces them to go through dangerous paths in an attempt to enter the United States. The fortification of other border areas resulted in the funneling of migrants through Arizona. As a result, the number of migrants recovered skyrocketed. The alarming years are 2010, 2020, and 2021. In 2010, post the economic crash of 2008, there were talks of immigration reform and their integration, which could’ve motivated more migrants to take on the journey. Between 2020 and 2021 there was another record high with migrants recovered, which I would predict came as a result of the pandemic. The pandemic marked countries economies for years to come, which could have worsened their living conditions pushing them to look for more opportunities in the United States. As the pandemic lessened, the protocols surrounding it decreased, which could have also increased the number of migrants attempting to enter the United States, which in turn made them face dangerous conditions that cost them their life.
Create a boxplot of monthly remains recovered by year. Below is shell code to get you started. This will produce an unacceptable plot but it will get you on your way. Adorn this plot with titles, color (if you wish), and any other fancy things to make it more interesting. Following this, provide an interpretation of the plot. What do we learn from this plot about migrant remains recovered?
#Shell code
mdy$year<-as.factor(mdy$year)
bp<-ggplot(mdy, aes(remainsmonth, year)) +
geom_boxplot(fill = "lightskyblue4") +
scale_x_continuous(breaks=seq(0,70,5)) +
labs(title="Monthly Migrant Remains Recovered per Year, 2000-2025",
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
y="Year",
x="Remains Month") +
theme(axis.text.x = element_text(size=9))
bp
The plot reveals variation in the amount of migrants recovered every month by year. The median number of migrants recovered each month by year hovered within similar ranges and continued to have variation in their respective interquartile ranges and outliers. These are predominantly controlled by the political state of the United States and the conditions of the paths they take. We must also consider the conditions of their native homes and how those circumstances often push them to brave the dangerous paths of the desert. These often inflate the mean because for the majority of the months less migrants are found compared to the two hottest months in the year: June and July. That is why the thin line going through the boxes is sometimes long since there is more variation in the amount of migrants found across different months.