Project 1: Migrant deaths

Understanding the migrant death issue

The data on migrant deaths comes from the Arizona OpenGIS Project, cosponsored by Humane Borders and the Pima County Office of Medical Examiner. I’ve edited the data set to include all recorded migrant deaths between 2000 up through March 2025. The objective of this project is to give you continued experience in working with and making sense of quantitative data, while at the same time, giving you the opportunity to better understand at a deep level, the nature of the migrant death issue. Above all else, my goal is to always honor and humanize those who have died in the desert. These data permit a better understanding of the death crisis.

For purposes of this project, you will be asked to produce a number of high-quality visualizations as well as compute some basic univariate statistics. Your job is to take what I have provided, analyze it, and tell the story I want to know something important, interesting, and useful about the migrant death crisis given the tasks assigned to you. Ultimately, when presented with information, you need to get practice in learning how to engage it. While this assignment is tethered to the migrant death data set, all skills used here would be transferrable to any data set.

What I am looking for here is something other than mechanical interpretations. I want to see creativity. What do I mean by creativity? To be blunt, creativity is not repeating numbers or statistics you produce in a table. I don’t need anyone to do that as I can see it with my own eyes. Creativity paints a picture as to what the human picture of the death crisis looks like, looking at the observed data. Answers that are mechanical are usually nonanalytical. Creativity means bringing in information and context outside of the confines of the specific charts, plots, or questions. In the end, if you simply report the results with no attempt at interpretation, I would not expect anything above a “C”-level grade (i.e. you just reproduce in words what I see in the tables or charts). In class, we will go over in great detail tips on how to avoid these problems, so follow those tips and you’re going to be well on your way.

This assignment is worth 800 points and is due by 11:59 PM on April 29.

Getting started

The data file is a csv file saved to my GitHub site.

## Rows: 4033 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): ML Number, Name, Sex, Reporting Date, Surface Management, Location...
## dbl  (7): Age, Corridor Code, Condition Code, Latitude, Longitude, UTM X, UTM Y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

##   ML Number             Name               Sex                 Age       
##  Length:4033        Length:4033        Length:4033        Min.   : 0.00  
##  Class :character   Class :character   Class :character   1st Qu.:24.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :30.00  
##                                                           Mean   :31.46  
##                                                           3rd Qu.:38.00  
##                                                           Max.   :99.00  
##                                                           NA's   :1398   
##  Reporting Date     Surface Management   Location         Location Precision
##  Length:4033        Length:4033        Length:4033        Length:4033       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Corridor Code     Corridor         Cause of Death     OME Determined COD
##  Min.   : 0.00   Length:4033        Length:4033        Length:4033       
##  1st Qu.: 6.00   Class :character   Class :character   Class :character  
##  Median : 7.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 7.03                                                           
##  3rd Qu.: 8.00                                                           
##  Max.   :13.00                                                           
##                                                                          
##  Condition Code Body Condition     Post Mortem Interval    State          
##  Min.   :1.00   Length:4033        Length:4033          Length:4033       
##  1st Qu.:1.00   Class :character   Class :character     Class :character  
##  Median :4.00   Mode  :character   Mode  :character     Mode  :character  
##  Mean   :3.84                                                             
##  3rd Qu.:6.00                                                             
##  Max.   :8.00                                                             
##                                                                           
##     County             Latitude       Longitude          UTM X       
##  Length:4033        Min.   :31.33   Min.   :-114.8   Min.   :143294  
##  Class :character   1st Qu.:31.75   1st Qu.:-112.3   1st Qu.:381244  
##  Mode  :character   Median :31.96   Median :-111.8   Median :424426  
##                     Mean   :32.00   Mean   :-111.8   Mean   :422503  
##                     3rd Qu.:32.21   3rd Qu.:-111.3   3rd Qu.:468782  
##                     Max.   :33.86   Max.   :-109.1   Max.   :685151  
##                                                                      
##      UTM Y        
##  Min.   :3466477  
##  1st Qu.:3513324  
##  Median :3536711  
##  Mean   :3540649  
##  3rd Qu.:3565118  
##  Max.   :3751034  
##

Creating year variable from date function and inspecting migrant deaths by year

Because we may want to look at yearly data, it’s useful to generate a variable that records the calendar year in which migrant remains were found. The code in the chunk below does just this. In addition to creating the new variable, I use the R command tabyl to produce a table of migrant deaths by year. In all, there are about 4,000 recorded deaths in the time frame of the data set I’ve created. The table will show you the number of remains recovered by year along with the proportion of total deaths each year accounts for. So in 2010, we see 224 remains were recovered. The proportion of the total number of deaths accounted for by this year is 0.05476773 (i.e. \(\frac{224}{4090}\)). Multiply this proportion by 100 and you get the percent contribution. For 2010, about 5.5% of all the recovered remains occurred in 2010. One way to quickly assess the persistence of the death crisis is to inspect the proportions. If the crisis was abating, we’d expect to see a substantial decline in the proportion. If the crisis is persistent, we’d expect to see these proportions to be very similar across time. (Note that 2025 will be a very small number because we only have partial data for this year.) What do you see when you look at these proportions?

md$date <-  as.Date(md$`Reporting Date`,'%m/%d/%y')

md$year <- as.numeric(format(md$date,'%y'))

md$month <- as.numeric(format(md$date, '%m'))


#This ensures the year is reported as "2000, 2001, ... ,2024" instead of "0,1, ... , 24".
md$year20 <- 2000 + md$year



tabyl(md$year20)

##  md$year20   n     percent
##       2000  74 0.018348624
##       2001  77 0.019092487
##       2002 147 0.036449293
##       2003 155 0.038432928
##       2004 169 0.041904290
##       2005 196 0.048599058
##       2006 168 0.041656335
##       2007 214 0.053062237
##       2008 162 0.040168609
##       2009 190 0.047111332
##       2010 222 0.055045872
##       2011 178 0.044135879
##       2012 154 0.038184974
##       2013 165 0.040912472
##       2014 127 0.031490206
##       2015 135 0.033473841
##       2016 147 0.036449293
##       2017 118 0.029258616
##       2018 120 0.029754525
##       2019 136 0.033721795
##       2020 213 0.052814282
##       2021 215 0.053310191
##       2022 171 0.042400198
##       2023 197 0.048847012
##       2024 155 0.038432928
##       2025  28 0.006942723

Task 1: Visualizing migrant deaths

For this task, I want you to produce a barplot of migrant remains recovered by year. You can use the code below to generate the plot. Often (most always), it’s easier to visualize quantitative data than looking at a table of data. The code in the chunk below will create a barplot where each bar corresponds to the number of migrant remains recovered in each year. When you look at this plot, what do you see? What interpretation would you give to this? Does the plot show the crisis abating? Does it seem persistent? Is it getting worse? Your interpretation should be well-written, clear, and non-mechanical. The plot and the interpretation is worth 100 points.

labelsforx<- c("2000", "2001", "2002", "2003", "2004",              "2005", "2006", "2007", "2008", "2009", "2010",
                          "2011", "2012", "2013", "2014", "2015",
                          "2016", "2017", "2018", "2019", "2020",
                          "2021", "2022", "2023", "2024", "2025")

ggplot(md, aes(year20)) +
  geom_bar(fill = "lightskyblue4") +
     scale_x_continuous(breaks=seq(2000,2025,1), labels= labelsforx) +
    labs(title="Migrant remains recovered by year along the Arizona/Mexico border, 2000-2025", 
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
          y="Number recovered", 
          x="Year") +
  theme_classic() +
    theme(axis.text.x = element_text(size=8, angle=45, hjust=1),
            axis.ticks = element_blank())

Task 1 answer should go here.

In our data, we find that there was a large spike in migrant remains recovered along the border starting in 2002.There has not been numbers near as low as they were prior to 2002 since then, which could be due to multiple factors. One possible interpretation could be that the numbers are just proportionate to rising immigrant numbers from Mexico to Arizona, however, published data proves that incorrect.From the Pew research center(https://www.pewresearch.org/chart/us-hispanics-mexican-origin-population), we see that between 2000 to 2010 there was only a 34.48% increase in foreign-born population of Mexican origin in the united states, while between just 2001 to 2002 the number of migrant remains recovered had gone up by around 100%. This comparison causes us to consider,other possible causes for the sudden increase in migrant remains found along the border. Interestingly enough, examining the rest of the graph, we find certain groups of time, leading to clear trends of decreasing, or increasing remains found, increadsing from 2000-2010, and decreasing from 2010-2020. These trends, interestingly enough are found in some correlation with the term lengths of the presidents.from 2009 to 2017, Obama was president, and we see a great decline, which leads us to conclude that a portion of this trend can be attributed to the presidency and its policies. However, we see in 2020 a large spike, which is likely do to the COVID-19 virus. Due to COVID-19, Title 42 essentially prevented a lot of migrants from coming in even legally, pushing migrants to opt to attempt to cross the border illegally which proved much more dangerous and could explain the spike in deaths.

Task 2: Lethality by month

For this task, I want you to create a barplot of the number of migrant remains recovered by calendar month. Using the code in the previous chunk will be useful but it will need to be modified. After you create the plot, provide a substantive interpretation of the results. Your score for this will be based on quality of the plot (100 points) and quality of the write-up (100 points)

#Type code in this chunk
labelsforxm<- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

ggplot(md, aes(month)) +
  geom_bar(fill = "lightskyblue4") +
     scale_x_continuous(breaks=seq(1,12,1), labels= labelsforxm) +
    labs(title="Migrant remains recovered by month along the Arizona/Mexico border, 2000-2025", 
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
          y="Number recovered", 
          x="Calendar Month") +
  theme_classic() +
    theme(axis.text.x = element_text(size=8, angle=45, hjust=1),
            axis.ticks = element_blank())

#### Task 2 answer should go here. In our graph above we find a rather interesting trend, as the peak of migrant death remains stem around the summer months. Looking at the average temperature in the southern arizona area during the summer, we find the on average, the temperature varies between 74, which would be bearable, to a whopping 105 degrees farenheit, with a record high of 119. These conditions could have led to far more migrant deaths during the transition to the united states, as the journey becomes dangerous due to the heat. we also see a relatively level amount of deaths in the other months, with small increases in the months leading up to summer. This is likely because no matter how cold it gets, it is always possible to dress for the cold and bundle up, versus we as humans can only shed so many layers. This spike in the summer months could also be from a desire of migrants to be able to support their family during the harsher winter months. Migrating in the early summer, June and July months, not only allows for migrants to stay home during the winter, helping support their family, but if immigration is successful, then moving as early as possible in the summer allows migrants to secure jobs or work faster, meaning they can still support their family during the harsher winter months, albeit from afar.

Task 3: Gender and migrant deaths

How do migrant deaths and gender relate to one another? The code in the chunk below creates a “factor-level” variable recording the gender of the migrant. Since gender is not always determined, there is a category called “undetermined.” Using the tabyl function, I create a table showing the total number of remains recovered that are male, female, and undetermined. For this task, reproduce the code below and provide a brief interpretation of the results. This task is worth 50 points.

md$gender<-factor(md$Sex,
                  levels=c("male", "female", "undetermined"),
                  labels=c("male", "female", "undetermined"))
tabyl(md$gender)

##     md$gender    n   percent
##          male 3292 0.8162658
##        female  605 0.1500124
##  undetermined  136 0.0337218

Task 3 answer should go here.

What we can gain from an interpretation of this data is insight into the gender roles and gender biases that can be found in the areas, and how it may have influenced the immigration of people from mexico to america through arizona. As a pretty general trend in the data, we see a large percentage of the remains found are male. For one, this could be because of traditonal gender roles, where it would be expected for the male to take the risk of traveling to the border, and making it across. On top on this, it would also be seen as more traditional for the man to work in the U.S and support his family from afar rather than the other way around. Another reason for this could be found in the wage gap, or even hiring practices found in the U.S.Men, on average, get paid more than women in the U.S and are often prioritized in many hiring practices, leading for more incentive and a larger payoff for the men making the journey, versus the women.

Purely on the three statistics we see here, a key takeaway should be about the brutality and lack of respect american officials hold towards these migrants. About 3% of the people in this statistic are marked as undetermined, one hundred and thirty-six people. This demonstrates a blatant disregard of human life and its inherent value and when these bodies are found, rather than attempting to discover more about them or return them to their family, they are treated so inhumanely that they are just marked down as a number on a statistic, not even their gender assigned to them. This third statistic is, from this perspective, purely born of out of a lack of respect for these migrants, an a laziness found in the system uncovering them.

Task 4: Gender, death and time

On Github, there is a .csv file called mdg.csv. You will need to access this file. The code below will do this.

rm(list=ls(all=FALSE)) #Keep this line in 
rm(list=ls(all=TRUE)) #Keep this line in 

mdg="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/mdg.csv"

mdg<-read_csv(url(mdg))

## New names:
## Rows: 25 Columns: 5
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (4): year, male, female, undetermined lgl (1): ...5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...5`

summary(mdg)

##       year           male            female        undetermined   
##  Min.   :2000   Min.   :0.6400   Min.   :0.0300   Min.   :0.0000  
##  1st Qu.:2006   1st Qu.:0.7700   1st Qu.:0.1000   1st Qu.:0.0000  
##  Median :2012   Median :0.8100   Median :0.1400   Median :0.0000  
##  Mean   :2012   Mean   :0.8244   Mean   :0.1496   Mean   :0.0272  
##  3rd Qu.:2018   3rd Qu.:0.8800   3rd Qu.:0.2000   3rd Qu.:0.0300  
##  Max.   :2024   Max.   :0.9700   Max.   :0.2400   Max.   :0.1900  
##    ...5        
##  Mode:logical  
##  NA's:25       
##                
##                
##                
##

This data set has 4 variables: year, male, female, and undetermined. The variables “male,” “female”, and “undetermined” give the proportion of remains recovered for each group by year. So if in a given year, 0.76 of the remains recovered were male, this implies 76 percent were male. To tidy up the plot I am going to ask you to create three new objects called “Male,” “Female,” and “Undetermined” that convert the proportions into percentages.

It may be useful to visualize these data by way of a line plot. Below is the start of some code to create a time-series plot. Provide the relevant code to complete the plot. Your plot, if properly done will have three lines corresponding to the three groups (Male, Female, Undetermined) and have informative main, \(y\), and \(x\) titling. After creation of the plot, provide a substantive interpretation of the results. What do we see, what do we learn?

R tasks of reading in the data, creating the new objects, and creating the plot will be worth 100 points. The analysis will be worth 100 points.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#### Task 4 answer should go here.

There seems to be a spike in male migrants vs female migrant remains found between 2015 and 2020. This is likely attributed to President trumps term starting in 2017, where border policy and crossing became very strict and cracked down upon, the more dangerous conditions to try and immigrate led to more men making the journey, subjecting themselves to the dangers of the travel, rather than women. In addition, consistently since the year 2000 male deaths were consistently increasing, while the female deaths were consistently decreasing. This could be attributed to, not only a disproportionate ratio of men versus women making the trip to the border, but also because prior to around 2017, the U.S border policy could have been more generous to women seeking asylum in the united states, especially if these women traveled up with their kids. However, starting in 2017, when President Trump first took office, border policy was greatly strengthened and became much harsher and strict, leading to more woman making the more perilous journey of coming up the illegal way which was more likely to end up in death.

Task 5: Visualizing deaths by month and year

I have created a dataset called “monthlydeathsbyyear.csv.” And it can be found on my Github site. The code in the chunk below will access the file.

rm(list=ls(all=TRUE)) #Keep this line in 

mdy="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/monthlydeathsbyyear.csv"

mdy<-read_csv(url(mdy))

## Rows: 299 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): year, month, obs, yearmonth, remainsmonth
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

summary(mdy)

##       year          month             obs      yearmonth       remainsmonth  
##  Min.   :2000   Min.   : 1.000   Min.   :1   Min.   :200001   Min.   : 1.00  
##  1st Qu.:2006   1st Qu.: 4.000   1st Qu.:1   1st Qu.:200605   1st Qu.: 7.00  
##  Median :2012   Median : 7.000   Median :1   Median :201207   Median :12.00  
##  Mean   :2012   Mean   : 6.512   Mean   :1   Mean   :201210   Mean   :13.39  
##  3rd Qu.:2018   3rd Qu.: 9.500   3rd Qu.:1   3rd Qu.:201810   3rd Qu.:16.00  
##  Max.   :2024   Max.   :12.000   Max.   :1   Max.   :202412   Max.   :69.00

In order to take advantage of some R capabilities with time series data, I am creating a date variable that will allow us to assess migrant remains recovered by month and year. The code in the chunk below will do this. I will discuss the logic of this in class.

mdy$ym<-as.Date(with(mdy,paste(year,month,obs, sep="-")),"%Y-%m-%d")
mdy

## # A tibble: 299 × 6
##     year month   obs yearmonth remainsmonth ym        
##    <dbl> <dbl> <dbl>     <dbl>        <dbl> <date>    
##  1  2000     1     1    200001            4 2000-01-01
##  2  2000     2     1    200002            7 2000-02-01
##  3  2000     3     1    200003            6 2000-03-01
##  4  2000     4     1    200004            5 2000-04-01
##  5  2000     5     1    200005           11 2000-05-01
##  6  2000     6     1    200006           15 2000-06-01
##  7  2000     7     1    200007            5 2000-07-01
##  8  2000     8     1    200008            7 2000-08-01
##  9  2000     9     1    200009            6 2000-09-01
## 10  2000    10     1    200010            4 2000-10-01
## # ℹ 289 more rows

View(mdy)

The code in the chunk below will produce a time series plot of the migrant death data by month and year. The plot is saved as an object I called “p.” Run this code to verify it works.

# Usual area chart
p <- mdy %>%
  ggplot( aes(x=ym, y=remainsmonth)) +
  geom_area(fill="#69b3a2", alpha=0.5) +
  geom_line(color="#69b3a2") +
  labs(title="Migrant remains recovered on the Arizona/Mexico border, 2000-2025", 
subtitle="Data from the Arizon OpenGIS Project (https://humaneborders.info/app/map.asp)",
  y="Remains recovered", x="Year and month") +
  theme_bw()
p

Next, run this code and see what it is you produce! For this part of the task, write an interpretation of this plot. What do we learn substantively about the migrant death crisis over time? This write-up is worth 100 points.

# Turn it interactive with ggplotly
p <- ggplotly(p)
p

Task 5 write-up should go here.

This graph further illustrates the data we uncovered earlier, showing approximately one large spike in migrant remains found on the border a year, which is the summer months. However it also gives a better visualization of how bad the crisis is today and is becoming. Although we still have peaks in the data every year, compared to the 2000-2010 range the peaks are on average lower, although the dips are higher. This can lead us to believe that although the trip is becoming more manageable, or less people may be migrating in the summer months, more people are attempting to migrate year round.We are also seeing what can be described as a crisis normalization. Although the spikes each year have gone down over time, we see the average remains found go up, to the point where a low end past 2020, could be compared to a peak around 2000. Assuming travel conditions have likely gotten better over time, other reasons for this inflation in remains could be due to stricter border policy, or even more brutality from border control.

Task 6: See if you can do this!

What is this plot showing us? What is going on in the plot? This task is worth 50 points.

pal <- c("green", "blue", "red")

fig <-  plot_ly(data = mdy ,x =  ~ym, y = ~remainsmonth, color = ~remainsmonth, colors = pal, type = 'scatter', mode = 'markers')%>%
  layout(title = 'Scatterplot of migrant remains recovered by month and year', plot_bgcolor = "white",
               xaxis=list(range = list("2000-01-01", "2025-3-01")))

fig

This graph helps us not only helps us precisely identify the migrant remains per month per year, but also gives a clear cut view of outliers. For example we see outliars in July 2005,2010,and 2023, as well as one in June of 2021. These outlien lead us to look for other possible solutions to the question of why and what is leading to these migrants passing away along or near the border, as even as July is a summer month, its temperatures and environment, on average, do not vary very extremely from the other summer months. To pick one month in specific to dissect, we can turn to July 2005, the largest outline in this graph, it is extremely separated from the rest of the graph. Possible reasons fr this could be due to the temperature spike, with July 2005 recording record temperatures in the area around July 2005, as well as a crackdown on border operations by the United states, which was seen in Operation Streamline. We also see a dramatic drop in migrant deaths in the following month, which is the only month of august on the graph with less than 20 remains found. This was likely due to a hysteria, or chain of information surrounding the dire conditions of the month prior, encouraging migrants to lie low for a while. #### Task 7: Univariate statistics

For this task, I want you to provide a substantive answer to the following questions (7.1 and 7.2 are worth 50 points each; 7.3 is worth 100 points):

Task 7.1: Statistics for all months/years

7.1. What is the mean, standard deviation, median, and IQR for the variable called “remainsmonth.” This variable records the number of deaths by month by year. What do we learn from these statistics? Is there evidence of skewness in the data?

#Insert code to answer this here
mdy %>%
summarize(
  mean_value = mean(remainsmonth),
  median_value = median(remainsmonth),
  sd_value = sd(remainsmonth),
  iqr_value = IQR(remainsmonth),
  count= n()
)

## # A tibble: 1 × 5
##   mean_value median_value sd_value iqr_value count
##        <dbl>        <dbl>    <dbl>     <dbl> <int>
## 1       13.4           12     9.03         9   299

Task 7.1 answer should go here.

From these statistics we get a greater grasp of the average impact of the migrant remains found per month. By seeing the mean, we can understand that there are around 13.4 migrant remains found per month. This allows us to grasp the gravity of the situation, especially not addressing it. Each month a solution is not found or even worked towards, we, as human beings and american citizens, compromise our core human values, by allowing another 13 people to pass away due to this crisis. In addition to this the data gives us more insight into how the distribution of deaths is throughout the year. We find that the mean is greater than the median, meaning the graph is skewed to the right, letting us interpret that there must be inconsistencies or outliars. We can use this data to help address that there is significant danger and urgency to solve this problem before more outliars appear, especially considering the standard deviation is close to 9 migrants a month, each month wasted is a gamble that 9 or more people could die.

Task 7.2: Univariate statistics over time

7.2. Compute the mean number of remains recovered by year. Based on these results, what do we learn? What years stand out to you?

#Insert code to answer this here
mdy %>%
  group_by(year) %>%
  summarize(
    mean_value = mean(remainsmonth)
  )

## # A tibble: 25 × 2
##     year mean_value
##    <dbl>      <dbl>
##  1  2000       6.17
##  2  2001       7   
##  3  2002      12.2 
##  4  2003      12.9 
##  5  2004      14.1 
##  6  2005      16.3 
##  7  2006      14   
##  8  2007      17.8 
##  9  2008      13.5 
## 10  2009      15.8 
## # ℹ 15 more rows

Task 7.2 answer should go here.

Based on these results we learn about the trends across years and we gain insight into possible reasons for spikes or dips in migrant deaths. We can use this data to identify strong trends of decrease, and then look into policy at that time, or world events, to determine whether it is something that we can control, and ideally institute to lower the migrant death toll. Years that stand out to me are 2018, the largest death toll per month, which stands out as 2018 never had any big outliers on our graphs, yet seems to have the highest average death toll per month, compared to the other years. It is important for us to look into this in the pursuit of a better migratory process. What also stands out to me is the year 2024, the most recent full year in our data. It seems to have a significant drop compared to the years before, and as our country progresses, our records of actions, policy, and enforcement only improve. A recent change stand out and is incredibly important as it is easy to track and will be even easier to institute as a permanent change as the infrastructure is already there.

Task 7.3: Time-series box-plot

7.3. Create a boxplot of monthly remains recovered by year. Below is shell code to get you started. This will produce an unacceptable plot but it will get you on your way. Adorn this plot with titles, color (if you wish), and any other fancy things to make it more interesting. Following this, provide an interpretation of the plot. What do we learn from this plot about migrant remains recovered?

#Shell code

mdy$year<-as.factor(mdy$year)

bp<-ggplot(mdy, aes(remainsmonth, year)) + 
  geom_boxplot()+
  ggtitle("Distribution of Migrant Remains Recovered by Year,2000-2025")

bp

Task 7.3 answer should go here.

This graph allows us to see the difference between policy affecting the migrant deaths, versus natural causes often out of our hands. For example, when the box of a year is smaller, that means there was much less standard deviation from the mean, and a lower IQR. This means that a similar amount of migrants died each month, likely being attributed to the policy and border enforcement routine during the year. However, when we see a larger box, this means there were many more variables at play, likely natural causes such as a heat wave, bad weather conditions, or in 2020, the COVID-19 virus and its impacts, which could lead to outliers skewing the data. We can also see this that the increase of generous policies directed to abate this crisis, during the period of 2010 to 2020, the standard deviation across the months also went down.