What may influence numbers in deportation statistics ?
I will attempt to uncover any hidden reasons that those numbers would be affected.
I have three datasets to work with. Two of the datasets contain data for the years 2019 and 2024.
The detainee reports indicate that most of these individuals were arrested at the border, while the remainder were apprehended in ICE’s area of responsibility (AOR). This list comprises both male and female detainees, with none classified as minors or children. An examination of the dataset reveals facilities for 2019 and 2024. A reporter from the New York Times claimed that Biden deported more immigrants than Trump did during his presidency.
We will examine these numbers, the differences in size, and correlate them to the border crossing entry dataset. This dataset includes different measures of the traffic type, port, and vehicles coming in from the Mexico and Canadian border.
It seems appropriate to state that among those identified as criminals are immigrants who have been arrested with prior convictions. Some individuals in the criminal category may have been arrested for repeatedly crossing the border, resulting in criminal records.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
New names:
Rows: 109 Columns: 29
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(13): Name, Address, City, State, AOR, Type Detailed, Male/Female, Guara... dbl
(15): Zip, FY24 ALOS, Level A, Level B, Level C, Level D, Male Crim, Mal... lgl
(1): ...29
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...29`
Facilities_2019 <-read_csv('2019_Facilities.csv')
New names:
Rows: 213 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(15): Name, Address, City, State, AOR, Type Detailed, Male/Female, Last ... dbl
(11): Zip, FY19 ALOS, Level B, Level C, Level D, Male Crim, Female Crim,... num
(5): Level A, Male Non-Crim, No ICE Threat Level, Mandatory, Guaranteed... lgl
(1): ...32
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...32`
Rows: 52767 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Port Name, State, Border, Date, Measure, Point
dbl (4): Port Code, Value, Latitude, Longitude
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Looking at the measure, I can get an idea of the crossing data. Crossing on foot would be pedestrians, while vehicles and passengers in those vehicles are also added to the count. I will focus on the counts of pedestrians and passengers only. So far, four measure types will have confirmed counts of individuals. As the earlier question suggested, this number should show the rise or fall of detainees for that year by state.
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
Plotting a horizontal bar graph, we can observe the comparisons with the total number of crossing personnel at the borders.
ggplot(data = border_data_total, aes(x = state, y = total_people_income, fill =factor(date))) +geom_col(position =position_dodge(width =0.7)) +coord_flip() +labs(title ="Comparison of Border Crossings by State (2019 vs 2024)",x ="State",y ="Total People",fill ="Year" ) +theme_minimal() +scale_y_continuous(labels =label_number(scale_cut =cut_short_scale()) )
Texas, California, and Arizona showing the highest numbers of personnel documented entering at that time.
Preparing my dataset for the regression model , where I filter for 2019 and 2024 entries for detainees before binding my 2019 and 2024 detainee datasets.
To merge my border crossing values with my facilities dataset, I will convert the states to abbreviations that match the facilities DataFrame. This will result in a fully merged DataFrame.
Call:
lm(formula = total_detainees ~ total_people_income, data = df_2024)
Residuals:
Min 1Q Median 3Q Max
-4028.2 -367.7 25.2 298.6 3335.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.890e+02 7.822e+02 -0.369 0.72141
total_people_income 1.013e-04 2.163e-05 4.683 0.00158 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1938 on 8 degrees of freedom
Multiple R-squared: 0.7327, Adjusted R-squared: 0.6993
F-statistic: 21.93 on 1 and 8 DF, p-value: 0.001576
hist(residuals(model_2024))
For 2019: The multiple R-squared is 0.7308, indicating that approximately 73% of the variation in detainee counts is explained by the incoming traffic variable.
For 2024: The multiple R-squared is 0.7327, which similarly indicates around 73%.
Here I tried to filter as best as I can without losing valuable data. This is because states like Texas have detainee counts of over 14,000, while states like Montana have only about 2 detainees.
A simple scatter plot to show the relationship and any possible differences between both years.
ggplot(df_filtered_bob, aes(x = total_people_income, y = total_detainees)) +geom_point() +geom_smooth(method ="lm", color ="red", se = T) +facet_wrap(~ Year) +labs(title ="Relationship Between Incoming Traffic and Detainee Counts",x ="Total People (Incoming Traffic)",y ="Total Detainees" ) +theme_minimal() +scale_x_continuous(labels = comma)
`geom_smooth()` using formula = 'y ~ x'
My linear models seem to indicate that there is a relationship between the income from border traffic and the number of detainees, as stated on the ICE website . The southern border has more income traffic, and the southern states have higher numbers of incarcerated detainees. For this linear model, I did not filter out the outliers. Even though the numbers from 2019 appear to be greater than those of 2024 , there really isn’t much difference in who holds office. As long as there is an influx of people coming in from the neighboring countries (Canada and Mexico), the number of immigrants in these detention centers will increase. What is the alternative here? If the president’s numbers indicate that during their term they deported more illegal immigrants, they do not mention that the consequence for those who are detained would be longer incarceration.
For those who were in the non-criminal section, as I stated earlier, if they are deported and return, they will be arrested again and booked with a prior criminal record. I spent considerable time figuring out how to plot my final visualizations. My original option was to display the income numbers and detainee counts per state on a hexagonal tile map, but I did not have enough time to complete the code. Most of the time spent here was on cleaning and organizing my datasets. Based on many other variables and methods I did not consider, different results could have emerged. The linear regression models can be improved. Both linear models show a P-value well under 0.05 for the rise in detainees per border income traffic. Due to my mass aggregation of values from the datasets, especially the income crossing data, my plot seems to have anomalies and is spaced out.
In conclusion, my results do not answer my proposed questions, but they show that even if one presidents deportation stats is higher than the other, based on the income traffic, both heads of office seem to take almost equal measures on the detainees before eventually deported them from these facilities. I have to state that deportation statistics include many other groups aside from detained immigrants.