ICE_Rports

Author

Nseyo O

Published

March 30, 2025

What may influence numbers in deportation statistics ?

I will attempt to uncover any hidden reasons that those numbers would be affected.

I have three datasets to work with. Two of the datasets contain data for the years 2019 and 2024.

The detainee reports indicate that most of these individuals were arrested at the border, while the remainder were apprehended in ICE’s area of responsibility (AOR). This list comprises both male and female detainees, with none classified as minors or children. An examination of the dataset reveals facilities for 2019 and 2024. A reporter from the New York Times claimed that Biden deported more immigrants than Trump did during his presidency.

We will examine these numbers, the differences in size, and correlate them to the border crossing entry dataset. This dataset includes different measures of the traffic type, port, and vehicles coming in from the Mexico and Canadian border.

It seems appropriate to state that among those identified as criminals are immigrants who have been arrested with prior convictions. Some individuals in the criminal category may have been arrested for repeatedly crossing the border, resulting in criminal records.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
setwd('/Users/oworenibanseyo/Desktop/Data 110 2025/Datasets')
Facilities_2024 <- read_csv('2024_Facilities.csv')
New names:
Rows: 109 Columns: 29
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(13): Name, Address, City, State, AOR, Type Detailed, Male/Female, Guara... dbl
(15): Zip, FY24 ALOS, Level A, Level B, Level C, Level D, Male Crim, Mal... lgl
(1): ...29
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...29`
Facilities_2019 <- read_csv('2019_Facilities.csv')
New names:
Rows: 213 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(15): Name, Address, City, State, AOR, Type Detailed, Male/Female, Last ... dbl
(11): Zip, FY19 ALOS, Level B, Level C, Level D, Male Crim, Female Crim,... num
(5): Level A, Male Non-Crim, No ICE Threat Level, Mandatory, Guaranteed... lgl
(1): ...32
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...32`
Border_crossing <- read_csv('border_crossing_data.csv')
Rows: 52767 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Port Name, State, Border, Date, Measure, Point
dbl (4): Port Code, Value, Latitude, Longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Links to datasets:

  1. https://data.bts.gov/Research-and-Statistics/Border-Crossing-Entry-Data/
  2. https://www.ice.gov/detain/detention-management#stats

Next, I will clean the detention data one by one and perform summary statistics.

head(Facilities_2019)
# A tibble: 6 × 32
  Name             Address City  State   Zip AOR   `Type Detailed` `Male/Female`
  <chr>            <chr>   <chr> <chr> <dbl> <chr> <chr>           <chr>        
1 STEWART DETENTI… 146 CC… LUMP… GA    31815 ATL   DIGSA           Male         
2 SOUTH TEXAS ICE… 566 VE… PEAR… TX    78061 SNA   CDF             Female/Male  
3 ADELANTO ICE PR… 10250 … ADEL… CA    92301 LOS   DIGSA           Female/Male  
4 ELOY FEDERAL CO… 1705 E… ELOY  AZ    85131 PHO   DIGSA           Female/Male  
5 TACOMA ICE PROC… 1623 E… TACO… WA    98421 SEA   CDF             Female/Male  
6 PORT ISABEL      27991 … LOS … TX    78566 SNA   SPC             Female/Male  
# ℹ 24 more variables: `FY19 ALOS` <dbl>, `Level A` <dbl>, `Level B` <dbl>,
#   `Level C` <dbl>, `Level D` <dbl>, `Male Crim` <dbl>, `Male Non-Crim` <dbl>,
#   `Female Crim` <dbl>, `Female Non-Crim` <dbl>, `ICE Threat Level 1` <dbl>,
#   `ICE Threat Level 2` <dbl>, `ICE Threat Level 3` <dbl>,
#   `No ICE Threat Level` <dbl>, Mandatory <dbl>, `Guaranteed Minimum` <dbl>,
#   `Last Inspection Type` <chr>, `Last Inspection Standard` <chr>,
#   `Last Inspection Rating - Final` <chr>, `Last Inspection Date` <chr>, …

I’ll use the structure function to look at the data types

# str(Facilities_2019)
# summary(Facilities_2019)

Let’s go ahead and lower the names for our variable while removing any and all unnecessary spaces

names(Facilities_2019) <- tolower(names(Facilities_2019))
names(Facilities_2019) <- gsub(" ","",names(Facilities_2019))
head(Facilities_2019)
# A tibble: 6 × 32
  name       address city  state   zip aor   typedetailed `male/female` fy19alos
  <chr>      <chr>   <chr> <chr> <dbl> <chr> <chr>        <chr>            <dbl>
1 STEWART D… 146 CC… LUMP… GA    31815 ATL   DIGSA        Male                45
2 SOUTH TEX… 566 VE… PEAR… TX    78061 SNA   CDF          Female/Male         30
3 ADELANTO … 10250 … ADEL… CA    92301 LOS   DIGSA        Female/Male         54
4 ELOY FEDE… 1705 E… ELOY  AZ    85131 PHO   DIGSA        Female/Male         40
5 TACOMA IC… 1623 E… TACO… WA    98421 SEA   CDF          Female/Male         74
6 PORT ISAB… 27991 … LOS … TX    78566 SNA   SPC          Female/Male          7
# ℹ 23 more variables: levela <dbl>, levelb <dbl>, levelc <dbl>, leveld <dbl>,
#   malecrim <dbl>, `malenon-crim` <dbl>, femalecrim <dbl>,
#   `femalenon-crim` <dbl>, icethreatlevel1 <dbl>, icethreatlevel2 <dbl>,
#   icethreatlevel3 <dbl>, noicethreatlevel <dbl>, mandatory <dbl>,
#   guaranteedminimum <dbl>, lastinspectiontype <chr>,
#   lastinspectionstandard <chr>, `lastinspectionrating-final` <chr>,
#   lastinspectiondate <chr>, secondtolastinspectiontype <chr>, …

Since there are no totals on the given detention table, I will combine males and females, criminals and non-criminals after renaming.

F_2019 <- Facilities_2019 |>
  rename(
    male_noncrim = `malenon-crim`,female_noncrim = `femalenon-crim`
  ) |>
  select(state, city, malecrim, male_noncrim, femalecrim, female_noncrim) |>
  drop_na()

Using group_by I will categorize by state and then collect totals for my four detainee categories.

F_2019_State <- F_2019 |>
group_by(state) |>
summarise(
  total_male_crim      = sum(malecrim),
  total_male_noncrim   = sum(male_noncrim),
  total_female_crim    = sum(femalecrim),
  total_female_noncrim = sum(female_noncrim)
) |>
  mutate(total_detainees = total_male_crim + total_male_noncrim + total_female_crim + total_female_noncrim)

head(F_2019_State)
# A tibble: 6 × 6
  state total_male_crim total_male_noncrim total_female_crim
  <chr>           <dbl>              <dbl>             <dbl>
1 AL                179                134                 4
2 AR                  2                  2                 0
3 AZ               1055               2571               114
4 CA               1646               2121               141
5 CO                427                593                30
6 FL               1212               1033                64
# ℹ 2 more variables: total_female_noncrim <dbl>, total_detainees <dbl>
summary(F_2019_State)
    state           total_male_crim   total_male_noncrim total_female_crim
 Length:48          Min.   :   0.00   Min.   :   0.00    Min.   :  0.00   
 Class :character   1st Qu.:  15.75   1st Qu.:   4.75    1st Qu.:  0.00   
 Mode  :character   Median : 135.00   Median :  90.00    Median :  5.00   
                    Mean   : 369.27   Mean   : 498.90    Mean   : 32.88   
                    3rd Qu.: 426.25   3rd Qu.: 365.75    3rd Qu.: 19.75   
                    Max.   :3893.00   Max.   :6929.00    Max.   :674.00   
 total_female_noncrim total_detainees  
 Min.   :   0.00      Min.   :    1.0  
 1st Qu.:   0.00      1st Qu.:   27.5  
 Median :   5.50      Median :  257.5  
 Mean   : 122.60      Mean   : 1023.6  
 3rd Qu.:  44.75      3rd Qu.: 1020.2  
 Max.   :3340.00      Max.   :14836.0  

Quite a depressing summary statistics we see above.

I’ll be doing the same exact thing to the 2024 facilities dataset.

names(Facilities_2024) <- tolower(names(Facilities_2024))
names(Facilities_2024) <- gsub(" ","",names(Facilities_2024))

F_2024 <- Facilities_2024 |>
  rename(
    male_noncrim = `malenon-crim`,female_noncrim = `femalenon-crim`
  ) |>
  select(state, city, malecrim, male_noncrim, femalecrim, female_noncrim) |>
  drop_na()

F_2024_State <- F_2024 |>
group_by(state) |>
summarise(
  total_male_crim      = sum(malecrim),
  total_male_noncrim   = sum(male_noncrim),
  total_female_crim    = sum(femalecrim),
  total_female_noncrim = sum(female_noncrim)
) |>
  mutate(total_detainees = total_male_crim + total_male_noncrim + total_female_crim + total_female_noncrim)

head(F_2024_State)
# A tibble: 6 × 6
  state total_male_crim total_male_noncrim total_female_crim
  <chr>           <dbl>              <dbl>             <dbl>
1 AL                 15                 11                 2
2 AR                  1                  1                 0
3 AZ                510               1362                60
4 CA                753               1615                23
5 CO                230                705                17
6 FL                684                630                33
# ℹ 2 more variables: total_female_noncrim <dbl>, total_detainees <dbl>
summary(F_2024_State)
    state           total_male_crim  total_male_noncrim total_female_crim
 Length:41          Min.   :   0.0   Min.   :   0.0     Min.   :  0.00   
 Class :character   1st Qu.:  12.0   1st Qu.:   4.0     1st Qu.:  0.00   
 Mode  :character   Median :  52.0   Median :  35.0     Median :  1.00   
                    Mean   : 228.1   Mean   : 541.6     Mean   : 13.24   
                    3rd Qu.: 240.0   3rd Qu.: 356.0     3rd Qu.:  8.00   
                    Max.   :2244.0   Max.   :7818.0     Max.   :162.00   
 total_female_noncrim total_detainees  
 Min.   :   0.0       Min.   :    2.0  
 1st Qu.:   0.0       1st Qu.:   15.0  
 Median :   1.0       Median :  114.0  
 Mean   : 101.5       Mean   :  884.5  
 3rd Qu.:  16.0       3rd Qu.:  719.0  
 Max.   :1439.0       Max.   :11663.0  

Now let’s clean up the border crossings to give us crossings by state.

I will need to look at the Measure column to see the type of incoming traffic from various ports. I will aggregate by state.

names(Border_crossing) <- tolower(names(Border_crossing))
names(Border_crossing) <- gsub(" ","",names(Border_crossing))

What are the different crossing Measures entered?

B_crossing_df <- Border_crossing 
  
measures_1 <- B_crossing_df |>
  distinct(measure) |>
  arrange(measure)

print(measures_1)
# A tibble: 12 × 1
   measure                    
   <chr>                      
 1 Bus Passengers             
 2 Buses                      
 3 Pedestrians                
 4 Personal Vehicle Passengers
 5 Personal Vehicles          
 6 Rail Containers Empty      
 7 Rail Containers Loaded     
 8 Train Passengers           
 9 Trains                     
10 Truck Containers Empty     
11 Truck Containers Loaded    
12 Trucks                     

Looking at the measure, I can get an idea of the crossing data. Crossing on foot would be pedestrians, while vehicles and passengers in those vehicles are also added to the count. I will focus on the counts of pedestrians and passengers only. So far, four measure types will have confirmed counts of individuals. As the earlier question suggested, this number should show the rise or fall of detainees for that year by state.

border_data_year <- B_crossing_df |>
  mutate(date = as.numeric(str_extract(date, "[0-9]{4}")))

people_measures <- c("Pedestrians", 
                     "Bus Passengers", 
                     "Personal Vehicle Passengers", 
                     "Train Passengers")

border_data_filtered <- border_data_year |>
  filter(date %in% c(2019, 2024),
         measure %in% people_measures)

border_data_summarized <- border_data_filtered |>
  group_by(state, date, measure) |>
  summarise(total_value = sum(value, na.rm = T)) |>
  ungroup()
`summarise()` has grouped output by 'state', 'date'. You can override using the
`.groups` argument.
border_data_total <- border_data_summarized |>
  group_by(state, date) |>
  summarise(total_people_income = sum(total_value, na.rm = T)) |>
  ungroup()
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.

Plotting a horizontal bar graph, we can observe the comparisons with the total number of crossing personnel at the borders.

ggplot(data = border_data_total, aes(x = state, y = total_people_income, fill = factor(date))) +
  geom_col(position = position_dodge(width = 0.7)) +
  coord_flip() +
  labs(
    title = "Comparison of Border Crossings by State (2019 vs 2024)",
    x = "State",
    y = "Total People",
    fill = "Year"
  ) +
  theme_minimal() +
 scale_y_continuous(
    labels = label_number(scale_cut = cut_short_scale())
  )

Texas, California, and Arizona showing the highest numbers of personnel documented entering at that time.

Preparing my dataset for the regression model , where I filter for 2019 and 2024 entries for detainees before binding my 2019 and 2024 detainee datasets.

df_detainees_2019 <- F_2019_State |>
  mutate(Year = 2019)

df_detainees_2024 <- F_2024_State |>
  mutate(Year = 2024)

df_detainees_all <- bind_rows(df_detainees_2019, df_detainees_2024)

To merge my border crossing values with my facilities dataset, I will convert the states to abbreviations that match the facilities DataFrame. This will result in a fully merged DataFrame.

df_people <- border_data_total |>
  rename(Year = date)

df_people_1 <- df_people |>
  mutate(state = state.abb[match(state, state.name)])

df_merged <- df_detainees_all |>
  left_join(df_people_1, by = c("state", "Year"))

df_merged_complete <- df_merged |>
  filter(!is.na(total_people_income), !is.na(total_detainees))


glimpse(df_merged_complete)
Rows: 22
Columns: 8
$ state                <chr> "AZ", "CA", "ID", "ME", "MI", "MN", "MT", "ND", "…
$ total_male_crim      <dbl> 1055, 1646, 12, 0, 159, 261, 1, 2, 550, 426, 3893…
$ total_male_noncrim   <dbl> 2571, 2121, 3, 1, 132, 162, 1, 3, 816, 307, 6929,…
$ total_female_crim    <dbl> 114, 141, 0, 0, 19, 17, 0, 0, 0, 37, 674, 55, 60,…
$ total_female_noncrim <dbl> 681, 475, 0, 0, 42, 8, 0, 0, 0, 56, 3340, 175, 51…
$ total_detainees      <dbl> 4421, 4383, 15, 1, 352, 448, 2, 5, 1366, 826, 148…
$ Year                 <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2…
$ total_people_income  <dbl> 21527115, 71017490, 415516, 3126218, 11169480, 16…

We make or first regressions model for 2019

df_2019 <- df_merged_complete |>
  filter(Year == 2019)
model_2019 <- lm(total_detainees ~ total_people_income, data = df_2019)
summary(model_2019)

Call:
lm(formula = total_detainees ~ total_people_income, data = df_2019)

Residuals:
    Min      1Q  Median      3Q     Max 
-5028.3  -472.9   -20.4   557.7  4581.5 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -1.284e+02  8.173e+02  -0.157 0.878312    
total_people_income  1.343e-04  2.578e-05   5.211 0.000395 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2310 on 10 degrees of freedom
Multiple R-squared:  0.7308,    Adjusted R-squared:  0.7039 
F-statistic: 27.15 on 1 and 10 DF,  p-value: 0.0003952

A histogram to show residuals

hist(residuals(model_2019))

The same done for 2024

df_2024 <- df_merged_complete |> 
  filter(Year == 2024)
model_2024 <- lm(total_detainees ~ total_people_income, data = df_2024)

summary(model_2024)

Call:
lm(formula = total_detainees ~ total_people_income, data = df_2024)

Residuals:
    Min      1Q  Median      3Q     Max 
-4028.2  -367.7    25.2   298.6  3335.0 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)   
(Intercept)         -2.890e+02  7.822e+02  -0.369  0.72141   
total_people_income  1.013e-04  2.163e-05   4.683  0.00158 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1938 on 8 degrees of freedom
Multiple R-squared:  0.7327,    Adjusted R-squared:  0.6993 
F-statistic: 21.93 on 1 and 8 DF,  p-value: 0.001576
hist(residuals(model_2024))

For 2019: The multiple R-squared is 0.7308, indicating that approximately 73% of the variation in detainee counts is explained by the incoming traffic variable.

For 2024: The multiple R-squared is 0.7327, which similarly indicates around 73%.

df_filtered_bob <- df_merged_complete |>
  filter(total_people_income >= 1000)

Here I tried to filter as best as I can without losing valuable data. This is because states like Texas have detainee counts of over 14,000, while states like Montana have only about 2 detainees.

A simple scatter plot to show the relationship and any possible differences between both years.

ggplot(df_filtered_bob, aes(x = total_people_income, y = total_detainees)) +
  geom_point() +
  geom_smooth(method = "lm", color = "red", se = T) +
  facet_wrap(~ Year) +
  labs(
    title = "Relationship Between Incoming Traffic and Detainee Counts",
    x = "Total People (Incoming Traffic)",
    y = "Total Detainees"
  ) +
  theme_minimal() +
  scale_x_continuous(labels = comma)
`geom_smooth()` using formula = 'y ~ x'

My linear models seem to indicate that there is a relationship between the income from border traffic and the number of detainees, as stated on the ICE website . The southern border has more income traffic, and the southern states have higher numbers of incarcerated detainees. For this linear model, I did not filter out the outliers. Even though the numbers from 2019 appear to be greater than those of 2024 , there really isn’t much difference in who holds office. As long as there is an influx of people coming in from the neighboring countries (Canada and Mexico), the number of immigrants in these detention centers will increase. What is the alternative here? If the president’s numbers indicate that during their term they deported more illegal immigrants, they do not mention that the consequence for those who are detained would be longer incarceration.

For those who were in the non-criminal section, as I stated earlier, if they are deported and return, they will be arrested again and booked with a prior criminal record. I spent considerable time figuring out how to plot my final visualizations. My original option was to display the income numbers and detainee counts per state on a hexagonal tile map, but I did not have enough time to complete the code. Most of the time spent here was on cleaning and organizing my datasets. Based on many other variables and methods I did not consider, different results could have emerged. The linear regression models can be improved. Both linear models show a P-value well under 0.05 for the rise in detainees per border income traffic. Due to my mass aggregation of values from the datasets, especially the income crossing data, my plot seems to have anomalies and is spaced out.

In conclusion, my results do not answer my proposed questions, but they show that even if one presidents deportation stats is higher than the other, based on the income traffic, both heads of office seem to take almost equal measures on the detainees before eventually deported them from these facilities. I have to state that deportation statistics include many other groups aside from detained immigrants.

References and sources

  1. https://www.nytimes.com/interactive/2025/03/04/us/politics/trump-immigration-policies-deportations-data.html?campaign_id=9&emc=edit_nn_20250318&instance_id=150288&nl=the-morning&regi_id=274498566&segment_id=193736&user_id=bd0ffc61867e6371fa88bb15f0a96a48

  2. OpenAI. (2025, March 27). How U.S. presidents control the Mexican and Canadian borders [AI-generated response]. ChatGPT. https://chat.openai.com/

  3. OpenAI. (2025, March 27). Clarifying predictors in linear regression using RStudio [AI-generated response]. ChatGPT. https://chat.openai.com/