Immigration Enforcement Patterns and Case Outcomes in the United States
Author
Flioria Akesse
Introduction
This study investigates trends in immigration enforcement, threat levels, and case results in the United States through the use of data from ICE Enforcement and Removal Operations for the fiscal year 2026.
The database consists of information on administrative immigration arrests carried out between October 1, 2025 and March 10, 2026. Every observation in the database pertains to an individual case of immigration arrest and comprises variables that pertain to demographics, location of arrest, modes of arrest, criminality level, and case result.
The objective of this study is to investigate the connection between demographic and geographic characteristics and increased threat levels and case results.
Dataset Source
Data Source:
Immigration and Customs Enforcement (ICE), Enforcement and Removal Operations dataset.
The dataset contains approximately 191,000 observations and 25 variables.
Variables and Research Questions
Both categorical and quantitative variables are included in this dataset.
Some of the categorical variables include: - Place of arrest - Country of citizenship - Gender - Threat category - Criminality classification - Method of apprehension - Status of the case
Some of the quantitative or date variables include: - Birth year - Birth date - Arrest date - Date of final order
Other variables like age at the time of arrest and case processing time would be generated during the data cleaning process.
Questions addressed by this research include: - What are the states where immigration arrests happen the most? - Is there any association between threat categories and criminality classifications? - Do demographic variables influence case outcomes? - Which variables influence case processing time?
Data Collection Methodology
The data set was compiled and released through the Deportation Data Project based on Immigration and Customs Enforcement (ICE) Enforcement and Removal Operations data.
Even though there is some information provided about the variables in the dataset in the documentation for the data, information about the method used to compile and report the data is scant.
Why I Chose This Dataset
The reason for selecting this dataset is that the topic is a significant public policy problem in the United States. Using this dataset, one can explore the realities of demographics, geography, and law from a statistical and visualization perspective.
I wanted to analyze the effects of various factors on the outcome and classification of the immigration cases. The large sample size makes the dataset suitable for regression analysis and visualization in addition to data cleaning with the help of R.
Load Libraries and Dataset
# Load librarieslibrary(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'ggplot2' was built under R version 4.5.3
Warning: package 'tidyr' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'purrr' was built under R version 4.5.3
Warning: package 'stringr' was built under R version 4.5.3
Warning: package 'forcats' was built under R version 4.5.3
Warning: package 'lubridate' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: package 'plotly' was built under R version 4.5.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(readxl)
Warning: package 'readxl' was built under R version 4.5.3
# A tibble: 6 × 25
`Apprehension Date` `Apprehension Type` State County `TOA Current Duty AOR`
<dttm> <chr> <chr> <lgl> <chr>
1 2025-11-16 14:21:59 Targeted TEXAS NA San Antonio Area of R…
2 2025-11-05 14:29:30 Collateral TEXAS NA San Antonio Area of R…
3 2025-12-14 08:18:33 Targeted UTAH NA Salt Lake City Area o…
4 2026-02-14 10:27:08 Targeted FLORIDA NA Miami Area of Respons…
5 2025-11-19 07:44:46 Targeted ILLINOIS NA Chicago Area of Respo…
6 2025-10-09 14:24:47 Targeted TEXAS NA El Paso Area of Respo…
# ℹ 20 more variables: `Apprehension Final Program` <chr>,
# `Arresting Agency` <chr>, `Apprehension Method` <chr>,
# `Apprehension Criminality` <chr>, `Case Status` <chr>,
# `Case Category` <chr>, `Departure Country` <chr>,
# `Final Order Yes No` <chr>, `Birth Date` <chr>, `Birth Year` <dbl>,
# `Citizenship Country` <chr>, Gender <chr>, `Departed Date` <dttm>,
# `Final Order Date` <dttm>, `Apprehension Site Landmark` <chr>, …
# Number of arrests by statestate_summary <- ice_clean %>%group_by(State) %>%summarize(Total_Arrests =n() ) %>%arrange(desc(Total_Arrests))head(state_summary)
# A tibble: 6 × 2
State Total_Arrests
<chr> <int>
1 TEXAS 48231
2 FLORIDA 17713
3 CALIFORNIA 14373
4 NEW YORK 7020
5 GEORGIA 6205
6 NEW JERSEY 5875
Visualization 1: Immigration Arrests by State
# Bar chart of arrests by statetop_states <- state_summary %>%slice_head(n =10)ggplot(top_states,aes(x =reorder(State, Total_Arrests),y = Total_Arrests,fill = State)) +geom_col() +coord_flip() +labs(title ="Top 10 States with the Highest Number of Immigration Arrests",x ="State",y ="Number of Arrests",caption ="Source: ICE Enforcement and Removal Operations Data" ) +theme_minimal() +scale_fill_brewer(palette ="Paired")
This graph depicts the top 10 states with the most immigration administrative arrests. Texas was ranked first in terms of the most arrests, followed by Florida and California. From the graph, it can be concluded that there is immigration enforcement in certain geographic locations within the United States.
Visualization 2: Threat Level by Gender
# Create summary for threat level by gendergender_threat <- ice_clean %>%group_by(Gender, `Case Threat Level`) %>%summarize(Count =n() )
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by Gender and Case Threat Level.
ℹ Output is grouped by Gender.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(Gender, Case Threat Level))` for per-operation
grouping (`?dplyr::dplyr_by`) instead.
This interactive visualization compares immigration case threat levels across gender categories. The chart allows viewers to explore the distribution of threat classifications interactively. Most cases appear within lower threat categories, while higher threat classifications occur less frequently.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `Threat_Level_Numeric = as.numeric(`Case Threat Level`)`.
Caused by warning:
! NAs introduced by coercion
# Multiple linear regression modelmodel <-lm( Threat_Level_Numeric ~ Age + Gender,data = regression_data)# Model summarysummary(model)
Call:
lm(formula = Threat_Level_Numeric ~ Age + Gender, data = regression_data)
Residuals:
Min 1Q Median 3Q Max
-1.54414 -0.85333 -0.02363 0.84391 1.71433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9036613 0.0205200 141.504 < 2e-16 ***
Age -0.0189221 0.0003371 -56.128 < 2e-16 ***
GenderMale -0.1420647 0.0164010 -8.662 < 2e-16 ***
GenderUnknown -0.3870093 0.0505684 -7.653 1.99e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8444 on 51974 degrees of freedom
(130769 observations deleted due to missingness)
Multiple R-squared: 0.0597, Adjusted R-squared: 0.05964
F-statistic: 1100 on 3 and 51974 DF, p-value: < 2.2e-16
Multiple regression analysis was performed to determine the correlation between the variables age, gender, and threat level. From the results of the multiple regression, it can be seen that both the variables age and gender are statistically significant factors affecting the threat level, since the p-values are less than 0.05. As per the adjusted R squared values, it can be said that the model accounts for a small percentage of the variation in the threat level categories.
# Diagnostic plots for regression modelpar(mfrow =c(2,2))plot(model)
# Export cleaned dataset for Tableauwrite.csv(ice_clean,"ice_clean.csv",row.names =FALSE)
Immigration enforcement has become a major public policy issue in the United States. According to the American Immigration Council, immigration enforcement policies affect millions of individuals and families and have major economic and social impacts across communities. Researchers continue to study how immigration enforcement patterns vary geographically and how enforcement priorities influence arrest outcomes and detention practices.
The purpose of this research project is to understand patterns of immigration enforcement, threats classifications, and outcomes using ICE Enforcement and Removal Operations data. In this research, it can be concluded that immigration arrests take place in a few number of states, particularly in Texas and Florida. Visualization also provided a comparison between case threat levels based on gender.
From the regression results, it can be observed that age and gender significantly predict the outcome of the threat classification. However, the regression equation explains a small part of the variance within the data set. Hence, there might be other independent variables that affect the outcomes of the immigration cases.
Some of the weaknesses identified from this project are related to some missing information among the variables used, and there is very little information provided about how the original database was constructed.