Setup and Libraries

knitr::opts_chunk$set(echo = TRUE, error = TRUE, cache=TRUE, autodep = TRUE, message = FALSE, progress_bar = FALSE)

library(readr)
library(tidyverse) 
library(dplyr) 
library(ggplot2) 
library(knitr) 
library(psych)
library(lme4)
library(boot)
library(ggeffects) 
library(DescTools)
library(tidycensus)
library(xml2)
library(tmap)
library(tmaptools)
library(leaflet)
library(sf)
library(leaflet.extras)
library(raster)
library(tigris)
library(sp)
library(scales)
library(stringi)
library(leaflegend)
library(readxl)
library(readr)
library(stringr)
library(janitor)
library(ggrepel)
setwd("/Users/eleanorprickettmorgan/Desktop/DeportationData/")

Importing Data

I’m skipping 6 because the first 6 rows of the excel file have summary information, but the data itself doesn’t actually start. Then I’m using Summary to check what my time range is, looking specifically at the Apprehension column.

AdminArrest <- read_excel("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO Admin Arrests_LESA-STU-FINAL Release_raw.xlsx", skip = 6)
View(AdminArrest)
summary(AdminArrest)
 Apprehension Date             Apprehension State Apprehension County Apprehension AOR   Final Program      Final Program Group Apprehension Method
 Min.   :2023-09-01 00:00:00   Length:377067      Mode:logical        Length:377067      Length:377067      Length:377067       Length:377067      
 1st Qu.:2024-06-19 12:29:00   Class :character   NA's:377067         Class :character   Class :character   Class :character    Class :character   
 Median :2025-03-12 13:50:00   Mode  :character                       Mode  :character   Mode  :character   Mode  :character    Mode  :character   
 Mean   :2024-12-31 02:24:47                                                                                                                       
 3rd Qu.:2025-07-14 11:13:21                                                                                                                       
 Max.   :2025-10-16 00:31:07                                                                                                                       
                                                                                                                                                   
 Apprehension Criminality Case Status        Case Category      Departed Date                 Departure Country  Final Order Yes No
 Length:377067            Length:377067      Length:377067      Min.   :1923-09-14 00:00:00   Length:377067      Length:377067     
 Class :character         Class :character   Class :character   1st Qu.:2024-10-02 00:00:00   Class :character   Class :character  
 Mode  :character         Mode  :character   Mode  :character   Median :2025-05-04 00:00:00   Mode  :character   Mode  :character  
                                                                Mean   :2025-02-18 23:21:58                                        
                                                                3rd Qu.:2025-08-04 00:00:00                                        
                                                                Max.   :2025-10-15 00:00:00                                        
                                                                NA's   :140929                                                     
 Final Order Date               Birth Date          Birth Year   Citizenship Country    Gender          Apprehension Site Landmark Alien File Number 
 Min.   :1967-01-06 00:00:00   Length:377067      Min.   :1932   Length:377067       Length:377067      Length:377067              Length:377067     
 1st Qu.:2019-09-26 00:00:00   Class :character   1st Qu.:1983   Class :character    Class :character   Class :character           Class :character  
 Median :2024-06-20 00:00:00   Mode  :character   Median :1991   Mode  :character    Mode  :character   Mode  :character           Mode  :character  
 Mean   :2021-06-04 17:40:28                      Mean   :1990                                                                                       
 3rd Qu.:2025-05-09 00:00:00                      3rd Qu.:1998                                                                                       
 Max.   :2025-10-16 00:00:00                      Max.   :2025                                                                                       
 NA's   :147872                                   NA's   :1                                                                                          
 EID Case ID        EID Subject ID     Unique Identifier 
 Length:377067      Length:377067      Length:377067     
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         

Cleaning the Data

Checking for Duplicates

To check for duplicates I’m going to look at the unique identifier column.

AdminArrest %>%
  group_by(`Unique Identifier`) %>%
  filter(n()==2)
NA

I get back 27,214 rows where the Unique identifier repeated itself. But I can tell that there are certain instances where the same identifier appears twice with a different apprehension date. So I’m going to try again looking at Apprehension Date in addition to the Unique Identifier column.

AdminArrest %>%
  group_by(`Unique Identifier`, `Apprehension Date`) %>%
  filter(n()==2)

When I filter by both of these things I get 3,304 rows. That suggests that maybe certain individuals were arrested twice, or have been in the system multiple times over the course of many years, this is closer to the true number of duplicates. So now I’m going to remove those potential duplicates using the distinct function.

cleaned_AdminArrest <- AdminArrest %>%
  distinct(`Unique Identifier`,`Apprehension Date`, .keep_all = TRUE)
View(cleaned_AdminArrest)

When I run this I now have 289,927 entries instead of the original 291722. I would probably still ask an expert who works with this data if there are other ways I might not be catching duplicates but this makes the most sense to me.

For later on I’m going to make a version of this national admin arrest data that I can merge with a shapefile for mapping.

Filtered_AdminArrest <- cleaned_AdminArrest %>%
  drop_na(`Apprehension AOR`)

View(Filtered_AdminArrest)
Mutate_AdminArrest <- Filtered_AdminArrest %>%
  filter(`Apprehension AOR`!= "HQ Area of Responsibility")  %>%
  mutate(ApprehensionAORcleaned1 = str_remove(`Apprehension AOR`,"Area of Responsibility")) %>%
  mutate(ApprehensionAORcleaned2 = str_remove(`ApprehensionAORcleaned1`,"HQ")) %>%
  mutate(ApprehensionAORcleaned3 = str_replace_all(`ApprehensionAORcleaned2`,"St. Paul", "St Paul"))
  

Mutate_AdminArrest %>%
  group_by(ApprehensionAORcleaned3) %>%
  tally()
NA

I’m going to use this for QGIS mapping, so this is me exporting this dataframe as a csv

write_csv(Mutate_AdminArrest, "/Users/eleanorprickettmorgan/Desktop/DeportationData/AdminArrestMapping.csv")

Checking for Consistency Issues

Apprehension State

To check for consistency issues, I’m going to look at the column names dealing primarily with characters. So for example for states I know that I want 50 ideally, and I would check by doing the following:

cleaned_AdminArrest %>%
  group_by(`Apprehension State`) %>%
  tally()

Here there are a couple consistency issues, some of the things included in States are three locations labeled “ARMED FORCES - EUROPE, ARMED FORCES - THE AMERICAS, ARMED SERVICES - THE PACIFIC.” I have no idea what differentiates armed forces vs. services, nor am I clear if these are arrests that took place on military bases or as a part of Military operations. Worth noting that we have 62 rows, instead of 50, which is partially because of territories under US control (i.e. Puerto Rico, Federated States of Micronesia, Guam, etc.) District of Columbia is also it’s own entry, and it’s not a state. There are also two lone entries in areas outside of US control, one just labeled “MEXICO,” the other labeled “TAMAULIPAS,” which is a territory in Mexico. Those seem like individual data entry issues or miscategorizations. Aside from that, the states themselves are spelled in consistent ways.

AOR (Area of Responsibility)

Now I’m going to perform a similar test on the AOR column for consistency. From the ICE.gov website I’m expecting 25 AOR’s.

cleaned_AdminArrest %>%
  group_by(`Apprehension AOR`) %>%
  tally()
NA

Here I got 27 rows back, one corresponded to N/A, with 5223 entries not assigned to a specific geography. There’s also a mysterious HQ Area of Responsibility, which is not listed online with only 50 entries. I would again have to ask someone who works with this data regularly what this means.

Final Program

Now I’ll do the same with the final program column:

cleaned_AdminArrest %>%
  group_by(`Final Program`) %>%
  tally()

Here the most obvious issue which populates first is what’s the difference between “287G Program” and “287g Task Force.” Because my project is in California which is (allegedly) not participating in 287g, I’m not worried about that. I would want to double check with an expert that some of these things do not refer to the same practical categories. I am concerned about the 487 Juveniles listed here.

Final Program Group

Because this is ICE data, I would assume everything in the final program group column is ICE, but I’m just double checking.

cleaned_AdminArrest %>%
  group_by(`Final Program Group`) %>%
  tally()

That produced just one row, so that’s consistent.

Apprehension Method

Now I’m checking for apprehension method:

cleaned_AdminArrest %>%
  group_by(`Apprehension Method`) %>%
  tally()

In terms of how things are literally spelled yes there is consistency. In terms of what the data is saying here, I do have a lot of questions. For example only one person is listed as arrested in the “Presented During Inspection” category, which feels like a contradiction with the courthouse and field office check-ins having so many individuals grabbed by ICE. However, those might be a different category of enforcement that’s included in another data set (i.e. detentions). This is probably my flawed understanding of the legal system here, but still something I want to note.

Apprehension Criminality

Now I’m going to check for apprehension criminality:

cleaned_AdminArrest %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

Here I get back only three rows, which is consistent with my understanding of folks are categorized in these situations.

Case Status

Now I’ll check for case status:

cleaned_AdminArrest %>%
  group_by(`Case Status`) %>%
  tally()

Here I get back 14 rows with categories listed. The first chunk are numbered starting at 0, but notably missing 1 and 2. I don’t know what those categories could be. There are 4462 who are not in a category at all, which is confusing because one category is literally just for active cases. There are also some acronyms I don’t understand, “9-VR Witnessed” I assume is voluntary removal witnessed, how is that different than voluntary departure? I’m concerned also about the 41 deaths listed. I’d also like to understand the difference between “L-Legalization - Permanent Residence Granted” and “Z-SAW - Permanent Residence Granted.”

Case Category

Now I’ll look at Case Category:

cleaned_AdminArrest %>%
  group_by(`Case Category`) %>%
  tally()

Here there are no glaring consistency issues except the 4412 uncategorized cases.

Departure Country

Now I’ll check for Departure Country:

cleaned_AdminArrest %>%
  group_by(`Departure Country`) %>%
  tally()

I got back 193 rows which is in line with the number of countries in the world (depending on who you ask, but at least in line with the number of countries the US recognizes). Clicking through I didn’t see any super weird duplicates. One row is N/A but from looking at the other columns in conjunction, I think a departure country is only entered when the departure happens. So the active cases don’t have a departure country listed.

Final Order Yes or No

Now I’ll check final order Yes or No (hopefully 2 rows).

cleaned_AdminArrest %>%
  group_by(`Final Order Yes No`) %>%
  tally()

Here we get back Yes, No, or Blank (4412 rows).

Citizenship Country

Now I’ll look at citizenship country

cleaned_AdminArrest %>%
  group_by(`Citizenship Country`) %>%
  tally()

My citizenship countries outnumber my departure countries by 3, I would want to compare those and see where I have additional citizenship but not departure. Notably no one here is uncategorized.

Gender

Now I’m checking for gender which I’m expecting 2 rows for:

cleaned_AdminArrest %>%
  group_by(Gender) %>%
  tally()

We get back three rows: Female, Male, and Unknown. Important to note that unknown is written in, and isn’t just a blank.

Apprehension Site Landmark

Now I’m checking for Apprehension Site Landmark. This is where I’m expecting the most inconsistencies.

cleaned_AdminArrest %>%
  group_by(`Apprehension Site Landmark`) %>%
  tally()

Here I get back 500 rows, and there are major differences in capitalization and usage of Dashes. Some include the state they were in, others do not. Some also includde the program they were in (297(g), CAP, etc.). For the purposes of my project I’d probably take the California subset and clean that, but this would likely be the most onerous part of working with this data.

Checking for Missing Data

Now to check for missing data, I’m going to use/refer to the original unclean data “AdminArrest”:

MissingData <-colSums(is.na(AdminArrest))
print(MissingData)
         Apprehension Date         Apprehension State        Apprehension County           Apprehension AOR              Final Program 
                         0                      60154                     377067                       5887                          0 
       Final Program Group        Apprehension Method   Apprehension Criminality                Case Status              Case Category 
                         0                          0                          0                       9372                       9373 
             Departed Date          Departure Country         Final Order Yes No           Final Order Date                 Birth Date 
                    140929                     140983                       9372                     147872                          1 
                Birth Year        Citizenship Country                     Gender Apprehension Site Landmark          Alien File Number 
                         1                          0                          0                       8767                       5410 
               EID Case ID             EID Subject ID          Unique Identifier 
                      9372                          0                       5410 

Additional Altering

Creating a New Column to Look at Time of Arrest

I want to mutate the apprehension date column so that I can filter later by year and month, and even potentially the time of day an administrative arrest took place.

new_AdminArrest <-cleaned_AdminArrest%>% 
  mutate(ApprehensionYear = year(`Apprehension Date`))%>% 
  mutate(ApprehensionMonth = month(`Apprehension Date`))%>% 
  mutate(ApprehensionDay = day(`Apprehension Date`))%>% 
  mutate(ApprehensionHour = hour(`Apprehension Date`))

View(new_AdminArrest)

Breaking out California and San Francisco Area of Responsibility Data.

Now I want to look at just administrative arrests in California, so I’m going to filter by the apprehension state column.

CA_AdminArrest <- new_AdminArrest %>%
  filter(`Apprehension State` == "CALIFORNIA")
View(CA_AdminArrest)

I want to filter that further to just the Administrative arrests in the SF Area of Responsibility, because even though that’s actually a lot larger than the geographic scope my project is looking at, it at least includes my area the Bay Area.

SFAOR_AdminArrest <- CA_AdminArrest %>% 
  filter(`Apprehension AOR` == "San Francisco Area of Responsibility")
View(SFAOR_AdminArrest)

Analysis

Adminsitrative Arrests Over Time

A really basic comparison I might make to start is comparing the number of Administrative Arrests by year.

SFAOR_AdminArrest %>% 
  group_by(ApprehensionYear) %>% 
  tally()

But the table I get here actually isn’t that informative given that I only have data from three years. But maybe I just want to see the change in time by month of just this new administration (2025), So I’d likely do the following:

SFAOR_AdminArrestbyMonth <-SFAOR_AdminArrest %>% 
  filter(ApprehensionYear == "2025") %>% 
  group_by(ApprehensionMonth) %>% 
  tally()
View(SFAOR_AdminArrestbyMonth)

Just looking at the table I see there’s an increase in June which held in July, but it might be helpful to visualize it, so I’m going to look at plot it out:

ggplot(data=SFAOR_AdminArrestbyMonth) +
  geom_line(aes(x=ApprehensionMonth, y=n)) +
  xlim(1,11) +
  scale_x_continuous(breaks = scales::breaks_extended(n = 11))+
  scale_y_continuous(limits = c(0,800), breaks = c(0,100,200,300,400,500,600,700,800))+
  labs( x = "Month", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Month") +
  theme(plot.title = element_text(hjust = 0.5))

NA

Administrative Arrests by Hour

Anecdotally, I’ve also heard this thing from a number of advocates that ICE actions (arrests) tend to happen in the early morning. I want to check if that’s true, so I’m going to look at the ApprehensionHour column that I mutated out of the original Apprehension Date column. It’s on a 24 hour clock (no am/pm). I’m going to use the CA_AdminArrest data because according to advocates this represents ICE policy so it should be true across the state.

CA_AdminArrest_byHour <- CA_AdminArrest %>%
  group_by(ApprehensionHour) %>%
  tally()

View(CA_AdminArrest_byHour)

To see that visually I’m going to chart it.

ggplot(data=CA_AdminArrest_byHour, aes(x = ApprehensionHour, y = n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Hour of the Day", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Hour of the Day from Sept 2023 through Oct 2025") +
  scale_x_continuous(n.breaks=23)

This shows that the number of arrests peaks betwen 9 and 10 am, and if the actual arrests are taking place at that hour we can infer that ICE/DHS had to be active before then. The majority of the activity seems to happen between 7am and 1pm.

Countries of Origin

Another thing I’m interested in is under the current administration, what are the top countries of origin for folks being arrested. I’m interested both statewide and for just the San Francisco AOR, but I’ll start with just California

CaliforniaCitizenshipCountry2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(CaliforniaCitizenshipCountry2025)

When I do this and sort by values my top 10 citizenship countries are Mexico, Guatemala, El Salvador, Colombia, Honduras, India, Venezuela, China, Nicaragua and Peru. To compare this to the SFAOR I’m doing a similar analysis

SFAORCitizenshipCountry2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(SFAORCitizenshipCountry2025)

When I sort this table by the count, I get slightly different results, but there is a lot of overlap. In the San Francisco Area of Responsibility the top 10 citizenship countries for arrested folks are Mexico, India, Guatemala, El Salvador, Colombia, Honduras, Peru, Nicaragua, Venezuela, and China. Mexico notably outpaces the rest of these places in number of arrests.

Criminality

Something else I’m interested in is the “criminality” of those arrested in the San Francisco Area of responsibility. Again I want to look just in the context of the current administration. Those who are arrested are assigned to one of three categories of criminality, but I’m interested in the proportion that those categories show up in arrests.

SFAORArrestCriminality2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORArrestCriminality2025)

I want to see this visually, so even though I’m not a fan of pie charsts I’m going to look at it in a pie chart.

ggplot(SFAORArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality SFAOR 2025") +
  theme_void()

For those arrested the majority are convicted criminals, according to the data, but it’s notable that a sizable chunk are categorized under “other immigration violation.” Arrests are supposed to represent individuals have a real case against them, so the convicted criminal category should represented the majority. And these arrests are not the totality of deportations, if you look at the deportation data set in addition to this arrests data, you would find the majority of folks coming in to contact with ICE/DHS do not have any criminal convictions. It’s also worth noting that the convicted criminal category doesn’t break down further into what crimes folks were convicted of. Other reporting has confirmed that there are many folks who end up arrested where the basis of their conviction is a traffic violation. I talked to one lawyer in the Bay Area who confirmed they had a client where a prior conviction came in the form of a citation for failing to pay for BART. Are those crimes the same as murder? No, but under this categorization they’re flattened into one category together.

Just to make sure my analysis isn’t off I’m going to compare SF in 2025 to all of California.

CA_ArrestCriminality2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(CA_ArrestCriminality2025)

Here is that information visualized:

ggplot(CA_ArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality All of California 2025") +
  theme_void()

The proportions I get here are very different. Other Immigration Violator is nearly the same value as Convicted Criminal. I’m curious as to what’s at the root of this discrepancy in San Francisco. I do wonder if this has something to do with the lack of indiscriminate raids in the Bay Area, which means that in places like LA there is a much higher proportion of people who are being swept up by this system.

I want to do a visualization showing how arrest criminality changes by month under this administration so I’m going to filter to make multiple tables:

SFAORJan25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "1") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJan25Crim)
SFAORFeb25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "2") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORFeb25Crim)
SFAORMar25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "3") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMar25Crim)
SFAORApr25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "4") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORApr25Crim)
SFAORMay25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "5") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMay25Crim)
SFAORJun25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "6") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJun25Crim)
SFAORJul25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "7") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJul25Crim)
SFAORAug25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "8") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORAug25Crim)
SFAORSept25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "9") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORSept25Crim)
SFAOROct25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "10") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAOROct25Crim)

Arrest Pickup Location and Apprehension Method

SFAOR_AdminArrest2025 <-SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Method`,`Apprehension Site Landmark`)

view(SFAOR_AdminArrest2025)

I want to look at where the apprehension method is 287g.

SFAOR_AdminArrest2025_287g <-SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "287(g) Program")

view(SFAOR_AdminArrest2025_287g)

I want to look at apprehension method involving local incarceration

SFAOR_AdminArrest2025_local <- SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "CAP Local Incarceration")

View (SFAOR_AdminArrest2025_local)

Now I want a tally of apprehension site landmarks.

SFAPLandmarks <- SFAOR_AdminArrest2025 %>%
  group_by(`Apprehension Site Landmark`)%>%
  tally()

view(SFAPLandmarks)

These are unfortunately vague, and they’re worth looking at for me personally, but they do not make for a great visualization.

Mapping Arrest AOR against detention facilities

I’m going to start by taking a shape file of the AOR facilities and reading it in.

ICE_AOR <- st_read("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO files/ice-aor-shp/ice-aor-shp.shp")
Reading layer `ice-aor-shp' from data source `/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO files/ice-aor-shp/ice-aor-shp.shp' using driver `ESRI Shapefile'
Simple feature collection with 26 features and 2 fields (with 1 geometry empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1467 ymin: 13.23419 xmax: 179.7785 ymax: 71.38782
Geodetic CRS:  NAD83
View(ICE_AOR)
ICE_AOR_CLEAN <- ICE_AOR %>%
  drop_na(offc_nm)

View(ICE_AOR_CLEAN)

This I’m going to use to Map in QGIS

st_write(ICE_AOR_CLEAN, "ICE_AOR_CLEAN.shp")
Layer ICE_AOR_CLEAN in dataset ICE_AOR_CLEAN.shp already exists:
use either append=TRUE to append to layer or append=FALSE to overwrite layer
Error: Dataset already exists.

Demographics in SFAOR

Gender Breakdown.

SFAORArrestGender2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Gender`) %>%
  tally() 

View(SFAORArrestGender2025)
ggplot(SFAORArrestGender2025, aes(x="Gender", y=n, fill=`Gender`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Arrest Gender SFAOR 2025") +
  theme_void()

Age Breakdown.

SFAOR_AdminArrest_Age2025 <- SFAOR_AdminArrest %>%
  mutate(Age = 2025 - `Birth Year`) %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(Age)  %>%
  tally()

View(SFAOR_AdminArrest_Age2025)

This table is not so useful, I think I’d like to see it visually.

ggplot(data=SFAOR_AdminArrest_Age2025, aes(x = Age, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Age Range", y = "Number of Arrests") +
  ggtitle("Arrests by Age SFAOR") +
  scale_x_continuous(n.breaks=20)

summary(SFAOR_AdminArrest_Age2025$Age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00   22.25   40.50   40.59   58.75   84.00 

How Many Arrests result in Deportation

SFAORDepartureCountry <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Departure Country`) %>%
  tally()
ggplot(data=SFAORDepartureCountry, aes(x = `Departure Country`, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Country", y = "Number of Arrests") +
  ggtitle("Departure Country SFAOR 2025") 

SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Case Status`) %>%
  tally()
---
title: "J221 EPM Final Project — Administrative Arrests"
author: "Ellie Prickett-Morgan"
date: "2025-12-10"
output: html_notebook
---

## Setup and Libraries

```{r}
knitr::opts_chunk$set(echo = TRUE, error = TRUE, cache=TRUE, autodep = TRUE, message = FALSE, progress_bar = FALSE)

library(readr)
library(tidyverse) 
library(dplyr) 
library(ggplot2) 
library(knitr) 
library(psych)
library(lme4)
library(boot)
library(ggeffects) 
library(DescTools)
library(tidycensus)
library(xml2)
library(tmap)
library(tmaptools)
library(leaflet)
library(sf)
library(leaflet.extras)
library(raster)
library(tigris)
library(sp)
library(scales)
library(stringi)
library(leaflegend)
library(readxl)
library(readr)
library(stringr)
library(janitor)
library(ggrepel)
```

```{r}
setwd("/Users/eleanorprickettmorgan/Desktop/DeportationData/")
```

## Importing Data

I'm skipping 6 because the first 6 rows of the excel file have summary information, but the data itself doesn't actually start. Then I'm using Summary to check what my time range is, looking specifically at the Apprehension column. 

```{r}
AdminArrest <- read_excel("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO Admin Arrests_LESA-STU-FINAL Release_raw.xlsx", skip = 6)
View(AdminArrest)
summary(AdminArrest)
```

## Cleaning the Data

### Checking for Duplicates

To check for duplicates I'm going to look at the unique identifier column. 

```{r}
AdminArrest %>%
  group_by(`Unique Identifier`) %>%
  filter(n()==2)

```

I get back 27,214 rows where the Unique identifier repeated itself. But I can tell that there are certain instances where the same identifier appears twice with a different apprehension date. So I'm going to try again looking at Apprehension Date in addition to the Unique Identifier column. 

```{r}
AdminArrest %>%
  group_by(`Unique Identifier`, `Apprehension Date`) %>%
  filter(n()==2)
```

When I filter by both of these things I get 3,304 rows. That suggests that maybe certain individuals were arrested twice, or have been in the system multiple times over the course of many years, this is closer to the true number of duplicates. So now I'm going to remove those potential duplicates using the distinct function. 

```{r}
cleaned_AdminArrest <- AdminArrest %>%
  distinct(`Unique Identifier`,`Apprehension Date`, .keep_all = TRUE)
View(cleaned_AdminArrest)
```

When I run this I now have 289,927 entries instead of the original 291722. I would probably still ask an expert who works with this data if there are other ways I might not be catching duplicates but this makes the most sense to me.

For later on I'm going to make a version of this national admin arrest data that I can merge with a shapefile for mapping. 

```{r}
Filtered_AdminArrest <- cleaned_AdminArrest %>%
  drop_na(`Apprehension AOR`)

View(Filtered_AdminArrest)
```

```{r}
Mutate_AdminArrest <- Filtered_AdminArrest %>%
  filter(`Apprehension AOR`!= "HQ Area of Responsibility")  %>%
  mutate(ApprehensionAORcleaned1 = str_remove(`Apprehension AOR`,"Area of Responsibility")) %>%
  mutate(ApprehensionAORcleaned2 = str_remove(`ApprehensionAORcleaned1`,"HQ")) %>%
  mutate(ApprehensionAORcleaned3 = str_replace_all(`ApprehensionAORcleaned2`,"St. Paul", "St Paul"))
  

Mutate_AdminArrest %>%
  group_by(ApprehensionAORcleaned3) %>%
  tally()

```

I'm going to use this for QGIS mapping, so this is me exporting this dataframe as a csv

```{r}
write_csv(Mutate_AdminArrest, "/Users/eleanorprickettmorgan/Desktop/DeportationData/AdminArrestMapping.csv")
```


### Checking for Consistency Issues

#### Apprehension State

To check for consistency issues, I'm going to look at the column names dealing primarily with characters. So for example for states I know that I want 50 ideally, and I would check by doing the following:

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension State`) %>%
  tally()
```

Here there are a couple consistency issues, some of the things included in States are three locations labeled "ARMED FORCES - EUROPE, ARMED FORCES - THE AMERICAS, ARMED SERVICES - THE PACIFIC." I have no idea what differentiates armed forces vs. services, nor am I clear if these are arrests that took place on military bases or as a part of Military operations. Worth noting that we have 62 rows, instead of 50, which is partially because of territories under US control (i.e. Puerto Rico, Federated States of Micronesia, Guam, etc.) District of Columbia is also it's own entry, and it's not a state. There are also two lone entries in areas outside of US control, one just labeled "MEXICO," the other labeled "TAMAULIPAS," which is a territory in Mexico. Those seem like individual data entry issues or miscategorizations. Aside from that, the states themselves are spelled in consistent ways. 

#### AOR (Area of Responsibility)

Now I'm going to perform a similar test on the AOR column for consistency. From the ICE.gov website I'm expecting 25 AOR's.

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension AOR`) %>%
  tally()

```

Here I got 27 rows back, one corresponded to N/A, with 5223 entries not assigned to a specific geography. There's also a mysterious HQ Area of Responsibility, which is not listed online with only 50 entries. I would again have to ask someone who works with this data regularly what this means. 

#### Final Program

Now I'll do the same with the final program column: 

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Program`) %>%
  tally()
```

Here the most obvious issue which populates first is what's the difference between "287G Program" and "287g Task Force." Because my project is in California which is (allegedly) not participating in 287g, I'm not worried about that. I would want to double check with an expert that some of these things do not refer to the same practical categories. I am concerned about the 487 Juveniles listed here.

#### Final Program Group

Because this is ICE data, I would assume everything in the final program group column is ICE, but I'm just double checking. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Program Group`) %>%
  tally()
```
That produced just one row, so that's consistent. 

#### Apprehension Method

Now I'm checking for apprehension method:

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Method`) %>%
  tally()
```

In terms of how things are literally spelled yes there is consistency. In terms of what the data is saying here, I do have a lot of questions. For example only one person is listed as arrested in the "Presented During Inspection" category, which feels like a contradiction with the courthouse and field office check-ins having so many individuals grabbed by ICE. However, those might be a different category of enforcement that's included in another data set (i.e. detentions). This is probably my flawed understanding of the legal system here, but still something I want to note. 

#### Apprehension Criminality

Now I'm going to check for apprehension criminality: 

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Criminality`) %>%
  tally()
```

Here I get back only three rows, which is consistent with my understanding of folks are categorized in these situations. 

#### Case Status

Now I'll check for case status:

```{r}
cleaned_AdminArrest %>%
  group_by(`Case Status`) %>%
  tally()
```

Here I get back 14 rows with categories listed. The first chunk are numbered starting at 0, but notably missing 1 and 2. I don't know what those categories could be. There are 4462 who are not in a category at all, which is confusing because one category is literally just for active cases. There are also some acronyms I don't understand, "9-VR Witnessed" I assume is voluntary removal witnessed, how is that different than voluntary departure? I'm concerned also about the 41 deaths listed. I'd also like to understand the difference between "L-Legalization - Permanent Residence Granted" and	"Z-SAW - Permanent Residence Granted."

#### Case Category

Now I'll look at Case Category: 

```{r}
cleaned_AdminArrest %>%
  group_by(`Case Category`) %>%
  tally()
```
Here there are no glaring consistency issues except the 4412 uncategorized cases. 

#### Departure Country

Now I'll check for Departure Country:

```{r}
cleaned_AdminArrest %>%
  group_by(`Departure Country`) %>%
  tally()
```

I got back 193 rows which is in line with the number of countries in the world (depending on who you ask, but at least in line with the number of countries the US recognizes). Clicking through I didn't see any super weird duplicates. One row is N/A but from looking at the other columns in conjunction, I think a departure country is only entered when the departure happens. So the active cases don't have a departure country listed. 

#### Final Order Yes or No

Now I'll check final order Yes or No (hopefully 2 rows).

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Order Yes No`) %>%
  tally()
```
Here we get back Yes, No, or Blank (4412 rows).

#### Citizenship Country

Now I'll look at citizenship country

```{r}
cleaned_AdminArrest %>%
  group_by(`Citizenship Country`) %>%
  tally()
```

My citizenship countries outnumber my departure countries by 3, I would want to compare those and see where I have additional citizenship but not departure. Notably no one here is uncategorized. 

#### Gender

Now I'm checking for gender which I'm expecting 2 rows for:
```{r}
cleaned_AdminArrest %>%
  group_by(Gender) %>%
  tally()
```

We get back three rows: Female, Male, and Unknown. Important to note that unknown is written in, and isn't just a blank. 

#### Apprehension Site Landmark

Now I'm checking for Apprehension Site Landmark. This is where I'm expecting the most inconsistencies.

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Site Landmark`) %>%
  tally()
```

Here I get back 500 rows, and there are major differences in capitalization and usage of Dashes. Some include the state they were in, others do not. Some also includde the program they were in (297(g), CAP, etc.). For the purposes of my project I'd probably take the California subset and clean that, but this would likely be the most onerous part of working with this data. 

### Checking for Missing Data

Now to check for missing data, I'm going to use/refer to the original unclean data "AdminArrest":

```{r}
MissingData <-colSums(is.na(AdminArrest))
print(MissingData)
```

### Additional Altering

#### Creating a New Column to Look at Time of Arrest

I want to mutate the apprehension date column so that I can filter later by year and month, and even potentially the time of day an administrative arrest took place.

```{r}
new_AdminArrest <-cleaned_AdminArrest%>% 
  mutate(ApprehensionYear = year(`Apprehension Date`))%>% 
  mutate(ApprehensionMonth = month(`Apprehension Date`))%>% 
  mutate(ApprehensionDay = day(`Apprehension Date`))%>% 
  mutate(ApprehensionHour = hour(`Apprehension Date`))

View(new_AdminArrest)

```

#### Breaking out California and San Francisco Area of Responsibility Data.

Now I want to look at just administrative arrests in California, so I'm going to filter by the apprehension state column.

```{r}
CA_AdminArrest <- new_AdminArrest %>%
  filter(`Apprehension State` == "CALIFORNIA")
View(CA_AdminArrest)

```

I want to filter that further to just the Administrative arrests in the SF Area of Responsibility, because even though that's actually a lot larger than the geographic scope my project is looking at, it at least includes my area the Bay Area. 

```{r}
SFAOR_AdminArrest <- CA_AdminArrest %>% 
  filter(`Apprehension AOR` == "San Francisco Area of Responsibility")
View(SFAOR_AdminArrest)
```

## Analysis

### Adminsitrative Arrests Over Time

A really basic comparison I might make to start is comparing the number of Administrative Arrests by year.

```{r}
SFAOR_AdminArrest %>% 
  group_by(ApprehensionYear) %>% 
  tally()
```

But the table I get here actually isn't that informative given that I only have data from three years. But maybe I just want to see the change in time by month of just this new administration (2025), So I'd likely do the following:

```{r}
SFAOR_AdminArrestbyMonth <-SFAOR_AdminArrest %>% 
  filter(ApprehensionYear == "2025") %>% 
  group_by(ApprehensionMonth) %>% 
  tally()
View(SFAOR_AdminArrestbyMonth)

```

Just looking at the table I see there's an increase in June which held in July, but it might be helpful to visualize it, so I'm going to look at plot it out: 

```{r}
ggplot(data=SFAOR_AdminArrestbyMonth) +
  geom_line(aes(x=ApprehensionMonth, y=n)) +
  xlim(1,11) +
  scale_x_continuous(breaks = scales::breaks_extended(n = 11))+
  scale_y_continuous(limits = c(0,800), breaks = c(0,100,200,300,400,500,600,700,800))+
  labs( x = "Month", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Month") +
  theme(plot.title = element_text(hjust = 0.5))
  
```


#### Administrative Arrests by Hour

Anecdotally, I've also heard this thing from a number of advocates that ICE actions (arrests) tend to happen in the early morning. I want to check if that's true, so I'm going to look at the ApprehensionHour column that I mutated out of the original Apprehension Date column. It's on a 24 hour clock (no am/pm). I'm going to use the CA_AdminArrest data because according to advocates this represents ICE policy so it should be true across the state. 

```{r}
CA_AdminArrest_byHour <- CA_AdminArrest %>%
  group_by(ApprehensionHour) %>%
  tally()

View(CA_AdminArrest_byHour)

```

To see that visually I'm going to chart it. 

```{r}
ggplot(data=CA_AdminArrest_byHour, aes(x = ApprehensionHour, y = n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Hour of the Day", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Hour of the Day from Sept 2023 through Oct 2025") +
  scale_x_continuous(n.breaks=23)
```
This shows that the number of arrests peaks betwen 9 and 10 am, and if the actual arrests are taking place at that hour we can infer that ICE/DHS had to be active before then. The majority of the activity seems to happen between 7am and 1pm.

#### Countries of Origin

Another thing I'm interested in is under the current administration, what are the top countries of origin for folks being arrested. I'm interested both statewide and for just the San Francisco AOR, but I'll start with just California

```{r}
CaliforniaCitizenshipCountry2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(CaliforniaCitizenshipCountry2025)
```

When I do this and sort by values my top 10 citizenship countries are Mexico, Guatemala, El Salvador, Colombia, Honduras, India, Venezuela, China, Nicaragua and Peru. To compare this to the SFAOR I'm doing a similar analysis


```{r}
SFAORCitizenshipCountry2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(SFAORCitizenshipCountry2025)
```

When I sort this table by the count, I get slightly different results, but there is a lot of overlap. In the San Francisco Area of Responsibility the top 10 citizenship countries for arrested folks are Mexico, India, Guatemala, El Salvador, Colombia, Honduras, Peru, Nicaragua, Venezuela, and China. Mexico notably outpaces the rest of these places in number of arrests. 

#### Criminality

Something else I'm interested in is the "criminality" of those arrested in the San Francisco Area of responsibility. Again I want to look just in the context of the current administration. Those who are arrested are assigned to one of three categories of criminality, but I'm interested in the proportion that those categories show up in arrests. 

```{r}
SFAORArrestCriminality2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORArrestCriminality2025)
```

I want to see this visually, so even though I'm not a fan of pie charsts I'm going to look at it in a pie chart.

```{r}
ggplot(SFAORArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality SFAOR 2025") +
  theme_void()
```


For those arrested the majority are convicted criminals, according to the data, but it's notable that a sizable chunk are categorized under "other immigration violation." Arrests are supposed to represent individuals have a real case against them, so the convicted criminal category should represented the majority. And these arrests are not the totality of deportations, if you look at the deportation data set in addition to this arrests data, you would find the majority of folks coming in to contact with ICE/DHS do not have any criminal convictions. It's also worth noting that the convicted criminal category doesn't break down further into what crimes folks were convicted of. Other reporting has confirmed that there are many folks who end up arrested where the basis of their conviction is a traffic violation. I talked to one lawyer in the Bay Area who confirmed they had a client where a prior conviction came in the form of a citation for failing to pay for BART. Are those crimes the same as murder? No, but under this categorization they're flattened into one category together. 

Just to make sure my analysis isn't off I'm going to compare SF in 2025 to all of California.

```{r}
CA_ArrestCriminality2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(CA_ArrestCriminality2025)
```

Here is that information visualized:

```{r}
ggplot(CA_ArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality All of California 2025") +
  theme_void()
```

The proportions I get here are very different. Other Immigration Violator is nearly the same value as Convicted Criminal. I'm curious as to what's at the root of this discrepancy in San Francisco. I do wonder if this has something to do with the lack of indiscriminate raids in the Bay Area, which means that in places like LA there is a much higher proportion of people who are being swept up by this system.

I want to do a visualization showing how arrest criminality changes by month under this administration so I'm going to filter to make multiple tables:

```{r}
SFAORJan25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "1") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJan25Crim)
```

```{r}
SFAORFeb25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "2") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORFeb25Crim)
```

```{r}
SFAORMar25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "3") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMar25Crim)
```

```{r}
SFAORApr25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "4") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORApr25Crim)
```

```{r}
SFAORMay25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "5") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMay25Crim)
```

```{r}
SFAORJun25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "6") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJun25Crim)
```

```{r}
SFAORJul25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "7") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJul25Crim)
```

```{r}
SFAORAug25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "8") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORAug25Crim)
```

```{r}
SFAORSept25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "9") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORSept25Crim)
```

```{r}
SFAOROct25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "10") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAOROct25Crim)
```


#### Arrest Pickup Location and Apprehension Method

```{r}
SFAOR_AdminArrest2025 <-SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Method`,`Apprehension Site Landmark`)

view(SFAOR_AdminArrest2025)
```

I want to look at where the apprehension method is 287g.

```{r}
SFAOR_AdminArrest2025_287g <-SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "287(g) Program")

view(SFAOR_AdminArrest2025_287g)
```

I want to look at apprehension method involving local incarceration

```{r}
SFAOR_AdminArrest2025_local <- SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "CAP Local Incarceration")

View (SFAOR_AdminArrest2025_local)
```

Now I want a tally of apprehension site landmarks.

```{r}
SFAPLandmarks <- SFAOR_AdminArrest2025 %>%
  group_by(`Apprehension Site Landmark`)%>%
  tally()

view(SFAPLandmarks)

```

These are unfortunately vague, and they're worth looking at for me personally, but they do not make for a great visualization. 

#### Mapping Arrest AOR against detention facilities

I'm going to start by taking a shape file of the AOR facilities and reading it in.

```{r}
ICE_AOR <- st_read("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO files/ice-aor-shp/ice-aor-shp.shp")
View(ICE_AOR)
```
```{r}
ICE_AOR_CLEAN <- ICE_AOR %>%
  drop_na(offc_nm)

View(ICE_AOR_CLEAN)
```

This I'm going to use to Map in QGIS

```{r}
st_write(ICE_AOR_CLEAN, "ICE_AOR_CLEAN.shp")
```

#### Demographics in SFAOR

Gender Breakdown.

```{r}
SFAORArrestGender2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Gender`) %>%
  tally() 

View(SFAORArrestGender2025)

```

```{r}
ggplot(SFAORArrestGender2025, aes(x="Gender", y=n, fill=`Gender`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Arrest Gender SFAOR 2025") +
  theme_void()
```


Age Breakdown.

```{r}
SFAOR_AdminArrest_Age2025 <- SFAOR_AdminArrest %>%
  mutate(Age = 2025 - `Birth Year`) %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(Age)  %>%
  tally()

View(SFAOR_AdminArrest_Age2025)
```

This table is not so useful, I think I'd like to see it visually.

```{r}
ggplot(data=SFAOR_AdminArrest_Age2025, aes(x = Age, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Age Range", y = "Number of Arrests") +
  ggtitle("Arrests by Age SFAOR") +
  scale_x_continuous(n.breaks=20)
```

```{r}
summary(SFAOR_AdminArrest_Age2025$Age)
```


#### How Many Arrests result in Deportation

```{r}
SFAORDepartureCountry <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Departure Country`) %>%
  tally()
```

```{r}
ggplot(data=SFAORDepartureCountry, aes(x = `Departure Country`, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Country", y = "Number of Arrests") +
  ggtitle("Departure Country SFAOR 2025") 
```


```{r}
SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Case Status`) %>%
  tally()
```

