Setup and Libraries

knitr::opts_chunk$set(echo = TRUE, error = TRUE, cache=TRUE, autodep = TRUE, message = FALSE, progress_bar = FALSE)

library(readr)
library(tidyverse) 
library(dplyr) 
library(ggplot2) 
library(knitr) 
library(psych)
library(lme4)
library(boot)
library(ggeffects) 
library(DescTools)
library(tidycensus)
library(xml2)
library(tmap)
library(tmaptools)
library(leaflet)
library(sf)
library(leaflet.extras)
library(raster)
library(tigris)
library(sp)
library(scales)
library(stringi)
library(leaflegend)
library(readxl)
library(readr)
library(stringr)
library(janitor)
library(ggrepel)
setwd("/Users/eleanorprickettmorgan/Desktop/DeportationData/")

Importing Data

I’m skipping 6 because the first 6 rows of the excel file have summary information, but the data itself doesn’t actually start. Then I’m using Summary to check what my time range is, looking specifically at the Apprehension column.

AdminArrest <- read_excel("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO Admin Arrests_LESA-STU-FINAL Release_raw.xlsx", skip = 6)
View(AdminArrest)
summary(AdminArrest)
 Apprehension Date             Apprehension State Apprehension County Apprehension AOR   Final Program      Final Program Group Apprehension Method Apprehension Criminality Case Status        Case Category     
 Min.   :2023-09-01 00:00:00   Length:377067      Mode:logical        Length:377067      Length:377067      Length:377067       Length:377067       Length:377067            Length:377067      Length:377067     
 1st Qu.:2024-06-19 12:29:00   Class :character   NA's:377067         Class :character   Class :character   Class :character    Class :character    Class :character         Class :character   Class :character  
 Median :2025-03-12 13:50:00   Mode  :character                       Mode  :character   Mode  :character   Mode  :character    Mode  :character    Mode  :character         Mode  :character   Mode  :character  
 Mean   :2024-12-31 02:24:47                                                                                                                                                                                      
 3rd Qu.:2025-07-14 11:13:21                                                                                                                                                                                      
 Max.   :2025-10-16 00:31:07                                                                                                                                                                                      
                                                                                                                                                                                                                  
 Departed Date                 Departure Country  Final Order Yes No Final Order Date               Birth Date          Birth Year   Citizenship Country    Gender          Apprehension Site Landmark Alien File Number 
 Min.   :1923-09-14 00:00:00   Length:377067      Length:377067      Min.   :1967-01-06 00:00:00   Length:377067      Min.   :1932   Length:377067       Length:377067      Length:377067              Length:377067     
 1st Qu.:2024-10-02 00:00:00   Class :character   Class :character   1st Qu.:2019-09-26 00:00:00   Class :character   1st Qu.:1983   Class :character    Class :character   Class :character           Class :character  
 Median :2025-05-04 00:00:00   Mode  :character   Mode  :character   Median :2024-06-20 00:00:00   Mode  :character   Median :1991   Mode  :character    Mode  :character   Mode  :character           Mode  :character  
 Mean   :2025-02-18 23:21:58                                         Mean   :2021-06-04 17:40:28                      Mean   :1990                                                                                       
 3rd Qu.:2025-08-04 00:00:00                                         3rd Qu.:2025-05-09 00:00:00                      3rd Qu.:1998                                                                                       
 Max.   :2025-10-15 00:00:00                                         Max.   :2025-10-16 00:00:00                      Max.   :2025                                                                                       
 NA's   :140929                                                      NA's   :147872                                   NA's   :1                                                                                          
 EID Case ID        EID Subject ID     Unique Identifier 
 Length:377067      Length:377067      Length:377067     
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         

Cleaning the Data

Checking for Duplicates

To check for duplicates I’m going to look at the unique identifier column.

AdminArrest %>%
  group_by(`Unique Identifier`) %>%
  filter(n()==2)
NA

I get back 36,984 rows where the Unique Identifier repeated itself. But I can tell that there are certain instances where the same identifier appears twice with a different apprehension date. So I’m going to try again looking at Apprehension Date in addition to the Unique Identifier column.

AdminArrest %>%
  group_by(`Unique Identifier`, `Apprehension Date`) %>%
  filter(n()==2)

When I filter by both of these things I get 4,576 rows. That suggests that maybe certain individuals were arrested twice, or have been in the system multiple times over the course of many years, this is closer to the true number of duplicates. So now I’m going to remove those potential duplicates using the distinct function.

cleaned_AdminArrest <- AdminArrest %>%
  distinct(`Unique Identifier`,`Apprehension Date`, .keep_all = TRUE)
View(cleaned_AdminArrest)

When I run this I now have 374,561 entries instead of the original 377067. I would probably still ask an expert who works with this data if there are other ways I might not be catching duplicates but this makes the most sense to me. The Deportation Data Project does also release a “processed” version of their data that I could compare this too, BUT, they delete certain columns that in the processed data that I’m interested in looking at.

Checking for Consistency Issues

When looking at consistency issues, I’m looking at the columns that have characters. I have broken out below, by column, how I checked for issues with characters.

Apprehension State

There are 50 states in the US, so I’m looking for a result that gives back 50 rows, maybe 51 for arrests that take place in DC.

cleaned_AdminArrest %>%
  group_by(`Apprehension State`) %>%
  tally()

Here there are a couple consistency issues. Some rows fall under “ARMED FORCES - EUROPE,” “ARMED FORCES - THE AMERICAS,” and “ARMED SERVICES - THE PACIFIC,” all of which are not states in any traditional sense. I also have no idea what differentiates armed forces vs. services, nor am I clear if these are arrests that took place on military bases or as a part of military operations. Worth noting that we have 62 rows, instead of 50, which is partially because of territories under US control (i.e. Puerto Rico, Federated States of Micronesia, Guam, etc.) District of Columbia is also it’s own entry, and it’s not a state. There are also two lone entries in areas outside of US control, one just labeled “MEXICO,” the other labeled “TAMAULIPAS,” which is a territory in Mexico. Those seem like individual data entry issues or miscategorizations. Aside from that, the states themselves are spelled in consistent ways.

AOR (Area of Responsibility)

From the ICE.gov website I’m expecting 25 AOR’s, so I want 25 rows back.

cleaned_AdminArrest %>%
  group_by(`Apprehension AOR`) %>%
  tally()
NA

Here I got 27 rows back, one corresponded to N/A, with 5223 entries not assigned to a specific geography. There’s also a HQ Area of Responsibility, which doees appear on the ICE website, but doesn’t correspond to a specifc geography.

Final Program

I don’t know how many final program categories I’m expecting, so here I have to check in a more granular way that there aren’t spelling or capitalization variations.

cleaned_AdminArrest %>%
  group_by(`Final Program`) %>%
  tally()

Here the most obvious issue which populates first is what’s the difference between “287G Program” and “287g Task Force.” Because my project is in California which is (allegedly) not participating in 287g, I’m not worried about that. I would want to double check with an expert that some of these things do not refer to the same practical categories. I am concerned about the 487 Juveniles listed here.

Final Program Group

Because this is ICE data, I would assume everything in the final program group column is ICE, but I’m just double checking.

cleaned_AdminArrest %>%
  group_by(`Final Program Group`) %>%
  tally()

That produced just one row, so that’s consistent.

Apprehension Method

Again I don’t have a set number of apprehension methods that I’m expecting from a data dictionary, so I am checking for spelling/capitalizaiton consistency.

cleaned_AdminArrest %>%
  group_by(`Apprehension Method`) %>%
  tally()

In terms of how things are literally spelled yes there is consistency. In terms of what the data is saying here, I do have a lot of questions. For example only one person is listed as arrested in the “Presented During Inspection” category, which feels like a contradiction with the courthouse and field office check-ins having so many individuals grabbed by ICE. However, those might be a different category of enforcement that’s included in another data set (i.e. detentions). This is probably my flawed understanding of the legal system here, but still something I want to note.

Apprehension Criminality

Apprhension Criminality is assigned to one of three categories, so I want back 3 rows.

cleaned_AdminArrest %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

This looks consistent, but I am wary of the fact that there are no blanks in anyones purported criminality because these same rows can have a lot of missing information. ICE has incentive to overestimate in this data the proportion of people who do have criminal convictions.

Case Status

Now I’ll check for case status:

cleaned_AdminArrest %>%
  group_by(`Case Status`) %>%
  tally()

Here I get back 14 rows with categories listed. The first chunk are numbered starting at 0, but notably missing 1 and 2. I don’t know what those categories could be. There are 9272 who are not in a category at all, which is confusing because one category is literally just for active cases. I’m concerned also about the 54 deaths listed.

Case Category

I don’t have a set number of case categories I’m looking for so this is a regular consistency check.

cleaned_AdminArrest %>%
  group_by(`Case Category`) %>%
  tally()

Here there are no glaring consistency issues except the 4412 uncategorized cases.

Departure Country

There US recognizes 197 independent states, and 193 member countries to the UN. I for sure don’t want the number of rows here to exceed 197.

cleaned_AdminArrest %>%
  group_by(`Departure Country`) %>%
  tally()

I got back 193 rows which is in line with the number of UN member countries. Clicking through I didn’t see any super weird duplicates. One row is N/A but from looking at the other columns in conjunction, I think a departure country is only entered when the departure happens, so the active cases don’t have a departure country listed.

Final Order Yes or No

Now I’ll check final order Yes or No (hopefully 2 rows).

cleaned_AdminArrest %>%
  group_by(`Final Order Yes No`) %>%
  tally()

Here we get back Yes, No, or Blank (4412 rows).

Citizenship Country

Same as with departure country I don’t want the number of countries to exceed 197.

cleaned_AdminArrest %>%
  group_by(`Citizenship Country`) %>%
  tally()

My citizenship countries outnumber my departure countries by 3, I would want to compare those and see where I have additional citizenship but not departure. One issue is that Serbia, Montenegro, and “Serbia and Montenegro” (which dissolved in 2006) all appear in the data. Similarly USSR and Yugoslavia also appear in the data. I imagine that these represent people who came to the US and during their time here those countries dissolved, which accounts for the extra rows. Additionally, there are 394 people whose citizenship country is listed as “unknown,” but no one has a blank entry for citizenship country.

Gender

Gender I expect 2 rows.

cleaned_AdminArrest %>%
  group_by(Gender) %>%
  tally()

We get back three rows: Female, Male, and Unknown. Important to note that unknown is written in, and isn’t just a blank.

Apprehension Site Landmark

Apprehension Site Landmark I have very little expectations on what will come back and this is where I expect the majority of inconsistentcies to be.

cleaned_AdminArrest %>%
  group_by(`Apprehension Site Landmark`) %>%
  tally()

Here I get back 5590 rows, and there are major differences in capitalization and usage of dashes. Some include the state they were in, others do not. Some also include the program they were in (287(g), CAP, etc.). For the purposes of my project I’d probably take the California subset and clean that, but this would likely be the most onerous part of working with this data.

Checking for Missing Data

Now to check for missing data, I’m going to use/refer to the original unclean data “AdminArrest”:

MissingData <-colSums(is.na(AdminArrest))
print(MissingData)
         Apprehension Date         Apprehension State        Apprehension County           Apprehension AOR              Final Program        Final Program Group        Apprehension Method   Apprehension Criminality 
                         0                      60154                     377067                       5887                          0                          0                          0                          0 
               Case Status              Case Category              Departed Date          Departure Country         Final Order Yes No           Final Order Date                 Birth Date                 Birth Year 
                      9372                       9373                     140929                     140983                       9372                     147872                          1                          1 
       Citizenship Country                     Gender Apprehension Site Landmark          Alien File Number                EID Case ID             EID Subject ID          Unique Identifier 
                         0                          0                       8767                       5410                       9372                          0                       5410 

Here what’s a big deal is that there are 5410 people missing an A number or a Unique Identifier. I am less worried because the EID subject ID has no missing data, but that is something to keep note of.

Additional Altering

Creating a New Column to Look at Time of Arrest

I want to mutate the apprehension date column so that I can filter later by year and month, and even potentially the time of day an administrative arrest took place.

new_AdminArrest <-cleaned_AdminArrest%>% 
  mutate(ApprehensionYear = year(`Apprehension Date`))%>% 
  mutate(ApprehensionMonth = month(`Apprehension Date`))%>% 
  mutate(ApprehensionDay = day(`Apprehension Date`))%>% 
  mutate(ApprehensionHour = hour(`Apprehension Date`))

View(new_AdminArrest)

Breaking out California and San Francisco Area of Responsibility Data.

Now I want to look at just administrative arrests in California, so I’m going to filter by the apprehension state column.

CA_AdminArrest <- new_AdminArrest %>%
  filter(`Apprehension State` == "CALIFORNIA")
View(CA_AdminArrest)

I want to filter that further to just the Administrative arrests in the SF Area of Responsibility, because even though that’s actually a lot larger than the geographic scope my project is looking at, it at least includes my area the Bay Area.

SFAOR_AdminArrest <- CA_AdminArrest %>% 
  filter(`Apprehension AOR` == "San Francisco Area of Responsibility")
View(SFAOR_AdminArrest)

Analysis

Adminsitrative Arrests Over Time

A really basic comparison I might make to start is comparing the number of Administrative Arrests by year.

SFAOR_AdminArrest %>% 
  group_by(ApprehensionYear) %>% 
  tally()

But the table I get here actually isn’t that informative given that I only have data from three years. But maybe I just want to see the change in time by month of just this new administration (2025), So I’d likely do the following:

SFAOR_AdminArrestbyMonth <-SFAOR_AdminArrest %>% 
  filter(ApprehensionYear == "2025") %>% 
  group_by(ApprehensionMonth) %>% 
  tally()
View(SFAOR_AdminArrestbyMonth)

Just looking at the table I see there’s an increase in June which held in July, but it might be helpful to visualize it, so I’m going to look at plot it out:

ggplot(data=SFAOR_AdminArrestbyMonth) +
  geom_line(aes(x=ApprehensionMonth, y=n)) +
  xlim(1,11) +
  scale_x_continuous(breaks = scales::breaks_extended(n = 11))+
  scale_y_continuous(limits = c(0,800), breaks = c(0,100,200,300,400,500,600,700,800))+
  labs( x = "Month", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Month") +
  theme(plot.title = element_text(hjust = 0.5))

NA

Administrative Arrests by Hour

Anecdotally, I’ve also heard this thing from a number of advocates that ICE actions (arrests) tend to happen in the early morning. I want to check if that’s true, so I’m going to look at the ApprehensionHour column that I mutated out of the original Apprehension Date column. It’s on a 24 hour clock (no am/pm). I’m going to use the CA_AdminArrest data because according to advocates this represents ICE policy so it should be true across the state.

CA_AdminArrest_byHour <- CA_AdminArrest %>%
  group_by(ApprehensionHour) %>%
  tally()

View(CA_AdminArrest_byHour)

To see that visually I’m going to chart it.

ggplot(data=CA_AdminArrest_byHour, aes(x = ApprehensionHour, y = n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Hour of the Day", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Hour of the Day from Sept 2023 through Oct 2025") +
  scale_x_continuous(n.breaks=23)

This shows that the number of arrests peaks betwen 9 and 10 am, and if the actual arrests are taking place at that hour we can infer that ICE/DHS had to be active before then. The majority of the activity seems to happen between 7am and 1pm.

Countries of Origin

Another thing I’m interested in is under the current administration, what are the top countries of origin for folks being arrested. I’m interested both statewide and for just the San Francisco AOR, but I’ll start with just California

CaliforniaCitizenshipCountry2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(CaliforniaCitizenshipCountry2025)

When I do this and sort by values my top 10 citizenship countries are Mexico, Guatemala, El Salvador, Colombia, Honduras, India, Venezuela, China, Nicaragua and Peru. To compare this to the SFAOR I’m doing a similar analysis

SFAORCitizenshipCountry2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(SFAORCitizenshipCountry2025)

When I sort this table by the count, I get slightly different results, but there is a lot of overlap. In the San Francisco Area of Responsibility the top 10 citizenship countries for arrested folks are Mexico, India, Guatemala, El Salvador, Colombia, Honduras, Peru, Nicaragua, Venezuela, and China. Mexico notably outpaces the rest of these places in number of arrests.

Criminality

Something else I’m interested in is the “criminality” of those arrested in the San Francisco Area of responsibility. Again I want to look just in the context of the current administration. Those who are arrested are assigned to one of three categories of criminality, but I’m interested in the proportion that those categories show up in arrests.

SFAORArrestCriminality2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORArrestCriminality2025)

I want to see this visually, so even though I’m not a fan of pie charsts I’m going to look at it in a pie chart.

ggplot(SFAORArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality SFAOR 2025") +
  theme_void()

For those arrested the majority are convicted criminals, according to the data, but it’s notable that a sizable chunk are categorized under “other immigration violation.” Arrests are supposed to represent individuals have a real case against them, so the convicted criminal category should represented the majority. And these arrests are not the totality of deportations, if you look at the deportation data set in addition to this arrests data, you would find the majority of folks coming in to contact with ICE/DHS do not have any criminal convictions. It’s also worth noting that the convicted criminal category doesn’t break down further into what crimes folks were convicted of. Other reporting has confirmed that there are many folks who end up arrested where the basis of their conviction is a traffic violation. I talked to one lawyer in the Bay Area who confirmed they had a client where a prior conviction came in the form of a citation for failing to pay for BART. Are those crimes the same as murder? No, but under this categorization they’re flattened into one category together.

Just to make sure my analysis isn’t off I’m going to compare SF in 2025 to all of California.

CA_ArrestCriminality2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(CA_ArrestCriminality2025)

Here is that information visualized:

ggplot(CA_ArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality All of California 2025") +
  theme_void()

The proportions I get here are different. Other Immigration Violator is nearly the same value as Convicted Criminal. I’m curious as to what’s at the root of this discrepancy in San Francisco. I do wonder if this has something to do with the lack of indiscriminate raids in the Bay Area, which means that in places like LA there is a much higher proportion of people who are being swept up by this system.

Changes to Criminality by Month

After looking at the general summary of criminality information, I want to see how the criminality of who is arrested has changed in the SFAOR by month. The question I’d want answered is if the proportion of people being arrested who have criminal convictions is changing. So much reporting has said that regular people with no convictions are getting swept up in arrests under this administration. So, I started with SFAOR Admin Arrest Data, filtered it by the current year, and then for each month so far I had it give me the proportion of arrests based on criminality.

January

SFAORJan25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "1") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJan25Crim)

February

SFAORFeb25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "2") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORFeb25Crim)

March

SFAORMar25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "3") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMar25Crim)

April

SFAORApr25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "4") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORApr25Crim)

May

SFAORMay25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "5") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMay25Crim)

June

SFAORJun25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "6") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJun25Crim)

July

SFAORJul25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "7") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJul25Crim)

August

SFAORAug25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "8") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORAug25Crim)

September

SFAORSept25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "9") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORSept25Crim)

October, but I’m noting that the October data is incomplete.

SFAOROct25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "10") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAOROct25Crim)

There was probably a faster way to iterate this process, but I took the results of each of these and made a spreadsheet to visualize.

Arrest Pickup Location and Apprehension Method

SFAOR_AdminArrest2025 <-SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Method`,`Apprehension Site Landmark`)

view(SFAOR_AdminArrest2025)

I want to look at where the apprehension method is 287g.

SFAOR_AdminArrest2025_287g <-SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "287(g) Program")

view(SFAOR_AdminArrest2025_287g)

I want to look at apprehension method involving local incarceration

SFAOR_AdminArrest2025_local <- SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "CAP Local Incarceration")

View (SFAOR_AdminArrest2025_local)

Now I want a tally of apprehension site landmarks.

SFAPLandmarks <- SFAOR_AdminArrest2025 %>%
  group_by(`Apprehension Site Landmark`)%>%
  tally()

view(SFAPLandmarks)

These are unfortunately vague, and they’re worth looking at for me personally, but they do not make for a great visualization.

Demographics in SFAOR

This is the gender of SFAOR admin arrests in 2025.

SFAORArrestGender2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Gender`) %>%
  tally() 

View(SFAORArrestGender2025)

Here’s the gender information graphed

ggplot(SFAORArrestGender2025, aes(x="Gender", y=n, fill=`Gender`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Arrest Gender SFAOR 2025") +
  theme_void()

This is an age breakdown of SFAOR admin arrests in 2025

SFAOR_AdminArrest_Age2025 <- SFAOR_AdminArrest %>%
  mutate(Age = 2025 - `Birth Year`) %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(Age)  %>%
  tally()

View(SFAOR_AdminArrest_Age2025)

Here is that same information graphed.

ggplot(data=SFAOR_AdminArrest_Age2025, aes(x = Age, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Age Range", y = "Number of Arrests") +
  ggtitle("Arrests by Age SFAOR") +
  scale_x_continuous(n.breaks=20)

I want to know what the Median and Mean age of arrest are so I can incorporate them into a better graphic

summary(SFAOR_AdminArrest_Age2025$Age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00   22.25   40.50   40.59   58.75   84.00 

Where do Arrested People in the SFAOR Go?

I’m going to do some analysis with the intention of putting it into a prettier program, like flourish. The only surefire way to tell if an arrest results in a deportation is that there is a departure country or departed date listed. I want to start creating two new entities, one which is the SFAOR 2025 arrests that resulted in a deportation, and the other being the SFAOR 2025 arrests that did not result in a deportation. Here’s how I would make each of those.

Deported Data Set:

Deported_SFAOR_25 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>% 
  drop_na(`Departed Date`)

View(Deported_SFAOR_25)

Non Deported Data Set:

NotDeported_SFAOR_25 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", is.na(`Departed Date`)) 
 
View(NotDeported_SFAOR_25)

I want to note that 2,086 arrests in the SFAOR resulted in deportation, and 1,756 did not. Now I want to look at the final program that people who were deported ended up in.

FPDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Final Program`) %>%
  tally()

View(FPDeported_SFAOR_25)

It’s not surprising that ERO Criminal Alien Program was the largest category because it’s the vaguest, so within that I wanted to see what the top apprehension methods were.

ERODeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  filter(`Final Program`== "ERO Criminal Alien Program") %>%
  group_by(`Apprehension Method`) %>%
  tally()

View(ERODeported_SFAOR_25)

I also want to see how the apprehension method breakdown for people who were deported.

AMDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Apprehension Method`) %>%
  tally()

View(AMDeported_SFAOR_25)

Out of curiosity I do want to see the final program breakdown for people who were not deported.

FPNotDeported_SFAOR_25 <- NotDeported_SFAOR_25 %>%
  group_by(`Final Program`) %>%
  tally()

View(FPNotDeported_SFAOR_25)

For both groups I also want to see what the criminality breakdown looks like. Here first is the Deported Group’s criminality:

CrimDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

View(CrimDeported_SFAOR_25)

And this is the non deported group’s criminality.

CrimNotDeported_SFAOR_25 <- NotDeported_SFAOR_25 %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

View(CrimNotDeported_SFAOR_25)

Here it is interesting to note that the vast majority of people who are arrested and deported (for now) do seem to have a criminal conviction, but the arrested non deported category is many more folks with just immigration violations. This is worth noting. I did a bunch of these breakouts for the purposes of my own understanding, but as for making a visual that I think has more capacity to be informative, I’m going to work with the final program categorization to make a

---
title: "J221 EPM Final Project — Administrative Arrests"
author: "Ellie Prickett-Morgan"
date: "2025-12-10"
output: html_notebook
---

## Setup and Libraries

```{r}
knitr::opts_chunk$set(echo = TRUE, error = TRUE, cache=TRUE, autodep = TRUE, message = FALSE, progress_bar = FALSE)

library(readr)
library(tidyverse) 
library(dplyr) 
library(ggplot2) 
library(knitr) 
library(psych)
library(lme4)
library(boot)
library(ggeffects) 
library(DescTools)
library(tidycensus)
library(xml2)
library(tmap)
library(tmaptools)
library(leaflet)
library(sf)
library(leaflet.extras)
library(raster)
library(tigris)
library(sp)
library(scales)
library(stringi)
library(leaflegend)
library(readxl)
library(readr)
library(stringr)
library(janitor)
library(ggrepel)
```

```{r}
setwd("/Users/eleanorprickettmorgan/Desktop/DeportationData/")
```

## Importing Data

I'm skipping 6 because the first 6 rows of the excel file have summary information, but the data itself doesn't actually start. Then I'm using Summary to check what my time range is, looking specifically at the Apprehension column. 

```{r}
AdminArrest <- read_excel("/Users/eleanorprickettmorgan/Desktop/DeportationData/ERO Admin Arrests_LESA-STU-FINAL Release_raw.xlsx", skip = 6)
View(AdminArrest)
summary(AdminArrest)
```

## Cleaning the Data

### Checking for Duplicates

To check for duplicates I'm going to look at the unique identifier column. 

```{r}
AdminArrest %>%
  group_by(`Unique Identifier`) %>%
  filter(n()==2)

```

I get back 36,984 rows where the Unique Identifier repeated itself. But I can tell that there are certain instances where the same identifier appears twice with a different apprehension date. So I'm going to try again looking at Apprehension Date in addition to the Unique Identifier column. 

```{r}
AdminArrest %>%
  group_by(`Unique Identifier`, `Apprehension Date`) %>%
  filter(n()==2)
```

When I filter by both of these things I get 4,576 rows. That suggests that maybe certain individuals were arrested twice, or have been in the system multiple times over the course of many years, this is closer to the true number of duplicates. So now I'm going to remove those potential duplicates using the distinct function. 

```{r}
cleaned_AdminArrest <- AdminArrest %>%
  distinct(`Unique Identifier`,`Apprehension Date`, .keep_all = TRUE)
View(cleaned_AdminArrest)
```

When I run this I now have 374,561 entries instead of the original 377067. I would probably still ask an expert who works with this data if there are other ways I might not be catching duplicates but this makes the most sense to me. The Deportation Data Project does also release a "processed" version of their data that I could compare this too, BUT, they delete certain columns that in the processed data that I'm interested in looking at. 

### Checking for Consistency Issues

When looking at consistency issues, I'm looking at the columns that have characters. I have broken out below, by column, how I checked for issues with characters.

#### Apprehension State

There are 50 states in the US, so I'm looking for a result that gives back 50 rows, maybe 51 for arrests that take place in DC.

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension State`) %>%
  tally()
```

Here there are a couple consistency issues. Some rows fall under "ARMED FORCES - EUROPE," "ARMED FORCES - THE AMERICAS," and "ARMED SERVICES - THE PACIFIC," all of which are not states in any traditional sense. I also have no idea what differentiates armed forces vs. services, nor am I clear if these are arrests that took place on military bases or as a part of military operations. Worth noting that we have 62 rows, instead of 50, which is partially because of territories under US control (i.e. Puerto Rico, Federated States of Micronesia, Guam, etc.) District of Columbia is also it's own entry, and it's not a state. There are also two lone entries in areas outside of US control, one just labeled "MEXICO," the other labeled "TAMAULIPAS," which is a territory in Mexico. Those seem like individual data entry issues or miscategorizations. Aside from that, the states themselves are spelled in consistent ways. 

#### AOR (Area of Responsibility)

From the ICE.gov website I'm expecting 25 AOR's, so I want 25 rows back. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension AOR`) %>%
  tally()

```

Here I got 27 rows back, one corresponded to N/A, with 5223 entries not assigned to a specific geography. There's also a HQ Area of Responsibility, which doees appear on the ICE website, but doesn't correspond to a specifc geography.

#### Final Program

I don't know how many final program categories I'm expecting, so here I have to check in a more granular way that there aren't spelling or capitalization variations. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Program`) %>%
  tally()
```

Here the most obvious issue which populates first is what's the difference between "287G Program" and "287g Task Force." Because my project is in California which is (allegedly) not participating in 287g, I'm not worried about that. I would want to double check with an expert that some of these things do not refer to the same practical categories. I am concerned about the 487 Juveniles listed here.

#### Final Program Group

Because this is ICE data, I would assume everything in the final program group column is ICE, but I'm just double checking. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Program Group`) %>%
  tally()
```
That produced just one row, so that's consistent. 

#### Apprehension Method

Again I don't have a set number of apprehension methods that I'm expecting from a data dictionary, so I am checking for spelling/capitalizaiton consistency.

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Method`) %>%
  tally()
```

In terms of how things are literally spelled yes there is consistency. In terms of what the data is saying here, I do have a lot of questions. For example only one person is listed as arrested in the "Presented During Inspection" category, which feels like a contradiction with the courthouse and field office check-ins having so many individuals grabbed by ICE. However, those might be a different category of enforcement that's included in another data set (i.e. detentions). This is probably my flawed understanding of the legal system here, but still something I want to note. 

#### Apprehension Criminality

Apprhension Criminality is assigned to one of three categories, so I want back 3 rows.

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Criminality`) %>%
  tally()
```

This looks consistent, but I am wary of the fact that there are no blanks in anyones purported criminality because these same rows can have a lot of missing information. ICE has incentive to overestimate in this data the proportion of people who do have criminal convictions. 

#### Case Status

Now I'll check for case status:

```{r}
cleaned_AdminArrest %>%
  group_by(`Case Status`) %>%
  tally()
```

Here I get back 14 rows with categories listed. The first chunk are numbered starting at 0, but notably missing 1 and 2. I don't know what those categories could be. There are 9272 who are not in a category at all, which is confusing because one category is literally just for active cases. I'm concerned also about the 54 deaths listed.

#### Case Category

I don't have a set number of case categories I'm looking for so this is a regular consistency check. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Case Category`) %>%
  tally()
```

Here there are no glaring consistency issues except the 4412 uncategorized cases. 

#### Departure Country

There US recognizes 197 independent states, and 193 member countries to the UN. I for sure don't want the number of rows here to exceed 197.

```{r}
cleaned_AdminArrest %>%
  group_by(`Departure Country`) %>%
  tally()
```

I got back 193 rows which is in line with the number of UN member countries. Clicking through I didn't see any super weird duplicates. One row is N/A but from looking at the other columns in conjunction, I think a departure country is only entered when the departure happens, so the active cases don't have a departure country listed. 

#### Final Order Yes or No

Now I'll check final order Yes or No (hopefully 2 rows).

```{r}
cleaned_AdminArrest %>%
  group_by(`Final Order Yes No`) %>%
  tally()
```
Here we get back Yes, No, or Blank (4412 rows).

#### Citizenship Country

Same as with departure country I don't want the number of countries to exceed 197.

```{r}
cleaned_AdminArrest %>%
  group_by(`Citizenship Country`) %>%
  tally()
```

My citizenship countries outnumber my departure countries by 3, I would want to compare those and see where I have additional citizenship but not departure. One issue is that Serbia, Montenegro, and "Serbia and Montenegro" (which dissolved in 2006) all appear in the data. Similarly USSR and Yugoslavia also appear in the data. I imagine that these represent people who came to the US and during their time here those countries dissolved, which accounts for the extra rows. Additionally, there are 394 people whose citizenship country is listed as "unknown," but no one has a blank entry for citizenship country.

#### Gender

Gender I expect 2 rows.

```{r}
cleaned_AdminArrest %>%
  group_by(Gender) %>%
  tally()
```

We get back three rows: Female, Male, and Unknown. Important to note that unknown is written in, and isn't just a blank. 

#### Apprehension Site Landmark

Apprehension Site Landmark I have very little expectations on what will come back and this is where I expect the majority of inconsistentcies to be. 

```{r}
cleaned_AdminArrest %>%
  group_by(`Apprehension Site Landmark`) %>%
  tally()
```

Here I get back 5590 rows, and there are major differences in capitalization and usage of dashes. Some include the state they were in, others do not. Some also include the program they were in (287(g), CAP, etc.). For the purposes of my project I'd probably take the California subset and clean that, but this would likely be the most onerous part of working with this data. 

### Checking for Missing Data

Now to check for missing data, I'm going to use/refer to the original unclean data "AdminArrest":

```{r}
MissingData <-colSums(is.na(AdminArrest))
print(MissingData)
```

Here what's a big deal is that there are 5410 people missing an A number or a Unique Identifier. I am less worried because the EID subject ID has no missing data, but that is something to keep note of. 

### Additional Altering

#### Creating a New Column to Look at Time of Arrest

I want to mutate the apprehension date column so that I can filter later by year and month, and even potentially the time of day an administrative arrest took place.

```{r}
new_AdminArrest <-cleaned_AdminArrest%>% 
  mutate(ApprehensionYear = year(`Apprehension Date`))%>% 
  mutate(ApprehensionMonth = month(`Apprehension Date`))%>% 
  mutate(ApprehensionDay = day(`Apprehension Date`))%>% 
  mutate(ApprehensionHour = hour(`Apprehension Date`))

View(new_AdminArrest)

```

#### Breaking out California and San Francisco Area of Responsibility Data.

Now I want to look at just administrative arrests in California, so I'm going to filter by the apprehension state column.

```{r}
CA_AdminArrest <- new_AdminArrest %>%
  filter(`Apprehension State` == "CALIFORNIA")
View(CA_AdminArrest)

```

I want to filter that further to just the Administrative arrests in the SF Area of Responsibility, because even though that's actually a lot larger than the geographic scope my project is looking at, it at least includes my area the Bay Area. 

```{r}
SFAOR_AdminArrest <- CA_AdminArrest %>% 
  filter(`Apprehension AOR` == "San Francisco Area of Responsibility")
View(SFAOR_AdminArrest)
```

## Analysis

### Adminsitrative Arrests Over Time

A really basic comparison I might make to start is comparing the number of Administrative Arrests by year.

```{r}
SFAOR_AdminArrest %>% 
  group_by(ApprehensionYear) %>% 
  tally()
```

But the table I get here actually isn't that informative given that I only have data from three years. But maybe I just want to see the change in time by month of just this new administration (2025), So I'd likely do the following:

```{r}
SFAOR_AdminArrestbyMonth <-SFAOR_AdminArrest %>% 
  filter(ApprehensionYear == "2025") %>% 
  group_by(ApprehensionMonth) %>% 
  tally()
View(SFAOR_AdminArrestbyMonth)

```

Just looking at the table I see there's an increase in June which held in July, but it might be helpful to visualize it, so I'm going to look at plot it out: 

```{r}
ggplot(data=SFAOR_AdminArrestbyMonth) +
  geom_line(aes(x=ApprehensionMonth, y=n)) +
  xlim(1,11) +
  scale_x_continuous(breaks = scales::breaks_extended(n = 11))+
  scale_y_continuous(limits = c(0,800), breaks = c(0,100,200,300,400,500,600,700,800))+
  labs( x = "Month", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Month") +
  theme(plot.title = element_text(hjust = 0.5))
  
```

#### Administrative Arrests by Hour

Anecdotally, I've also heard this thing from a number of advocates that ICE actions (arrests) tend to happen in the early morning. I want to check if that's true, so I'm going to look at the ApprehensionHour column that I mutated out of the original Apprehension Date column. It's on a 24 hour clock (no am/pm). I'm going to use the CA_AdminArrest data because according to advocates this represents ICE policy so it should be true across the state. 

```{r}
CA_AdminArrest_byHour <- CA_AdminArrest %>%
  group_by(ApprehensionHour) %>%
  tally()

View(CA_AdminArrest_byHour)

```

To see that visually I'm going to chart it. 

```{r}
ggplot(data=CA_AdminArrest_byHour, aes(x = ApprehensionHour, y = n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Hour of the Day", y = "Number of Arrests") +
  ggtitle("Administrative Arrests by Hour of the Day from Sept 2023 through Oct 2025") +
  scale_x_continuous(n.breaks=23)
```
This shows that the number of arrests peaks betwen 9 and 10 am, and if the actual arrests are taking place at that hour we can infer that ICE/DHS had to be active before then. The majority of the activity seems to happen between 7am and 1pm.

#### Countries of Origin

Another thing I'm interested in is under the current administration, what are the top countries of origin for folks being arrested. I'm interested both statewide and for just the San Francisco AOR, but I'll start with just California

```{r}
CaliforniaCitizenshipCountry2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(CaliforniaCitizenshipCountry2025)
```

When I do this and sort by values my top 10 citizenship countries are Mexico, Guatemala, El Salvador, Colombia, Honduras, India, Venezuela, China, Nicaragua and Peru. To compare this to the SFAOR I'm doing a similar analysis


```{r}
SFAORCitizenshipCountry2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Citizenship Country`) %>%
  tally() 

View(SFAORCitizenshipCountry2025)
```

When I sort this table by the count, I get slightly different results, but there is a lot of overlap. In the San Francisco Area of Responsibility the top 10 citizenship countries for arrested folks are Mexico, India, Guatemala, El Salvador, Colombia, Honduras, Peru, Nicaragua, Venezuela, and China. Mexico notably outpaces the rest of these places in number of arrests. 

#### Criminality

Something else I'm interested in is the "criminality" of those arrested in the San Francisco Area of responsibility. Again I want to look just in the context of the current administration. Those who are arrested are assigned to one of three categories of criminality, but I'm interested in the proportion that those categories show up in arrests. 

```{r}
SFAORArrestCriminality2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORArrestCriminality2025)
```

I want to see this visually, so even though I'm not a fan of pie charsts I'm going to look at it in a pie chart.

```{r}
ggplot(SFAORArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality SFAOR 2025") +
  theme_void()
```


For those arrested the majority are convicted criminals, according to the data, but it's notable that a sizable chunk are categorized under "other immigration violation." Arrests are supposed to represent individuals have a real case against them, so the convicted criminal category should represented the majority. And these arrests are not the totality of deportations, if you look at the deportation data set in addition to this arrests data, you would find the majority of folks coming in to contact with ICE/DHS do not have any criminal convictions. It's also worth noting that the convicted criminal category doesn't break down further into what crimes folks were convicted of. Other reporting has confirmed that there are many folks who end up arrested where the basis of their conviction is a traffic violation. I talked to one lawyer in the Bay Area who confirmed they had a client where a prior conviction came in the form of a citation for failing to pay for BART. Are those crimes the same as murder? No, but under this categorization they're flattened into one category together. 

Just to make sure my analysis isn't off I'm going to compare SF in 2025 to all of California.

```{r}
CA_ArrestCriminality2025 <- CA_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(CA_ArrestCriminality2025)
```

Here is that information visualized:

```{r}
ggplot(CA_ArrestCriminality2025, aes(x="Apprehension Criminality", y=n, fill=`Apprehension Criminality`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Apprehension Criminality All of California 2025") +
  theme_void()
```

The proportions I get here are different. Other Immigration Violator is nearly the same value as Convicted Criminal. I'm curious as to what's at the root of this discrepancy in San Francisco. I do wonder if this has something to do with the lack of indiscriminate raids in the Bay Area, which means that in places like LA there is a much higher proportion of people who are being swept up by this system.

#### Changes to Criminality by Month

After looking at the general summary of criminality information, I want to see how the criminality of who is arrested has changed in the SFAOR by month. The question I'd want answered is if the proportion of people being arrested who have criminal convictions is changing. So much reporting has said that regular people with no convictions are getting swept up in arrests under this administration. So, I started with SFAOR Admin Arrest Data, filtered it by the current year, and then for each month so far I had it give me the proportion of arrests based on criminality.

January

```{r}
SFAORJan25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "1") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJan25Crim)
```

February

```{r}
SFAORFeb25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "2") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORFeb25Crim)
```

March

```{r}
SFAORMar25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "3") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMar25Crim)
```

April

```{r}
SFAORApr25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "4") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORApr25Crim)
```

May

```{r}
SFAORMay25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "5") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORMay25Crim)
```

June

```{r}
SFAORJun25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "6") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJun25Crim)
```

July

```{r}
SFAORJul25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "7") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORJul25Crim)
```

August

```{r}
SFAORAug25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "8") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORAug25Crim)
```

September

```{r}
SFAORSept25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "9") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAORSept25Crim)
```

October, but I'm noting that the October data is incomplete.

```{r}
SFAOROct25Crim <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", ApprehensionMonth == "10") %>%
  group_by(`Apprehension Criminality`) %>%
  tally() 

View(SFAOROct25Crim)
```

There was probably a faster way to iterate this process, but I took the results of each of these and made a spreadsheet to visualize. 


#### Arrest Pickup Location and Apprehension Method

```{r}
SFAOR_AdminArrest2025 <-SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Apprehension Method`,`Apprehension Site Landmark`)

view(SFAOR_AdminArrest2025)
```

I want to look at where the apprehension method is 287g.

```{r}
SFAOR_AdminArrest2025_287g <-SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "287(g) Program")

view(SFAOR_AdminArrest2025_287g)
```

I want to look at apprehension method involving local incarceration

```{r}
SFAOR_AdminArrest2025_local <- SFAOR_AdminArrest2025 %>%
  filter(`Apprehension Method`== "CAP Local Incarceration")

View (SFAOR_AdminArrest2025_local)
```

Now I want a tally of apprehension site landmarks.

```{r}
SFAPLandmarks <- SFAOR_AdminArrest2025 %>%
  group_by(`Apprehension Site Landmark`)%>%
  tally()

view(SFAPLandmarks)

```

These are unfortunately vague, and they're worth looking at for me personally, but they do not make for a great visualization. 

#### Demographics in SFAOR

This is the gender of SFAOR admin arrests in 2025.

```{r}
SFAORArrestGender2025 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(`Gender`) %>%
  tally() 

View(SFAORArrestGender2025)

```

Here's the gender information graphed

```{r}
ggplot(SFAORArrestGender2025, aes(x="Gender", y=n, fill=`Gender`)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  ggtitle("Arrest Gender SFAOR 2025") +
  theme_void()
```

This is an age breakdown of SFAOR admin arrests in 2025

```{r}
SFAOR_AdminArrest_Age2025 <- SFAOR_AdminArrest %>%
  mutate(Age = 2025 - `Birth Year`) %>%
  filter(ApprehensionYear == "2025") %>%
  group_by(Age)  %>%
  tally()

View(SFAOR_AdminArrest_Age2025)
```

Here is that same information graphed.

```{r}
ggplot(data=SFAOR_AdminArrest_Age2025, aes(x = Age, y =n)) +
  geom_bar(stat="identity", width=0.5, fill= "steelblue") +
  labs( x = "Age Range", y = "Number of Arrests") +
  ggtitle("Arrests by Age SFAOR") +
  scale_x_continuous(n.breaks=20)
```

I want to know what the Median and Mean age of arrest are so I can incorporate them into a better graphic

```{r}
summary(SFAOR_AdminArrest_Age2025$Age)
```


#### Where do Arrested People in the SFAOR Go?

I'm going to do some analysis with the intention of putting it into a prettier program, like flourish. The only surefire way to tell if an arrest results in a deportation is that there is a departure country or departed date listed. I want to start creating two new entities, one which is the SFAOR 2025 arrests that resulted in a deportation, and the other being the SFAOR 2025 arrests that did not result in a deportation. Here's how I would make each of those.

Deported Data Set:

```{r}
Deported_SFAOR_25 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025") %>% 
  drop_na(`Departed Date`)

View(Deported_SFAOR_25)
```

Non Deported Data Set: 

```{r}
NotDeported_SFAOR_25 <- SFAOR_AdminArrest %>%
  filter(ApprehensionYear == "2025", is.na(`Departed Date`)) 
 
View(NotDeported_SFAOR_25)
```

I want to note that 2,086 arrests in the SFAOR resulted in deportation, and 1,756 did not. Now I want to look at the final program that people who were deported ended up in.

```{r}
FPDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Final Program`) %>%
  tally()

View(FPDeported_SFAOR_25)
```

It's not surprising that ERO Criminal Alien Program was the largest category because it's the vaguest, so within that I wanted to see what the top apprehension methods were. 

```{r}
ERODeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  filter(`Final Program`== "ERO Criminal Alien Program") %>%
  group_by(`Apprehension Method`) %>%
  tally()

View(ERODeported_SFAOR_25)
```


I also want to see how the apprehension method breakdown for people who were deported.

```{r}
AMDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Apprehension Method`) %>%
  tally()

View(AMDeported_SFAOR_25)
```

Out of curiosity I do want to see the final program breakdown for people who were not deported.

```{r}
FPNotDeported_SFAOR_25 <- NotDeported_SFAOR_25 %>%
  group_by(`Final Program`) %>%
  tally()

View(FPNotDeported_SFAOR_25)
```

For both groups I also want to see what the criminality breakdown looks like. Here first is the Deported Group's criminality:

```{r}
CrimDeported_SFAOR_25 <- Deported_SFAOR_25 %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

View(CrimDeported_SFAOR_25)
```

And this is the non deported group's criminality. 

```{r}
CrimNotDeported_SFAOR_25 <- NotDeported_SFAOR_25 %>%
  group_by(`Apprehension Criminality`) %>%
  tally()

View(CrimNotDeported_SFAOR_25)
```

Here it is interesting to note that the vast majority of people who are arrested and deported (for now) do seem to have a criminal conviction, but the arrested non deported category is many more folks with just immigration violations. This is worth noting. I did a bunch of these breakouts for the purposes of my own understanding, but as for making a visual that I think has more capacity to be informative, I'm going to work with the final program categorization to make a 
