Overview

This project will allow you to access data on deportations in the United States from 2003 to 2024. In your analysis, you will be able to assess several claims that have been made regarding deportation. Are common narratives about deportation sustainable given the observed data? This is what social scientists do: we make, or attempt to make, evidence-based claims. The tasks I am asking you to do here are portable to any (or most any) data set you might encounter whether it is in international relations, comparative politics, economics, sociology, and so forth. The “POL 51” aspect of this assignment is to give you hands-on experience in interpreting plots, univariate statistics, and rudimentary hypothesis testing. In class, we will cover this extensively. I have assigned an article by Patler and Jones and it is posted on Canvas. It is expected you read this article in advance of writing up your responses.

Project 1 will roll out in two parts. This is the second and it is worth 400 points. You will submit the HTML file you generate on Canvas by October 31.

Part 2: The Criminality Narrative

A common trope in the immigration debate is that undocumented immigrants commit, at high rates, violent crimes. Therefore, the supposition is that migrants who are deported are migrants who have committed serious criminal infractions. This idea is prevalent in political rhetoric surrounding the issue of deportation. But is the claim consistent with the actual data?

Part 1 of this assignment is asking you to analyze real-world data on deportations in the United States between the years 2003 and 2024. The data you access records annual ICE removals (deportations) based on what ICE records as the “Most Serious Criminal Conviction” for someone who is deported. The following information is from TRAC (Transactional Records Access Clearinghouse) and describes what the classification levels mean:

“Seriousness Level of MSCC Conviction. ICE classifies National Crime Information Center (NCIC) offense codes into three seriousness levels. The most serious (Level 1) covers what ICE considers to be”aggravated felonies.” Level 2 offenses cover other felonies, while Level 3 offenses are misdemeanors, including petty and other minor violations of the law. TRAC uses ICE’s “business rules” to group recorded NCIC offense codes into these three seriousness levels.”

Essentially what this loosely means is that “Level 1” convictions are the most serious and “Level 3” convictions are generally minor legal infractions.

This chunk of code will access the data set.

reasons="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/ICE_reasonforremoval.csv"
reasons<-read_csv(url(reasons))
summary(reasons)

##       Year       President              All              None       
##  Min.   :2003   Length:22          Min.   : 56882   Min.   : 19495  
##  1st Qu.:2008   Class :character   1st Qu.:178148   1st Qu.: 85446  
##  Median :2014   Mode  :character   Median :238765   Median :106426  
##  Mean   :2014                      Mean   :248987   Mean   :122287  
##  3rd Qu.:2019                      3rd Qu.:356423   3rd Qu.:165287  
##  Max.   :2024                      Max.   :407821   Max.   :253342  
##      Level1          Level2          Level3        Undocumented     
##  Min.   : 9819   Min.   : 3846   Min.   : 11045   Min.   :10100000  
##  1st Qu.:38484   1st Qu.: 9056   1st Qu.: 34978   1st Qu.:10500000  
##  Median :46743   Median :17480   Median : 63186   Median :11050000  
##  Mean   :46534   Mean   :15601   Mean   : 64541   Mean   :11015455  
##  3rd Qu.:57148   3rd Qu.:20342   3rd Qu.: 90950   3rd Qu.:11375000  
##  Max.   :75590   Max.   :29436   Max.   :130251   Max.   :12200000  
##      ER_Non     
##  Min.   : 4018  
##  1st Qu.:28563  
##  Median :41647  
##  Mean   :38980  
##  3rd Qu.:50230  
##  Max.   :71686

Task 1: Univariate statistics

For this task, you are going to summarize the data by way of analyzing descriptive, univariate statistics (i.e. mean, s.d., median, iqr). To do this:

First, create four variables recording the percentage of all deportations that are: Level 1, Level 2, Level 3, and None. Put your code in the chunk below to do this.

#Insert code for task 1.1 here
reasons <- reasons %>%
  mutate(percent_level1 = (Level1/All) * 100,
         percent_level2 = (Level2/All) * 100,
         percent_level3 = (Level3/All) * 100,
         percent_none = (None/All) * 100)
reasons

## # A tibble: 22 × 13
##     Year President    All   None Level1 Level2 Level3 Undocumented ER_Non
##    <dbl> <chr>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>        <dbl>  <dbl>
##  1  2024 Biden     102204  77494   9819   3846  11045     11200000  54870
##  2  2023 Biden     145578  88697  23936   9044  23901     11000000  54933
##  3  2022 Biden      73432  35109  18895   6433  12995     11000000  18874
##  4  2021 Biden      56882  19495  19802   5788  11797     10500000   7607
##  5  2020 Trump     180313  78725  29956  18247  53568     10350000  31520
##  6  2019 Trump     269823 117858  37720  21462  92783     10200000  51387
##  7  2018 Trump     257239 110178  40778  20132  85451     10500000  41665
##  8  2017 Trump     224503  97850  43247  18625  64781     10500000  42504
##  9  2016 Obama2    241258 102673  45149  20352  73084     10700000  46759
## 10  2015 Obama2    236272  96266  46758  20314  72934     11000000  45743
## # ℹ 12 more rows
## # ℹ 4 more variables: percent_level1 <dbl>, percent_level2 <dbl>,
## #   percent_level3 <dbl>, percent_none <dbl>

Second, compute the mean, standard deviation, median, and iqr for each of the newly created variables (this means you will have \(4 \times 4=16\) statistics). Put your code to do this in the chunk below.

#Insert code for task 1.2 here
mean_percent_level1 <- mean(reasons$percent_level1)
mean_percent_level2 <- mean(reasons$percent_level2)
mean_percent_level3 <- mean(reasons$percent_level3)
mean_percent_none <- mean(reasons$percent_none)
sd_percent_level1 <- sd(reasons$percent_level1)
sd_percent_level2 <- sd(reasons$percent_level2)
sd_percent_level3 <- sd(reasons$percent_level3)
sd_percent_none <- sd(reasons$percent_none)
median_percent_level1 <- median(reasons$percent_level1)
median_percent_level2 <- median(reasons$percent_level2)
median_percent_level3 <- median(reasons$percent_level3)
median_percent_none <- median(reasons$percent_none)
iqr_percent_level1 <- IQR(reasons$percent_level1)
iqr_percent_level2 <- IQR(reasons$percent_level2)
iqr_percent_level3 <- IQR(reasons$percent_level3)
iqr_percent_none <- IQR(reasons$percent_none)

print(mean_percent_level1 <- mean(reasons$percent_level1))

## [1] 19.77446

print(mean_percent_level2 <- mean(reasons$percent_level2))

## [1] 6.53759

print(mean_percent_level3 <- mean(reasons$percent_level3))

## [1] 24.50703

print(mean_percent_none <- mean(reasons$percent_none))

## [1] 49.17316

print(sd_percent_level1 <- sd(reasons$percent_level1))

## [1] 5.625592

print(sd_percent_level1 <- sd(reasons$percent_level2))

## [1] 2.10969

print(sd_percent_level1 <- sd(reasons$percent_level3))

## [1] 7.295998

print(sd_percent_level1 <- sd(reasons$percent_none))

## [1] 9.966849

print(median_percent_level1 <- median(reasons$percent_level1))

## [1] 18.3565

print(median_percent_level2 <- median(reasons$percent_level2))

## [1] 6.715175

print(median_percent_level3 <- median(reasons$percent_level3))

## [1] 24.30652

print(median_percent_none <- median(reasons$percent_none))

## [1] 45.3763

print(iqr_percent_level1 <- IQR(reasons$percent_level1))

## [1] 4.999591

print(iqr_percent_level1 <- IQR(reasons$percent_level2))

## [1] 3.27075

print(iqr_percent_level1 <- IQR(reasons$percent_level3))

## [1] 12.03445

print(iqr_percent_level1 <- IQR(reasons$percent_none))

## [1] 10.11668

Third, using the shell table code below, enter the statistics you computed, rounding up to one decimal spot.

Type	Mean	s.d.	Median	IQR
Level 1	19.8	5.6	18.4	5.0
Level 2	6.5	2.1	6.7	3.3
Level 3	24.5	7.3	24.3	12.0
None	49.2	10.0	45.4	10.1

Fourth, provide a thorough and substantive interpretation of the tabularized data. What do we learn? How does your analysis square with the criminality narrative.?

Enter the text for Task 1 below. This task is worth 100 points.

Univariate statistics are particularly useful in analyzing the deportation data from the Department of Homeland Security from 2003 to 2024, and offer valuable insights as to the truthfulness of the criminality narrative in the United States. The data above displays the mean, standard deviation, median, and interquartile range for the percentages of deportees across 4 levels of criminal convictions from 2003 to 2024. One thing immediately noticeable in the data is that the mean of the percentage of deportees with no criminal convictions (49.2) is substantially larger than the means of the percentages of deportees with criminal convictions in all three levels. This indicates that the vast majority of immigrants deported from the US between 2003 and 2024 had no standing criminal convictions. This number is even more significant if we take into account the fact that the criminality narrative proposes that immigrants in the US are highly associated with serious crime. The second highest mean for the percentage of deportees is for those with level 3 convictions (low-level convictions), which can be as minor as breaking a traffic rule. Such crimes are far from those associated with immigrants within the criminality narrative. If we were to pool the means for deportees with none and level 3 criminal convictions, the number would be extremely high compared to deportees who had committed serious crimes in the US. With that said, the data above is highly inconsistent with the criminality narrative.

The medians for each of the data observations are respectively similar to the means, meaning that the data is not especially skewed in either direction. This means that there are no extreme outliers in the data. In addition, the standard deviations (which measure the degree of variation in the data around the mean) and the interquartile ranges (which measure the degree of variation in the data around the median) are relatively low given the context of the data set. If the standard deviation or the interquartile range were especially high for the percentage of deportees with no conviction (for example 20), it would be inappropriate to conclude that this percentage was consistent throughout time, but this is not the case, and therefore, it can be deduced that in any given president’s term, the percentage of deportees with each level of conviction in each year are relatively concentrated around the mean. As a result, the mean values in the data above are representative of the consistency of the high percentages of deportees who had no criminal convictions, as well as the low percentages of those with serious criminal convictions, throughout the given time frame (2003-2024). Essentially, the data strongly oppose the claim that immigrants in the US are correlated with serious crime, which is the basis of the criminality narrative.

Task 2

For this part of the project, we will consider deportations in the context of who the President of the United States is. First, in order to do this, we need to create a factor-level In the data set, there is a variable called “President” and records each president as: “Bush1”, “Bush2”, “Obama1”, “Obama2”, “Trump”, “Biden.” In the chunk below, insert code to create the factor-level variable.

#Insert code for task 2.1 in the chunk below

reasons$factor_president <- factor(reasons$President,
                                   levels = c("Bush1", "Bush2", "Obama1",
                                              "Obama2", "Trump", "Biden"),
                                   labels = c("Bush term 1", "Bush term 2", "Obama term 1",
                                              "Obama term 2", "Trump", "Biden"))
print(levels(reasons$factor_president))

## [1] "Bush term 1"  "Bush term 2"  "Obama term 1" "Obama term 2" "Trump"       
## [6] "Biden"

Second, create a boxplot of the variable percent_minor for each Presidential administration. In the boxplot, include a symbol indicating the mean of the variable percent_minor.

#Insert code for task 2.2 in the chunk below
reasons <- reasons %>%
  mutate(percent_minor = ((None + Level3)/(All)) * 100)

plot1 <- ggplot(reasons, aes(x = factor_president, y = percent_minor)) +
  geom_boxplot(fill = c("tomato2", "tomato2", "cornflowerblue",
                        "cornflowerblue", "tomato2", "cornflowerblue")) +
  labs(x = "Presidential Administrations",
       y = "Percentage of Deportees with Minor Infractions",
       title = "Each Presidential Term Yielded Similarly High Percentages of Deportees\nwith Minor Criminal Infractions, Never Dropping Below 50%") +
  stat_summary(fun = mean, geom = "point", shape = 20) +
  scale_y_continuous(limits = c(30, 100),
                     breaks = seq(30, 100, 10)) +
  theme_bw() +
  theme(plot.margin = margin(t = 15, r = 10, b = 10, l = 10)) # This line adds space between the labels and the graph because otherwise the title overlaps with the top line of the graph

plot1 <- ggplotly(plot1)
plot1

Third, provide a thorough and substantive interpretation of the plot. What do we learn about variation across Presidencies regarding deportation of individuals with no, or minor, criminal convictions?

Enter the text for Task 2 below. This task is worth 150 points.

The graph above uses data from the Department of Homeland Security and displays a box plot that represents the percentage of deportees with minor criminal convictions (minor being either no conviction or a level 3 conviction) with respect to each presidential term between 2003 and 2024. The red box plots represent presidents who are in the Republican party and the blue box plots represents presidents in the Democrat party. For one, this plot further emphasizes the findings in the tabularized data from before by demonstrating that the criminality narrative is false. The data is consistently at very high percentages across the presidential terms, and within each presidential term, the interquartile ranges (which are represented by the distance between the top and the bottom of the boxes) are quite narrow. This demonstrates that the data in each presidential term are concentrated around the median. The data points to consistency not just across presidential terms but also within them.

Joe Biden’s presidential term offers a slight exception to this. As visible in the graph, his box plot is more spread out than the others, suggesting more variation in the percentages of deportees with minor criminal convictions across the years in his term. His plot also features a minimum of 55%, which means the year where the percentage of deportees with minor convictions was the lowest in Biden’s term was 55%. This can be attributed to the fact that Biden’s term started right when the COVID-19 virus was at its peak. Because of this, the influx of immigrants to the US was at a low, and as a result, the number of immigrants deported dropped as well (those with minor convictions dropping more significantly than those with serious convictions). Once the effects of COVID dwindled, more immigrants came to the US, more deportations occurred, and with that, the percentage of deportees with minor convictions went up again. Despite all of this, the percentage for deportees with minor criminal convictions was still 55% which represents the majority of deportations.

If the criminality narrative were correct, it would be expected that the data would display much lower percentages for deportees with minor convictions and higher percentages for those with serious convictions. It might also be expected that the box plots would be more vertically elongated, which would suggest that the data is inconsistent and that there is no trend of high percentages of deportees who had minor convictions. However, none of these are the case, and the data provides strong evidence that immigrants in the US have little relationship with crime. It should also be noted that there is no significant variation in the percentages during Republican and Democratic presidential terms. This might suggest that any differences in policy between Democratic and Republican regimes in the US have no effect on the percentage of deportees with minor criminal convictions.

Task 3: Total deportations by Presidential Administration

Using the factor-level variable you created in Task 2, create a boxplot of total deportations (this is the variable called All in the dataset) and, as in Task 2, provide a thorough and substantive interpretation. Are there major differences across presidencies? Across parties? Is there a basis for this claim? This task is worth 150 points.

#Insert code for task 3.1 in the chunk below

plot2 <- ggplot(reasons, aes(x = factor_president, y = All)) +
  geom_boxplot(fill = c("tomato2", "tomato2", "cornflowerblue",
                        "cornflowerblue", "tomato2", "cornflowerblue")) +
  labs(x = "Presidential Administration",
       y = "Total Deportations",
       title = "In Obama's First Term, the Year with the Fewest Deportations Exceeded\nNearly Every Year Under the Other Three Presidents") +
  stat_summary(fun = mean, geom = "point", shape = 20) +
  theme_bw() +
  theme(plot.margin = margin(t = 15, r = 10, b = 10, l = 10)) # Same reasoning as previous graph
  
plot2 <- ggplotly(plot2)
plot2

Enter the text for Task 3 below. This task is worth 150 points.

The graph above takes the same data from the Department of Homeland Security and displays box plots of total deportations that occurred in each presidential term between 2003 and 2024. Obama’s first term stands out in the graph as having the most total deportations of any presidential term between 2003 and 2024. In Obama’s first term, the year with the least deportations (383,847) still exceeded all but one year in any of the other presidential terms (One of the years in Bush’s second term produced 385,711 deportations). Multiple factors influenced this. For one, In Obama’s first term, he largely took advantage of the 287g agreements, which were a part of the Illegal Immigration Reform and Immigrant Responsibility Act of 1996 (IIRIRA). The 287g agreements made it so that local law enforcement could coordinate with immigration enforcement officials, effectively making it easier to identify and deport immigrants in the US (Patler and Jones, 2025). Additionally, during Obama’s first term, the inflow of immigrants to the US was at an all-time high. The more immigrants there were, the more deportations there were bound to be. In Obama’s second term, he realized that the use of the 287g agreements was unpopular among his supporters, so he stopped using them (Professor Jones in lecture). This explains why the box plot representing Obama’s second term has lower total deportations and is more spread out (deportations decreased over the course of the term, leading to a higher interquartile range). In the box plot representing Bush’s second term, there is a lot of variation (large interquartile range). This is because Bush started using the 287g agreements mid-way through his second term, leading to a large increase in total deportations (Jones in lecture).

Throughout Trump’s political career, he has been largely critical of immigrants, often using harsh language and associating immigrants with terrorism and serious crimes (Jones, Sherman, Rojas, Hosek, Vannette, Rocha, García-Ponce, Pantoja, García-Amador, 2020). It’s important to note that the insinuation of immigrants by powerful political actors such as Donald Trump fuels the criminality narrative in the United States and the stigma surrounding immigrants. The idea that immigrants are synonymous with crime has been instilled into much of the American population, even though this idea is false, as previously established. In Trump’s 2016 presidential campaign, he promised a mass deportation if he were to enter office, and, while the data shows that there was a substantial number of immigrants deported during his first term as president (931,878). That number is relatively small compared to the numbers produced during other presidential terms (in which deportation wasn’t as emphasized, nor were immigrants as criminalized) in this time frame. This may have been simply because there wasn’t as much immigration to the US during Trump’s first term, and therefore a smaller pool of immigrants to deport. This decrease in immigration may be in part because of the overlap between the end of Trump’s first term and the beginning of the COVID-19 pandemic affecting the US. One implication that can be drawn from the data is that the total number of deportations at a given time might be largely impacted simply by the influx of immigrants at said time. Because the higher the density of immigrants is, the more deportations there will be. Even though deportation of immigrants has historically been further emphasized by the Republican party than by the Democratic party, between 2003 and 2024, there were more total deportations under Democratic presidents (3,128,036) than there were under Republican presidents (2,349,675). This is evidence to support the previous implication.

The box plot that represents Biden’s presidential term has the lowest total deportations of the data set. This can also be attributed to the immigrant influx because COVID-19 was peaking during his term, and immigration to the US was much lower during this time. In the context of the criminality narrative in the US, this graph displays the exceedingly large number of deportations that occurred between 2003 and 2024. As previously established in this project, the percentage of deportees with no criminal conviction or even a minor conviction remained in the majority throughout each presidents term between 2003 and 2024. When you compare that to the total number of immigrants who were deported during this time frame, it puts into perspective the extent to which innocent immigrants in the US are exploited and tyrannized solely because of a criminality narrative that is false.

Project 1: The criminality narrative part 2

Miguel Greenberg

October 15, 2025