This project will allow you to access data on deportations in the United States from 2003 to 2024. In your analysis, you will be able to assess several claims that have been made regarding deportation. Are common narratives about deportation sustainable given the observed data? This is what social scientists do: we make, or attempt to make, evidence-based claims. The tasks I am asking you to do here are portable to any (or most any) data set you might encounter whether it is in international relations, comparative politics, economics, sociology, and so forth. The “POL 51” aspect of this assignment is to give you hands-on experience in interpreting plots, univariate statistics, and rudimentary hypothesis testing. In class, we will cover this extensively. I have assigned an article by Patler and Jones and it is posted on Canvas. It is expected you read this article in advance of writing up your responses.
Project 1 will roll out in two parts. This is the second and it is worth 400 points. You will submit the HTML file you generate on Canvas by October 31.
A common trope in the immigration debate is that undocumented immigrants commit, at high rates, violent crimes. Therefore, the supposition is that migrants who are deported are migrants who have committed serious criminal infractions. This idea is prevalent in political rhetoric surrounding the issue of deportation. But is the claim consistent with the actual data?
Part 1 of this assignment is asking you to analyze real-world data on deportations in the United States between the years 2003 and 2024. The data you access records annual ICE removals (deportations) based on what ICE records as the “Most Serious Criminal Conviction” for someone who is deported. The following information is from TRAC (Transactional Records Access Clearinghouse) and describes what the classification levels mean:
“Seriousness Level of MSCC Conviction. ICE classifies National Crime Information Center (NCIC) offense codes into three seriousness levels. The most serious (Level 1) covers what ICE considers to be”aggravated felonies.” Level 2 offenses cover other felonies, while Level 3 offenses are misdemeanors, including petty and other minor violations of the law. TRAC uses ICE’s “business rules” to group recorded NCIC offense codes into these three seriousness levels.”
Essentially what this loosely means is that “Level 1” convictions are the most serious and “Level 3” convictions are generally minor legal infractions.
This chunk of code will access the data set.
reasons="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/ICE_reasonforremoval.csv"
reasons<-read_csv(url(reasons))
summary(reasons)
## Year President All None
## Min. :2003 Length:22 Min. : 56882 Min. : 19495
## 1st Qu.:2008 Class :character 1st Qu.:178148 1st Qu.: 85446
## Median :2014 Mode :character Median :238765 Median :106426
## Mean :2014 Mean :248987 Mean :122287
## 3rd Qu.:2019 3rd Qu.:356423 3rd Qu.:165287
## Max. :2024 Max. :407821 Max. :253342
## Level1 Level2 Level3 Undocumented
## Min. : 9819 Min. : 3846 Min. : 11045 Min. :10100000
## 1st Qu.:38484 1st Qu.: 9056 1st Qu.: 34978 1st Qu.:10500000
## Median :46743 Median :17480 Median : 63186 Median :11050000
## Mean :46534 Mean :15601 Mean : 64541 Mean :11015455
## 3rd Qu.:57148 3rd Qu.:20342 3rd Qu.: 90950 3rd Qu.:11375000
## Max. :75590 Max. :29436 Max. :130251 Max. :12200000
## ER_Non
## Min. : 4018
## 1st Qu.:28563
## Median :41647
## Mean :38980
## 3rd Qu.:50230
## Max. :71686
For this task, you are going to summarize the data by way of analyzing descriptive, univariate statistics (i.e. mean, s.d., median, iqr). To do this:
First, create four variables recording the percentage of all deportations that are: Level 1, Level 2, Level 3, and None. Put your code in the chunk below to do this.
#Insert code for task 1.1 here
reasons <- reasons %>%
mutate(percent_level1 = (Level1/All) * 100,
percent_level2 = (Level2/All) * 100,
percent_level3 = (Level3/All) * 100,
percent_none = (None/All) * 100)
reasons
## # A tibble: 22 × 13
## Year President All None Level1 Level2 Level3 Undocumented ER_Non
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2024 Biden 102204 77494 9819 3846 11045 11200000 54870
## 2 2023 Biden 145578 88697 23936 9044 23901 11000000 54933
## 3 2022 Biden 73432 35109 18895 6433 12995 11000000 18874
## 4 2021 Biden 56882 19495 19802 5788 11797 10500000 7607
## 5 2020 Trump 180313 78725 29956 18247 53568 10350000 31520
## 6 2019 Trump 269823 117858 37720 21462 92783 10200000 51387
## 7 2018 Trump 257239 110178 40778 20132 85451 10500000 41665
## 8 2017 Trump 224503 97850 43247 18625 64781 10500000 42504
## 9 2016 Obama2 241258 102673 45149 20352 73084 10700000 46759
## 10 2015 Obama2 236272 96266 46758 20314 72934 11000000 45743
## # ℹ 12 more rows
## # ℹ 4 more variables: percent_level1 <dbl>, percent_level2 <dbl>,
## # percent_level3 <dbl>, percent_none <dbl>
Second, compute the mean, standard deviation, median, and iqr for each of the newly created variables (this means you will have \(4 \times 4=16\) statistics). Put your code to do this in the chunk below.
#Insert code for task 1.2 here
mean_percent_level1 <- mean(reasons$percent_level1)
mean_percent_level2 <- mean(reasons$percent_level2)
mean_percent_level3 <- mean(reasons$percent_level3)
mean_percent_none <- mean(reasons$percent_none)
sd_percent_level1 <- sd(reasons$percent_level1)
sd_percent_level2 <- sd(reasons$percent_level2)
sd_percent_level3 <- sd(reasons$percent_level3)
sd_percent_none <- sd(reasons$percent_none)
median_percent_level1 <- median(reasons$percent_level1)
median_percent_level2 <- median(reasons$percent_level2)
median_percent_level3 <- median(reasons$percent_level3)
median_percent_none <- median(reasons$percent_none)
iqr_percent_level1 <- IQR(reasons$percent_level1)
iqr_percent_level2 <- IQR(reasons$percent_level2)
iqr_percent_level3 <- IQR(reasons$percent_level3)
iqr_percent_none <- IQR(reasons$percent_none)
print(mean_percent_level1 <- mean(reasons$percent_level1))
## [1] 19.77446
print(mean_percent_level2 <- mean(reasons$percent_level2))
## [1] 6.53759
print(mean_percent_level3 <- mean(reasons$percent_level3))
## [1] 24.50703
print(mean_percent_none <- mean(reasons$percent_none))
## [1] 49.17316
print(sd_percent_level1 <- sd(reasons$percent_level1))
## [1] 5.625592
print(sd_percent_level1 <- sd(reasons$percent_level2))
## [1] 2.10969
print(sd_percent_level1 <- sd(reasons$percent_level3))
## [1] 7.295998
print(sd_percent_level1 <- sd(reasons$percent_none))
## [1] 9.966849
print(median_percent_level1 <- median(reasons$percent_level1))
## [1] 18.3565
print(median_percent_level2 <- median(reasons$percent_level2))
## [1] 6.715175
print(median_percent_level3 <- median(reasons$percent_level3))
## [1] 24.30652
print(median_percent_none <- median(reasons$percent_none))
## [1] 45.3763
print(iqr_percent_level1 <- IQR(reasons$percent_level1))
## [1] 4.999591
print(iqr_percent_level1 <- IQR(reasons$percent_level2))
## [1] 3.27075
print(iqr_percent_level1 <- IQR(reasons$percent_level3))
## [1] 12.03445
print(iqr_percent_level1 <- IQR(reasons$percent_none))
## [1] 10.11668
Third, using the shell table code below, enter the statistics you computed, rounding up to one decimal spot.
| Type | Mean | s.d. | Median | IQR |
|---|---|---|---|---|
| Level 1 | 19.8 | 5.6 | 18.4 | 5.0 |
| Level 2 | 6.5 | 2.1 | 6.7 | 3.3 |
| Level 3 | 24.5 | 7.3 | 24.3 | 12.0 |
| None | 49.2 | 10.0 | 45.4 | 10.1 |
Fourth, provide a thorough and substantive interpretation of the tabularized data. What do we learn? How does your analysis square with the criminality narrative.?
Univariate statistics are particularly useful in analyzing the deportation data from the Department of Homeland Security from 2003 to 2024, and offer valuable insights as to the truthfulness of the criminality narrative in the United States. The data above displays the mean, standard deviation, median, and interquartile range for the percentages of deportees across 4 levels of criminal convictions from 2003 to 2024. One thing immediately noticeable in the data is that the mean of the percentage of deportees with no criminal convictions (49.2) is substantially larger than the means of the percentages of deportees with criminal convictions in all three levels. This indicates that the vast majority of immigrants deported from the US between 2003 and 2024 had no standing criminal convictions. This number is even more significant if we take into account the fact that the criminality narrative proposes that immigrants in the US are highly associated with serious crime. The second highest mean for the percentage of deportees is for those with level 3 convictions (low-level convictions), which can be as minor as breaking a traffic rule. Such crimes are far from those associated with immigrants within the criminality narrative. If we were to pool the means for deportees with none and level 3 criminal convictions, the number would be extremely high compared to deportees who had committed serious crimes in the US. With that said, the data above is highly inconsistent with the criminality narrative.
The medians for each of the data observations are respectively similar to the means, meaning that the data is not especially skewed in either direction. This means that there are no extreme outliers in the data. In addition, the standard deviations (which measure the degree of variation in the data around the mean) and the interquartile ranges (which measure the degree of variation in the data around the median) are relatively low given the context of the data set. If the standard deviation or the interquartile range were especially high for the percentage of deportees with no conviction (for example 20), it would be inappropriate to conclude that this percentage was consistent throughout time, but this is not the case, and therefore, it can be deduced that in any given president’s term, the percentage of deportees with each level of conviction in each year are relatively concentrated around the mean. As a result, the mean values in the data above are representative of the consistency of the high percentages of deportees who had no criminal convictions, as well as the low percentages of those with serious criminal convictions, throughout the given time frame (2003-2024). Essentially, the data strongly oppose the claim that immigrants in the US are correlated with serious crime, which is the basis of the criminality narrative.
For this part of the project, we will consider deportations in the context of who the President of the United States is. First, in order to do this, we need to create a factor-level In the data set, there is a variable called “President” and records each president as: “Bush1”, “Bush2”, “Obama1”, “Obama2”, “Trump”, “Biden.” In the chunk below, insert code to create the factor-level variable.
#Insert code for task 2.1 in the chunk below
reasons$factor_president <- factor(reasons$President,
levels = c("Bush1", "Bush2", "Obama1",
"Obama2", "Trump", "Biden"),
labels = c("Bush term 1", "Bush term 2", "Obama term 1",
"Obama term 2", "Trump", "Biden"))
print(levels(reasons$factor_president))
## [1] "Bush term 1" "Bush term 2" "Obama term 1" "Obama term 2" "Trump"
## [6] "Biden"
Second, create a boxplot of the variable percent_minor for each Presidential administration. In the boxplot, include a symbol indicating the mean of the variable percent_minor.
#Insert code for task 2.2 in the chunk below
reasons <- reasons %>%
mutate(percent_minor = ((None + Level3)/(All)) * 100)
plot1 <- ggplot(reasons, aes(x = factor_president, y = percent_minor)) +
geom_boxplot(fill = c("tomato2", "tomato2", "cornflowerblue",
"cornflowerblue", "tomato2", "cornflowerblue")) +
labs(x = "Presidential Administrations",
y = "Percentage of Deportees with Minor Infractions",
title = "Each Presidential Term Yielded Similarly High Percentages of Deportees\nwith Minor Criminal Infractions, Never Dropping Below 50%") +
stat_summary(fun = mean, geom = "point", shape = 20) +
scale_y_continuous(limits = c(30, 100),
breaks = seq(30, 100, 10)) +
theme_bw() +
theme(plot.margin = margin(t = 15, r = 10, b = 10, l = 10)) # This line adds space between the labels and the graph because otherwise the title overlaps with the top line of the graph
plot1 <- ggplotly(plot1)
plot1
Third, provide a thorough and substantive interpretation of the plot. What do we learn about variation across Presidencies regarding deportation of individuals with no, or minor, criminal convictions?
The graph above uses data from the Department of Homeland Security and displays a box plot that represents the percentage of deportees with minor criminal convictions (minor being either no conviction or a level 3 conviction) with respect to each presidential term between 2003 and 2024. The red box plots represent presidents who are in the Republican party and the blue box plots represents presidents in the Democrat party. For one, this plot further emphasizes the findings in the tabularized data from before by demonstrating that the criminality narrative is false. The data is consistently at very high percentages across the presidential terms, and within each presidential term, the interquartile ranges (which are represented by the distance between the top and the bottom of the boxes) are quite narrow. This demonstrates that the data in each presidential term are concentrated around the median. The data points to consistency not just across presidential terms but also within them.
Joe Biden’s presidential term offers a slight exception to this. As visible in the graph, his box plot is more spread out than the others, suggesting more variation in the percentages of deportees with minor criminal convictions across the years in his term. His plot also features a minimum of 55%, which means the year where the percentage of deportees with minor convictions was the lowest in Biden’s term was 55%. This can be attributed to the fact that Biden’s term started right when the COVID-19 virus was at its peak. Because of this, the influx of immigrants to the US was at a low, and as a result, the number of immigrants deported dropped as well (those with minor convictions dropping more significantly than those with serious convictions). Once the effects of COVID dwindled, more immigrants came to the US, more deportations occurred, and with that, the percentage of deportees with minor convictions went up again. Despite all of this, the percentage for deportees with minor criminal convictions was still 55% which represents the majority of deportations.
If the criminality narrative were correct, it would be expected that the data would display much lower percentages for deportees with minor convictions and higher percentages for those with serious convictions. It might also be expected that the box plots would be more vertically elongated, which would suggest that the data is inconsistent and that there is no trend of high percentages of deportees who had minor convictions. However, none of these are the case, and the data provides strong evidence that immigrants in the US have little relationship with crime. It should also be noted that there is no significant variation in the percentages during Republican and Democratic presidential terms. This might suggest that any differences in policy between Democratic and Republican regimes in the US have no effect on the percentage of deportees with minor criminal convictions.
Using the factor-level variable you created in Task 2, create a boxplot of total deportations (this is the variable called All in the dataset) and, as in Task 2, provide a thorough and substantive interpretation. Are there major differences across presidencies? Across parties? Is there a basis for this claim? This task is worth 150 points.
#Insert code for task 3.1 in the chunk below
plot2 <- ggplot(reasons, aes(x = factor_president, y = All)) +
geom_boxplot(fill = c("tomato2", "tomato2", "cornflowerblue",
"cornflowerblue", "tomato2", "cornflowerblue")) +
labs(x = "Presidential Administration",
y = "Total Deportations",
title = "In Obama's First Term, the Year with the Fewest Deportations Exceeded\nNearly Every Year Under the Other Three Presidents") +
stat_summary(fun = mean, geom = "point", shape = 20) +
theme_bw() +
theme(plot.margin = margin(t = 15, r = 10, b = 10, l = 10)) # Same reasoning as previous graph
plot2 <- ggplotly(plot2)
plot2
The graph above takes the same data from the Department of Homeland Security and displays box plots of total deportations that occurred in each presidential term between 2003 and 2024. Obama’s first term stands out in the graph as having the most total deportations of any presidential term between 2003 and 2024. In Obama’s first term, the year with the least deportations (383,847) still exceeded all but one year in any of the other presidential terms (One of the years in Bush’s second term produced 385,711 deportations). Multiple factors influenced this. For one, In Obama’s first term, he largely took advantage of the 287g agreements, which were a part of the Illegal Immigration Reform and Immigrant Responsibility Act of 1996 (IIRIRA). The 287g agreements made it so that local law enforcement could coordinate with immigration enforcement officials, effectively making it easier to identify and deport immigrants in the US (Patler and Jones, 2025). Additionally, during Obama’s first term, the inflow of immigrants to the US was at an all-time high. The more immigrants there were, the more deportations there were bound to be. In Obama’s second term, he realized that the use of the 287g agreements was unpopular among his supporters, so he stopped using them (Professor Jones in lecture). This explains why the box plot representing Obama’s second term has lower total deportations and is more spread out (deportations decreased over the course of the term, leading to a higher interquartile range). In the box plot representing Bush’s second term, there is a lot of variation (large interquartile range). This is because Bush started using the 287g agreements mid-way through his second term, leading to a large increase in total deportations (Jones in lecture).
Throughout Trump’s political career, he has been largely critical of immigrants, often using harsh language and associating immigrants with terrorism and serious crimes (Jones, Sherman, Rojas, Hosek, Vannette, Rocha, García-Ponce, Pantoja, García-Amador, 2020). It’s important to note that the insinuation of immigrants by powerful political actors such as Donald Trump fuels the criminality narrative in the United States and the stigma surrounding immigrants. The idea that immigrants are synonymous with crime has been instilled into much of the American population, even though this idea is false, as previously established. In Trump’s 2016 presidential campaign, he promised a mass deportation if he were to enter office, and, while the data shows that there was a substantial number of immigrants deported during his first term as president (931,878). That number is relatively small compared to the numbers produced during other presidential terms (in which deportation wasn’t as emphasized, nor were immigrants as criminalized) in this time frame. This may have been simply because there wasn’t as much immigration to the US during Trump’s first term, and therefore a smaller pool of immigrants to deport. This decrease in immigration may be in part because of the overlap between the end of Trump’s first term and the beginning of the COVID-19 pandemic affecting the US. One implication that can be drawn from the data is that the total number of deportations at a given time might be largely impacted simply by the influx of immigrants at said time. Because the higher the density of immigrants is, the more deportations there will be. Even though deportation of immigrants has historically been further emphasized by the Republican party than by the Democratic party, between 2003 and 2024, there were more total deportations under Democratic presidents (3,128,036) than there were under Republican presidents (2,349,675). This is evidence to support the previous implication.
The box plot that represents Biden’s presidential term has the lowest total deportations of the data set. This can also be attributed to the immigrant influx because COVID-19 was peaking during his term, and immigration to the US was much lower during this time. In the context of the criminality narrative in the US, this graph displays the exceedingly large number of deportations that occurred between 2003 and 2024. As previously established in this project, the percentage of deportees with no criminal conviction or even a minor conviction remained in the majority throughout each presidents term between 2003 and 2024. When you compare that to the total number of immigrants who were deported during this time frame, it puts into perspective the extent to which innocent immigrants in the US are exploited and tyrannized solely because of a criminality narrative that is false.