I have compiled data of deportations by fiscal year from 1948 to 2022. Your job is to use some of the skills we are learning in this class to better understand these data. As such, I will be asking you to engage in a number of tasks requiring the use of \(t\)-tests and simple regression. Your grade will be based on analysis and presentation of the data. This assignment is worth 600 points. It will be due May 30 by 11:59 PM. You need to submit an HTML document or a document that includes code and viewable output.
This chunk reads in the data on deportations from 1948 to 2022.
urlfile="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/ICE_removals_1948.csv"
remove.1<-read_csv(url(urlfile))
summary(remove.1)
## Year Apprehensions President Party
## Min. :1948 Min. : 45336 Length:75 Min. :0.0000
## 1st Qu.:1966 1st Qu.: 444233 Class :character 1st Qu.:0.0000
## Median :1985 Median : 889212 Mode :character Median :0.0000
## Mean :1985 Mean : 852071 Mean :0.4667
## 3rd Qu.:2004 3rd Qu.:1194182 3rd Qu.:1.0000
## Max. :2022 Max. :2584220 Max. :1.0000
##
## PCGdp Decade Deportations VR
## Min. : 1833 Min. :1940 Min. : 5989 Min. : 52383
## 1st Qu.: 4231 1st Qu.:1960 1st Qu.: 17363 1st Qu.: 174562
## Median :18237 Median :1980 Median : 29277 Median : 673169
## Mean :24128 Mean :1978 Mean :109287 Mean : 648029
## 3rd Qu.:40607 3rd Qu.:2000 3rd Qu.:188747 3rd Qu.:1017325
## Max. :77247 Max. :2010 Max. :432334 Max. :1675876
## NA's :2
## Administrative EnforcementReturns Criminal Noncriminal
## Min. : 15072 Min. : 49664 Min. : 61117 Min. : 24666
## 1st Qu.: 44947 1st Qu.: 81191 1st Qu.:114680 1st Qu.:161440
## Median : 60150 Median : 86801 Median :135509 Median :190058
## Mean : 70965 Mean :159377 Mean :139193 Mean :168409
## 3rd Qu.: 85478 3rd Qu.:171374 3rd Qu.:176722 3rd Qu.:215554
## Max. :180266 Max. :523153 Max. :200039 Max. :233846
## NA's :61 NA's :61 NA's :63 NA's :63
## Title 42 Foreign Born Naturalized Noncitizen
## Min. : 206770 Min. : 9619300 Min. :14967828 Min. :20722014
## 1st Qu.: 638922 1st Qu.: 9738100 1st Qu.:17003818 1st Qu.:21671389
## Median :1071074 Median :19767300 Median :19639724 Median :21965584
## Mean : 793937 Mean :23434849 Mean :19752182 Mean :21939190
## 3rd Qu.:1087520 3rd Qu.:36154329 3rd Qu.:22459486 3rd Qu.:22364709
## Max. :1103966 Max. :46182177 Max. :24509131 Max. :22593269
## NA's :72 NA's :7 NA's :57 NA's :57
## Unauthorized population US Population App_lagged
## Min. : 3500000 Min. :146631302 Min. : 45336
## 1st Qu.:10237500 1st Qu.:197636197 1st Qu.: 382740
## Median :10850000 Median :237923795 Median : 885587
## Mean :10168182 Mean :241806480 Mean : 820197
## 3rd Qu.:11375000 3rd Qu.:291456616 3rd Qu.:1183165
## Max. :12200000 Max. :333287557 Max. :1865379
## NA's :53
Below is code to produce a barplot of deportations over the time frame. I want you to provide a professional-grade interpretation of the plot you are seeing. This task is worth 100 points.
df_melted <- aggregate(data = remove.1, Deportations ~ Year, mean)
names(df_melted) <- c("Year", "mean_Deportations")
ggplot(df_melted, aes(x = Year, y = mean_Deportations, width=1)) +
geom_bar(stat = "identity") +
scale_x_continuous(n.breaks = 10) +
labs(title="Figure 1: Deportations by year (FY 1948-2022)",
y="Number of deportations", x="Fiscal year",
color="") +
theme_bw() +
theme(#panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text.y = element_text(size=9),
axis.text.x = element_text(size=9),
#axis.title.y=element_blank(),
#axis.title.x=element_blank(),
#legend.title=element_blank(),
#legend.position=c(.01, .77),
#legend.justification=c("left", "bottom"),
#legend.title = element_text(size = 5),
#legend.text = element_text(size = 5),
#legend.margin=margin(0,0,0,0),
#legend.box.margin=margin(-1,-1,-1,-1),
plot.title = element_text(size=12))
The figure above demonstrates the number of deportations per year. For the first few decades, from 1940 to 1980, the deportations remained relatively low and stable, and rarely passed 100,000 in a year. However, beginning in the 19990s we see the number of deportations increase drastically and climb rapidly until it peaks around the 2010s.At this point annual deportaitons surpassed 400,000, then began to gradually decline, with a sharp drop in 2020, which is likely on the account of the COVID-19 pandemic. The large increase in the 1990s corresponds to major immigration policy changes made around that time border enforcement authority was greatly broadened and could be attributed to the spike. The figure demonstrates the trend of w deportations became increasingly instutionalized and have shifted from being a more rare consequence or legal event, into a more routine enforcement of authority and demonstration of power and inequality in the United States.
Create a factor-level variable for Party of the President labeled “Republican” for Republicans and “Democrat” for Democrats. Following this, compute a two-group difference-in-means test assessing the following research question: Are the number of Deportations under a Democratic Presidency significantly different from Deportations under a Republican Presidency? In a paragraph, report results from the analysis using substantive language that could be understandable to a lay-person. This task is worth 100 points.
#Insert code to do this task in this chunk
PresidentParty<-factor(remove.1$Party,levels=c(0,1),labels = c("Republican","Democrat"),ordered=TRUE)
levels(PresidentParty)
## [1] "Republican" "Democrat"
t.test(remove.1$Deportations~PresidentParty)
##
## Welch Two Sample t-test
##
## data: remove.1$Deportations by PresidentParty
## t = -0.97685, df = 64.521, p-value = 0.3323
## alternative hypothesis: true difference in means between group Republican and group Democrat is not equal to 0
## 95 percent confidence interval:
## -94518.08 32432.22
## sample estimates:
## mean in group Republican mean in group Democrat
## 94800.32 125843.26
Based on this data, during the years we had a democrat president, the number of deportations was about 30,000 more than the years where we had a republican president. However, there was a very large variation year-to-year in the number of deportations for any parties presidency. As a result of this, we do not have sufficient evidence to reject the null hypothesis that the number of deportations were the same under both parties. Essentially, because the data is so uncertain, we do not have enough evidence to prove that one party deported significantly more than another party, and must support the idea that they had an equal impact on deportation.
Estimate a bivariate regression model of the form: \(\hat{Deportations}=\beta_0 + \beta_1*Party~of~President\) and report the results from the regression model by summarizing the regression object. Based on the table of results, what would be the predicted number of deportations for Republicans and for Democrats. What does \(\beta_0\) and \(\beta_1\) tell us? Based on the model, is there evidence to reject the null hypothsis that \(\overline{D}_{Dem}=\overline{D}_{Rep}\)? This task is worth 100 points. Before doing this, you should read “The US Deportation System: History, Impacts, and New Empirical Research” by Caitlin Patler and Bradford Jones.
#Insert code to do this task in this chunk
reg1<- lm(Deportations~PresidentParty,data=remove.1)
summary(reg1)
##
## Call:
## lm(formula = Deportations ~ PresidentParty, data = remove.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -118080 -87243 -71318 82297 306491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 110322 15643 7.052 0.000000000833 ***
## PresidentParty.L 21951 22123 0.992 0.324
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 135200 on 73 degrees of freedom
## Multiple R-squared: 0.01331, Adjusted R-squared: -0.0002093
## F-statistic: 0.9845 on 1 and 73 DF, p-value: 0.3244
This regression model estimates the correlation between the presidents poltical party affiliation, and the number of deportations in the united states of america every year. The results are that under Republican presidents, the average number of deportations is approximately 110,322. However, under Democratic presidents the model estimates an increase of about 22,000 deportations, bringing the estimated Democratic average to around 132,000. The coefficient of the Democratic presidents is not statistically significant(p=.324), so we cannot be confident that this difference is significant, and not due to chance. In addition to this, the model shows that there is very little variation in the amount of deportations, meaning that the difference could be attributed to reasons other than political party. Based on these findings, we fail to reject the null hypothesis that there is no difference in deportation levels between Democratic and Republican presidencies.
Using \(\textrm{plot_model}\) (from the \(\textrm{sjPlot}\)), provide a professional-grade plot of the regression model along with an interpretation of the plot. Which hypothesis is the plot most consistent with? This task is worth 100 points.
#Insert code to do this task in this chunk
plot_model(reg1, type = "est", show.values = TRUE, value.offset = 0.3, title = "Effect of President's Party on Deportations") +
labs(
x= "Democratic President - Republican President",
y="Estimated effect on deportations/year"
) +
theme(axis.text.y= element_blank())
The plot shows the estimated difference in deportations with Democratic versus Republican presidents. The point estimate shows that, on average, there were 21,951 more deportations per year under Democratic presidents than under Republican ones. However, the confidence interval for this data is wide, and crosses 0, meaning that our data is not statistically significant.Essentially, although visually, there appears to be an increase under Democratic presidents, this data is not reliable enough to make conclusions with any solid foundations, meaning our data is consistent with the null hypthesis, that there is no substantial difference in deportation under different party presidents.
In the Patler and Jones article I asked you to read, they point out that several policies were enacted that made deportations easier to carry out. Among one of the most important policy was the Illegal Immigration Reform and Immigrant Responsibility Act, 1996. One prediction might be that after changes in the 1990 (like the IIRIA), we should observe and increase in deportations starting in the 1990s. To assess this claim, do the following:
Create a well-labled factor-level variable denoting each decade starting with the 1950s (1951-1960) going up to the 2010s (2011-2020) and then estimate a regression model treating the dependent variable (i.e the number of deportations) as a function of the decade-factor level variable. Following this plot the regression model using $. Provide a thorough interpretation of the regression model with a focus on the claims made in the paragraph above. Are the results consistent with the basic claim made? This task is worth 100 points.
#Insert code to do this task in this chunk
remove.1 <- remove.1 %>%
filter(Year >= 1951 & Year <= 2020) %>%
mutate(Decade = case_when(
Year >= 1951 & Year <= 1960 ~ "1950s",
Year >= 1961 & Year <= 1970 ~ "1960s",
Year >= 1971 & Year <= 1980 ~ "1970s",
Year >= 1981 & Year <= 1990 ~ "1980s",
Year >= 1991 & Year <= 2000 ~ "1990s",
Year >= 2001 & Year <= 2010 ~ "2000s",
Year >= 2011 & Year <= 2020 ~ "2010s"
))
remove.1$Decade <- factor(remove.1$Decade,
levels = c("1950s", "1960s", "1970s", "1980s", "1990s", "2000s", "2010s"),
ordered = FALSE)
reg3 <- lm(Deportations ~ Decade, data = remove.1)
summary(reg3)
##
## Call:
## lm(formula = Deportations ~ Decade, data = remove.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -112306 -8002 -742 7322 104976
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15047 14387 1.046 0.299619
## Decade1960s -4927 20347 -0.242 0.809460
## Decade1970s 8974 20347 0.441 0.660666
## Decade1980s 8236 20347 0.405 0.687015
## Decade1990s 79603 20347 3.912 0.000227 ***
## Decade2000s 262426 20347 12.898 < 0.0000000000000002 ***
## Decade2010s 334623 20347 16.446 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 45500 on 63 degrees of freedom
## Multiple R-squared: 0.9016, Adjusted R-squared: 0.8923
## F-statistic: 96.25 on 6 and 63 DF, p-value: < 0.00000000000000022
plot_model(reg3, type = "est", show.values = TRUE, value.offset = 0.3,
title = "Effect of Decade on Deportations")
This model analyzes how average deportations per year have changed on a decade to decade basis, from 1950-2010. With the baseline being set at the average deportations in 1950, indicated by the intercept of 15047 per year.The coefficient in the model demonstrates how many more, or fewer deportations occured on average per year, in each decade time period as compared to the 1950s. The model shows a clear and substantial increase in deportations each decade, especially starting in the 1990s, which aligns with the claims and evidence presented within the Patler and Jones article.The coefficients for the decades of 1960s, 1970s, and the 1980s were not significantly different that the 1950’s since their 95% confidence intervals all included zero in them. This indicates that the rate of deportation per year in these decades were not significantly different that the rate in the 1950s. The coefficient in the 1990’s, however, is a large jump from the 1950s and was statistically significant since its 95% confidence interval excluded zero. It indicates tha the number of deportations increased significantly in the 1990s’s compared to the 1950s. The coefficients in 2000s and 2010s are even larger, which could reflect the lengthy impact of policies such as the Illegal Immigrant Reform and Immigrant Responsibility act in 1996 in increasing deportations compared to the 1950s.
Create a dummy variable (or binary variable) coded 1 if the year is 1996 or later and 0 otherwise. Estimate a regression model treating deportations as a function of this dummy variable. Plot the regression model and provide a thorough substantive interpretation of the regression results. To start, what would be the null and alternative hypotheses for \(\beta_1\) given the research question? Suggested ways to interpret this would be to report the predicted number of deportations in the later period compared to the earlier period as well as the discussing the coefficient showing the difference. You should tie your interpretation back to the regression estimates. This task is worth 100 points.
#Insert code to do this task in this chunk
remove.1$Post1996 <- ifelse(remove.1$Year >= 1996, 1, 0)
reg4 <- lm(Deportations ~ Post1996, data = remove.1)
summary(reg4)
##
## Call:
## lm(formula = Deportations ~ Post1996, data = remove.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -210398 -12302 -1951 13213 152256
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20905 8972 2.33 0.0228 *
## Post1996 259173 15013 17.26 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 60190 on 68 degrees of freedom
## Multiple R-squared: 0.8142, Adjusted R-squared: 0.8115
## F-statistic: 298 on 1 and 68 DF, p-value: < 0.00000000000000022
plot_model(reg4, type = "est", show.values = TRUE, value.offset = 0.3,
title = "Deportations Before and After 1996")
Null Hypothesis: There is no difference in average deportations before and after 1996. Alternative Hypothesis: There is a difference in average deportations before and after 1996.
The model above shows the deportations as a function of whether the year was before or after 1996, when the IIRIRA( Illegal Immigration Reform and Immigrant Responsibility Act) was passed. According to the regression output,the intercept of the model or the average number of deportations per year before 1996, was around 20,000.The coefficient for our variable Post1996 however, is 259,173. This means on average, after 1996, there were 259,173 more deportations per year than before 1996. Putting these values together we can conclude that annually after 1996, the United States deported on average around 280,000 people per year, which is backed by the p-value for this coeeficient. Our p-value is shown to be < 0.001, meaning this information is highly statistically significant. This information provides support for our alternative hypothesis as we reject the null. It also demonstrates the major impact that can be attributed to the IIRIRA policy, as well as the claims made in the Patler and Jones article, where it was stated that the IIRIRA fundamentally transformed the deportation system in America.However, as a caveat, in order to fully attribute this change to this policy, we must consider all external variables that may have occured during the time frame of before and after 1996.