Overview

I have compiled data of deportations by fiscal year from 1948 to 2022. Your job is to use some of the skills we are learning in this class to better understand these data. As such, I will be asking you to engage in a number of tasks requiring the use of \(t\)-tests and simple regression. Your grade will be based on analysis and presentation of the data. This assignment is worth 600 points. It will be due May 30 by 11:59 PM. You need to submit an HTML document or a document that includes code and viewable output.

Reading in the deportation data

This chunk reads in the data on deportations from 1948 to 2022.

urlfile="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/ICE_removals_1948.csv"

remove.1<-read_csv(url(urlfile))

summary(remove.1)
##       Year      Apprehensions      President             Party       
##  Min.   :1948   Min.   :  45336   Length:75          Min.   :0.0000  
##  1st Qu.:1966   1st Qu.: 444233   Class :character   1st Qu.:0.0000  
##  Median :1985   Median : 889212   Mode  :character   Median :0.0000  
##  Mean   :1985   Mean   : 852071                      Mean   :0.4667  
##  3rd Qu.:2004   3rd Qu.:1194182                      3rd Qu.:1.0000  
##  Max.   :2022   Max.   :2584220                      Max.   :1.0000  
##                                                                      
##      PCGdp           Decade      Deportations          VR         
##  Min.   : 1833   Min.   :1940   Min.   :  5989   Min.   :  52383  
##  1st Qu.: 4231   1st Qu.:1960   1st Qu.: 17363   1st Qu.: 174562  
##  Median :18237   Median :1980   Median : 29277   Median : 673169  
##  Mean   :24128   Mean   :1978   Mean   :109287   Mean   : 648029  
##  3rd Qu.:40607   3rd Qu.:2000   3rd Qu.:188747   3rd Qu.:1017325  
##  Max.   :77247   Max.   :2010   Max.   :432334   Max.   :1675876  
##                  NA's   :2                                        
##  Administrative   EnforcementReturns    Criminal       Noncriminal    
##  Min.   : 15072   Min.   : 49664     Min.   : 61117   Min.   : 24666  
##  1st Qu.: 44947   1st Qu.: 81191     1st Qu.:114680   1st Qu.:161440  
##  Median : 60150   Median : 86801     Median :135509   Median :190058  
##  Mean   : 70965   Mean   :159377     Mean   :139193   Mean   :168409  
##  3rd Qu.: 85478   3rd Qu.:171374     3rd Qu.:176722   3rd Qu.:215554  
##  Max.   :180266   Max.   :523153     Max.   :200039   Max.   :233846  
##  NA's   :61       NA's   :61         NA's   :63       NA's   :63      
##     Title 42        Foreign Born       Naturalized         Noncitizen      
##  Min.   : 206770   Min.   : 9619300   Min.   :14967828   Min.   :20722014  
##  1st Qu.: 638922   1st Qu.: 9738100   1st Qu.:17003818   1st Qu.:21671389  
##  Median :1071074   Median :19767300   Median :19639724   Median :21965584  
##  Mean   : 793937   Mean   :23434849   Mean   :19752182   Mean   :21939190  
##  3rd Qu.:1087520   3rd Qu.:36154329   3rd Qu.:22459486   3rd Qu.:22364709  
##  Max.   :1103966   Max.   :46182177   Max.   :24509131   Max.   :22593269  
##  NA's   :72        NA's   :7          NA's   :57         NA's   :57        
##  Unauthorized population US Population         App_lagged     
##  Min.   : 3500000        Min.   :146631302   Min.   :  45336  
##  1st Qu.:10237500        1st Qu.:197636197   1st Qu.: 382740  
##  Median :10850000        Median :237923795   Median : 885587  
##  Mean   :10168182        Mean   :241806480   Mean   : 820197  
##  3rd Qu.:11375000        3rd Qu.:291456616   3rd Qu.:1183165  
##  Max.   :12200000        Max.   :333287557   Max.   :1865379  
##  NA's   :53

Task 1: Interpret barplot of deportations

Below is code to produce a barplot of deportations over the time frame. I want you to provide a professional-grade interpretation of the plot you are seeing. This task is worth 100 points.

df_melted <- aggregate(data = remove.1, Deportations ~ Year, mean)
names(df_melted) <- c("Year", "mean_Deportations")

ggplot(df_melted, aes(x = Year, y = mean_Deportations, width=1)) +
  geom_bar(stat = "identity") +
  scale_x_continuous(n.breaks = 10) +
labs(title="Figure 1: Deportations by year (FY 1948-2022)",
       y="Number of deportations", x="Fiscal year",
       color="") +
  theme_bw() +
  theme(#panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    axis.text.y = element_text(size=9),
    axis.text.x = element_text(size=9),
    #axis.title.y=element_blank(),
    #axis.title.x=element_blank(),
    #legend.title=element_blank(),
    #legend.position=c(.01, .77),
    #legend.justification=c("left", "bottom"),
    #legend.title = element_text(size = 5), 
    #legend.text = element_text(size = 5),
    #legend.margin=margin(0,0,0,0),
    #legend.box.margin=margin(-1,-1,-1,-1),
    plot.title = element_text(size=12))  

Task 1 answer here

The figure above demonstrates the number of deportations per year. For the first few decades, from 1940 to 1980, the deportations remained relatively low and stable, and rarely passed 100,000 in a year. However, beginning in the 19990s we see the number of deportations increase drastically and climb rapidly until it peaks around the 2010s.At this point annual deportaitons surpassed 400,000, then began to gradually decline, with a sharp drop in 2020, which is likely on the account of the COVID-19 pandemic. The large increase in the 1990s corresponds to major immigration policy changes made around that time border enforcement authority was greatly broadened and could be attributed to the spike. The figure demonstrates the trend of w deportations became increasingly instutionalized and have shifted from being a more rare consequence or legal event, into a more routine enforcement of authority and demonstration of power and inequality in the United States.

Task 2: T-test by Party

Create a factor-level variable for Party of the President labeled “Republican” for Republicans and “Democrat” for Democrats. Following this, compute a two-group difference-in-means test assessing the following research question: Are the number of Deportations under a Democratic Presidency significantly different from Deportations under a Republican Presidency? In a paragraph, report results from the analysis using substantive language that could be understandable to a lay-person. This task is worth 100 points.

#Insert code to do this task in this chunk 
PresidentParty<-factor(remove.1$Party,levels=c(0,1),labels = c("Republican","Democrat"),ordered=TRUE)
levels(PresidentParty)
## [1] "Republican" "Democrat"
t.test(remove.1$Deportations~PresidentParty)
## 
##  Welch Two Sample t-test
## 
## data:  remove.1$Deportations by PresidentParty
## t = -0.97685, df = 64.521, p-value = 0.3323
## alternative hypothesis: true difference in means between group Republican and group Democrat is not equal to 0
## 95 percent confidence interval:
##  -94518.08  32432.22
## sample estimates:
## mean in group Republican   mean in group Democrat 
##                 94800.32                125843.26

Task 2 answer goes here

Based on this data, during the years we had a democrat president, the number of deportations was about 30,000 more than the years where we had a republican president. However, there was a very large variation year-to-year in the number of deportations for any parties presidency. As a result of this, we do not have sufficient evidence to reject the null hypothesis that the number of deportations were the same under both parties. Essentially, because the data is so uncertain, we do not have enough evidence to prove that one party deported significantly more than another party, and must support the idea that they had an equal impact on deportation.

Task 3: Regression with a dummy variable

Estimate a bivariate regression model of the form: \(\hat{Deportations}=\beta_0 + \beta_1*Party~of~President\) and report the results from the regression model by summarizing the regression object. Based on the table of results, what would be the predicted number of deportations for Republicans and for Democrats. What does \(\beta_0\) and \(\beta_1\) tell us? Based on the model, is there evidence to reject the null hypothsis that \(\overline{D}_{Dem}=\overline{D}_{Rep}\)? This task is worth 100 points. Before doing this, you should read “The US Deportation System: History, Impacts, and New Empirical Research” by Caitlin Patler and Bradford Jones.

#Insert code to do this task in this chunk 
reg1<- lm(Deportations~PresidentParty,data=remove.1)
summary(reg1)
## 
## Call:
## lm(formula = Deportations ~ PresidentParty, data = remove.1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -118080  -87243  -71318   82297  306491 
## 
## Coefficients:
##                  Estimate Std. Error t value       Pr(>|t|)    
## (Intercept)        110322      15643   7.052 0.000000000833 ***
## PresidentParty.L    21951      22123   0.992          0.324    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 135200 on 73 degrees of freedom
## Multiple R-squared:  0.01331,    Adjusted R-squared:  -0.0002093 
## F-statistic: 0.9845 on 1 and 73 DF,  p-value: 0.3244

Task 3 answer goes here

This regression model estimates the correlation between the presidents poltical party affiliation, and the number of deportations in the united states of america every year. The results are that under Republican presidents, the average number of deportations is approximately 110,322. However, under Democratic presidents the model estimates an increase of about 22,000 deportations, bringing the estimated Democratic average to around 132,000. The coefficient of the Democratic presidents is not statistically significant(p=.324), so we cannot be confident that this difference is significant, and not due to chance. In addition to this, the model shows that there is very little variation in the amount of deportations, meaning that the difference could be attributed to reasons other than political party. Based on these findings, we fail to reject the null hypothesis that there is no difference in deportation levels between Democratic and Republican presidencies.

Task 4: Plot regression object

Using \(\textrm{plot_model}\) (from the \(\textrm{sjPlot}\)), provide a professional-grade plot of the regression model along with an interpretation of the plot. Which hypothesis is the plot most consistent with? This task is worth 100 points.

#Insert code to do this task in this chunk 
plot_model(reg1, type = "est", show.values = TRUE, value.offset = 0.3, title = "Effect of President's Party on Deportations") +
labs(
  x= "Democratic President - Republican President",
  y="Estimated effect on deportations/year"
     ) + 
  theme(axis.text.y= element_blank())

Task 4 answer goes here

The plot shows the estimated difference in deportations with Democratic versus Republican presidents. The point estimate shows that, on average, there were 21,951 more deportations per year under Democratic presidents than under Republican ones. However, the confidence interval for this data is wide, and crosses 0, meaning that our data is not statistically significant.Essentially, although visually, there appears to be an increase under Democratic presidents, this data is not reliable enough to make conclusions with any solid foundations, meaning our data is consistent with the null hypthesis, that there is no substantial difference in deportation under different party presidents.

Task 5: Regression by decade

In the Patler and Jones article I asked you to read, they point out that several policies were enacted that made deportations easier to carry out. Among one of the most important policy was the Illegal Immigration Reform and Immigrant Responsibility Act, 1996. One prediction might be that after changes in the 1990 (like the IIRIA), we should observe and increase in deportations starting in the 1990s. To assess this claim, do the following:

Create a well-labled factor-level variable denoting each decade starting with the 1950s (1951-1960) going up to the 2010s (2011-2020) and then estimate a regression model treating the dependent variable (i.e the number of deportations) as a function of the decade-factor level variable. Following this plot the regression model using $. Provide a thorough interpretation of the regression model with a focus on the claims made in the paragraph above. Are the results consistent with the basic claim made? This task is worth 100 points.

#Insert code to do this task in this chunk 
remove.1 <- remove.1 %>%
  filter(Year >= 1951 & Year <= 2020) %>%
  mutate(Decade = case_when(
    Year >= 1951 & Year <= 1960 ~ "1950s",
    Year >= 1961 & Year <= 1970 ~ "1960s",
    Year >= 1971 & Year <= 1980 ~ "1970s",
    Year >= 1981 & Year <= 1990 ~ "1980s",
    Year >= 1991 & Year <= 2000 ~ "1990s",
    Year >= 2001 & Year <= 2010 ~ "2000s",
    Year >= 2011 & Year <= 2020 ~ "2010s"
  ))
remove.1$Decade <- factor(remove.1$Decade,
                          levels = c("1950s", "1960s", "1970s", "1980s", "1990s", "2000s", "2010s"),
                          ordered = FALSE)
reg3 <- lm(Deportations ~ Decade, data = remove.1)
summary(reg3)
## 
## Call:
## lm(formula = Deportations ~ Decade, data = remove.1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -112306   -8002    -742    7322  104976 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)    15047      14387   1.046             0.299619    
## Decade1960s    -4927      20347  -0.242             0.809460    
## Decade1970s     8974      20347   0.441             0.660666    
## Decade1980s     8236      20347   0.405             0.687015    
## Decade1990s    79603      20347   3.912             0.000227 ***
## Decade2000s   262426      20347  12.898 < 0.0000000000000002 ***
## Decade2010s   334623      20347  16.446 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45500 on 63 degrees of freedom
## Multiple R-squared:  0.9016, Adjusted R-squared:  0.8923 
## F-statistic: 96.25 on 6 and 63 DF,  p-value: < 0.00000000000000022
plot_model(reg3, type = "est", show.values = TRUE, value.offset = 0.3,
           title = "Effect of Decade on Deportations")

Task 5 answer goes here

This model analyzes how average deportations per year have changed on a decade to decade basis, from 1950-2010. With the baseline being set at the average deportations in 1950, indicated by the intercept of 15047 per year.The coefficient in the model demonstrates how many more, or fewer deportations occured on average per year, in each decade time period as compared to the 1950s. The model shows a clear and substantial increase in deportations each decade, especially starting in the 1990s, which aligns with the claims and evidence presented within the Patler and Jones article.The coefficients for the decades of 1960s, 1970s, and the 1980s were not significantly different that the 1950’s since their 95% confidence intervals all included zero in them. This indicates that the rate of deportation per year in these decades were not significantly different that the rate in the 1950s. The coefficient in the 1990’s, however, is a large jump from the 1950s and was statistically significant since its 95% confidence interval excluded zero. It indicates tha the number of deportations increased significantly in the 1990s’s compared to the 1950s. The coefficients in 2000s and 2010s are even larger, which could reflect the lengthy impact of policies such as the Illegal Immigrant Reform and Immigrant Responsibility act in 1996 in increasing deportations compared to the 1950s.

Task 6: Pre-post 1996

Create a dummy variable (or binary variable) coded 1 if the year is 1996 or later and 0 otherwise. Estimate a regression model treating deportations as a function of this dummy variable. Plot the regression model and provide a thorough substantive interpretation of the regression results. To start, what would be the null and alternative hypotheses for \(\beta_1\) given the research question? Suggested ways to interpret this would be to report the predicted number of deportations in the later period compared to the earlier period as well as the discussing the coefficient showing the difference. You should tie your interpretation back to the regression estimates. This task is worth 100 points.

#Insert code to do this task in this chunk 
remove.1$Post1996 <- ifelse(remove.1$Year >= 1996, 1, 0)
reg4 <- lm(Deportations ~ Post1996, data = remove.1)
summary(reg4)
## 
## Call:
## lm(formula = Deportations ~ Post1996, data = remove.1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -210398  -12302   -1951   13213  152256 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)    20905       8972    2.33              0.0228 *  
## Post1996      259173      15013   17.26 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60190 on 68 degrees of freedom
## Multiple R-squared:  0.8142, Adjusted R-squared:  0.8115 
## F-statistic:   298 on 1 and 68 DF,  p-value: < 0.00000000000000022
plot_model(reg4, type = "est", show.values = TRUE, value.offset = 0.3,
           title = "Deportations Before and After 1996")

Task 6 answer goes here

Null Hypothesis: There is no difference in average deportations before and after 1996. Alternative Hypothesis: There is a difference in average deportations before and after 1996.

The model above shows the deportations as a function of whether the year was before or after 1996, when the IIRIRA( Illegal Immigration Reform and Immigrant Responsibility Act) was passed. According to the regression output,the intercept of the model or the average number of deportations per year before 1996, was around 20,000.The coefficient for our variable Post1996 however, is 259,173. This means on average, after 1996, there were 259,173 more deportations per year than before 1996. Putting these values together we can conclude that annually after 1996, the United States deported on average around 280,000 people per year, which is backed by the p-value for this coeeficient. Our p-value is shown to be < 0.001, meaning this information is highly statistically significant. This information provides support for our alternative hypothesis as we reject the null. It also demonstrates the major impact that can be attributed to the IIRIRA policy, as well as the claims made in the Patler and Jones article, where it was stated that the IIRIRA fundamentally transformed the deportation system in America.However, as a caveat, in order to fully attribute this change to this policy, we must consider all external variables that may have occured during the time frame of before and after 1996.