LINAS Project: Deliverable 2

Objective: This project will give the student the opportunity to apply statistical modeling techniques to real world public opinion data. Each student will estimate and more importantly, interpret, a model they proposed in deliverable 1 using the Latino Immigrant National Attitude Survey. The first deliverable was worth 100 points; this second component will be worth 500 points. This RMD file is critical as it contains recodes of a number of independent variables that will be used by some of you. I will update the recoding section of it to accommodate some of the proposed independent variables, but I will not recode all of the proposed independent variables.

You are to use this RMD to file to produce a final HTML that will be submitted on Canvas by Wednesday, December 10 at 11:59 PM. However, there are a series of extra credit incentives in place in order to induce avoidance of turning it in at the last minute. They are:

Should you submit the HTML for part 2 on Canvas by Friday, Dec. 5, 11:59 PM, you will receive an 8% bonus to your grade. Eight percent of 500 is 40 points.

Should you submit the HTML for part 2 on Canvas by Sunday, Dec. 7, 11:59 PM , you will receive a 5% bonus to your grade. Five percent of 500 is 25 points.

Should you submit the HTML for part 2 on Canvas by Tuesday, Dec. 9, 11:59 PM, you will receive a 3% bonus to your grade. Three percent of 500 is 15 points.

Should you submit the HTML for part 2 on Canvas by Thursday, Dec. 11, 11:59 PM, you will receive a 0% bonus to your grade. This will be the final submission time.

On Canvas, there will be 4 portals for submission, one each for these options.

Accessing LINAS 2025 data

The following chunk of code will access the LINAS data.

linas.1="https://raw.githubusercontent.com/mightyjoemoon/LINAS2025/main/linas_may2025_weighted_csv.csv"

linas.1<-read_csv(url(linas.1))

head(linas.1)
## # A tibble: 6 × 278
##   record uuid  date     s2    s3    s4  s5r1  s5r2  s5r3  s5r4  s5r5  s5r6  s5r7
##    <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     88 qqsg… 4/22…     2     1     2     0     1     0     0     0     0     0
## 2    244 xz2n… 4/24…     4     1     2     0     1     0     0     0     0     0
## 3    260 5qxf… 4/24…     5     1     2     0     1     0     0     0     0     0
## 4    285 cgbt… 4/24…     5     2     2     0     1     0     0     0     0     0
## 5    382 sgrg… 4/25…     3     1     2     0     1     0     0     0     0     0
## 6    615 zxz4… 4/30…     5     2     2     0     1     0     0     0     0     0
## # ℹ 265 more variables: s6 <dbl>, s6r22oe <chr>, origcat <dbl>, state <dbl>,
## #   akc <lgl>, alc <dbl>, arc <dbl>, azc <dbl>, cac <dbl>, coc <dbl>,
## #   ctc <dbl>, dec <lgl>, flc <dbl>, gac <dbl>, hic <dbl>, iac <dbl>,
## #   idc <dbl>, ilc <dbl>, inc <dbl>, ksc <dbl>, kyc <dbl>, lac <dbl>,
## #   mac <dbl>, mec <dbl>, mdc <dbl>, mic <dbl>, mnc <dbl>, moc <dbl>,
## #   msc <dbl>, mtc <dbl>, ncc <dbl>, ndc <dbl>, nec <dbl>, nhc <dbl>,
## #   nmc <dbl>, nvc <dbl>, njc <dbl>, nyc <dbl>, ohc <dbl>, okc <dbl>, …

Dependent variables

The following chunks of code will produce the three dependent variables you selected from. Do not alter this code as it may alter the meaning of the scale (or corrupt it). In deliverable 1, you chose one of these dependent variables for your analysis..

Proactive response to enforcement

The proactive response scale is based on questions q8r1 through q8r9. If you are using this measure as the dependent measure, it is your responsibility to assess what the variable is measuring. The name of this variable is proactive_scale.

linas.1$proactive1 <- ifelse(linas.1$q8r1==1, 1, 0)
  linas.1$proactive2 <- ifelse(linas.1$q8r2==1, 1, 0)
    linas.1$proactive3 <- ifelse(linas.1$q8r3==1, 1, 0)
      linas.1$proactive4 <- ifelse(linas.1$q8r4==1, 1, 0)
        linas.1$proactive5 <- ifelse(linas.1$q8r5==1, 1, 0)
          linas.1$proactive6 <- ifelse(linas.1$q8r6==1, 1, 0)
        linas.1$proactive7 <- ifelse(linas.1$q8r7==1, 1, 0)
       linas.1$proactive8 <- ifelse(linas.1$q8r8==1, 1, 0)
      linas.1$proactive9 <- ifelse(linas.1$q8r9==1, 1, 0)
     #linas.1$proactive10 <- ifelse(linas.1$q8r10==1, 1, 0)
    #linas.1$proactive11 <- ifelse(linas.1$q8r11==1, 1, 0)
   #linas.1$proactive12 <- ifelse(linas.1$q8r12==1, 1, 0)
   

linas.1$proactive_scale <- (linas.1$proactive1 + linas.1$proactive2 + linas.1$proactive3  + linas.1$proactive4 + linas.1$proactive5 + linas.1$proactive6 + linas.1$proactive7 + linas.1$proactive8  + linas.1$proactive9)

table(linas.1$proactive_scale)
## 
##   0   1   2   3   4   5   6   7   9 
## 413 275 146  90  41  20  10   3   2
summary(linas.1$proactive_scale)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     1.0     1.2     2.0     9.0

Independent variables

Each student is required to analyze the relationship between gender and party affiliation. In the next two chunks, you will see code producing these two variables.

Gender

Gender is a factor-level variable recorded as “female” and “male”. For purposes of statistical analysis, “male” is the baseline category. It is the student’s responsibility to understan what this means. This variable is called gender.

linas.1$gender <- factor(linas.1$s3,
                                levels=c("1", "2"),
                                labels=c("Male", "Female"))

table(linas.1$gender)
## 
##   Male Female 
##    485    514

Party affiliation

Party affiliation is a three-level factor variable recorded as “Republican” for Republicans, “Democrats” for Democrats, and “Ind./Other” for Independent and other identifiers. This factor variable treats partisan “leaners” as partisans. This will be explained in class. The code is a bit lengthy so don’t alter it. To derive the 3-level factor, I first created a variable to identify the leaners. From this I create the variable for party identification; this variable is called pidthree.

##Coding for party: multi levels

linas.1$pid[linas.1$q65==1 & linas.1$q66==1] <- 1 
linas.1$pid[linas.1$q65==1 & linas.1$q66==2] <- 2
linas.1$pid[linas.1$q65==3 & linas.1$q67==1] <- 3
linas.1$pid[linas.1$q65==3 & linas.1$q67==3] <- 4
linas.1$pid[linas.1$q65==3 & linas.1$q67==2] <- 5
linas.1$pid[linas.1$q65==2 & linas.1$q66==2] <- 6
linas.1$pid[linas.1$q65==2 & linas.1$q66==1] <- 7
linas.1$pid[linas.1$q65==3 & linas.1$q67==4] <- 8  #Independent leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==4] <- 9  #Other leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==1] <- 3  #Other leans Rep
linas.1$pid[linas.1$q65==4 & linas.1$q67==2] <- 5  #Other leans Dem
linas.1$pid[linas.1$q65==4 & linas.1$q67==3] <- 12 #Other leans Independent

## Note that the code below will exclude: Independents who lean "other"; "Other" identifiers who lean "other"; and "Other that leans Independent" 

linas.1$pidseven <- factor(linas.1$pid,
                             levels=c(1,2,3,4,5,6,7),
                             labels=c("SR", "R", "LR", "I", "LD", "D", "SD"))

## Coding for party: 3 levels.  Note that leaners are treated as partisans. Republicans are baseline category

linas.1$pidthree<- factor(linas.1$pid,
                       levels=c(1,2,3,4,5,6,7, 8, 9, 12),
                       labels=c("Republican", "Republican", "Republican", "Ind./Other",
                                "Democrat", "Democrat", "Democrat", "Ind./Other",
                                 "Ind./Other", "Ind./Other"))

table(linas.1$pidthree)
## 
## Republican Ind./Other   Democrat 
##        236        293        471

Additional independent variables

Apart from gender and party affiliation, each student selected between 2 and 4 additional independent variables. I have precoded many, but not all, of these items. The following chunks of code will produce several variables, some of which you may have selected.

Time in the US variable (q1)

“What year did you first arrive to live in the United States?” is how we measure time in the United States. This variable is based on q1 which is coded 1=2025, 2=2024, 101=1925. If we subtract 1 from this variable, we have an approximation of the number of years spent in the US. The name of this variable is timefrom2025.

#table(linas.1$q1)

linas.1$timefrom2025 <- linas.1$q1-1

table(linas.1$timefrom2025)
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  7 35 43 33 22 19 19 25 17 16 17 13 19 17 14 45 12 12 18 13 17 14 18 24 29 58 
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
## 24 22 10 11 24 20  9 17 14 28 14 17  7 18 17 14  7  8  8 16  5  7 11  9 10  9 
## 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 75 76 78 88 
##  4  7  9  4  1  4  5  4  2  2  3  4  6  1  1  1  2  1  1  1  1  1  1  1  1

Anxiety about deportation (q46 and q47)

Anxiety about deportation are based on Tquestions q46 and 47. I wrote these questions and ideally, they are meant to be used in conjunction with one another. Some students are using 1 or both of them. The variable personal_anxiety records individual anxiety; the variable ff_anxiety records anxiety for family or friends. High scores reflect greater anxiety.

linas.1$personal_anxiety <- 5-linas.1$q46


linas.1$ff_anxiety <- 5-linas.1$q47

table(linas.1$personal_anxiety)
## 
##   1   2   3   4 
## 348 298 234 120
table(linas.1$ff_anxiety)
## 
##   1   2   3   4 
## 230 243 340 187

Income level (q59)

table(linas.1$q59)
## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13 
## 132 103  83  88 107  56  79  61  48  91  54  16  82
linas.1$income_level[linas.1$q59 <= 4] <- "A. Low"
linas.1$income_level[linas.1$q59 >=5 & linas.1$q59 <=9] <- "B. Medium"
linas.1$income_level[linas.1$q59 >9 & linas.1$q59 <=12] <- "C. High"

table(linas.1$income_level)
## 
##    A. Low B. Medium   C. High 
##       406       351       161

ANALYSIS:

Here is where your story begins. You will describe your data and then estimate and interpret a linear regression model.

Part 1: Overview of research and dependent variable

What is your research question and what are the main features of your dependent variable? You should follow my example but use your own data and language. Do not cut and paste what I write; this will lead you down a path you don’t want to go down. This section is worth 100 points.

Research question:

What is the research question you are addressing?

What factors alter the extent to which Latino immigrants engage in proactive strategies in response to Trump’s immigration enforcement?

Why should we care?

Why should anyone care about what it is you’re doing?

By studying different factors that influence the extent to which Latino immigrants engage in proactive strategies in response to Trump’s immigration enforcement, we gain valuable insight into how a prevalent marginalized group in the US navigates state power and oppression. Additionally, this study shifts focus from portraying Latino immigrants simply as victims of immigration enforcement to portraying them as important political actors.

Characteristics of your dependent variable

What is your dependent variable measuring and what does the distribution of the variable look like? Below is shell code that produces a barplot using a variable called “endorese_narrative.” You will plot your dependent variable obviously.

My dependent variable is the proactive strategies used by Latino immigrants. This variable is based on survey question 8: Since the new Trump Administration began in January 2025, have you done any of the following. The variable is coded such that 0 denotes the number of Latino immigrants that have not engaged in any proactive strategies and 9 denotes those who have engaged in all 9 proactive strategies listed under the survey question. Below is a bar plot representing this variable.

proactive <- ggplot(linas.1, aes(x = proactive_scale, y = after_stat(count/sum(count)))) +
  geom_bar(fill = "lightblue4") +
  scale_y_continuous(labels=percent) +
  scale_x_continuous(breaks = seq(0, 9, 1)) +
  labs(title = "More than 40% of Respondents Don't Participate in Any Forms of Proactive\nStrategies",
       x = "Number of Proactive Strategies that Respondents Engage in",
       y = "Percentage of Sample") +
  theme_classic()

proactive

Provide an interpretation of this plot here

What are the main features of your plot?

This plot is of the dependent variable. It reveals that close to half of respondents (~40%) engage in no proactive strategies. It also reveals that around 60% of respondents do engage in some form of proactive response to Trump’s immigration enforcement. The majority of this 60% have engaged in 1, 2, or 3 different strategies. This disparity between those who do and don’t proactively respond to immigration enforcement begs the question; who are the Latino immigrants who engage in proactive strategies?

Part 2: Analysis

In this section you will assess the relationship between your dependent variable and your independent variables. This section is worth 500 points.

Independent variables (100 points)

What are your dependent variables. Use natural language and not the literal name of the variable you are interpreting (i.e. if you’re using the variable “endorse_narrative”, do not use the language “endorse_narrative” since no one knows what this means; use substantive language.)

4 independent variables that I predict will be related to proactive response by Latinos to Trump’s immigration enforcement are gender, party affiliation, the time that immigrants have been in the US, anxiety about deportation, and income level. Gender is coded as binary denoting males (n = 485) and females (n = 514). Party affiliation is a three-level factor variable denoting Republicans (n = 236), Democrats (n = 471), and Independents/other (n = 293). The time immigrants have been in the US is coded such that 1 denotes that an immigrant arrived in 2025, 2 denotes arrival in 2024 and so on. Subtracting one from this variable yields an accurate time that an immigrant has spent in the US. Anxiety about deportation is coded such that 1 denotes “I worry a great deal/all the time” (n = 348), 2 denotes “I worry a lot” (n = 298), 3 denotes “I don’t worry too much” (n = 234), and 4 denotes “I don’t worry at all” (n = 120). Income level is a three-level variable where “Low” denotes a total household income of below $50,000 (n = 406), “Medium” denotes a total income between $50,000 and $100,000 (n = 351), and “High” denotes a total income of above $100,000 (n = 161). As expressed in my first deliverable, I predict: 1) female immigrants are more likely to proactively respond, 2) republicans are less likely to proactively respond, 3) immigrants living in the US longer are more likely to proactive respond, 4) immigrants with more anxiety about deportation are less likely to proactively respond, and 5) immigrants with medium income are most likely to proactively respond and immigrants with high income are least likely to respond, with low-income immigrants somewhere in between.

Regression analysis (400 points)

Here is where you will estimate the linear regression model. The code below is based on my worked example; yours will be based on your deliverable 1 proposal.

reg1 <- lm(proactive_scale ~ gender + pidthree + timefrom2025 + personal_anxiety + income_level, data=linas.1, weights=weight)

summary(reg1)
## 
## Call:
## lm(formula = proactive_scale ~ gender + pidthree + timefrom2025 + 
##     personal_anxiety + income_level, data = linas.1, weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4008 -0.9005 -0.2347  0.6257  7.8667 
## 
## Coefficients:
##                        Estimate Std. Error t value          Pr(>|t|)    
## (Intercept)           -0.080397   0.163513  -0.492             0.623    
## genderFemale          -0.146112   0.089722  -1.628             0.104    
## pidthreeInd./Other     0.076583   0.128725   0.595             0.552    
## pidthreeDemocrat       0.544183   0.116563   4.669 0.000003489686727 ***
## timefrom2025           0.002101   0.002728   0.770             0.442    
## personal_anxiety       0.330485   0.045516   7.261 0.000000000000826 ***
## income_levelB. Medium  0.503591   0.098561   5.109 0.000000393837642 ***
## income_levelC. High    0.801830   0.135439   5.920 0.000000004553788 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.342 on 909 degrees of freedom
##   (83 observations deleted due to missingness)
## Multiple R-squared:  0.1353, Adjusted R-squared:  0.1286 
## F-statistic: 20.31 on 7 and 909 DF,  p-value: < 0.00000000000000022

I estimated a linear regression model and the results are displayed above. Based on these results, it is revealed that there is no gender gap associated with proactive strategies used in response to immigration enforcement. While males score around .15 points higher than females, this value is negligible given the 10-point scale of the proactive strategies dependent variable. Additionally, the p-value of 0.104 is above the 0.05 significance level, therefore the difference between males and females becomes even more negligible. This is easier to see visually.

plot_model(reg1, type = "pred", 
           terms = c("gender"), ci.lvl = .95, 
           title="There are no Gender Differences in Proactive Strategies Used by Latino\nImmigrants", axis.title=c("Gender", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4)  +
   #ylim(0,4) +
  theme_classic() +
  theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
            axis.ticks = element_blank())

As seen in the graph, there is very low difference between males and females engaging in proactive strategies. This is demonstrated by largely overlapping confidence intervals and a very moderate slope. This data is not consistent with my prediction that women would have more participation in proactive strategies to immigration enforcement.

Next, analyzing partisanship, I find a difference between political parties in the extent to which Latino immigrants engage in proactive strategies. The table shows that democrats score around 0.54 points higher than republicans which is substantial on the proactive scale. Independent identifiers score 0.08 points higher than Republicans which is statistically insignificant, especially when taking into account the p-value of .55 which is far more than the significance level. These data are better explained with a regression plot.

plot_model(reg1, type = "pred", 
           terms = c("pidthree"), ci.lvl = .95, 
           title="Democrats Engage in Proactive Strategies More than Independents and\nRepublicans", axis.title=c("Party Identification", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4)  +
   #ylim(0,4) +
  theme_classic() +
  theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
            axis.ticks = element_blank())

The graph portrays Republicans and Independents engagement in proactive strategies at similarly low levels, while Democrats show engagement in proactive strategies at higher rates than both other parties. This is supported by the slope increase between Independents and Democrats and the lack of overlap between the confidence intervals. These data are consistent with my hypothesis that democrats would engage in more proactive strategies than republicans.

Next, I analyze the relationship between time spent in the US and engagement in proactive strategies. In the regression output, the coefficient of 0.0021 is of very little significance on a 10-point scale and suggests that Latino immigrants do not engage in more proactive strategies as time in the US increases. Additionally, the p-value of .44 is far above the .05 significance level, further emphasizing insignificance. This relationship is easier to see in the following regression plot.

plot_model(reg1, type = "pred", 
           terms = c("timefrom2025"), ci.lvl = .95, 
           title="Proactive Behavior by Latino Immigrants Does not Change as a Function of Time\nSpent in the US", axis.title=c("Time Spent in US (Years)", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4)  +
   #ylim(0,4) +
  theme_classic() +
  theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
            axis.ticks = element_blank())

As it can be seen in the plot, the gradient is essentially flat (only around a .13 change in number of proactive strategies used in 75 years) between time spent in the US and proactive engagement, suggesting no correlation. This finding is not consistent with my hypothesis that more time spent would result in more proactive engagement in response to Trump’s immigration enforcement.

Next observing the relationship between personal anxiety about deportation and the extent to which Latino immigrants engage in proactive strategies, we find that for each one-point increase in anxiety, proactive behavior increases by 0.33 points. This demonstrates a strong relationship and high statistical significance, as further evidenced by the p-value no different from 0. This relationship is better understood with a regression plot.

plot_model(reg1, type = "pred", 
           terms = c("personal_anxiety"), ci.lvl = .95, 
           title="There is a Strong Correlation Between Personal Anxiety About Deportation and\nProactive Engagement", axis.title=c("Level of Personal Anxiety (1 is Low, 4 is High)", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4)  +
   #ylim(0,4) +
  theme_classic() +
  theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
            axis.ticks = element_blank())

As visualized, there is a strong correlation between personal anxiety about deportation and engagement in proactive strategies in response to immigration enforcement. This one of the more surprising results in my analysis, as I hypothesized that lower anxiety levels about deportation would result in higher proactive behavior. I predicted this because I thought that if a Latino immigrant was less afraid of getting deported, they would be more likely to engage in proactive strategies. Now seeing the data, I propose a new hypothesis as a possible explanation for this trend: Latino immigrants with higher anxiety are more likely to engage in proactive strategies because they have more of an incentive to create a safer environment for themselves in the US.

Finally, assessing the relationship between income level and Latino immigrant engagement in proactive strategies, we find the largest variation in any of the independent variables being analyzed. Latino immigrants with medium household incomes score .50 points higher than those with low household incomes. Latino immigrants with high household incomes score .80 points (almost a full point) higher than those with low household incomes. These results are highly statistically significant, especially when taken into account that the p-values are significantly lower than the 0.05 significance level. This relationship is visualized below.

plot_model(reg1, type = "pred", 
           terms = c("income_level"), ci.lvl = .95, 
           title="Proactive Engagement Increases as Total Household Income Increases", axis.title=c("Income Level", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4)  +
   #ylim(0,4) +
  theme_classic() +
  theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
            axis.ticks = element_blank())

As visualized in the graph, there is a strong downward gradient from high to low income, meaning that as income increases, so does the level of proactive activity that Latino immigrants engage in. This data is not consistent with my original hypothesis that proactive engagement would increase from low to medium but then descend from medium to high.

My model suggests strong evidence that, while gender and time spent in the US have no effect on how much Latino immigrants engage in proactive strategies, partisanship, personal anxiety about deportation, and income level do have an effect. While these factors do seem to alter how much Latino immigrants engage in proactive strategies, in general, the sample population of immigrants surveyed engage in proactive strategies at relatively low levels, with the average number of proactive strategies engaged in being 1.2. This result is quite interesting and begs another question; Why are engagement levels in proactive strategies in response to immigration enforcement by Latino immigrants low?