Objective: This project will give the student the opportunity to apply statistical modeling techniques to real world public opinion data. Each student will estimate and more importantly, interpret, a model they proposed in deliverable 1 using the Latino Immigrant National Attitude Survey. The first deliverable was worth 100 points; this second component will be worth 500 points. This RMD file is critical as it contains recodes of a number of independent variables that will be used by some of you. I will update the recoding section of it to accommodate some of the proposed independent variables, but I will not recode all of the proposed independent variables.
You are to use this RMD to file to produce a final HTML that will be submitted on Canvas by Wednesday, December 10 at 11:59 PM. However, there are a series of extra credit incentives in place in order to induce avoidance of turning it in at the last minute. They are:
Should you submit the HTML for part 2 on Canvas by Friday, Dec. 5, 11:59 PM, you will receive an 8% bonus to your grade. Eight percent of 500 is 40 points.
Should you submit the HTML for part 2 on Canvas by Sunday, Dec. 7, 11:59 PM , you will receive a 5% bonus to your grade. Five percent of 500 is 25 points.
Should you submit the HTML for part 2 on Canvas by Tuesday, Dec. 9, 11:59 PM, you will receive a 3% bonus to your grade. Three percent of 500 is 15 points.
Should you submit the HTML for part 2 on Canvas by Thursday, Dec. 11, 11:59 PM, you will receive a 0% bonus to your grade. This will be the final submission time.
On Canvas, there will be 4 portals for submission, one each for these options.
The following chunk of code will access the LINAS data.
linas.1="https://raw.githubusercontent.com/mightyjoemoon/LINAS2025/main/linas_may2025_weighted_csv.csv"
linas.1<-read_csv(url(linas.1))
head(linas.1)
## # A tibble: 6 × 278
## record uuid date s2 s3 s4 s5r1 s5r2 s5r3 s5r4 s5r5 s5r6 s5r7
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 88 qqsg… 4/22… 2 1 2 0 1 0 0 0 0 0
## 2 244 xz2n… 4/24… 4 1 2 0 1 0 0 0 0 0
## 3 260 5qxf… 4/24… 5 1 2 0 1 0 0 0 0 0
## 4 285 cgbt… 4/24… 5 2 2 0 1 0 0 0 0 0
## 5 382 sgrg… 4/25… 3 1 2 0 1 0 0 0 0 0
## 6 615 zxz4… 4/30… 5 2 2 0 1 0 0 0 0 0
## # ℹ 265 more variables: s6 <dbl>, s6r22oe <chr>, origcat <dbl>, state <dbl>,
## # akc <lgl>, alc <dbl>, arc <dbl>, azc <dbl>, cac <dbl>, coc <dbl>,
## # ctc <dbl>, dec <lgl>, flc <dbl>, gac <dbl>, hic <dbl>, iac <dbl>,
## # idc <dbl>, ilc <dbl>, inc <dbl>, ksc <dbl>, kyc <dbl>, lac <dbl>,
## # mac <dbl>, mec <dbl>, mdc <dbl>, mic <dbl>, mnc <dbl>, moc <dbl>,
## # msc <dbl>, mtc <dbl>, ncc <dbl>, ndc <dbl>, nec <dbl>, nhc <dbl>,
## # nmc <dbl>, nvc <dbl>, njc <dbl>, nyc <dbl>, ohc <dbl>, okc <dbl>, …
The following chunks of code will produce the three dependent variables you selected from. Do not alter this code as it may alter the meaning of the scale (or corrupt it). In deliverable 1, you chose one of these dependent variables for your analysis..
The proactive response scale is based on questions q8r1 through q8r9. If you are using this measure as the dependent measure, it is your responsibility to assess what the variable is measuring. The name of this variable is proactive_scale.
linas.1$proactive1 <- ifelse(linas.1$q8r1==1, 1, 0)
linas.1$proactive2 <- ifelse(linas.1$q8r2==1, 1, 0)
linas.1$proactive3 <- ifelse(linas.1$q8r3==1, 1, 0)
linas.1$proactive4 <- ifelse(linas.1$q8r4==1, 1, 0)
linas.1$proactive5 <- ifelse(linas.1$q8r5==1, 1, 0)
linas.1$proactive6 <- ifelse(linas.1$q8r6==1, 1, 0)
linas.1$proactive7 <- ifelse(linas.1$q8r7==1, 1, 0)
linas.1$proactive8 <- ifelse(linas.1$q8r8==1, 1, 0)
linas.1$proactive9 <- ifelse(linas.1$q8r9==1, 1, 0)
#linas.1$proactive10 <- ifelse(linas.1$q8r10==1, 1, 0)
#linas.1$proactive11 <- ifelse(linas.1$q8r11==1, 1, 0)
#linas.1$proactive12 <- ifelse(linas.1$q8r12==1, 1, 0)
linas.1$proactive_scale <- (linas.1$proactive1 + linas.1$proactive2 + linas.1$proactive3 + linas.1$proactive4 + linas.1$proactive5 + linas.1$proactive6 + linas.1$proactive7 + linas.1$proactive8 + linas.1$proactive9)
table(linas.1$proactive_scale)
##
## 0 1 2 3 4 5 6 7 9
## 413 275 146 90 41 20 10 3 2
summary(linas.1$proactive_scale)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 1.0 1.2 2.0 9.0
Each student is required to analyze the relationship between gender and party affiliation. In the next two chunks, you will see code producing these two variables.
Gender is a factor-level variable recorded as “female” and “male”. For purposes of statistical analysis, “male” is the baseline category. It is the student’s responsibility to understan what this means. This variable is called gender.
linas.1$gender <- factor(linas.1$s3,
levels=c("1", "2"),
labels=c("Male", "Female"))
table(linas.1$gender)
##
## Male Female
## 485 514
Party affiliation is a three-level factor variable recorded as “Republican” for Republicans, “Democrats” for Democrats, and “Ind./Other” for Independent and other identifiers. This factor variable treats partisan “leaners” as partisans. This will be explained in class. The code is a bit lengthy so don’t alter it. To derive the 3-level factor, I first created a variable to identify the leaners. From this I create the variable for party identification; this variable is called pidthree.
##Coding for party: multi levels
linas.1$pid[linas.1$q65==1 & linas.1$q66==1] <- 1
linas.1$pid[linas.1$q65==1 & linas.1$q66==2] <- 2
linas.1$pid[linas.1$q65==3 & linas.1$q67==1] <- 3
linas.1$pid[linas.1$q65==3 & linas.1$q67==3] <- 4
linas.1$pid[linas.1$q65==3 & linas.1$q67==2] <- 5
linas.1$pid[linas.1$q65==2 & linas.1$q66==2] <- 6
linas.1$pid[linas.1$q65==2 & linas.1$q66==1] <- 7
linas.1$pid[linas.1$q65==3 & linas.1$q67==4] <- 8 #Independent leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==4] <- 9 #Other leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==1] <- 3 #Other leans Rep
linas.1$pid[linas.1$q65==4 & linas.1$q67==2] <- 5 #Other leans Dem
linas.1$pid[linas.1$q65==4 & linas.1$q67==3] <- 12 #Other leans Independent
## Note that the code below will exclude: Independents who lean "other"; "Other" identifiers who lean "other"; and "Other that leans Independent"
linas.1$pidseven <- factor(linas.1$pid,
levels=c(1,2,3,4,5,6,7),
labels=c("SR", "R", "LR", "I", "LD", "D", "SD"))
## Coding for party: 3 levels. Note that leaners are treated as partisans. Republicans are baseline category
linas.1$pidthree<- factor(linas.1$pid,
levels=c(1,2,3,4,5,6,7, 8, 9, 12),
labels=c("Republican", "Republican", "Republican", "Ind./Other",
"Democrat", "Democrat", "Democrat", "Ind./Other",
"Ind./Other", "Ind./Other"))
table(linas.1$pidthree)
##
## Republican Ind./Other Democrat
## 236 293 471
Apart from gender and party affiliation, each student selected between 2 and 4 additional independent variables. I have precoded many, but not all, of these items. The following chunks of code will produce several variables, some of which you may have selected.
“What year did you first arrive to live in the United States?” is how we measure time in the United States. This variable is based on q1 which is coded 1=2025, 2=2024, 101=1925. If we subtract 1 from this variable, we have an approximation of the number of years spent in the US. The name of this variable is timefrom2025.
#table(linas.1$q1)
linas.1$timefrom2025 <- linas.1$q1-1
table(linas.1$timefrom2025)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 7 35 43 33 22 19 19 25 17 16 17 13 19 17 14 45 12 12 18 13 17 14 18 24 29 58
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## 24 22 10 11 24 20 9 17 14 28 14 17 7 18 17 14 7 8 8 16 5 7 11 9 10 9
## 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 75 76 78 88
## 4 7 9 4 1 4 5 4 2 2 3 4 6 1 1 1 2 1 1 1 1 1 1 1 1
Anxiety about deportation are based on Tquestions q46 and 47. I wrote these questions and ideally, they are meant to be used in conjunction with one another. Some students are using 1 or both of them. The variable personal_anxiety records individual anxiety; the variable ff_anxiety records anxiety for family or friends. High scores reflect greater anxiety.
linas.1$personal_anxiety <- 5-linas.1$q46
linas.1$ff_anxiety <- 5-linas.1$q47
table(linas.1$personal_anxiety)
##
## 1 2 3 4
## 348 298 234 120
table(linas.1$ff_anxiety)
##
## 1 2 3 4
## 230 243 340 187
table(linas.1$q59)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 132 103 83 88 107 56 79 61 48 91 54 16 82
linas.1$income_level[linas.1$q59 <= 4] <- "A. Low"
linas.1$income_level[linas.1$q59 >=5 & linas.1$q59 <=9] <- "B. Medium"
linas.1$income_level[linas.1$q59 >9 & linas.1$q59 <=12] <- "C. High"
table(linas.1$income_level)
##
## A. Low B. Medium C. High
## 406 351 161
Here is where your story begins. You will describe your data and then estimate and interpret a linear regression model.
What is your research question and what are the main features of your dependent variable? You should follow my example but use your own data and language. Do not cut and paste what I write; this will lead you down a path you don’t want to go down. This section is worth 100 points.
What is the research question you are addressing?
What factors alter the extent to which Latino immigrants engage in proactive strategies in response to Trump’s immigration enforcement?
Why should anyone care about what it is you’re doing?
By studying different factors that influence the extent to which Latino immigrants engage in proactive strategies in response to Trump’s immigration enforcement, we gain valuable insight into how a prevalent marginalized group in the US navigates state power and oppression. Additionally, this study shifts focus from portraying Latino immigrants simply as victims of immigration enforcement to portraying them as important political actors.
What is your dependent variable measuring and what does the distribution of the variable look like? Below is shell code that produces a barplot using a variable called “endorese_narrative.” You will plot your dependent variable obviously.
My dependent variable is the proactive strategies used by Latino immigrants. This variable is based on survey question 8: Since the new Trump Administration began in January 2025, have you done any of the following. The variable is coded such that 0 denotes the number of Latino immigrants that have not engaged in any proactive strategies and 9 denotes those who have engaged in all 9 proactive strategies listed under the survey question. Below is a bar plot representing this variable.
proactive <- ggplot(linas.1, aes(x = proactive_scale, y = after_stat(count/sum(count)))) +
geom_bar(fill = "lightblue4") +
scale_y_continuous(labels=percent) +
scale_x_continuous(breaks = seq(0, 9, 1)) +
labs(title = "More than 40% of Respondents Don't Participate in Any Forms of Proactive\nStrategies",
x = "Number of Proactive Strategies that Respondents Engage in",
y = "Percentage of Sample") +
theme_classic()
proactive
What are the main features of your plot?
This plot is of the dependent variable. It reveals that close to half of respondents (~40%) engage in no proactive strategies. It also reveals that around 60% of respondents do engage in some form of proactive response to Trump’s immigration enforcement. The majority of this 60% have engaged in 1, 2, or 3 different strategies. This disparity between those who do and don’t proactively respond to immigration enforcement begs the question; who are the Latino immigrants who engage in proactive strategies?
In this section you will assess the relationship between your dependent variable and your independent variables. This section is worth 500 points.
What are your dependent variables. Use natural language and not the literal name of the variable you are interpreting (i.e. if you’re using the variable “endorse_narrative”, do not use the language “endorse_narrative” since no one knows what this means; use substantive language.)
4 independent variables that I predict will be related to proactive response by Latinos to Trump’s immigration enforcement are gender, party affiliation, the time that immigrants have been in the US, anxiety about deportation, and income level. Gender is coded as binary denoting males (n = 485) and females (n = 514). Party affiliation is a three-level factor variable denoting Republicans (n = 236), Democrats (n = 471), and Independents/other (n = 293). The time immigrants have been in the US is coded such that 1 denotes that an immigrant arrived in 2025, 2 denotes arrival in 2024 and so on. Subtracting one from this variable yields an accurate time that an immigrant has spent in the US. Anxiety about deportation is coded such that 1 denotes “I worry a great deal/all the time” (n = 348), 2 denotes “I worry a lot” (n = 298), 3 denotes “I don’t worry too much” (n = 234), and 4 denotes “I don’t worry at all” (n = 120). Income level is a three-level variable where “Low” denotes a total household income of below $50,000 (n = 406), “Medium” denotes a total income between $50,000 and $100,000 (n = 351), and “High” denotes a total income of above $100,000 (n = 161). As expressed in my first deliverable, I predict: 1) female immigrants are more likely to proactively respond, 2) republicans are less likely to proactively respond, 3) immigrants living in the US longer are more likely to proactive respond, 4) immigrants with more anxiety about deportation are less likely to proactively respond, and 5) immigrants with medium income are most likely to proactively respond and immigrants with high income are least likely to respond, with low-income immigrants somewhere in between.
Here is where you will estimate the linear regression model. The code below is based on my worked example; yours will be based on your deliverable 1 proposal.
reg1 <- lm(proactive_scale ~ gender + pidthree + timefrom2025 + personal_anxiety + income_level, data=linas.1, weights=weight)
summary(reg1)
##
## Call:
## lm(formula = proactive_scale ~ gender + pidthree + timefrom2025 +
## personal_anxiety + income_level, data = linas.1, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.4008 -0.9005 -0.2347 0.6257 7.8667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.080397 0.163513 -0.492 0.623
## genderFemale -0.146112 0.089722 -1.628 0.104
## pidthreeInd./Other 0.076583 0.128725 0.595 0.552
## pidthreeDemocrat 0.544183 0.116563 4.669 0.000003489686727 ***
## timefrom2025 0.002101 0.002728 0.770 0.442
## personal_anxiety 0.330485 0.045516 7.261 0.000000000000826 ***
## income_levelB. Medium 0.503591 0.098561 5.109 0.000000393837642 ***
## income_levelC. High 0.801830 0.135439 5.920 0.000000004553788 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.342 on 909 degrees of freedom
## (83 observations deleted due to missingness)
## Multiple R-squared: 0.1353, Adjusted R-squared: 0.1286
## F-statistic: 20.31 on 7 and 909 DF, p-value: < 0.00000000000000022
I estimated a linear regression model and the results are displayed above. Based on these results, it is revealed that there is no gender gap associated with proactive strategies used in response to immigration enforcement. While males score around .15 points higher than females, this value is negligible given the 10-point scale of the proactive strategies dependent variable. Additionally, the p-value of 0.104 is above the 0.05 significance level, therefore the difference between males and females becomes even more negligible. This is easier to see visually.
plot_model(reg1, type = "pred",
terms = c("gender"), ci.lvl = .95,
title="There are no Gender Differences in Proactive Strategies Used by Latino\nImmigrants", axis.title=c("Gender", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As seen in the graph, there is very low difference between males and
females engaging in proactive strategies. This is demonstrated by
largely overlapping confidence intervals and a very moderate slope. This
data is not consistent with my prediction that women would have more
participation in proactive strategies to immigration enforcement.
Next, analyzing partisanship, I find a difference between political parties in the extent to which Latino immigrants engage in proactive strategies. The table shows that democrats score around 0.54 points higher than republicans which is substantial on the proactive scale. Independent identifiers score 0.08 points higher than Republicans which is statistically insignificant, especially when taking into account the p-value of .55 which is far more than the significance level. These data are better explained with a regression plot.
plot_model(reg1, type = "pred",
terms = c("pidthree"), ci.lvl = .95,
title="Democrats Engage in Proactive Strategies More than Independents and\nRepublicans", axis.title=c("Party Identification", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
The graph portrays Republicans and Independents engagement in proactive
strategies at similarly low levels, while Democrats show engagement in
proactive strategies at higher rates than both other parties. This is
supported by the slope increase between Independents and Democrats and
the lack of overlap between the confidence intervals. These data are
consistent with my hypothesis that democrats would engage in more
proactive strategies than republicans.
Next, I analyze the relationship between time spent in the US and engagement in proactive strategies. In the regression output, the coefficient of 0.0021 is of very little significance on a 10-point scale and suggests that Latino immigrants do not engage in more proactive strategies as time in the US increases. Additionally, the p-value of .44 is far above the .05 significance level, further emphasizing insignificance. This relationship is easier to see in the following regression plot.
plot_model(reg1, type = "pred",
terms = c("timefrom2025"), ci.lvl = .95,
title="Proactive Behavior by Latino Immigrants Does not Change as a Function of Time\nSpent in the US", axis.title=c("Time Spent in US (Years)", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As it can be seen in the plot, the gradient is essentially flat (only
around a .13 change in number of proactive strategies used in 75 years)
between time spent in the US and proactive engagement, suggesting no
correlation. This finding is not consistent with my hypothesis that more
time spent would result in more proactive engagement in response to
Trump’s immigration enforcement.
Next observing the relationship between personal anxiety about deportation and the extent to which Latino immigrants engage in proactive strategies, we find that for each one-point increase in anxiety, proactive behavior increases by 0.33 points. This demonstrates a strong relationship and high statistical significance, as further evidenced by the p-value no different from 0. This relationship is better understood with a regression plot.
plot_model(reg1, type = "pred",
terms = c("personal_anxiety"), ci.lvl = .95,
title="There is a Strong Correlation Between Personal Anxiety About Deportation and\nProactive Engagement", axis.title=c("Level of Personal Anxiety (1 is Low, 4 is High)", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As visualized, there is a strong correlation between personal anxiety
about deportation and engagement in proactive strategies in response to
immigration enforcement. This one of the more surprising results in my
analysis, as I hypothesized that lower anxiety levels about deportation
would result in higher proactive behavior. I predicted this because I
thought that if a Latino immigrant was less afraid of getting deported,
they would be more likely to engage in proactive strategies. Now seeing
the data, I propose a new hypothesis as a possible explanation for this
trend: Latino immigrants with higher anxiety are more likely to engage
in proactive strategies because they have more of an incentive to create
a safer environment for themselves in the US.
Finally, assessing the relationship between income level and Latino immigrant engagement in proactive strategies, we find the largest variation in any of the independent variables being analyzed. Latino immigrants with medium household incomes score .50 points higher than those with low household incomes. Latino immigrants with high household incomes score .80 points (almost a full point) higher than those with low household incomes. These results are highly statistically significant, especially when taken into account that the p-values are significantly lower than the 0.05 significance level. This relationship is visualized below.
plot_model(reg1, type = "pred",
terms = c("income_level"), ci.lvl = .95,
title="Proactive Engagement Increases as Total Household Income Increases", axis.title=c("Income Level", "Predicted Level of Proactive Engagement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As visualized in the graph, there is a strong downward gradient from
high to low income, meaning that as income increases, so does the level
of proactive activity that Latino immigrants engage in. This data is not
consistent with my original hypothesis that proactive engagement would
increase from low to medium but then descend from medium to high.
My model suggests strong evidence that, while gender and time spent in the US have no effect on how much Latino immigrants engage in proactive strategies, partisanship, personal anxiety about deportation, and income level do have an effect. While these factors do seem to alter how much Latino immigrants engage in proactive strategies, in general, the sample population of immigrants surveyed engage in proactive strategies at relatively low levels, with the average number of proactive strategies engaged in being 1.2. This result is quite interesting and begs another question; Why are engagement levels in proactive strategies in response to immigration enforcement by Latino immigrants low?