Objective: This project will give the student the opportunity to apply statistical modeling techniques to real world public opinion data. Each student will estimate and more importantly, interpret, a model they proposed in deliverable 1 using the Latino Immigrant National Attitude Survey. The first deliverable was worth 100 points; this second component will be worth 500 points. This RMD file is critical as it contains recodes of a number of independent variables that will be used by some of you. I will update the recoding section of it to accommodate some of the proposed independent variables, but I will not recode all of the proposed independent variables.
You are to use this RMD to file to produce a final HTML that will be submitted on Canvas by Wednesday, December 10 at 11:59 PM. However, there are a series of extra credit incentives in place in order to induce avoidance of turning it in at the last minute. They are:
Should you submit the HTML for part 2 on Canvas by Friday, Dec. 5, 11:59 PM, you will receive an 8% bonus to your grade. Eight percent of 500 is 40 points.
Should you submit the HTML for part 2 on Canvas by Sunday, Dec. 7, 11:59 PM , you will receive a 5% bonus to your grade. Five percent of 500 is 25 points.
Should you submit the HTML for part 2 on Canvas by Tuesday, Dec. 9, 11:59 PM, you will receive a 3% bonus to your grade. Three percent of 500 is 15 points.
Should you submit the HTML for part 2 on Canvas by Thursday, Dec. 11, 11:59 PM, you will receive a 0% bonus to your grade. This will be the final submission time.
On Canvas, there will be 4 portals for submission, one each for these options.
The following chunk of code will access the LINAS data.
linas.1="https://raw.githubusercontent.com/mightyjoemoon/LINAS2025/main/linas_may2025_weighted_csv.csv"
linas.1<-read_csv(url(linas.1))
#summary(linas.1)
Each student is required to analyze the relationship between gender and party affiliation. In the next two chunks, you will see code producing these two variables.
Gender is a factor-level variable recorded as “female” and “male”. For purposes of statistical analysis, “male” is the baseline category. It is the student’s responsibility to understan what this means. This variable is called gender.
linas.1$gender <- factor(linas.1$s3,
levels=c("1", "2"),
labels=c("Male", "Female"))
table(linas.1$gender)
##
## Male Female
## 485 514
Party affiliation is a three-level factor variable recorded as “Republican” for Republicans, “Democrats” for Democrats, and “Ind./Other” for Independent and other identifiers. This factor variable treats partisan “leaners” as partisans. This will be explained in class. The code is a bit lengthy so don’t alter it. To derive the 3-level factor, I first created a variable to identify the leaners. From this I create the variable for party identification; this variable is called pidthree.
##Coding for party: multi levels
linas.1$pid[linas.1$q65==1 & linas.1$q66==1] <- 1
linas.1$pid[linas.1$q65==1 & linas.1$q66==2] <- 2
linas.1$pid[linas.1$q65==3 & linas.1$q67==1] <- 3
linas.1$pid[linas.1$q65==3 & linas.1$q67==3] <- 4
linas.1$pid[linas.1$q65==3 & linas.1$q67==2] <- 5
linas.1$pid[linas.1$q65==2 & linas.1$q66==2] <- 6
linas.1$pid[linas.1$q65==2 & linas.1$q66==1] <- 7
linas.1$pid[linas.1$q65==3 & linas.1$q67==4] <- 8 #Independent leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==4] <- 9 #Other leans other
linas.1$pid[linas.1$q65==4 & linas.1$q67==1] <- 3 #Other leans Rep
linas.1$pid[linas.1$q65==4 & linas.1$q67==2] <- 5 #Other leans Dem
linas.1$pid[linas.1$q65==4 & linas.1$q67==3] <- 12 #Other leans Independent
## Note that the code below will exclude: Independents who lean "other"; "Other" identifiers who lean "other"; and "Other that leans Independent"
linas.1$pidseven <- factor(linas.1$pid,
levels=c(1,2,3,4,5,6,7),
labels=c("SR", "R", "LR", "I", "LD", "D", "SD"))
## Coding for party: 3 levels. Note that leaners are treated as partisans. Republicans are baseline category
linas.1$pidthree<- factor(linas.1$pid,
levels=c(1,2,3,4,5,6,7, 8, 9, 12),
labels=c("Republican", "Republican", "Republican", "Ind./Other",
"Democrat", "Democrat", "Democrat", "Ind./Other",
"Ind./Other", "Ind./Other"))
table(linas.1$pidthree)
##
## Republican Ind./Other Democrat
## 236 293 471
The criminality narrative questions are based on q12 and q13. These items were split-sampled so we cannot summate them into a scale. We can consider detention and deportation separately but to do so results in loss of half the data. I’ve created a variable called endorse_narrative which pools these responses. Higher scores reflect higher endorsement of the criminality narrative.
linas.1$detain_criminal <- linas.1$q12
linas.1$deport_criminal <- linas.1$q13
#Rescale criminality such that high scores=endorsement
linas.1$endorse_narrative <- (6-linas.1$criminality)
table(linas.1$endorse_narrative)
##
## 1 2 3 4 5
## 253 230 221 185 111
This chunk of code produces latino_identity which is based on q44. High scores reflect greater identity.
## Coding for Latino identity
#Leaving scale as is
linas.1$latino_identity<-linas.1$q44
table(linas.1$latino_identity)
##
## 1 2 3 4 5
## 28 24 152 387 409
Question q42 measures beliefs about immigrant discrimination. This variable is named discrim. High scores reflect beliefs that discrimination levels are very low (i.e. denial of discrimination).
#Denial of discrimination
#[q42]: How much discrimination is there in the United States today against immigrants?
#Values: 1-5
#1 A lot
#2 Some
#3 Not much
#4 None
#5 Don't know
linas.1$discrim <- linas.1$q42
linas.1$discrim[linas.1$q42==1] <- 1 #Alot
linas.1$discrim[linas.1$q42==2] <- 2 #Some
linas.1$discrim[linas.1$q42==5] <- 3 #DK
linas.1$discrim[linas.1$q42==3] <- 4 #Not much
linas.1$discrim[linas.1$q42==4] <- 5 #None
table(linas.1$discrim)
##
## 1 2 3 4 5
## 493 326 53 94 34
Here is where your story begins. You will describe your data and then estimate and interpret a linear regression model.
What is your research question and what are the main features of your dependent variable? You should follow my example but use your own data and language. Do not cut and paste what I write; this will lead you down a path you don’t want to go down. This section is worth 100 points.
What is the research question you are addressing?
Who are the Latino immigrants that endorse the criminality narrative?
Why should anyone care about what it is you’re doing?
The immigrant criminality narrative is prevalent in political rhetoric. The narrative associates immigrants with violent criminality and is closely connected to Latino immigrants. It thus becomes natural to ask who the Latino immigrants are that endorse this narrative, a narrative that seemingly would adversely affect Latino communities.
What is your dependent variable measuring and what does the distribution of the variable look like? Below is shell code that produces a barplot using a variable called “endorese_narrative.” You will plot your dependent variable obviously.
My dependent variable is the immigrant criminality narrative measure. This item is based on the combined survey questions “Most Latino immigrants who are in immigration detention facilities/who are deported have probably committed serious crimes in the United States. The variable is coded such that a”1” denotes strongly disagree with the statement, and “5” denotes strongly agree with the statement. In other words, higher scores denote greater endorsement of the criminality narrative. Below is a bar plot of the distribution of respondents who endorse the criminality narrative.
ggplot(linas.1, aes(x=endorse_narrative, y = after_stat(count/sum(count)))) +
geom_bar(fill = "lightskyblue4") + scale_y_continuous(labels = percent) +
labs(title="About 30 percent of respondents agree or strongly agree that most detained and \ndeported immigrants have committed serious crimes",
y="Percent of sample",
x="Level-of-endorsement of the criminality narrative (1=strongly disagree/5=strongly agree)") +
theme_classic() +
theme(axis.text.x = element_text(size=7, angle=0, hjust=.5),
axis.ticks = element_blank(),
axis.text.y = element_text(size=8),
plot.title = element_text(size=9),
axis.title.y=element_text(size=8),
axis.title.x=element_text(size=8))
What are the main features of your plot?
This is a barplot of the dependent variable. It reveals two important results. First, most Latino immigrants do not endorse the immigrant criminality narrative. About 48% disagree or disagree strongly with the claim that most detainees/deportees are serious criminals; however the plot also reveals that nearly 30% of Latino immigrants agree or agree strongly with the claim. This makes the research question of who the Latino immigrants are that endorse the narrative a natural question to ask.
In this section you will assess the relationship between your dependent variable and your independent variables. This section is worth 500 points.
What are your independent variables. Use natural language and not the literal name of the variable you are interpreting (i.e. if you’re using the variable “endorse_narrative”, do not use the language “endorse_narrative” since no one knows what this means; use substantive language.)
I hypothesize that four variables will be related to beliefs about the criminality narrative: gender, party affiliation, beliefs about discrimination, and strength of Latino idenity. Gender is coded as binary denoting males (\(n=485\)) and females (\(n=514\)). Party affiliation is a three-level factor variable denoting Republicans (\(n=236\)), Democrats (\(n=471\)), and Independents/other (\(n=293\)). Strength of Latino identity records how strongly one agrees or disagrees with the statement “The fact that I am Latino is an important part of my identity.” Higher scores on this measure represent Latinos who are strong identifiers and lower scores represent Latinos who are not strong identifiers. Beliefs about discrimination records Latino immigrants beliefs about the amount of discrimination there is against immigrants. High scores on this variable denote respondents who say there is very little discrimination; in other words, these are the respondents who deny the existence of discrimination. As outlined in my first deliverable, I expect: 1) women to endorse the narrative at lower rates than men; 2) Republicans will endorse the narrative at significantly higher rates than other partisan groups; 3) high Latino identity will be associated with low rates of endorsement; and 4) those who deny the existence of discrimination will be more likely to endorse the narrative.
Here is where you will estimate the linear regression model. The code below is based on my worked example; yours will be based on your deliverable 1 proposal.
reg1 <- lm(endorse_narrative ~ gender + pidthree + discrim + latino_identity, data=linas.1, weights=weight)
summary(reg1)
##
## Call:
## lm(formula = endorse_narrative ~ gender + pidthree + discrim +
## latino_identity, data = linas.1, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.7949 -1.0682 -0.1874 0.8463 3.4047
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.84650 0.22592 17.026 < 0.0000000000000002 ***
## genderFemale -0.12538 0.07888 -1.589 0.112
## pidthreeInd./Other -0.70858 0.11069 -6.402 0.000000000236742 ***
## pidthreeDemocrat -0.81828 0.10471 -7.815 0.000000000000014 ***
## discrim 0.15984 0.03794 4.213 0.000027471079774 ***
## latino_identity -0.19585 0.04265 -4.592 0.000004945602369 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.24 on 993 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1334, Adjusted R-squared: 0.129
## F-statistic: 30.57 on 5 and 993 DF, p-value: < 0.00000000000000022
I estimated a linear regression model and the results are displayed in the table above. Based on these results, I find that there is no gender gap associated with endorsement of the criminality narrative. Males score about .13 points higher than females, but on a 5-point scale, this difference is tiny. Further, the estimated coefficient is no different from 0. This is easier to see visually.
plot_model(reg1, type = "pred",
terms = c("gender"), ci.lvl = .95,
title="There are no gender differences in the endorsement of the criminality narrative", axis.title=c("Gender", "Predicted level-of-endorsement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As shown in the plot given above, the difference in predicted level-of-endorsement for males and females is nearly identical with highly overlapping confidence intervals. There is no evidence of a gender gap. The absence of a relationship is not consistent with my research expectations for gender.
Turning next to partisanship, it is clear from the regression results that there are substantial differences due to party. We see that Democratic identifiers endorse the narrative about .82 points lower than Republicans; on a 5-point scale, this is nearly a 1 full point difference. Similarly, we see that Independent/Other identifiers score about .71 points lower than Republicans. This result suggests that this group endorses the narrative significantly lower than Republicans. This is easily seen in the regression plot shown below.
plot_model(reg1, type = "pred",
terms = c("pidthree"), ci.lvl = .95,
title="Republicans endorse the criminality narrative at rates significantly \nhigher than Democrats or Independent/Other identifiers", axis.title=c("Party affiliation", "Predicted level-of-endorsement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
This figure clearly shows a large partisan gap between Republicans and all non-Republican identifiers. The figure also shows there is no significant differences between Independent/Other identifiers and Democrats.
Next, I consider the relationship between denial of discrimination and endorsement of the criminality narrative. I expected that those who believe there is no discrimination against immigrants (discrimination deniers) will endorse the narrative at rates higher than those who believe there is a great deal of discrimination against immigrants. The slope coefficient of .16 is significantly different from 0 and suggests that those who do not see discrimination as a problem endorse the criminality narrative more highly relative to those who see discrimination as being a problem for immigrants. This relationship is shown in the plot below.
plot_model(reg1, type = "pred",
terms = c("discrim"), ci.lvl = .95,
title="High discrimination deniers endorse the criminality narrative at rates significantly \nhigher than low discrimination deniers", axis.title=c("Denial of discrimination (1=low denial; 5=high denial)", "Predicted level-of-endorsement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
As is clear from the figure, there is strong association between discrimination denial and endorsement of the criminality narrative. High deniers are almost a full point higher on the endorsement scale than are low deniers, a result consistent with research expectations.
Finally, consider the role of Latino identity. I predicted that the high Latino identifiers will endorse the criminality narrative at significantly lower levels than low Latino identifiers. The regression results are consistent with this expectation. We see that for a one-point increase on the Latino identity measure, endorsement levels decrease by about .20 points. This relationship is visually depicted below.
plot_model(reg1, type = "pred",
terms = c("latino_identity"), ci.lvl = .95,
title="High Latino identifiers endorse the criminality narrative at rates significantly \nlower than low Latino identifiers", axis.title=c("Strength-of-identity (1=low identity; 5=high identity)", "Predicted level-of-endorsement"), colors=c("skyblue4")) + geom_line(color="skyblue4", linetype=3, linewidth=.4) +
#ylim(0,4) +
theme_classic() +
theme(axis.text.x = element_text(size=10, angle=0, hjust=.5),
axis.ticks = element_blank())
The downwardly sloping regression line shows that the gap between high identifiers and low identifiers is almost one full point (on a 5-point scale). This gives evidence consistent with my research expectations.
To conclude, my model shows strong evidence that partisanship, Latino identity, and denial of discrimination seem to impact judgment about the criminality narrative. Interestingly, gender is unrelated to endorsement.