Objective: This project will give the student the opportunity to….
linas.1="https://raw.githubusercontent.com/mightyjoemoon/LINAS2025/main/linas_may2025_weighted_csv.csv"
linas.1<-read_csv(url(linas.1))
#summary(linas.1)
##
## 2 3 4 5 6
## 162 247 235 176 180
This chunk codes country-of-origin as: 1) raw factor by country (21 levels); 2) a 7-level factor variable coded for South American, Central America (excluding Northern Triangle), Northern Triangle, Cuba, Dominican Republic, Mexico, and Other and 3) binary coded as Mexico, not Mexico.
The precoded variable “educat” is a good way to account for educational differences.
I am creating a factor-level variable for s10 giving it value-labels.
What year did you first arrive to live in the United States?
q1 is coded 1=2025, 2=2024, 101=1925. If we subtract 1 from this
variable, we have an approximation of the number of years spent in the
US.
I am reverse scoring this to create new variable called “NoFamilyHere” (1 if no family; 0 if family)
Code party as a 4-level factor variable and then create traditional 7-point scale
4-point
7-point:
I can create a summative scale.
I’ve created a summative scale of these items then rescaled them to have a 0 point then rescaled again to make high scores supportive of these policies. The scale thus ranges from 0 (total opposition) to 12 (full support)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.147 2.000 4.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.014 2.000 4.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.463 3.000 4.000
These are questions q46 and 47
These items were split-sampled so we cannot summate them. We can consider detention and deportation separately. In the data set I’ve created a variable called “criminality” which pools these responses. The difference in responses is very minimal.
7-point scale
Scaled measure of health outcomes.
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 132 103 83 88 107 56 79 61 48 91 54 16 82
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 132 103 83 88 107 56 79 61 48 91 54 16
table(linas.1$income)
## Project 2 Analysis
In the remaining part of this Rmd file, you will tell your social science story by considering variation in your selected dependent variable as a function of your independent variables. We will start by considering the univariate statistics of your items. In real social science research, the first thing a researcher needs to do is know their data inside and out. The last thing you want is to be taken by surprise. So over a series of tasks, we will build up a story consisting of statistical analysis. So let's get going.
### Task 1: Theoretical motivation
What is your dependent variable measuring? How was the dependent variable constructed? What motivates you to use this dependent variable in your study? Please provide a thoughtful answer to this question below. This is worth 100 points.
#### Put answer for Task 1 here below
I am interested in exploring the correlation of two independent variables on the immigrant's view of draconian policies. I find this variable particularly interesting because we would expect immigrants to be incredibly resistant to anti immigration policies, yet there is a surprising amount of variation, which I would like to explain. The first variable I will be investigating is income. I am interested if immigrants who are more economically entrenched and have more resources are more likely to support policies that would austensibly make it marginally more difficult for other immigrants. The second variable I would like to explore is how the time spent in the country affects people's view on anti-immigratory policy. My hypothesis is similar to the first, the longer you've been in the country the more entrenched you become, the more likely you are to favor anti-immigration policies.
While my hypothesis states there is a correlation between support of draconian policies and above variables, seeing any data on the correlation (or lack there of) would be informative. Seeing no correlation would suggest time and wealth are not transformative of people's ideological opinions on immigration. Seeing a negative correlation would suggest as people become more entrenched they actually become more empathetic to other immigrants, or are just more conscientious of United States politics in general.
Also as part of the embedded analysis we are comparing draconian policies to party preference. We would assume this is the strongest predictor of anti immigration policy because Republicans generally endorse these policies while Democrats generally renounce them
### Task 2: Describe the dependent variable
In the chunks below, include code that can be used to summarize and describe your dependent variable. This could (and should) include code to compute univariate statistics as well as any visualizations you might provide (barplots, boxplots, histograms, etc). In doing research, the first thing you need to know are the main features of your variables. In this case, we want to understand the main features of the dependent variable for your analysis. After you compile the descriptive statistics and plots, provide a summary of the measure written as if you were including this write-up for an academic paper. This task in total is worth 200 points.
``` r
summary(linas.1$support_draconian)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 3.000 3.593 6.000 12.000
mean(linas.1$support_draconian)
## [1] 3.592593
sd(linas.1$support_draconian)
## [1] 3.580151
summary(linas.1$birthright2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.003 2.000 4.000
summary(linas.1$alienenemy2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.126 2.000 4.000
summary(linas.1$registry2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.463 3.000 4.000
plot2<- ggplot(linas.1)+
scale_x_continuous(breaks=0:12) +
geom_bar(aes(x = support_draconian),fill="dodgerblue")+
labs(title="#1 Immigrant Support of Anti-Immigration Policy \n 12 point scale", y="# Of Responses", x="Level Of Agreement") +
theme_classic()
plot2
plot1<- ggplot(linas.1)+
scale_x_continuous(breaks=0:12) +
geom_bar(aes(x = birthright2),fill="dodgerblue")+
labs(title="#2 Immigrant Support of Removal of Birthright Citizenship Policy \n 5 point scale", y="# Of Responses", x="Level Of Agreement") +
theme_classic()
plot1
plot3<- ggplot(linas.1)+
scale_x_continuous(breaks=0:12) +
geom_bar(aes(x = alienenemy2),fill="dodgerblue")+
labs(title=" #3 Immigrant Support of Use of Alien Enemies Act Policy \n 5 point scale", y="# Of Responses", x="Level Of Agreement") +
theme_classic()
plot3
plot4<- ggplot(linas.1)+
scale_x_continuous(breaks=0:12) +
geom_bar(aes(x = registry2),fill="dodgerblue")+
labs(title="#4 Immigrant Support of National Immigration Registry \n 5 point scale", y="# Of Responses", x="Level Of Agreement") +
theme_classic()
plot4
The data we measured exists on two likert scales, zero being strongly disagree. Graphs 2-4 are immigrant’s viewpoints on removal of birthright citizenship, use of the Alien Enemies act to Against Immigrants, and support for a National Registry for immigrants. Graph 1 is a summative bar graph combining the data of all three graphs into a measure I will call (support for draconian policies). Examining the data we find an extreme rightward skew for all four variables. Set 2-4 ranged from a mean of 1-1.5 suggesting on average immigrants slightly disagree with those policies. However it is important to note that the plurality of every graph was strong disagreement with the draconian policies, causing the extreme right skew. This tells us the majority of immigrants either strongly disagree, or slightly disagree with draconian immigration policies. The other bins were divided relatively evenly.
The notable outlier to this of relatively even distribution among graphs 2-4 was bin 2 in the support for National Immigration Registry. Not only does Bin 2 have the highest peak outside of strongly disagreeing of any of the data sets, the entire graph has the highest average agreement rate of all the policies. This is suggestive of two things. The first is that a National Immigration Registry is the most relatively popular draconian policy among immigrants. The second is that immigrants may be the least informed about what would be entailed by a national immigration registry, leading to the highest number of (neither agree or disagree) responses.
Under closer examination Graph 1 reveals a peculiar detail. While the vast majority of immigrants look disfavorably on Draconian Policies, the top 25% of immigrant responses are actually neutral to favorable. So there is a significant proportion of immigrants who are ambivalent or support these policies. This goes against my initial expectations, as I would have believed disfavorability to hover around 85%-95%”
Are there significant differences in your dependent variable by gender of the respondent. This is the task I am asking you to do here. There are multiple ways one can assess this, but we’ll consider a linear regression approach. Estimate a bivariate regression model treating the dependent variable as a function of the variable gender. After this, plot the regression function. Include the code in the chunk below and then follow this up with a professional-grade interpretation of the results. This task is worth 100 points.
#Insert code in this chunk
task3 <- lm(support_draconian ~ gender, weights=weight, data=linas.1)
summary(task3)
##
## Call:
## lm(formula = support_draconian ~ gender, data = linas.1, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -5.3343 -3.0948 -0.7744 2.5673 9.8173
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7294 0.1626 22.94 <0.0000000000000002 ***
## genderFemale -0.4310 0.2330 -1.85 0.0646 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.521 on 915 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.003727, Adjusted R-squared: 0.002638
## F-statistic: 3.423 on 1 and 915 DF, p-value: 0.06462
task3modelplot <- plot_model( task3, type="pred", terms=c("gender"), ci.lvl=.95,
title="Average Support for Draconian Policies by Gender", axis.title=c("Gender","Support"), colors=c("dodgerblue"))+theme_bw()
mean(linas.1$support_draconian)
## [1] 3.592593
task3modelplot
Using a bivariate regression model we find a significant correlation between gender and support for draconian immigration policy. We observe on average Males score 3.8/12 for support for draconian immigration and female score an average of 3.3/12. While further research is needed to verify why this gap exists I have several hypotheses. The gap may be attributable to differences in stress caused by draconian immigration policy. My peers are analyzing this question, but it is not currently in my domain of expertise. It may also be due to differences in immigrant family structures. Women may play a more important role in maintaining and developing familial relations, and the added worry over more vulnerable friends and family members could be attributed to higher levels of disagreement with draconian policy. There could also be significant proportions of men who come to the united states without family, which would heavily alter their social circles and perceived impact of aforementioned policy
For this analysis, estimate and plot a linear regression model treating the dependent variable as a function of: gender, party (using the four-level version [i.e. “pidfour”]), and immigration status. Create a regression plot for the party affiliation and immigration status variables and provide a thorough interpretation of the results. What do we learn about the relationship between party and your dependent measure? Between immigration status and your dependent measure? This task is worth 200 points in total.
#Insert code in this chunk
task4.1 <- lm(support_draconian ~ gender + pidfour + status , weights=weight, data=linas.1)
task4.11 <- lm(support_draconian ~ gender + pidseven + status , weights=weight, data=linas.1)
summary(task4.1)
##
## Call:
## lm(formula = support_draconian ~ gender + pidfour + status, data = linas.1,
## weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -9.4106 -2.4258 -0.5116 2.1693 10.4976
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.42730 0.28005 22.951 < 0.0000000000000002 ***
## genderFemale -0.34703 0.21467 -1.617 0.1063
## pidfourDem -3.90009 0.29860 -13.061 < 0.0000000000000002 ***
## pidfourInd -3.27100 0.30377 -10.768 < 0.0000000000000002 ***
## pidfourOth -2.48116 0.44107 -5.625 0.0000000247 ***
## statusLPR 0.15197 0.25669 0.592 0.5540
## statusVisa 0.96391 0.49242 1.957 0.0506 .
## statusTemp -0.28854 0.38236 -0.755 0.4507
## statusNOTA -0.09979 0.41870 -0.238 0.8117
## statusPNTS -0.54586 0.57807 -0.944 0.3453
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.205 on 907 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1815, Adjusted R-squared: 0.1734
## F-statistic: 22.34 on 9 and 907 DF, p-value: < 0.00000000000000022
summary(task4.11)
##
## Call:
## lm(formula = support_draconian ~ gender + pidseven + status,
## data = linas.1, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -8.4103 -2.2119 -0.6945 1.9102 10.8227
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.32741 0.34292 21.368 < 0.0000000000000002 ***
## genderFemale -0.45979 0.21947 -2.095 0.036477 *
## pidsevenR -2.00730 0.47348 -4.239 0.0000249 ***
## pidsevenLR -2.30282 0.59655 -3.860 0.000122 ***
## pidsevenI -3.67310 0.40070 -9.167 < 0.0000000000000002 ***
## pidsevenLD -5.42291 0.43457 -12.479 < 0.0000000000000002 ***
## pidsevenD -4.67545 0.39689 -11.780 < 0.0000000000000002 ***
## pidsevenSD -4.85195 0.40116 -12.095 < 0.0000000000000002 ***
## statusLPR 0.14105 0.26065 0.541 0.588544
## statusVisa 1.13698 0.50885 2.234 0.025724 *
## statusTemp -0.19763 0.39942 -0.495 0.620885
## statusNOTA -0.13503 0.44239 -0.305 0.760274
## statusPNTS -0.08404 0.68573 -0.123 0.902491
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.137 on 823 degrees of freedom
## (82 observations deleted due to missingness)
## Multiple R-squared: 0.2432, Adjusted R-squared: 0.2322
## F-statistic: 22.04 on 12 and 823 DF, p-value: < 0.00000000000000022
#Insert code in this chunk
task4.1modelplot <- plot_model( task4.1, type="pred", terms=c("pidfour"), ci.lvl=.95,
title="#3 Correlation of Support for Draconian Policies by Political Party", axis.title=c("Party ID","Support"), colors=c("dodgerblue"), geom_line(color="dodgerblue"))+theme_bw()
task4.5modelplot <- plot_model( task4.11, type="pred", terms=c("pidseven"), ci.lvl=.95,
title="#2 Correlation of Support for Draconian Policies and Political Party", axis.title=c("Party ID","Support"), colors=c("dodgerblue"), geom_line(color="dodgerblue"))+theme_bw()
task4.2modelplot <- plot_model( task4.1, type="pred", terms=c("status"), ci.lvl=.95,
title="#4 Correlation of Support for Draconian Policies and Immigration Status", axis.title=c("Immigration Status","Support"), colors=c("dodgerblue"))+theme_bw()
task4.4modelplot <- plot_model( task4.1, type="pred", terms=c("gender"), ci.lvl=.95,
title="#1 Correlation of Support for Draconian Policies and Gender", axis.title=c("Gender ID","Support"), colors=c("dodgerblue"), geom_line(color="dodgerblue"))+theme_bw()
task4.4modelplot
task4.5modelplot
task4.1modelplot
task4.2modelplot
In the new set of graphs we examine the relationship between the draconian policy summative variable and Gender, Party Identification, and Immigration Status. The analysis of gender differences between men and women yields much the same result as task 3, a significant correlation with males rating approval for draconian immigration policies around 0.5 points higher than females. For further hypotheses on why that might be the case see task 3.
Party Identification provides the best predictor of any variable we’ve examined so far, confirming my hypothesis in task 1. Republicans Score 4 points higher than Democrats for support of draconian policies, 3.3 points higher than independents, and 2.3 points higher than others. This essentially bifurcates our subjects into two definable groups. Republicans are likely to support draconian immigration policies, and every other political affiliation is likely not too. This makes intrinsic sense. Republicans have openly run on implementing the above listed immigration policies so it’s no surprise we see republican immigrants be in favor of their parties platform. Inversely, any other political identification not tied to a party that explicitly endorses these policies are likely to disagree with these policies.
It is nevertheless informative to know that immigrants largely stand on party lines regarding anti-immigration stances, despite being at risk for adverse effects of draconian policy. Further analysis could be completed to see if Republican immigrants have access to practical mitigation strategies involving anti-immigrant policies, or if they are equally at risk as their counterparts.
It would also be helpful to find out what political affiliation the (other) category held. I suspect this group is largely apolitical. If this was true it would show how distancing yourself from political identification makes you more supportive of draconian policy than an independent; suggesting higher base level endorsement.
It is worth noting that when we examine graph 3 we find there is a significant swing between strong republicans and leaning Republicans, while we don’t see a significant shift between strong and leaning Democrats. This could suggest the formation of separate ideological groups within immigrant republicans, while democrats appear to be much more homogenous.
Immigration Status is by far the most interesting variable, as we find no significant correlation between any immigration status and support for draconian immigration policy. We would have expected to see statuses like Legal Permanent Residents and Naturalized Citizens to score higher than Statuses like Temporary Visa’s, Prefer not to say, or None of the Above. This is because we would expect LPR’s and NAT’s to feel they have stronger legal protections and therefore have less risk of being targeted by anti-immigration policies. Compare this with temporary Visa’s which are easier to overstay, or have outright revoked; or PNTS and NOTA who we expect may have undocumented immigrants included. More research needs to be completed in order to find out why this corollary does not hold.
You previously were asked to “find” two additional predictor variables, \(x_k\), to help explain variation in your dependent variable. For this task, I want you to consider a linear regression model that includes the demographic factors from Task 3 as well as the two additional \(x\) variables. Estimate the regression model and plot the regression results for the two \(x\) variables. Provide a thorough and professional-grade interpretation of the model, specifically focusing on the two new variables. How do you expect these variables to be related to the dependent variable (i.e. positively, negatively, you have no clear prediction)? These expectations are essentially your testable hypotheses. Are the results consistent with the hypotheses? Yes? No? This task is worth 200 points.
Include your code in the chunks below.
#Insert code in this chunk
task5.1 <- lm(support_draconian ~ gender + pidfour + status+ timefrom2025 + incomef , weights=weight, data=linas.1)
summary(task5.1)
##
## Call:
## lm(formula = support_draconian ~ gender + pidfour + status +
## timefrom2025 + incomef, data = linas.1, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -9.1635 -2.2888 -0.5656 2.1901 10.6471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.908496 0.449609 15.366 < 0.0000000000000002 ***
## genderFemale -0.281761 0.212612 -1.325 0.185431
## pidfourDem -3.459972 0.305651 -11.320 < 0.0000000000000002 ***
## pidfourInd -2.880554 0.311062 -9.260 < 0.0000000000000002 ***
## pidfourOth -2.240832 0.447461 -5.008 0.000000663 ***
## statusLPR -0.102540 0.262707 -0.390 0.696392
## statusVisa 0.395637 0.500077 0.791 0.429065
## statusTemp -0.629712 0.398465 -1.580 0.114381
## statusNOTA -0.345142 0.422728 -0.816 0.414453
## statusPNTS -0.844043 0.584239 -1.445 0.148896
## timefrom2025 -0.025385 0.006827 -3.718 0.000213 ***
## incomef20-29k -0.498105 0.408696 -1.219 0.223254
## incomef30-39k 0.334413 0.435897 0.767 0.443175
## incomef40-49k -0.195784 0.434690 -0.450 0.652532
## incomef50-59k -0.597767 0.414989 -1.440 0.150093
## incomef60-69k -0.396648 0.502579 -0.789 0.430189
## incomef70-79k -0.199735 0.455498 -0.438 0.661132
## incomef80-89k 0.459208 0.486237 0.944 0.345214
## incomef90-99k -0.470035 0.543878 -0.864 0.387694
## incomef100-149k -0.099356 0.454588 -0.219 0.827041
## incomef150-199k 2.268207 0.558054 4.064 0.000052370 ***
## incomef200k+ 0.762844 0.880310 0.867 0.386414
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.145 on 895 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.2225, Adjusted R-squared: 0.2042
## F-statistic: 12.2 on 21 and 895 DF, p-value: < 0.00000000000000022
#Insert code in this chunk
table(linas.1$income)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 132 103 83 88 107 56 79 61 48 91 54 16
task5.2modelplot <- plot_model( task5.1, type="pred", terms=c("incomef"), ci.lvl=.95,
title="Correlation between Support for Draconian Policies and Income", axis.title=c("Immigration Status","Support"), colors=c("dodgerblue"))+theme_bw()
task5.2modelplot
#Insert code in this chunk
task5.1modelplot <- plot_model( task5.1, type="pred", terms=c("timefrom2025"), ci.lvl=.95,
title="Negative Correlation between Support for Draconian Policies and Time in United States", axis.title=c("Years","Support"), colors=c("dodgerblue"))+theme_bw()
task5.1modelplot
The models for task 5 examine the correlation between support for draconian policy, amount of time spent in the United States, and income earned. Income was measured in 10,000 dollar increments untill 100k and then 50k dollar incriments untill 200k. When we examine support for draconian policy and income we observe an interesting phenomenon. There is no correlation when we examine any group making under 150 thousand dollars (<100k). However when we examine the group of immigrants who make 100-150k the amount of support shoots up significantly by 2.25 points. We cannot determine from the data if the trend continues beyond 200k because of the limited number of respondents who make over that amount. Despite that, the data still suggests that there is some tipping point of income that is massively correlated with an increase in support for draconian policies. I would hypothesize this has less to do with the actual insulative effect of wealth from the downsides of anti-immigration policy, and more has to do with a cultural or status shift emblematic of said wealth. This is because the respondents are surveyed from all over the country, and come from places that have drastically different costs of living. Yet we see a rise in the same range. This could signal the correlation is less explained by the wealth itself, and more an isolating effect of this particular wealth bracket. However we do not have the data for exact income, so we can only see there is an uptick in the 150-200k bin, so it is hard to narrow down an exact turning point the effect takes place at.
The Correlation between Time Spent in the U.S and Support for Draconian Immigration Policy conforms much better to a linear model. There is negative correlation between the two, resulting in around a 2% drop in support year over year. This provides us with a fascinating insight. Immigrants who spend more and more time in the U.S become more opposed to anti-immigration policy. A hypothesis for why this could be, is as immigrants spend more time in the United States they become more socially and politically connected. These connections may increase their disapproval for draconian policies as they have more negative experiences with enforcement, have more idea of the normative legal procedure compared to the new policy, and have had more time to foster a life in America that is being put in jeopardy. However too get more detailed answers into the cause of this relationship I suspect you would need a more qualitative approach then a survey. Surveys like this are unlikekly to capture the full range of the subjects thoughts, or touch on the many confounding variables I have a hard time considering as a non-immigrant.