As you may recall, for my last homework I was interested in knowing about the effect of immigration upon expectations of student success. Now I am interested in knowing about the effect of immigration (by proxy) on college admittance. In other words, “Are immigrant students more or less likely to be admitted to college than non-immigrant students?”
For this question, I hypothesize that immigrant students will be more likely to attend college than non-immigrant students, as there is a higher expectation of college attendence among immigrant students.
To test this hypothesis, I will need to construct a number of binary variables from the ELS:2002 dataset:
-College Admittance (Dependent)
-Immigration Proxy (Independent)
-Parent’s Expectations of the Student (Independent)
-Socio-Economic Status [includes Parent’s education] (Independent)
#Variable will show "1" if the student has ever attended an institution of higher education in Follow-up 2 (F2), "0" if they have not, and NA for missing or incomplete values.
els<-els %>% mutate(attended_college_f2=case_when(.$f2b07==1 ~ 1,
.$f2b07==0 ~ 0,
.$f2b07 %in% c(-3,-4,-8,-9) ~ NA_real_))
#This variable equals "1" if the student was born in either Puerto Rico or outside the US AND does not speak English at home.
els<-els %>% mutate(immigrant_proxy=case_when(.$bygnstat== 1 & .$byhomlng %in% c(2:6) ~ 1,
.$bygnstat %in% c(2:3) | .$byhomlng==1 ~ 0,
.$bygnstat %in% c(-8,-4,-9) | .$byhomlng %in% c(-4,-8,-9) ~ NA_real_))
els<-els %>% mutate(immigrant_proxy2=case_when(.$immigrant_proxy==1 ~ "immigrant",
.$immigrant_proxy==0 ~ "non-immigrant"))
els$immigrant_proxy2<-as.factor(els$immigrant_proxy2)
#If a student's parent expects them to graduate college or higher, the value is "1". If the parent expects their child to not graduate college or high school, the value is "0". Missing values or skipped questions are "NA".
els<-els %>% mutate(parents_college_exp=case_when(.$byparasp %in% c(5:7) ~ 1,
.$byparasp %in% c(1:4) ~ 0,
.$byparasp %in% c(-4) ~ NA_real_))
els<-els %>% mutate(parents_college_exp2=case_when(
.$parents_college_exp==1 ~ "Parents_w_College",
.$parents_college_exp==0 ~ "Parents_No_College"))
els$parents_college_exp2<-as.factor(els$parents_college_exp2)
#Measures if a student's family has above-median SES. "1" if the student's family is in the top two quartiles, "0" if they are in the bottom two quartiles, "NA" if no response or missing answer.
els<-els %>% mutate(family_high_ses=case_when(.$byses1qu %in% c(3,4) ~ 1,
.$byses1qu %in% c(1,2) ~ 0,
.$byses1qu %in% c(-4,-8) ~ NA_real_))
els<-els %>% mutate(family_high_ses2=case_when(.$family_high_ses==1 ~ "High_SES",
.$family_high_ses==0 ~ "Low_SES"))
els$family_high_ses2<-as.factor(els$family_high_ses2)
Now that our variables have been constructed, we can create a survey design along with a logistic model to explain the odds that students in this dataset will attend college:
des<-svydesign(ids=~stu_id, strata=~strat_id, weights = ~bystuwt,data=els[is.na(els$bystuwt)==F,])
fit.logit<-svyglm(attended_college_f2~immigrant_proxy+parents_college_exp+family_high_ses,design= des, family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
stargazer(fit.logit,type = "html",align=T,ci=T)
| Dependent variable: | |
| attended_college_f2 | |
| immigrant_proxy | 0.016 |
| (-0.184, 0.215) | |
| parents_college_exp | 1.338*** |
| (1.201, 1.475) | |
| family_high_ses | 1.227*** |
| (1.121, 1.333) | |
| Constant | -0.691*** |
| (-0.818, -0.564) | |
| Observations | 12,915 |
| Log Likelihood | -6,862.033 |
| Akaike Inf. Crit. | 13,732.070 |
| Note: | p<0.1; p<0.05; p<0.01 |
Unfortunately, as in the last homework, we see that the immigration proxy that I have constructed is not significant. This implies that either immigration has no effect upon college admittance or that the proxy I have constructed is somehow flawed. I think it’s much more likely that my proxy is flawed, somehow.
Now that the logistic model has been constructed, we can create odds ratios and parameters from the model itself:
exp(cbind(OR=coef(fit.logit),confint(fit.logit)))
## OR 2.5 % 97.5 %
## (Intercept) 0.5009378 0.4411507 0.5688276
## immigrant_proxy 1.0158938 0.8321730 1.2401751
## parents_college_exp 3.8102067 3.3222736 4.3698011
## family_high_ses 3.4124303 3.0693050 3.7939145
As we can see, students who have parents with high college expectations and who have high family-SES, are almost four to three times as likely to attend college than other students, all else being equal. Naturally, this is a very simplistic model and shouldn’t be taken as gospel.
Here, we use data from the logistic model to construct “interesting” cases and predictions. A number of “interesting” students have been created with varying values of “immigrant proxy”, “parents college expectations” and “family SES”. What is interesting is that we find the most important variable to be “parents college expectations”: a typical student who is a non-immigrant, has high family SES but whose parents have no college expectations for them only have a 33% chance of attending college in the F2 followup. Meanwhile, a non-immigrant student with low family SES but who has parental college expectations has an 86% chance of being admitted to a college by the F2 followup:
dat<-expand.grid(immigrant_proxy=levels(els$immigrant_proxy2),parents_college_exp=levels(els$parents_college_exp2), family_high_ses=levels(els$family_high_ses2))
fit<-predict(fit.logit,newdata = dat,type="response")
dat$fitted.prob.logit<-round(fit,3)
head(dat,n=20)
## immigrant_proxy parents_college_exp family_high_ses fitted.prob.logit
## 1 immigrant Parents_No_College High_SES 0.334
## 2 non-immigrant Parents_No_College High_SES 0.337
## 3 immigrant Parents_w_College High_SES 0.656
## 4 non-immigrant Parents_w_College High_SES 0.660
## 5 immigrant Parents_No_College Low_SES 0.631
## 6 non-immigrant Parents_No_College Low_SES 0.635
## 7 immigrant Parents_w_College Low_SES 0.867
## 8 non-immigrant Parents_w_College Low_SES 0.869
Even with this simplistic model, then, we can see the importance of parental college expectations upon student success and college admittance. Although this is not the variable that I set out to explore, it is interesting nonetheless.