I am interested in the effects of immigration upon student expectations of success. Much research has been done to suggest that immigrant adolescents – with the exception of undocumented immigrants – generally expect themselves to accomplish high levels of student success after high school. It is probably impossible to estimate the proportion of undocumented students in the ELS dataset, which then suggests that the actual educational expectations among migrant students in this data set are difficult to predict.
My hypothesis is that the proportion of undocumented students in the ELS dataset is likely very low, and consequently migrant students will have higher mean educational expectations than non-migrant students.
To test this, we will need to construct a number of binary variables, and understand some serious caveats which go along with these new variables.
I propose to create the following binary variables:
-Educational Expectations of the Student (Dependent) -Immigration Proxy (Independent) -Parent’s Educational Level (Independent) -Parent’s Expectations of the Student (Inependent)
Because parental expectations and experiences likely have a strong effect upon the student’s expectations, I expect that both the parent’s educational level (college graduate or above) and parent’s expectations (typically quite high) of the student should have a positive effect upon the student’s educational self-expectation independent of the student’s immigration status.
First, I created a variable called “college expectations”, which will be “1” if the student expects to graduate college and “0” if they do not. Missing data is “NA”.
els<-els %>% mutate(college_expectations=case_when(.$bystexp %in% c(5:7) ~ 1,
.$bystexp %in% c(1:4,-1) ~ 0,
.$bystexp %in% c(-8,-4) ~ NA_real_))
Then I created a new “immigration proxy” variable. This variable equals “1” if the student was born in either Puerto Rico or outside the US AND does not speak English at home. If student is born in the US OR speaks English at home, it will equal “0”. Missing values or skipped questions are “NA”.
Obviously this is an imperfect proxy since students from Puerto Rico are citizens of the United States. We are assuming, for at least the sake of this homework, that someone born outside the continental United States and who speaks a non-English language at home is likely an immigrant.
els<-els %>% mutate(immigrant_proxy=case_when(.$bygnstat== 1 & .$byhomlng %in% c(2:6) ~ 1,
.$bygnstat %in% c(2:3) | .$byhomlng==1 ~ 0,
.$bygnstat %in% c(-8,-4,-9) | .$byhomlng %in% c(-4,-8,-9) ~ NA_real_))
els<-els %>% mutate(immigrant_proxy2=case_when(.$immigrant_proxy==1 ~ "immigrant",
.$immigrant_proxy==0 ~ "non-immigrant"))
Here I created a new variable to measure parent’s highest level of education. If a student’s parent is a college graduate or higher, the value is “1”. If neither parent has completed college, the value is “0”. Missing values or skipped questions are “NA”.
els<-els %>% mutate(parents_college=case_when(.$bypared %in% c(6:8) ~ 1,
.$bypared %in% c(1:5) ~ 0,
.$bypared %in% c(-4,-8,-9) ~ NA_real_))
I also created a new variable to measure the parent’s expectations of their child’s future level of education. If a student’s parent expects them to graduate college or higher, the value is “1”. If the parent expects their child to not graduate college or high school, the value is “0”. Missing values or skipped questions are “NA”.
els<-els %>% mutate(parents_college_exp=case_when(.$byparasp %in% c(5:7) ~ 1,
.$byparasp %in% c(1:4) ~ 0,
.$byparasp %in% c(-4) ~ NA_real_))
els2<-els %>% select(stu_id,bystuwt,strat_id,immigrant_proxy,immigrant_proxy2,college_expectations,parents_college,parents_college_exp)
els3<-els2[complete.cases(els2),]
In the following tables, I’ve provided some brief and unweighted descriptive statistics for college expectations, parent’s college level and parent’s college expectations by “immigration proxy”. Non-immigrant parents seem slightly more likely to have graduated from college, while immigrant parents seem slightly more likely to expect their children to attend college. Beyond this, the simple analysis appears inconclusive.
els3 %>% group_by(immigrant_proxy2) %>% summarise(college_expectations=mean(college_expectations,na.rm = T),parents_college=mean(parents_college,na.rm = T),parents_college_exp=mean(parents_college_exp,na.rm=T),n())
“0” if the student believes they will not graduate from college, “1” if they believe that they will.
table(els3$college_expectations,els3$immigrant_proxy2)
##
## immigrant non-immigrant
## 0 276 3535
## 1 749 10170
prop.table(table(els3$college_expectations,els3$immigrant_proxy2),margin=2)
##
## immigrant non-immigrant
## 0 0.2692683 0.2579351
## 1 0.7307317 0.7420649
“0” if the parent has graduated from college, “1” if they have graduated from college.
table(els3$parents_college,els3$immigrant_proxy2)
##
## immigrant non-immigrant
## 0 643 7997
## 1 382 5708
prop.table(table(els3$parents_college,els3$immigrant_proxy2),margin=2)
##
## immigrant non-immigrant
## 0 0.6273171 0.5835097
## 1 0.3726829 0.4164903
“0” if the parent believes their child will not graduate from college, “1” if they believe that they will.
table(els3$parents_college_exp,els3$immigrant_proxy2)
##
## immigrant non-immigrant
## 0 89 1708
## 1 936 11997
prop.table(table(els3$parents_college_exp,els3$immigrant_proxy2),margin = 2)
##
## immigrant non-immigrant
## 0 0.08682927 0.12462605
## 1 0.91317073 0.87537395
Because we are dissatisfied with this simplistic analysis, I have created a survey design which will hopefully shed more light upon these results:
des<-svydesign(ids=~1,strata =~strat_id,weights=~bystuwt,data=els3[is.na(els3$bystuwt)==F,])
Now, we can re-run some of the descriptive statistics using statistical weights provided by the ELS:2002 dataset. In this example, we will re-run the prop.table for college expectations vs. immigration proxy:
prop.table(wtd.table(els3$college_expectations,els3$immigrant_proxy2,weights=els3$bystuwt),margin=2)
## immigrant non-immigrant
## 0 0.2747658 0.2804260
## 1 0.7252342 0.7195740
As you can see, the figures are visibly different – both categories have gained and lost a few points simply by using weights.
Without survey design:
n<-length(is.na(els3$immigrant_proxy2)==F)
p<-prop.table(table(els3$college_expectations,els3$immigrant_proxy2),margin=2)
se<-sqrt(p*(1-p))/n
data.frame(proportion=p,se=se)
With survey design:
sv.table<-svyby(formula=~college_expectations,by=~immigrant_proxy2,design=des,FUN=svymean,na.rm=T)
sv.table
As you can see, With the survey design, standard errors are quite a bit larger, and likely more accurate.
Now we will perform a regression analysis on these variables, using no weights or designs, using weights only and finally using the full survey design.
First, with no weights or design:
fit1<-lm(college_expectations~immigrant_proxy+parents_college+parents_college_exp,data=els3)
summary(fit1)
##
## Call:
## lm(formula = college_expectations ~ immigrant_proxy + parents_college +
## parents_college_exp, data = els3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8678 -0.3101 0.1322 0.2547 0.7123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.310145 0.009656 32.119 <2e-16 ***
## immigrant_proxy -0.022414 0.013148 -1.705 0.0883 .
## parents_college 0.122489 0.006922 17.696 <2e-16 ***
## parents_college_exp 0.435133 0.010417 41.772 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4057 on 14726 degrees of freedom
## Multiple R-squared: 0.1421, Adjusted R-squared: 0.1419
## F-statistic: 812.9 on 3 and 14726 DF, p-value: < 2.2e-16
It appears that the immigrant proxy has a weakly negative effect upon a student’s college expectations – but only at p=0.08 – the very edge of significance.
Now, we run the same analysis using weights provided by the ELS:2002 dataset:
fit2<-lm(college_expectations~immigrant_proxy+parents_college+parents_college_exp,data=els3,weights=bystuwt)
summary(fit2)
##
## Call:
## lm(formula = college_expectations ~ immigrant_proxy + parents_college +
## parents_college_exp, data = els3, weights = bystuwt)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -24.991 -3.129 2.090 3.690 19.639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.301486 0.009386 32.120 <2e-16 ***
## immigrant_proxy -0.001719 0.014959 -0.115 0.909
## parents_college 0.121359 0.007193 16.873 <2e-16 ***
## parents_college_exp 0.431280 0.010168 42.414 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.243 on 14726 degrees of freedom
## Multiple R-squared: 0.1412, Adjusted R-squared: 0.141
## F-statistic: 806.9 on 3 and 14726 DF, p-value: < 2.2e-16
Now we see that the significance of the immigration proxy has disappeared when weights are added.
To further improve the model, we run a regression analysis using the full survey design:
fit3<-svyglm(college_expectations~immigrant_proxy+parents_college+parents_college_exp,des,family=gaussian)
summary(fit3)
##
## Call:
## svyglm(formula = college_expectations ~ immigrant_proxy + parents_college +
## parents_college_exp, des, family = gaussian)
##
## Survey design:
## svydesign(ids = ~1, strata = ~strat_id, weights = ~bystuwt, data = els3[is.na(els3$bystuwt) ==
## F, ])
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.301486 0.012568 23.989 <2e-16 ***
## immigrant_proxy -0.001719 0.018233 -0.094 0.925
## parents_college 0.121359 0.008330 14.568 <2e-16 ***
## parents_college_exp 0.431280 0.013656 31.581 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1731935)
##
## Number of Fisher Scoring iterations: 2
The immigration proxy remains insignificant while the standard errors have increased, and the t-values have shrunk considerably.
These three regression results can be seen arrayed side-by-side in the table below:
stargazer(fit1,fit2,fit3,style="demography",type="html",
column.labels=c("No Design","Weights","Survey Design"),
title="Regression Models for College Expectations by Immigration Status, Parent's Level of Education and Parent's College Expectations - ELS:2002",keep.stat="n",model.names=F,align=T,ci=T)
| college_expectations | |||
| No Design | Weights | Survey Design | |
| Model 1 | Model 2 | Model 3 | |
| immigrant_proxy | -0.022 | -0.002 | -0.002 |
| (-0.048, 0.003) | (-0.031, 0.028) | (-0.037, 0.034) | |
| parents_college | 0.122*** | 0.121*** | 0.121*** |
| (0.109, 0.136) | (0.107, 0.135) | (0.105, 0.138) | |
| parents_college_exp | 0.435*** | 0.431*** | 0.431*** |
| (0.415, 0.456) | (0.411, 0.451) | (0.405, 0.458) | |
| Constant | 0.310*** | 0.301*** | 0.301*** |
| (0.291, 0.329) | (0.283, 0.320) | (0.277, 0.326) | |
| N | 14,730 | 14,730 | 14,730 |
| p < .05; p < .01; p < .001 | |||