This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
plot(cars)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.3 v dplyr 1.0.2
v tidyr 1.1.1 v stringr 1.4.0
v readr 1.3.1 v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: 㤼㸱Hmisc㤼㸲
The following objects are masked from 㤼㸱package:dplyr㤼㸲:
src, summarize
The following objects are masked from 㤼㸱package:base㤼㸲:
format.pval, units
library(lme4)
Loading required package: Matrix
Attaching package: 㤼㸱Matrix㤼㸲
The following objects are masked from 㤼㸱package:tidyr㤼㸲:
expand, pack, unpack
scotland <- read_dta("EDUS 651/scotland.dta")
Hmisc::describe(scotland)
scotland
11 Variables 2310 Observations
---------------------------------------------------------------------
neighid Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 524 1 495.3 305.5 61.0
.10 .25 .50 .75 .90 .95
143.9 240.2 530.0 707.0 808.0 861.0
lowest : 26 27 29 30 31, highest: 1092 1095 1096 1097 1098
---------------------------------------------------------------------
schid Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 17 0.995 10.01 7.174 0
.10 .25 .50 .75 .90 .95
2 5 9 16 19 20
lowest : 0 1 2 3 5, highest: 16 17 18 19 20
Value 0 1 2 3 5 6 7 8 9
Frequency 146 22 146 159 155 101 286 112 136
Proportion 0.063 0.010 0.063 0.069 0.067 0.044 0.124 0.048 0.059
Value 10 13 15 16 17 18 19 20
Frequency 133 92 190 111 154 91 102 174
Proportion 0.058 0.040 0.082 0.048 0.067 0.039 0.044 0.075
---------------------------------------------------------------------
attain Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 14 0.987 0.0934 1.124 -1.3276
.10 .25 .50 .75 .90 .95
-1.3276 -0.5812 0.1582 0.7350 1.5177 1.5177
lowest : -1.3276 -0.5812 -0.3600 -0.1325 0.0293
highest: 0.7350 0.9127 1.1405 1.5177 2.4151
---------------------------------------------------------------------
p7vrq Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 68 0.999 0.5058 11.95 -17.028
.10 .25 .50 .75 .90 .95
-13.028 -7.028 -0.028 7.972 13.972 16.972
lowest : -27.028 -26.028 -25.028 -24.028 -23.028
highest: 35.972 36.972 40.972 41.972 42.972
---------------------------------------------------------------------
p7read Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 61 1 -0.04435 15.8 -23.866
.10 .25 .50 .75 .90 .95
-17.866 -9.866 -0.866 9.134 19.134 24.134
lowest : -31.866 -30.866 -29.866 -28.866 -27.866
highest: 24.134 25.134 26.134 27.134 28.134
---------------------------------------------------------------------
dadocc Format:%12.0g
n missing distinct Info Mean Gmd
2310 0 7 0.933 -0.4642 12.33
lowest : -23.454 -11.494 -9.094 -3.454 2.316
highest: -9.094 -3.454 2.316 16.196 29.226
Value -23.454 -11.494 -9.094 -3.454 2.316 16.196 29.226
Frequency 91 285 303 884 242 397 108
Proportion 0.039 0.123 0.131 0.383 0.105 0.172 0.047
---------------------------------------------------------------------
dadunemp Format:%12.0g
n missing distinct Info Sum Mean Gmd
2310 0 2 0.292 252 0.1091 0.1945
---------------------------------------------------------------------
daded Format:%12.0g
n missing distinct Info Sum Mean Gmd
2310 0 2 0.507 497 0.2152 0.3379
---------------------------------------------------------------------
momed Format:%12.0g
n missing distinct Info Sum Mean Gmd
2310 0 2 0.56 574 0.2485 0.3736
---------------------------------------------------------------------
male Format:%12.0g
n missing distinct Info Sum Mean Gmd
2310 0 2 0.749 1109 0.4801 0.4994
---------------------------------------------------------------------
deprive Format:%12.0g
n missing distinct Info Mean Gmd .05
2310 0 458 1 0.02167 0.6664 -0.8250
.10 .25 .50 .75 .90 .95
-0.6930 -0.3970 -0.0620 0.2957 0.8410 1.1400
lowest : -1.082 -1.048 -1.030 -0.983 -0.975
highest: 2.330 2.419 2.438 2.498 2.959
---------------------------------------------------------------------
scotland.clean <- scotland %>%
mutate(.,
schid = as_factor(schid),
neighid = as_factor(neighid),
momed = as_factor(momed),
daded = as_factor(daded))
glimpse(scotland.clean)
Rows: 2,310
Columns: 11
$ neighid <fct> 26, 26, 26, 26, 26, 27, 29, 29, 29, 29, 29, 29,...
$ schid <fct> 20, 20, 20, 20, 20, 20, 18, 20, 20, 20, 20, 20,...
$ attain <dbl> 1.5177, -1.3276, 0.5610, 1.5177, -1.3276, -0.13...
$ p7vrq <dbl> 17.972, -10.028, 2.972, 1.972, -1.028, 3.972, 8...
$ p7read <dbl> 17.134, -27.866, 6.134, 11.134, -0.866, -0.866,...
$ dadocc <dbl> 16.196, -3.454, 2.316, -9.094, -3.454, -3.454, ...
$ dadunemp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ daded <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
$ momed <fct> 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ male <dbl> 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,...
$ deprive <dbl> -0.551, -0.551, -0.551, -0.551, -0.551, 0.147, ...
model.null.schid <- lmer(p7read ~ (1|schid), REML = FALSE, data = scotland.clean)
summary(model.null.schid)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: p7read ~ (1 | schid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18558.8 18576.0 -9276.4 18552.8 2307
Scaled residuals:
Min 1Q Median 3Q Max
-2.7611 -0.6895 -0.0027 0.6854 2.5664
Random effects:
Groups Name Variance Std.Dev.
schid (Intercept) 18.75 4.33
Residual 176.64 13.29
Number of obs: 2310, groups: schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.2912 1.0951 -0.266
ICC.null.schid <- 18.75/(18.75 + 176.64)
From this null model and ICC we can see that 9% of the variablility is between schools.
model.null.neighid <- lmer(p7read ~ (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.null.neighid)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: p7read ~ (1 | neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18606.3 18623.5 -9300.1 18600.3 2307
Scaled residuals:
Min 1Q Median 3Q Max
-2.65885 -0.67581 -0.01035 0.67042 2.51256
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 32.89 5.735
Residual 160.37 12.664
Number of obs: 2310, groups: neighid, 524
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.2407 0.3816 -0.631
ICC.null.neighid <- 32.89/( 32.89 + 160.37)
From this null model and ICC we can see that 17% of the variablility is between neighborhoods.
model.null.crossed <- lmer(p7read ~ (1|schid) + (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.null.crossed)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: p7read ~ (1 | schid) + (1 | neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18524.9 18547.8 -9258.4 18516.9 2306
Scaled residuals:
Min 1Q Median 3Q Max
-2.61239 -0.66099 0.01158 0.66532 2.55698
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 17.21 4.148
schid (Intercept) 16.59 4.074
Residual 160.36 12.663
Number of obs: 2310, groups: neighid, 524; schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -0.3701 1.0522 -0.352
ICC.schid.crossed <- 16.59/(17.21 + 16.59 + 160.36)
ICC.schid.crossed
[1] 0.08544499
ICC.neighid.crossed <- 17.21/(17.21 + 16.59 + 160.36)
ICC.neighid.crossed
[1] 0.08863824
When we account for nesting within neighborhoods and schools, 9% of the variability of reading scores is bewteen schools and 9% of the variability in reading scores is between neighborhoods. We should run crossed models.
model.1 <- lmer(p7read ~ momed + daded + (1|schid) + (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.1)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: p7read ~ momed + daded + (1 | schid) + (1 | neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18424.4 18458.9 -9206.2 18412.4 2304
Scaled residuals:
Min 1Q Median 3Q Max
-2.99961 -0.66409 0.01792 0.66336 2.63998
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 13.24 3.639
schid (Intercept) 13.06 3.614
Residual 155.66 12.476
Number of obs: 2310, groups: neighid, 524; schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -2.1132 0.9580 -2.206
momed1 2.4043 0.7069 3.401
daded1 5.3704 0.7504 7.157
Correlation of Fixed Effects:
(Intr) momed1
momed1 -0.110
daded1 -0.083 -0.460
Both of the variables are significant and the AIC and BIC have gone down showing this model with individual level predictors fit better. For both mom and dad’s if they have more education it predicts an increase in students reading scores.
model.2 <- lmer(p7read ~ momed + daded + dadocc + (1|schid) + (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.2)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
p7read ~ momed + daded + dadocc + (1 | schid) + (1 | neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18310.2 18350.5 -9148.1 18296.2 2303
Scaled residuals:
Min 1Q Median 3Q Max
-3.2651 -0.6765 0.0008 0.6506 2.6851
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 7.921 2.814
schid (Intercept) 9.900 3.146
Residual 151.718 12.317
Number of obs: 2310, groups: neighid, 524; schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.38112 0.84752 -1.630
momed1 1.83651 0.69334 2.649
daded1 3.47317 0.75559 4.597
dadocc 0.26808 0.02425 11.057
Correlation of Fixed Effects:
(Intr) momed1 daded1
momed1 -0.127
daded1 -0.107 -0.428
dadocc 0.078 -0.079 -0.242
After adding the measure of dad’s occupancy both AIC and BIC have decreased showing a greater model fit. Dad’s occupancy is significant and shows as prestige increases it predicts a slight increase in reading scores.
model.3 <- lmer(p7read ~ momed + daded + dadocc + deprive + (1|schid) + (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.3)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
p7read ~ momed + daded + dadocc + deprive + (1 | schid) + (1 |
neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18253.9 18299.9 -9119.0 18237.9 2302
Scaled residuals:
Min 1Q Median 3Q Max
-3.3799 -0.6834 0.0091 0.6557 3.1614
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 3.716 1.928
schid (Intercept) 6.381 2.526
Residual 151.646 12.314
Number of obs: 2310, groups: neighid, 524; schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.15892 0.70581 -1.642
momed1 1.66801 0.68663 2.429
daded1 3.31275 0.74800 4.429
dadocc 0.23752 0.02435 9.752
deprive -3.84492 0.48529 -7.923
Correlation of Fixed Effects:
(Intr) momed1 daded1 dadocc
momed1 -0.151
daded1 -0.128 -0.427
dadocc 0.085 -0.075 -0.232
deprive -0.033 0.032 0.039 0.197
Both AIC and BIC have decreased showing this model has better fit. Deprive is significant along with momed, daded, dadocc. An increase in the deprive factors in a neighborhood predicts a decrease in reading scoress. This model shows the best fit.
model.4 <- lmer(p7read ~ momed + daded + dadocc + deprive + momed:daded + (1|schid) + (1|neighid), REML = FALSE, data = scotland.clean)
summary(model.4)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
p7read ~ momed + daded + dadocc + deprive + momed:daded + (1 |
schid) + (1 | neighid)
Data: scotland.clean
AIC BIC logLik deviance df.resid
18247.3 18299.0 -9114.6 18229.3 2301
Scaled residuals:
Min 1Q Median 3Q Max
-3.3143 -0.6715 0.0026 0.6539 3.1707
Random effects:
Groups Name Variance Std.Dev.
neighid (Intercept) 3.513 1.874
schid (Intercept) 6.506 2.551
Residual 151.222 12.297
Number of obs: 2310, groups: neighid, 524; schid, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.37684 0.71429 -1.928
momed1 3.12965 0.84626 3.698
daded1 5.34180 1.01514 5.262
dadocc 0.23530 0.02432 9.676
deprive -3.75830 0.48407 -7.764
momed1:daded1 -4.24203 1.43861 -2.949
Correlation of Fixed Effects:
(Intr) momed1 daded1 dadocc depriv
momed1 -0.181
daded1 -0.163 0.143
dadocc 0.087 -0.079 -0.192
deprive -0.039 0.059 0.067 0.195
momed1:ddd1 0.103 -0.587 -0.678 0.032 -0.057
The interaction between mom’s education and dad’s education is significant. If the mom has a low level of education then the dad’s level of education has a greater impact on student’s reading scores. The interaction is answering a research question about the impact of mom’s education on the impact dad’s education makes on student’s reading scores.