Introduction

In this homework, we build multi-level models using 1) Complete-pooling 2) No-pooling 3) random intercept 4) random slope methods.

Data

The data of this homework is the classroom dataset from DataCamp, Data source: https://www.datacamp.com/courses/hierarchical-and-mixed-effects-models

file: classroom.csv

Results

library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)

hw8 <- read.csv("classroom.csv")
hw8 <- na.omit(hw8)

head(hw8)
##   sex minority mathkind mathgain   ses yearstea mathknow housepov mathprep
## 4   0        1      449       83 -0.38        2    -0.11    0.082     3.25
## 5   0        1      425       53 -0.03        2    -0.11    0.082     3.25
## 6   1        1      450       65  0.76        2    -0.11    0.082     3.25
## 7   0        1      452       51 -0.03        2    -0.11    0.082     3.25
## 8   0        1      443       66  0.20        2    -0.11    0.082     3.25
## 9   1        1      422       88  0.64        2    -0.11    0.082     3.25
##   classid schoolid childid
## 4     217        1       4
## 5     217        1       5
## 6     217        1       6
## 7     217        1       7
## 8     217        1       8
## 9     217        1       9

Here is a summary of variables of interest:

  1. mathgain: Math score gain and this is the dependent variable
  2. mathknow: Teacher’s math knowledge score
  3. mathprep: Teacher’s math teaching experience in terms of years
  4. schoolid: ID uniquely identified each school in the dataset. I will use it in the grouping when I test for variations within group in random intercept and random slope models.
##### Complete Pooling
complete_pooling <- lm(mathgain ~ mathprep, data = hw8)

summary(complete_pooling)
## 
## Call:
## lm(formula = mathgain ~ mathprep, data = hw8)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -142.167  -22.144   -1.167   19.856  193.658 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   50.678      2.954  17.154  < 2e-16 ***
## mathprep       2.733      1.053   2.596  0.00956 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.61 on 1079 degrees of freedom
## Multiple R-squared:  0.006207,   Adjusted R-squared:  0.005286 
## F-statistic: 6.739 on 1 and 1079 DF,  p-value: 0.009558

From table above, intercept(the average mathgain score when mathprep equals zero for all school) and explanatory variable, mathprep are significant according to the pvalue of .00956. It also shows that one unit increase in mathprep is causes a 2.733 increase in mathgain. However, complete pooling method assumes every school has the same mean mathgain but school environment may be an important determiant of the total gain of math score gain among students because better schools could have larger budget to hire better math teachers.

##### No pooling
dcoef <- hw8 %>% group_by(schoolid) %>% do(mod = lm(mathgain ~ mathprep, data = .))

intercept  <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
slope <- dcoef %>% do(data.frame(mathc = coef(.$mod)[2]))


### plotting intercept and slope

g1 <- ggplot(intercept, aes(x = intc)) + geom_histogram() + ggtitle("Variations of Intercept Between Schools")
g2 <- ggplot(slope, aes(x = mathc)) + geom_histogram() + ggtitle("Variations of Slope Between Schools")

plot(g1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(g2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 23 rows containing non-finite values (stat_bin).

From the two histograms above, we know there is a great deal of between-school variation in both the intercept and the slope parameters(the co-efficient of mathprep). The complete-pooling model results of 50.678 and 2.733 are countervailing forces to the no pooling model.

### Random Intercept
m1_lme <- lme(mathgain ~ mathprep, data = hw8, random = ~1|schoolid, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC    logLik
##   10708.79 10728.73 -5350.396
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.15375 33.08727
## 
## Fixed effects: mathgain ~ mathprep 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 51.77514  3.314535 975 15.620633   0.000
## mathprep     2.33704  1.142000 975  2.046442   0.041
##  Correlation: 
##          (Intr)
## mathprep -0.898
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.23414577 -0.61621317 -0.03074758  0.54052208  5.62686966 
## 
## Number of Observations: 1081
## Number of Groups: 105
### Random Slope
m2_lme <- lme(mathgain ~ mathprep, data = hw8, random = ~mathprep|schoolid, method = "ML")
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC    logLik
##   10709.05 10738.97 -5348.526
## 
## Random effects:
##  Formula: ~mathprep | schoolid
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 19.95448 (Intr)
## mathprep     5.98277 -0.862
## Residual    32.67916       
## 
## Fixed effects: mathgain ~ mathprep 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 51.94118  4.076799 975 12.74068  0.0000
## mathprep     2.32488  1.459865 975  1.59253  0.1116
##  Correlation: 
##          (Intr)
## mathprep -0.93 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.28811412 -0.61822905 -0.03040962  0.53475762  5.66997206 
## 
## Number of Observations: 1081
## Number of Groups: 105
### Model Selection
AIC(complete_pooling, m1_lme, m2_lme)
##                  df      AIC
## complete_pooling  3 10734.35
## m1_lme            4 10708.79
## m2_lme            6 10709.05
### Adding school level covariate mathknow
m3_lme <- lme(mathgain ~ mathprep + mathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m3_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC   logLik
##   10705.76 10730.69 -5347.88
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.89427 32.89363
## 
## Fixed effects: mathgain ~ mathprep + mathknow 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 51.83588  3.347463 974 15.485124  0.0000
## mathprep     2.30558  1.147323 974  2.009531  0.0448
## mathknow     2.69722  1.172456 974  2.300488  0.0216
##  Correlation: 
##          (Intr) mthprp
## mathprep -0.892       
## mathknow -0.006  0.005
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.28444049 -0.61411294 -0.01586956  0.55005125  5.55520977 
## 
## Number of Observations: 1081
## Number of Groups: 105
### Cross-level interaction

m4_lme <- lme(mathgain ~ mathprep * mathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m4_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC    logLik
##   10707.57 10737.48 -5347.784
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.94884 32.88221
## 
## Fixed effects: mathgain ~ mathprep * mathknow 
##                      Value Std.Error  DF   t-value p-value
## (Intercept)       51.85050  3.351577 973 15.470480  0.0000
## mathprep           2.30223  1.148268 973  2.004962  0.0452
## mathknow           1.33113  3.346915 973  0.397719  0.6909
## mathprep:mathknow  0.53194  1.210203 973  0.439548  0.6604
##  Correlation: 
##                   (Intr) mthprp mthknw
## mathprep          -0.892              
## mathknow          -0.006  0.002       
## mathprep:mathknow  0.005  0.000 -0.936
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.28795291 -0.61458219 -0.01268577  0.55773494  5.54323790 
## 
## Number of Observations: 1081
## Number of Groups: 105
### Best Model

AIC(complete_pooling, m1_lme, m2_lme, m3_lme, m4_lme)
##                  df      AIC
## complete_pooling  3 10734.35
## m1_lme            4 10708.79
## m2_lme            6 10709.05
## m3_lme            5 10705.76
## m4_lme            6 10707.57
### Centering mathknow

hw8 <- hw8 %>% mutate(cmathknow = mathknow - mean(mathknow))

# refit

m4a_lme <- lme(mathgain ~ mathprep + cmathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m4a_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC   logLik
##   10705.76 10730.69 -5347.88
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.89427 32.89363
## 
## Fixed effects: mathgain ~ mathprep + cmathknow 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 51.92004  3.347451 974 15.510321  0.0000
## mathprep     2.30558  1.147323 974  2.009531  0.0448
## cmathknow    2.69722  1.172456 974  2.300488  0.0216
##  Correlation: 
##           (Intr) mthprp
## mathprep  -0.892       
## cmathknow  0.005  0.005
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.28444049 -0.61411294 -0.01586956  0.55005125  5.55520977 
## 
## Number of Observations: 1081
## Number of Groups: 105

The 1st AIC table shows the best model is randome intercept model as the selection criteria is the lower the AIC, the better the model.

Based on randome intercept model, we added additional explanatory variable teacher’s math knowledge. However, the 2nd AIC table indicates that additional explanatory variable does not yield better model. It could be that teacher’s knowledge is correlated with teachers’ years of math preparation and therefore additional explanatory variable did not cover too much additional variation while additional parameter caused AIC score to increase.

htmlreg(list(m4_lme, m4a_lme), doctype = FALSE)
Statistical models
Model 1 Model 2
(Intercept) 51.85*** 51.92***
(3.35) (3.35)
mathprep 2.30* 2.31*
(1.15) (1.15)
mathknow 1.33
(3.35)
mathprep:mathknow 0.53
(1.21)
cmathknow 2.70*
(1.17)
AIC 10707.57 10705.76
BIC 10737.48 10730.69
Log Likelihood -5347.78 -5347.88
Num. obs. 1081 1081
Num. groups 105 105
p < 0.001, p < 0.01, p < 0.05

I centered the mathknow by substracting its mean and by doing this the intercept is just the mean of the response when mathprep = 0 and mathknow = to it’s mean. So when 0 is out of the range of data, that value is meaningless.But when I center the mathknow so that a value within the dataset becomes 0, the intercept becomes the mean of Y (mathgain) which is reflected as 51.92004. at the value you centered on, in this case the average mean of mathknow. Looking at the above comparison between model 1 and model 2, We can see that a one unit increase in math knowledge causes a 1.33 increase in mathgain for model 1 and that the coefficient of amthknowledge or the interaction between mathprp and mathknowlege which not significant. However for model 2, which uses a centered mathknow and it’s coefficient shows a 2.70 increase in mathgain per unit increase in the new varaible, which is significant.

### Random Slope
m0_lme <- lme(mathgain ~ 1, random = ~ 1|schoolid, data = hw8, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: hw8 
##        AIC      BIC    logLik
##   10710.94 10725.89 -5352.468
## 
## Random effects:
##  Formula: ~1 | schoolid
##         (Intercept) Residual
## StdDev:    10.47735 33.10434
## 
## Fixed effects: mathgain ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 57.85794  1.484513 976 38.97436       0
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.21925890 -0.59881686 -0.02051724  0.54634792  5.65376999 
## 
## Number of Observations: 1081
## Number of Groups: 105
### Confidence Interval
intervals(m0_lme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower     est.   upper
## (Intercept) 54.94608 57.85794 60.7698
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: schoolid 
##                    lower     est.    upper
## sd((Intercept)) 7.916763 10.47735 13.86613
## 
##  Within-group standard error:
##    lower     est.    upper 
## 31.67205 33.10434 34.60140

Conclusion

In this exercise, I constructed complete-pooling, no-pooling, random intercept and random slope models based on the classroom dataset from DataCamp. I used mathgain as dependent variable and regressed it on mathprep in the complete-pooling method which assumes every school has the same mean value of mathgain score. Although the explanatory variable is significant, the complete-pooling method ignores the variations between schools and better schools tend to hire better teachers.

In no-pooling method, we can clearly see the variations of intercept and slope terms between schools from the histogram. In the random intercept and random slope methods, I allow intercept and slope term to be random variables based on school ID and use AIC to pick the best models among complete-pooling. Random intercept turns out to be the best model and it proves that model performance does improve when I allow each school to have different mean math gain score. In other words this model shows us a more accurate picture of how much the variation of mathgain can be explained by mathprep, which is reflected by the 2.337 increase in mathgain per increased unit of mathprep.

Lastly, I included mathknow term and refit the model. The model performance does not improve based on AIC and the reason might be teacher’s knowledge is positively correlated with teacher’s number of years of math preparation. Therefore, the additional variations captured by adding additional variable do not offset the penalty term in AIC by adding additional explanatory variable of mathknow. This could be because the explainitory variable mathknowlegde tries to explain the same variation of mathgain that is already captured by the othe explainitory varibale of mathprep. We centered the teacher’s mathknow varibale to avaoid a situation where a zero value of math knowlege produces a meaningless mathgain. For example if mathknow is zero, that would mean that the teacher has no experience in math, therefor the average mathgain score of 51.85 coming out of a teacher with no knowledge is highly unlikely in reality.