In this homework, we build multi-level models using 1) Complete-pooling 2) No-pooling 3) random intercept 4) random slope methods.
The data of this homework is the classroom dataset from DataCamp, Data source: https://www.datacamp.com/courses/hierarchical-and-mixed-effects-models
file: classroom.csv
library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
hw8 <- read.csv("classroom.csv")
hw8 <- na.omit(hw8)
head(hw8)
## sex minority mathkind mathgain ses yearstea mathknow housepov mathprep
## 4 0 1 449 83 -0.38 2 -0.11 0.082 3.25
## 5 0 1 425 53 -0.03 2 -0.11 0.082 3.25
## 6 1 1 450 65 0.76 2 -0.11 0.082 3.25
## 7 0 1 452 51 -0.03 2 -0.11 0.082 3.25
## 8 0 1 443 66 0.20 2 -0.11 0.082 3.25
## 9 1 1 422 88 0.64 2 -0.11 0.082 3.25
## classid schoolid childid
## 4 217 1 4
## 5 217 1 5
## 6 217 1 6
## 7 217 1 7
## 8 217 1 8
## 9 217 1 9
Here is a summary of variables of interest:
##### Complete Pooling
complete_pooling <- lm(mathgain ~ mathprep, data = hw8)
summary(complete_pooling)
##
## Call:
## lm(formula = mathgain ~ mathprep, data = hw8)
##
## Residuals:
## Min 1Q Median 3Q Max
## -142.167 -22.144 -1.167 19.856 193.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.678 2.954 17.154 < 2e-16 ***
## mathprep 2.733 1.053 2.596 0.00956 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34.61 on 1079 degrees of freedom
## Multiple R-squared: 0.006207, Adjusted R-squared: 0.005286
## F-statistic: 6.739 on 1 and 1079 DF, p-value: 0.009558
From table above, intercept(the average mathgain score when mathprep equals zero for all school) and explanatory variable, mathprep are significant according to the pvalue of .00956. It also shows that one unit increase in mathprep is causes a 2.733 increase in mathgain. However, complete pooling method assumes every school has the same mean mathgain but school environment may be an important determiant of the total gain of math score gain among students because better schools could have larger budget to hire better math teachers.
##### No pooling
dcoef <- hw8 %>% group_by(schoolid) %>% do(mod = lm(mathgain ~ mathprep, data = .))
intercept <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
slope <- dcoef %>% do(data.frame(mathc = coef(.$mod)[2]))
### plotting intercept and slope
g1 <- ggplot(intercept, aes(x = intc)) + geom_histogram() + ggtitle("Variations of Intercept Between Schools")
g2 <- ggplot(slope, aes(x = mathc)) + geom_histogram() + ggtitle("Variations of Slope Between Schools")
plot(g1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot(g2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 23 rows containing non-finite values (stat_bin).
From the two histograms above, we know there is a great deal of between-school variation in both the intercept and the slope parameters(the co-efficient of mathprep). The complete-pooling model results of 50.678 and 2.733 are countervailing forces to the no pooling model.
### Random Intercept
m1_lme <- lme(mathgain ~ mathprep, data = hw8, random = ~1|schoolid, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10708.79 10728.73 -5350.396
##
## Random effects:
## Formula: ~1 | schoolid
## (Intercept) Residual
## StdDev: 10.15375 33.08727
##
## Fixed effects: mathgain ~ mathprep
## Value Std.Error DF t-value p-value
## (Intercept) 51.77514 3.314535 975 15.620633 0.000
## mathprep 2.33704 1.142000 975 2.046442 0.041
## Correlation:
## (Intr)
## mathprep -0.898
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.23414577 -0.61621317 -0.03074758 0.54052208 5.62686966
##
## Number of Observations: 1081
## Number of Groups: 105
### Random Slope
m2_lme <- lme(mathgain ~ mathprep, data = hw8, random = ~mathprep|schoolid, method = "ML")
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10709.05 10738.97 -5348.526
##
## Random effects:
## Formula: ~mathprep | schoolid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 19.95448 (Intr)
## mathprep 5.98277 -0.862
## Residual 32.67916
##
## Fixed effects: mathgain ~ mathprep
## Value Std.Error DF t-value p-value
## (Intercept) 51.94118 4.076799 975 12.74068 0.0000
## mathprep 2.32488 1.459865 975 1.59253 0.1116
## Correlation:
## (Intr)
## mathprep -0.93
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.28811412 -0.61822905 -0.03040962 0.53475762 5.66997206
##
## Number of Observations: 1081
## Number of Groups: 105
### Model Selection
AIC(complete_pooling, m1_lme, m2_lme)
## df AIC
## complete_pooling 3 10734.35
## m1_lme 4 10708.79
## m2_lme 6 10709.05
### Adding school level covariate mathknow
m3_lme <- lme(mathgain ~ mathprep + mathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m3_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10705.76 10730.69 -5347.88
##
## Random effects:
## Formula: ~1 | schoolid
## (Intercept) Residual
## StdDev: 10.89427 32.89363
##
## Fixed effects: mathgain ~ mathprep + mathknow
## Value Std.Error DF t-value p-value
## (Intercept) 51.83588 3.347463 974 15.485124 0.0000
## mathprep 2.30558 1.147323 974 2.009531 0.0448
## mathknow 2.69722 1.172456 974 2.300488 0.0216
## Correlation:
## (Intr) mthprp
## mathprep -0.892
## mathknow -0.006 0.005
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.28444049 -0.61411294 -0.01586956 0.55005125 5.55520977
##
## Number of Observations: 1081
## Number of Groups: 105
### Cross-level interaction
m4_lme <- lme(mathgain ~ mathprep * mathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m4_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10707.57 10737.48 -5347.784
##
## Random effects:
## Formula: ~1 | schoolid
## (Intercept) Residual
## StdDev: 10.94884 32.88221
##
## Fixed effects: mathgain ~ mathprep * mathknow
## Value Std.Error DF t-value p-value
## (Intercept) 51.85050 3.351577 973 15.470480 0.0000
## mathprep 2.30223 1.148268 973 2.004962 0.0452
## mathknow 1.33113 3.346915 973 0.397719 0.6909
## mathprep:mathknow 0.53194 1.210203 973 0.439548 0.6604
## Correlation:
## (Intr) mthprp mthknw
## mathprep -0.892
## mathknow -0.006 0.002
## mathprep:mathknow 0.005 0.000 -0.936
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.28795291 -0.61458219 -0.01268577 0.55773494 5.54323790
##
## Number of Observations: 1081
## Number of Groups: 105
### Best Model
AIC(complete_pooling, m1_lme, m2_lme, m3_lme, m4_lme)
## df AIC
## complete_pooling 3 10734.35
## m1_lme 4 10708.79
## m2_lme 6 10709.05
## m3_lme 5 10705.76
## m4_lme 6 10707.57
### Centering mathknow
hw8 <- hw8 %>% mutate(cmathknow = mathknow - mean(mathknow))
# refit
m4a_lme <- lme(mathgain ~ mathprep + cmathknow, data = hw8, random = ~1|schoolid, method = "ML")
summary(m4a_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10705.76 10730.69 -5347.88
##
## Random effects:
## Formula: ~1 | schoolid
## (Intercept) Residual
## StdDev: 10.89427 32.89363
##
## Fixed effects: mathgain ~ mathprep + cmathknow
## Value Std.Error DF t-value p-value
## (Intercept) 51.92004 3.347451 974 15.510321 0.0000
## mathprep 2.30558 1.147323 974 2.009531 0.0448
## cmathknow 2.69722 1.172456 974 2.300488 0.0216
## Correlation:
## (Intr) mthprp
## mathprep -0.892
## cmathknow 0.005 0.005
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.28444049 -0.61411294 -0.01586956 0.55005125 5.55520977
##
## Number of Observations: 1081
## Number of Groups: 105
The 1st AIC table shows the best model is randome intercept model as the selection criteria is the lower the AIC, the better the model.
Based on randome intercept model, we added additional explanatory variable teacher’s math knowledge. However, the 2nd AIC table indicates that additional explanatory variable does not yield better model. It could be that teacher’s knowledge is correlated with teachers’ years of math preparation and therefore additional explanatory variable did not cover too much additional variation while additional parameter caused AIC score to increase.
htmlreg(list(m4_lme, m4a_lme), doctype = FALSE)
| Model 1 | Model 2 | ||
|---|---|---|---|
| (Intercept) | 51.85*** | 51.92*** | |
| (3.35) | (3.35) | ||
| mathprep | 2.30* | 2.31* | |
| (1.15) | (1.15) | ||
| mathknow | 1.33 | ||
| (3.35) | |||
| mathprep:mathknow | 0.53 | ||
| (1.21) | |||
| cmathknow | 2.70* | ||
| (1.17) | |||
| AIC | 10707.57 | 10705.76 | |
| BIC | 10737.48 | 10730.69 | |
| Log Likelihood | -5347.78 | -5347.88 | |
| Num. obs. | 1081 | 1081 | |
| Num. groups | 105 | 105 | |
| p < 0.001, p < 0.01, p < 0.05 | |||
I centered the mathknow by substracting its mean and by doing this the intercept is just the mean of the response when mathprep = 0 and mathknow = to it’s mean. So when 0 is out of the range of data, that value is meaningless.But when I center the mathknow so that a value within the dataset becomes 0, the intercept becomes the mean of Y (mathgain) which is reflected as 51.92004. at the value you centered on, in this case the average mean of mathknow. Looking at the above comparison between model 1 and model 2, We can see that a one unit increase in math knowledge causes a 1.33 increase in mathgain for model 1 and that the coefficient of amthknowledge or the interaction between mathprp and mathknowlege which not significant. However for model 2, which uses a centered mathknow and it’s coefficient shows a 2.70 increase in mathgain per unit increase in the new varaible, which is significant.
### Random Slope
m0_lme <- lme(mathgain ~ 1, random = ~ 1|schoolid, data = hw8, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: hw8
## AIC BIC logLik
## 10710.94 10725.89 -5352.468
##
## Random effects:
## Formula: ~1 | schoolid
## (Intercept) Residual
## StdDev: 10.47735 33.10434
##
## Fixed effects: mathgain ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 57.85794 1.484513 976 38.97436 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.21925890 -0.59881686 -0.02051724 0.54634792 5.65376999
##
## Number of Observations: 1081
## Number of Groups: 105
### Confidence Interval
intervals(m0_lme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 54.94608 57.85794 60.7698
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: schoolid
## lower est. upper
## sd((Intercept)) 7.916763 10.47735 13.86613
##
## Within-group standard error:
## lower est. upper
## 31.67205 33.10434 34.60140
In this exercise, I constructed complete-pooling, no-pooling, random intercept and random slope models based on the classroom dataset from DataCamp. I used mathgain as dependent variable and regressed it on mathprep in the complete-pooling method which assumes every school has the same mean value of mathgain score. Although the explanatory variable is significant, the complete-pooling method ignores the variations between schools and better schools tend to hire better teachers.
In no-pooling method, we can clearly see the variations of intercept and slope terms between schools from the histogram. In the random intercept and random slope methods, I allow intercept and slope term to be random variables based on school ID and use AIC to pick the best models among complete-pooling. Random intercept turns out to be the best model and it proves that model performance does improve when I allow each school to have different mean math gain score. In other words this model shows us a more accurate picture of how much the variation of mathgain can be explained by mathprep, which is reflected by the 2.337 increase in mathgain per increased unit of mathprep.
Lastly, I included mathknow term and refit the model. The model performance does not improve based on AIC and the reason might be teacher’s knowledge is positively correlated with teacher’s number of years of math preparation. Therefore, the additional variations captured by adding additional variable do not offset the penalty term in AIC by adding additional explanatory variable of mathknow. This could be because the explainitory variable mathknowlegde tries to explain the same variation of mathgain that is already captured by the othe explainitory varibale of mathprep. We centered the teacher’s mathknow varibale to avaoid a situation where a zero value of math knowlege produces a meaningless mathgain. For example if mathknow is zero, that would mean that the teacher has no experience in math, therefor the average mathgain score of 51.85 coming out of a teacher with no knowledge is highly unlikely in reality.