1 Introduction

These data are subset of the NELS-88 data (National Education Longitudinal Study of 1988). The data set contains information on students’ performance on a math test and 14 other variables.

2 Data management

In this section, we load the data set, select variables of interest, and examine the first 6 lines of the data frame object.

# load the data from the package
data(school23, package="influence.ME")
# save a copy with only selected variables
dta <- school23[, c("school.ID", "SES", "mean.SES", "math")]
# show first 6 lines
head(dta)
  school.ID   SES mean.SES math
1      6053  0.85 0.699773   50
2      6053  0.43 0.699773   43
3      6053 -0.59 0.699773   50
4      6053  1.02 0.699773   49
5      6053  0.84 0.699773   62
6      6053  1.32 0.699773   43

3 Visualization

We draw a scatter diagram of the math scores against mean school SES and add the regression line.

ggplot(dta, aes(mean.SES, math))+
  geom_point(alpha=.5)+
  stat_smooth(method='lm', formula=y~x, se=TRUE)+
  labs(x="Mean school SES",
       y="Math score")+
  theme_minimal()

We draw a scatter diagram of the math scores against individual SES and add the regression line by school.

ggplot(dta, aes(SES, math, group=school.ID))+
  geom_point(alpha=.5)+
  stat_smooth(method='lm', formula=y~x, se=FALSE, 
              col=1, size=rel(.5))+
  labs(x="SES",
       y="Math score")+
  theme_minimal()

4 Models

4.1 Problem

  • Interpret each of the following models

4.1.1 Model 0

-學生社經地位不同,數學成績也會不同,學生社經地位越高,數學成績越高。

m0 <- lm(math ~ mean.SES, data=dta)
summary(m0)

Call:
lm(formula = math ~ mean.SES, data = dta)

Residuals:
    Min      1Q  Median      3Q     Max 
-23.174  -7.384   0.165   7.577  23.005 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   51.733      0.416     124   <2e-16
mean.SES       8.076      0.671      12   <2e-16

Residual standard error: 9.47 on 517 degrees of freedom
Multiple R-squared:  0.219, Adjusted R-squared:  0.218 
F-statistic:  145 on 1 and 517 DF,  p-value: <2e-16

4.1.2 Model 1

-在同一間學校中,學生的社經地位不同對數學成績沒有顯著差異,在不同學校之間也沒有差異。

m1 <- nlme::lmList(math ~ I(SES - mean.SES) | school.ID, data=dta)
summary(m1)
Call:
  Model: math ~ I(SES - mean.SES) | school.ID 
   Data: dta 

Coefficients:
   (Intercept) 
      Estimate Std. Error t value     Pr(>|t|)
6053   56.3182    1.29799 43.3889 5.19617e-167
6327   57.3750    3.04405 18.8482  1.60537e-59
6467   56.6000    3.85045 14.6996  1.47370e-40
7194   48.4583    1.75748 27.5726 1.73816e-100
7472   45.7391    1.79528 25.4774  9.01320e-91
7474   53.9412    2.08820 25.8314  2.00944e-92
7801   50.0909    1.83563 27.2881  3.54682e-99
7829   42.1500    1.92523 21.8935  6.82718e-74
7930   53.2500    1.75748 30.2990 7.23805e-113
24371  48.3500    1.92523 25.1139  4.51450e-89
24725  43.5455    1.83563 23.7223  1.55935e-82
25456  49.8636    1.83563 27.1643  1.32091e-98
25642  46.4000    1.92523 24.1011  2.56608e-84
26537  56.3750    2.15247 26.1909  4.26485e-94
46417  55.6957    1.79528 31.0233 4.24448e-116
47583  51.0500    1.92523 26.5164  1.31353e-95
54344  40.9474    1.97524 20.7303  2.17688e-68
62821  62.8209    1.05186 59.7234 1.94763e-222
 [ 達到了 getOption("max.print") -- 省略最後 5 列 ]]
   I(SES - mean.SES) 
        Estimate Std. Error    t value   Pr(>|t|)
6053   0.1515565    2.22314  0.0681722 0.94567734
6327  17.8574119    6.10499  2.9250526 0.00360947
6467   7.7373472    5.23465  1.4781028 0.14004584
7194   2.1019533    4.89361  0.4295305 0.66773279
7472   3.7020715    3.63722  1.0178307 0.30927876
7474   5.6797646    3.51626  1.6152861 0.10691555
7801  -1.2092983    3.49518 -0.3459905 0.72950370
7829  -1.0588653    3.11374 -0.3400626 0.73396038
7930   5.4743511    2.19085  2.4987335 0.01280196
24371  6.2254948    2.12279  2.9327005 0.00352332
24725  6.8469661    2.51820  2.7189945 0.00678882
25456  4.7741287    3.47844  1.3724921 0.17056074
25642  0.0793474    2.69810  0.0294087 0.97655107
26537  2.4715001    4.91554  0.5027936 0.61534339
46417  5.0081506    3.10761  1.6115760 0.10772127
47583  7.8025194    2.92725  2.6654820 0.00795080
54344  3.5927497    3.06511  1.1721445 0.24172909
62821 -1.2538648    2.31096 -0.5425741 0.58767859
 [ 達到了 getOption("max.print") -- 省略最後 5 列 ]]

Residual standard error: 8.60987 on 473 degrees of freedom

4.1.3 Model 2

-當學校的社經地位跟全部學校社經地位相比,每增加一個單位,數學成績增加7.16的差異,在不同學校也是如此。

dta$gm <- mean(dta$mean.SES)
m2 <- lme4::lmer(math ~ I(mean.SES-gm) + (1 | school.ID), data=dta)
print(summary(m2), corr=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ I(mean.SES - gm) + (1 | school.ID)
   Data: dta

REML criterion at convergence: 3779.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6463 -0.7322 -0.0112  0.7252  2.6772 

Random effects:
 Groups    Name        Variance Std.Dev.
 school.ID (Intercept)  9.89    3.15    
 Residual              81.36    9.02    
Number of obs: 519, groups:  school.ID, 23

Fixed effects:
                 Estimate Std. Error t value
(Intercept)        51.480      0.795   64.75
I(mean.SES - gm)    7.163      1.397    5.13

4.1.4 Model 3

-個人社經地位與學校平均社經地位的差異,每增加一單位,數學成績增加3.88的差異,在不同學校也是如此。

m3 <- lme4::lmer(math ~ I(SES - mean.SES) + (1 | school.ID), data=dta)
print(summary(m3), corr=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ I(SES - mean.SES) + (1 | school.ID)
   Data: dta

REML criterion at convergence: 3758.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6666 -0.7611 -0.0381  0.7621  2.6289 

Random effects:
 Groups    Name        Variance Std.Dev.
 school.ID (Intercept) 26.3     5.13    
 Residual              75.2     8.67    
Number of obs: 519, groups:  school.ID, 23

Fixed effects:
                  Estimate Std. Error t value
(Intercept)          50.76       1.15   44.16
I(SES - mean.SES)     3.88       0.61    6.37

4.1.5 Problem

  • Interpret the following plot.

  • 社經地位不同,數學成績也有所不同。在同一學校中,個人社經地位不同對數學成績的表現沒有差異。不同社經地位的學校,學生的數學成績有差異。而同一社經地位學生,在不同學校的數學成績表現會有所不同。

fortify.merMod(m3) %>%
 ggplot() +
  aes(SES, .fitted, group=school.ID)+
  geom_point(aes(SES, math), alpha=.5)+
  stat_smooth(method='lm', formula=y~x, se=FALSE, 
              col=1, size=rel(.5))+
  labs(x="SES",
       y="Math score")+
  theme_minimal()

```

5 References

Kreft, I., & De Leeuw, J. (1998). Introducing Multilevel Modeling. Sage Publications.