1 Data management

In this section, we load the exam scores data set, activate the help page for it, and examine the first 6 lines of the data frame object.

# load the data from the package
data(school23, package="influence.ME")
# invoke help document
?school23
# view first 6 lines
head(school23)
  school.ID   SES mean.SES         homework                parented ratio
1      6053  0.85 0.699773 Less than 1 hour        College graduate    18
2      6053  0.43 0.699773 Less than 1 hour GT H.S. & LT 4yr degree    18
3      6053 -0.59 0.699773          2 hours GT H.S. & LT 4yr degree    18
4      6053  1.02 0.699773 Less than 1 hour      M.A. or equivalent    18
5      6053  0.84 0.699773 Less than 1 hour      M.A. or equivalent    18
  perc.minor math    sex  race                       school.type structure
1     11-20%   50 Female White Private, no religious affiliation         3
2     11-20%   43 Female White Private, no religious affiliation         3
3     11-20%   50 Female Asian Private, no religious affiliation         3
4     11-20%   49 Female White Private, no religious affiliation         3
5     11-20%   62   Male White Private, no religious affiliation         3
  school.size urban        region
1     400-599 Urban North Central
2     400-599 Urban North Central
3     400-599 Urban North Central
4     400-599 Urban North Central
5     400-599 Urban North Central
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

2 Summary statistics

We first compute the correlation coefficient between exam scores and LRT scores using all students in the data set. We then compute the mean exam scores and mean LRT scores by school. The correlation coefficient between mean school exam scores and mean school LRT scores is then calculated.

with(school23, cor(SES, math))
[1] 0.490656
# compute the means by school
mSES <- with(school23, tapply(SES, school.ID, mean))
mMath <- with(school23, tapply(math, school.ID, mean))
cor(mSES, mMath)
[1] 0.718315

3 Visualization

We draw a scatter diagram of the exam scores against the LRT scores and add the regression line. Next we superimpose on the scatter plot the mean school exam scores and mean school LRT scores (in color cyan) and add the regression line based on the mean school scores.

with(school23, plot(math ~ SES, 
                bty = 'n', 
                cex = 0.5,
                xlab = 'Standardized LR test score', 
                ylab = 'Normalized exam score'))
grid()
with(school23, abline(lm(math ~ SES)))
points(mSES, mMath, pch=16, col=5)
abline(lm(mMath ~ mSES), col=5)

4 References

Kreft, I., & De Leeuw, J. (1998). Introducing Multilevel Modeling. Sage Publications.