1 Problem statement

For the IQ and language example, plot 131 school-level regression lines of language score against verbal IQ (similar to the figure on the page). Compute the mean and standard deviation of the intercept and slope estimates. What is the correlation coefficient between the two parameter estimates?

2 Introduction

The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287, and the number of schools is 131. Class sizes ranges from 4 to 35. One question of interest is how the language test score depends on the pupil’s intelligence and his/her family’s socio-economic status.

Source: Snijders, T., & Bosker, R. (2002). Multilevel Analysis.

Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: The number of 8th graders in the classroom Column 6: Socioeconomic status score

3 Data management

# input data
dta <- read.table("C:/Users/Ching-Fang Wu/Documents/lmm/verbalIQ.txt", h=T)
# inspect data structure
str(dta)
'data.frame':   2287 obs. of  6 variables:
 $ school  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ pupil   : int  17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
 $ viq     : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
 $ language: int  46 45 33 46 20 30 30 57 36 36 ...
 $ csize   : int  29 29 29 29 29 29 29 29 29 29 ...
 $ ses     : int  23 10 15 23 10 10 23 10 13 15 ...
# examine first 6 lines
knitr::kable(head(dta))
school pupil viq language csize ses
1 17001 15.0 46 29 23
1 17002 14.5 45 29 10
1 17003 9.5 33 29 15
1 17004 11.0 46 29 23
1 17005 8.0 20 29 10
1 17006 9.5 30 29 10

4 Visualization

#plot 131 school-level regression lines of language score against verbal IQ
library(tidyverse)
ggplot(data=dta, aes(x=viq, y=language, group = school)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              color="gray", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", 
       y="language") +
  theme_bw()

5 Analysis

#Compute the mean and standard deviation of the intercept and slope estimates.
mlang <- with(dta, tapply(language, school, mean))
mviq <- with(dta, tapply(viq, school, mean))
model=lm(formula = mlang ~ mviq, data = dta)
summary(model)

Call:
lm(formula = mlang ~ mviq, data = dta)

Residuals:
   Min     1Q Median     3Q    Max 
-8.416 -1.751  0.046  2.218  6.889 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -7.269      3.431   -2.12    0.036
mviq           4.049      0.292   13.87   <2e-16

Residual standard error: 3.35 on 129 degrees of freedom
Multiple R-squared:  0.598, Adjusted R-squared:  0.595 
F-statistic:  192 on 1 and 129 DF,  p-value: <2e-16
#Correlation coefficient between language score and verbal IQ
with(dta, cor(viq, language, method = "pearson"))
[1] 0.60982
#Correlation coefficient visualization
library(tidyverse)
library(ggplot2)
library(nlme)
library(dplyr)
#install.packages("gapminder")
library(gapminder)

ggplot(data=dta, aes(x=viq, y=language)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              level = 0.95,
              color="red", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", y="language") +
  theme_bw()

6 Results

7 Conslusions

8 The End