For the IQ and language example, plot 131 school-level regression lines of language score against verbal IQ (similar to the figure on the page). Compute the mean and standard deviation of the intercept and slope estimates. What is the correlation coefficient between the two parameter estimates?
The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287, and the number of schools is 131. Class sizes ranges from 4 to 35. One question of interest is how the language test score depends on the pupil’s intelligence and his/her family’s socio-economic status.
Source: Snijders, T., & Bosker, R. (2002). Multilevel Analysis.
Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: The number of 8th graders in the classroom Column 6: Socioeconomic status score
# input data
dta <- read.table("C:/Users/Ching-Fang Wu/Documents/lmm/verbalIQ.txt", h=T)
# inspect data structure
str(dta)
'data.frame': 2287 obs. of 6 variables:
$ school : int 1 1 1 1 1 1 1 1 1 1 ...
$ pupil : int 17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
$ viq : num 15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
$ language: int 46 45 33 46 20 30 30 57 36 36 ...
$ csize : int 29 29 29 29 29 29 29 29 29 29 ...
$ ses : int 23 10 15 23 10 10 23 10 13 15 ...
# examine first 6 lines
knitr::kable(head(dta))
school | pupil | viq | language | csize | ses |
---|---|---|---|---|---|
1 | 17001 | 15.0 | 46 | 29 | 23 |
1 | 17002 | 14.5 | 45 | 29 | 10 |
1 | 17003 | 9.5 | 33 | 29 | 15 |
1 | 17004 | 11.0 | 46 | 29 | 23 |
1 | 17005 | 8.0 | 20 | 29 | 10 |
1 | 17006 | 9.5 | 30 | 29 | 10 |
#plot 131 school-level regression lines of language score against verbal IQ
library(tidyverse)
ggplot(data=dta, aes(x=viq, y=language, group = school)) +
geom_point() +
geom_smooth(method="lm",
formula= y ~ x,
se=F,
color="gray",
linetype=2,
size=rel(.5)) +
labs(x="Verbal IQ",
y="language") +
theme_bw()
#Compute the mean and standard deviation of the intercept and slope estimates.
mlang <- with(dta, tapply(language, school, mean))
mviq <- with(dta, tapply(viq, school, mean))
model=lm(formula = mlang ~ mviq, data = dta)
summary(model)
Call:
lm(formula = mlang ~ mviq, data = dta)
Residuals:
Min 1Q Median 3Q Max
-8.416 -1.751 0.046 2.218 6.889
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.269 3.431 -2.12 0.036
mviq 4.049 0.292 13.87 <2e-16
Residual standard error: 3.35 on 129 degrees of freedom
Multiple R-squared: 0.598, Adjusted R-squared: 0.595
F-statistic: 192 on 1 and 129 DF, p-value: <2e-16
#Correlation coefficient between language score and verbal IQ
with(dta, cor(viq, language, method = "pearson"))
[1] 0.60982
#Correlation coefficient visualization
library(tidyverse)
library(ggplot2)
library(nlme)
library(dplyr)
#install.packages("gapminder")
library(gapminder)
ggplot(data=dta, aes(x=viq, y=language)) +
geom_point() +
geom_smooth(method="lm",
formula= y ~ x,
se=F,
level = 0.95,
color="red",
linetype=2,
size=rel(.5)) +
labs(x="Verbal IQ", y="language") +
theme_bw()