Homework4 For the IQ and language example, plot 131 school-level regression lines of language score against verbal IQ (similar to the figure on the page). Compute the mean and standard deviation of the intercept and slope estimates. What is the correlation coefficient between the two parameter estimates?
The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287, and the number of schools is 131. Class sizes ranges from 4 to 35. One question of interest is how the language test score depends on the pupil’s intelligence and his/her family’s socio-economic status.
Source: Snijders, T., & Bosker, R. (2002). Multilevel Analysis.
Data:verbalIQ.txt
Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: The number of 8th graders in the classroom Column 6: Socioeconomic status score
# input data
dta <- read.table("C:/Users/Ching-Fang Wu/Documents/data/verbalIQ.txt",header=T)
# inspect data structure
str(dta)
## 'data.frame': 2287 obs. of 6 variables:
## $ school : int 1 1 1 1 1 1 1 1 1 1 ...
## $ pupil : int 17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
## $ viq : num 15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
## $ language: int 46 45 33 46 20 30 30 57 36 36 ...
## $ csize : int 29 29 29 29 29 29 29 29 29 29 ...
## $ ses : int 23 10 15 23 10 10 23 10 13 15 ...
head(dta)
## school pupil viq language csize ses
## 1 1 17001 15.0 46 29 23
## 2 1 17002 14.5 45 29 10
## 3 1 17003 9.5 33 29 15
## 4 1 17004 11.0 46 29 23
## 5 1 17005 8.0 20 29 10
## 6 1 17006 9.5 30 29 10
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.2 √ purrr 0.3.4
## √ tibble 3.0.4 √ dplyr 1.0.2
## √ tidyr 1.1.2 √ stringr 1.4.0
## √ readr 1.3.1 √ forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ggplot(data=dta, aes(x=viq, y=language, group = school)) +
geom_point() +
geom_smooth(method="lm",
formula= y ~ x,
se=F,
color="gray",
linetype=2,
size=rel(.5)) +
labs(x="Verbal IQ",
y="language") +
theme_bw()
mlang <- with(dta, tapply(language, school, mean))
mviq <- with(dta, tapply(viq, school, mean))
model=lm(formula = mlang ~ mviq, data = dta)
summary(model)
##
## Call:
## lm(formula = mlang ~ mviq, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4163 -1.7507 0.0461 2.2184 6.8885
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.269 3.431 -2.119 0.036 *
## mviq 4.049 0.292 13.866 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.346 on 129 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.5953
## F-statistic: 192.3 on 1 and 129 DF, p-value: < 2.2e-16
with(dta, cor(viq, language, method = "pearson"))
## [1] 0.6098195
library(tidyverse)
library(ggplot2)
library(nlme)
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
library(dplyr)
library(gapminder)
ggplot(data=dta, aes(x=viq, y=language)) +
geom_point() +
geom_smooth(method="lm",
formula= y ~ x,
se=F,
level = 0.95,
color="red",
linetype=2,
size=rel(.5)) +
labs(x="Verbal IQ", y="language") +
theme_bw()
# The end