Homework4 For the IQ and language example, plot 131 school-level regression lines of language score against verbal IQ (similar to the figure on the page). Compute the mean and standard deviation of the intercept and slope estimates. What is the correlation coefficient between the two parameter estimates?

1 IQ and language example

The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287, and the number of schools is 131. Class sizes ranges from 4 to 35. One question of interest is how the language test score depends on the pupil’s intelligence and his/her family’s socio-economic status.

Source: Snijders, T., & Bosker, R. (2002). Multilevel Analysis.

Data:verbalIQ.txt

Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: The number of 8th graders in the classroom Column 6: Socioeconomic status score

2 Data management

# input data
dta <- read.table("C:/Users/Ching-Fang Wu/Documents/data/verbalIQ.txt",header=T) 
# inspect data structure
str(dta)
## 'data.frame':    2287 obs. of  6 variables:
##  $ school  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pupil   : int  17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
##  $ viq     : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
##  $ language: int  46 45 33 46 20 30 30 57 36 36 ...
##  $ csize   : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ ses     : int  23 10 15 23 10 10 23 10 13 15 ...
head(dta)
##   school pupil  viq language csize ses
## 1      1 17001 15.0       46    29  23
## 2      1 17002 14.5       45    29  10
## 3      1 17003  9.5       33    29  15
## 4      1 17004 11.0       46    29  23
## 5      1 17005  8.0       20    29  10
## 6      1 17006  9.5       30    29  10

3

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.2     √ purrr   0.3.4
## √ tibble  3.0.4     √ dplyr   1.0.2
## √ tidyr   1.1.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(data=dta, aes(x=viq, y=language, group = school)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              color="gray", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", 
       y="language") +
  theme_bw()

4 Compute the mean and standard deviation

mlang <- with(dta, tapply(language, school, mean))
mviq <- with(dta, tapply(viq, school, mean))
model=lm(formula = mlang ~ mviq, data = dta)
summary(model)
## 
## Call:
## lm(formula = mlang ~ mviq, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4163 -1.7507  0.0461  2.2184  6.8885 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -7.269      3.431  -2.119    0.036 *  
## mviq           4.049      0.292  13.866   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.346 on 129 degrees of freedom
## Multiple R-squared:  0.5985, Adjusted R-squared:  0.5953 
## F-statistic: 192.3 on 1 and 129 DF,  p-value: < 2.2e-16

5 Correlation coefficient

with(dta, cor(viq, language, method = "pearson"))
## [1] 0.6098195

6 Correlation coefficient visualization

library(tidyverse)
library(ggplot2)
library(nlme)
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
library(dplyr)
library(gapminder)
ggplot(data=dta, aes(x=viq, y=language)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              level = 0.95,
              color="red", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", y="language") +
  theme_bw()

# The end