Homework4 For the IQ and language example, plot 131 school-level regression lines of language score against verbal IQ (similar to the figure on the page). Compute the mean and standard deviation of the intercept and slope estimates. What is the correlation coefficient between the two parameter estimates?
The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287, and the number of schools is 131. Class sizes ranges from 4 to 35. One question of interest is how the language test score depends on the pupil’s intelligence and his/her family’s socio-economic status.
Source: Snijders, T., & Bosker, R. (2002). Multilevel Analysis.
Data:verbalIQ.txt
Column 1: School ID Column 2: Pupil ID Column 3: Verbal IQ Column 4: Language achievement score Column 5: The number of 8th graders in the classroom Column 6: Socioeconomic status score
# input data
dta <- read.table("C:/Users/Ching-Fang Wu/Documents/data/verbalIQ.txt",header=T) # inspect data structure
str(dta)## 'data.frame':    2287 obs. of  6 variables:
##  $ school  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pupil   : int  17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
##  $ viq     : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
##  $ language: int  46 45 33 46 20 30 30 57 36 36 ...
##  $ csize   : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ ses     : int  23 10 15 23 10 10 23 10 13 15 ...head(dta)##   school pupil  viq language csize ses
## 1      1 17001 15.0       46    29  23
## 2      1 17002 14.5       45    29  10
## 3      1 17003  9.5       33    29  15
## 4      1 17004 11.0       46    29  23
## 5      1 17005  8.0       20    29  10
## 6      1 17006  9.5       30    29  10library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --## √ ggplot2 3.3.2     √ purrr   0.3.4
## √ tibble  3.0.4     √ dplyr   1.0.2
## √ tidyr   1.1.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()ggplot(data=dta, aes(x=viq, y=language, group = school)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              color="gray", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", 
       y="language") +
  theme_bw()mlang <- with(dta, tapply(language, school, mean))
mviq <- with(dta, tapply(viq, school, mean))
model=lm(formula = mlang ~ mviq, data = dta)
summary(model)## 
## Call:
## lm(formula = mlang ~ mviq, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4163 -1.7507  0.0461  2.2184  6.8885 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -7.269      3.431  -2.119    0.036 *  
## mviq           4.049      0.292  13.866   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.346 on 129 degrees of freedom
## Multiple R-squared:  0.5985, Adjusted R-squared:  0.5953 
## F-statistic: 192.3 on 1 and 129 DF,  p-value: < 2.2e-16with(dta, cor(viq, language, method = "pearson"))## [1] 0.6098195library(tidyverse)
library(ggplot2)
library(nlme)## 
## Attaching package: 'nlme'## The following object is masked from 'package:dplyr':
## 
##     collapselibrary(dplyr)
library(gapminder)ggplot(data=dta, aes(x=viq, y=language)) +
  geom_point() + 
  geom_smooth(method="lm", 
              formula= y ~ x,
              se=F, 
              level = 0.95,
              color="red", 
              linetype=2, 
              size=rel(.5)) +
    labs(x="Verbal IQ", y="language") +
  theme_bw() # The end