In-class exercise4: Download the data file from the language and math example first to a data folder. In Rstudio, open the R script and compile a report in html directly or use the command > knitr::spin(“foo.R”, knit=FALSE)

to render it as an RMD file format first.

1 Introduction

Ecological correlations are based on rates or averages. They tend to overstate the strength of an association.

The data set consists of grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287 and the number of schools is 131. Class sizes are from 4 to 35. The question of interest is the correlation between scores on an arithmetic test and a language test.

Source: Snijders, T., & Bosker, R. (1999). Multilevel Analysis.

Data:langMath
R:langMath W2

Column 1: School ID Column 2: Pupil ID Column 4: Language test score Column 5: Arithmetic test score

2 Data management

# data management and graphics package
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.2     √ purrr   0.3.4
## √ tibble  3.0.4     √ dplyr   1.0.2
## √ tidyr   1.1.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# input data
dta <- read.csv("C:/Users/Ching-Fang Wu/Documents/data/langMath.csv",h=T)
# compute averages by school
dta_a <- dta %>%
        group_by(School) %>% #依據各校計算
        summarize(ave_lang = mean(Lang, na.rm=TRUE),
                  ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# compute averages by school
dta_a <- dta %>%
        group_by(School) %>%
        summarize(ave_lang = mean(Lang, na.rm=TRUE),
                  ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# compute averages by school
dta_a <- dta %>%
        group_by(School) %>% 
        summarize(ave_lang = mean(Lang, na.rm=TRUE),
                  ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# superimpose two plots
ggplot(data=dta, aes(x=Arith, y=Lang)) +
 geom_point(color="skyblue") +
 stat_smooth(method="lm", formula=y ~ x, se=F, col="skyblue") +
 geom_point(data=dta_a, aes(ave_arith, ave_lang), color="steelblue") +
 stat_smooth(data=dta_a, aes(ave_arith, ave_lang),
             method="lm", formula= y ~ x, se=F, color="steelblue") +
 labs(x="Arithmetic score", 
      y="Language score") +
 theme_bw()

#THE END