In-class exercise4: Download the data file from the language and math example first to a data folder. In Rstudio, open the R script and compile a report in html directly or use the command > knitr::spin(“foo.R”, knit=FALSE)
to render it as an RMD file format first.
Ecological correlations are based on rates or averages. They tend to overstate the strength of an association.
The data set consists of grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2,287 and the number of schools is 131. Class sizes are from 4 to 35. The question of interest is the correlation between scores on an arithmetic test and a language test.
Source: Snijders, T., & Bosker, R. (1999). Multilevel Analysis.
Data:langMath
R:langMath W2
Column 1: School ID Column 2: Pupil ID Column 4: Language test score Column 5: Arithmetic test score
# data management and graphics package
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## √ ggplot2 3.3.2 √ purrr 0.3.4
## √ tibble 3.0.4 √ dplyr 1.0.2
## √ tidyr 1.1.2 √ stringr 1.4.0
## √ readr 1.3.1 √ forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# input data
dta <- read.csv("C:/Users/Ching-Fang Wu/Documents/data/langMath.csv",h=T)
# compute averages by school
dta_a <- dta %>%
group_by(School) %>% #依據各校計算
summarize(ave_lang = mean(Lang, na.rm=TRUE),
ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# compute averages by school
dta_a <- dta %>%
group_by(School) %>%
summarize(ave_lang = mean(Lang, na.rm=TRUE),
ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# compute averages by school
dta_a <- dta %>%
group_by(School) %>%
summarize(ave_lang = mean(Lang, na.rm=TRUE),
ave_arith = mean(Arith, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# superimpose two plots
ggplot(data=dta, aes(x=Arith, y=Lang)) +
geom_point(color="skyblue") +
stat_smooth(method="lm", formula=y ~ x, se=F, col="skyblue") +
geom_point(data=dta_a, aes(ave_arith, ave_lang), color="steelblue") +
stat_smooth(data=dta_a, aes(ave_arith, ave_lang),
method="lm", formula= y ~ x, se=F, color="steelblue") +
labs(x="Arithmetic score",
y="Language score") +
theme_bw()
#THE END