grammer of graphics exercise 2, 4
exercise 2
The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. After deleting pupils with missing values, the number of pupils is 2,287 and the number of schools is 131. Class size ranges from 4 to 35. The response variables are score on a language test and that on an arithmetic test. The research intest is on how the two test scores depend on the pupil’s intelligence (verbal IQ) and on the number of pupils in a school class. The class size is categorized into small, medium, and large with roughly equal number of observations in each category. The verbal IQ is categorized into low, middle and high with roughly equal number of observations in each category. Reproduce the plot below. Source: Snijders, T. & Bosker, R. (2002). Multilevel Analysis.Column 1: School ID
Column 2: Pupil ID
Column 3: Verbal IQ score
Column 4: The number of pupils in a class
Column 5: Language test score
Column 6: Arithmetic test score
…
loading data and check data structure
dta<-read.table("C:/Users/USER/Desktop/R_data management/0420/langMathDutch.txt", header=T)
str(dta)## 'data.frame': 2287 obs. of 6 variables:
## $ school: int 1 1 1 1 1 1 1 1 1 1 ...
## $ pupil : int 17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 ...
## $ IQV : num 15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
## $ size : int 29 29 29 29 29 29 29 29 29 29 ...
## $ lang : int 46 45 33 46 20 30 30 57 36 36 ...
## $ arith : int 24 19 24 26 9 13 13 30 23 22 ...
## school pupil IQV size lang arith
## 1 1 17001 15.0 29 46 24
## 2 1 17002 14.5 29 45 19
## 3 1 17003 9.5 29 33 24
## 4 1 17004 11.0 29 46 26
## 5 1 17005 8.0 29 20 9
## 6 1 17006 9.5 29 30 13
label variable by quantile
dta$level<-with(dta, factor(findInterval(size, c(-Inf,
quantile(size, probs=c(1/3, 2/3)), Inf)),
labels=c("Small","Medium","Large")))
dta$IQ<-with(dta, factor(findInterval(IQV, c(-Inf,
quantile(IQV, probs=c(1/3, 2/3)), Inf)),
labels=c("Low","Meddle","High")))ggplot
library(ggplot2)
p0<-ggplot(data=dta, aes(lang, arith))+
labs(x="Language score", y="Arithmetic score")+
geom_point(shape=18, size=2)+ # change shape
# add regression line +SE, adjust size and color
stat_smooth(data=dta, formula=(y~x), method="lm", se=T, fill="#4D4D4D", colour="darkblue", size=0.5, alpha = 0.2)+
facet_wrap(.~level+IQ, labeller=labeller(.multi_line=F))+ # make two label at the same line
scale_x_continuous(breaks=seq(10, 50, 10))+ # adjust the tick label
scale_y_continuous(breaks=seq(5, 30, 5))
p0exercise 4
A sample of 158 children with autisim spectrum disorder were recruited. Social development was assessed using the Vineland Adaptive Behavior Interview survey form, a parent-reported measure of socialization. It is a combined score that included assessment of interpersonal relationships, play/leisure time activities, and coping skills. Initial language development was assessed using the Sequenced Inventory of Communication Development (SICD) scale. These assessments were repeated on these children when they were 3, 5, 9, 13 years of age.Data: autism{WWGbook}
Column 1: Age (in years)
Column 2: Vineland Socialization Age Equivalent score
Column 3: Sequenced Inventory of Communication Development Expressive Group (1 = Low, 2 = Medium, 3 = High)
Column 4: Child ID
…
…
loading data and check data set
## age vsae sicdegp childid
## 1 2 6 3 1
## 2 3 7 3 1
## 3 5 18 3 1
## 4 9 25 3 1
## 5 13 27 3 1
## 6 2 17 3 3
## 'data.frame': 612 obs. of 4 variables:
## $ age : int 2 3 5 9 13 2 3 5 9 13 ...
## $ vsae : int 6 7 18 25 27 17 18 12 18 24 ...
## $ sicdegp: int 3 3 3 3 3 3 3 3 3 3 ...
## $ childid: int 1 1 1 1 1 3 3 3 3 3 ...
data manipulation
library(dplyr)
library(plyr)
dta$sic<-factor(dta$sicdegp, levels = c(1,2,3), labels = c("L", "M", "H")) # label variable
m<-dta%>%summarise(m=mean(age), std=sd(age), me=median(age)) # find mean of age
dta$center<-(dta$age-m$m) # center ageplot 1.
library(ggplot2)
p0<-ggplot(data=dta, aes(x=center, y=vsae))+labs(x="Age (in years, centered)", y="VASE score")
p0+geom_line(aes(group = childid))+facet_grid(. ~sic)+geom_smooth(formula=(y~x), method="lm")+
scale_x_continuous(limits=c(-4, 7.3), breaks=seq(-2.5, 5.0, 2.5))+
geom_point(size=rel(2), alpha=.5)+ # make the points transparent
theme_bw()plot 2
dta1<-na.omit(dta)
p<-dta1 %>% mutate(age2=age-2)%>%group_by(sic, age2)%>%
dplyr::summarise(n=n(), m=mean(vsae),se=sd(vsae)/sqrt(n))%>%
ggplot()+
aes(age2, m, group=sic, shape=sic)+
geom_errorbar(aes(ymin=m-se, ymax=m+se), width=.2, size=.3)+
geom_line(aes(linetype=sic), show.legend=T)+
geom_point(size=rel(2), show.legend=T)+
scale_shape_manual(values=c(1, 2, 16), name="Group")+ # change legend title, there line and shape in the legend,
#have to assign the name both in shape manual and the linetype_discrete
scale_linetype_discrete(name="Group")+ # change legend title
labs(x="Age (in year-2)", y="Vsae score")+
theme_bw()+
# legend frame
theme(legend.background = element_rect(fill="white", size=0.5, linetype="solid",
colour ="black"), legend.position=c(0.08, 0.85),
legend.key = element_rect(fill = "white", colour = "darkgray"))+ # little grey box in the legned
theme(axis.text.y = element_text(colour="Black"), axis.text.x=element_text(colour="Black"))
p