1

Fifty male and fifty female students fill out the same questionnaire in weekly intervals starting five weeks before an important examination to measure state anxiety. The research interests are: 1. whether there are gender difference in state anxiety 2. individual differences in state anxiety
Reproduce the two plots below.

Source: Von Eye, A., & Schuster C. (1998). Regression Analysis for Social Sciences. San Diego: Academic Press.

Column 1: Anxiety score 5 weeks before exam for female
Column 2: Anxiety score 4 weeks before exam for female
Column 3: Anxiety score 3 weeks before exam for female
Column 4: Anxiety score 2 weeks before exam for female
Column 5: Anxiety score 1 weeks before exam for female
Column 6: Anxiety score 5 weeks before exam for male
Column 7: Anxiety score 4 weeks before exam for male
Column 8: Anxiety score 3 weeks before exam for male
Column 9: Anxiety score 2 weeks before exam for male
Column 10: Anxiety score 1 weeks before exam for male

dir<-'http://titan.ccunix.ccu.edu.tw/~psycfs/dataM/Data/stateAnxiety.txt'
dta<-as.matrix(read.table(dir,h=T))
dtaa<-cbind(as.numeric(dta),as.data.frame(c(rep("Female",250),rep("Male",250))),rep(-5:-1,times=2,each=50),c(rep(1:50,times=5),rep(51:100,times=5)))
colnames(dtaa)<-c("anxiety","sex","week","id")
p1<-ggplot(data=dtaa,aes(x=week,y=anxiety,colour=sex,group=id))+geom_line()
p2<-ggplot(data=dtaa,aes(x=week,y=anxiety,colour=sex))+geom_point()+stat_smooth(method = "lm") 
gridExtra::grid.arrange(p1, p2, ncol = 2)

2

The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. After deleting pupils with missing values, the number of pupils is 2,287 and the number of schools is 131. Class size ranges from 4 to 35. The response variables are score on a language test and that on an arithmetic test. The research intest is on how the two test scores depend on the pupil’s intelligence (verbal IQ) and on the number of pupils in a school class.
The class size is categorized into small, medium, and large with roughly equal number of observations in each category. The verbal IQ is categorized into low, middle and high with roughly equal number of observations in each category. Reproduce the plot below.

Source: Snijders, T. & Bosker, R. (2002). Multilevel Analysis.

Column 1: School ID
Column 2: Pupil ID
Column 3: Verbal IQ score
Column 4: The number of pupils in a class
Column 5: Language test score
Column 6: Arithmetic test score

dir<-'http://titan.ccunix.ccu.edu.tw/~psycfs/dataM/Data/langMathDutch.txt'
dta<-read.table(dir,h=T)
c1<-sort(dta$size)[(length(dta$size)/3)]
c2<-sort(dta$size)[2*(length(dta$size)/3)]
dta$size<-cut(dta$size,breaks=c(-Inf,c1,c2,Inf),labels=c("small", "medium","large"))
c3<-sort(dta$IQ)[(length(dta$IQ)/3)]
c4<-sort(dta$IQ)[2*(length(dta$IQ)/3)]
dta$IQ<-cut(dta$IQ,breaks=c(-Inf,c3,c4,Inf),labels=c("low", "meddle","high"))
dta$size_IQ<-as.factor(paste(dta$size,dta$IQ,sep=","))

ggplot(data = dta, aes(x =lang, y = arith))+
  geom_point(pch=18) +
  stat_smooth(method = "lm", se = T) +
  facet_wrap(size~IQ)+
  labs(x = "Language score", y = "Arithmetic score")

3

A sample of pupils whose mathematics attainment is tested at the end of Year 2 (seven years old). The coverage of mathematics curriculum each pupil received during Year 2 is measured; so is the mathematics attainment at the beginning of the year. There are 39 pupils in the study.
Draw a graph of the mean test scores of year one and year two for the data set. Include the 95%-confidence intervals for the means.
Source: Plewis, I. (1997). Statistics in Education.
Column 1: Mathematics attainment, end of Year 2
Column 2: Mathematics attainment, end of Year 1
Column 3: Curriculum coverage

dir<-'http://titan.ccunix.ccu.edu.tw/~psycfs/dataM/Data/mathAttainment.txt'
dta<-as.matrix(read.table(dir,h=T))
dtaa<-cbind(c(dta[,1:2]),as.data.frame(rep(c("Year2","Year1"),each=nrow(dta))),rep(1:nrow(dta),2))
colnames(dtaa)<-c("score","year","id")
ggplot(data=dtaa,aes(x=year,y=score))+
  stat_summary(fun.y = mean, geom = "point", size =4) +
  stat_summary(fun.data = mean_se , fun.args=list(mult=1.96), geom = "pointrange", size = 1)+
  labs(y= "Mean math score", x= "Year")

5

Use the built-in data set USPersonalExpenditure in R for this problem. This data set consists of United States personal expenditures (in billions of dollars) in the categories; food and tobacco, household operation, medical and health, personal care, and private education for the years 1940, 1945, 1950, 1955 and 1960. Plot the US personal expenditure data in the style of the third plot in the “Time Use” case study. You might want to transform the dollar amounts to log base 10 unit first.

dta<-reshape2::melt(USPersonalExpenditure)
colnames(dta)<-c("item","year","PE")
dta$PE<-log10(dta$PE*10^9)
dta$year<-as.factor(dta$year)
ggplot(data=dta,aes(x=PE,y=year))+
  geom_point()+
  facet_grid(~item)+
  geom_segment(aes(x=10,xend=PE,y=year, yend = year))+
  geom_vline(xintercept = 10, colour = "grey80")

6

The data below give a cross-classification of 205 married persons by height.
Plot the data table.
Source: Yule, G. (1900). On the association of attributes in statistics: with illustration from the material of the childhood society. Philos. Trans. Roy. Soc. Soc. Ser. A. 194, 257-319.

dta<-matrix(c(18,20,12,29,21,25,14,28,9),3,3)
colnames(dta)<-c("tall","medium","short")
rownames(dta)<-c("tall","medium","short")
dta.long<-reshape2::melt(dta)
colnames(dta.long)<-c('husband','wife','count')
ggplot(data=dta.long,aes(x=husband,y=count,fill=wife))+
  geom_bar(stat = "identity", position = "dodge", width = .75) +
  scale_fill_manual(values = c("gray70", "gray60", "gray50")) +
  theme_bw()

8

dir<-'http://titan.ccunix.ccu.edu.tw/~psycfs/dataM/Data/hs0.txt'
dta<-read.table(dir,h=T)
ggplot(data=dta,aes(x=0,y=math,fill=math)) +
  geom_tile(aes(x=0, y=math, fill = math)) +
  scale_x_continuous(limits=c(-1,1),breaks=2)+
  scale_fill_gradient2(low = 'blue4', mid = 'white', high = 'green', midpoint = mean(dta$math))+
  labs(x ="",y = "Math score") +
  theme_minimal()

9

In a national study of 15- and 16-year-old adolescents. The event of interest is ever having sexual intercourse.
Turn the data table into the following plot.
Source: Morgan, S.P., & Teachman, J.D. (1988). Journal of Marriage & Family, 50, 929-936.

dta<-matrix(c(43,26,29,22,134,149,23,36),4,dimnames=list(c("Male","Female","Male","Female"),c("Yes","No")))
dtaa<-cbind(reshape2::melt(dta),as.data.frame(rep(c("white","black"),each=2,times=2)))
names(dtaa)<-c('Gender','Intercourse','Count','Race')
dtaa$Gender<-factor(dtaa$Gender,levels(dtaa$Gender)[c(2,1)])
dtaa$Race<-factor(dtaa$Race,levels(dtaa$Race)[c(2,1)])
dtaa<-data.table(dtaa)
dtaa[, percent := sum(Count), by=list(Gender,Race)]
dtaa[, percent := Count/percent*100]
ggplot(aes(x=Race,y=percent,fill=Gender),data=dtaa[dtaa$Intercourse=="Yes",])+
  geom_bar(position = "dodge",stat="identity")+
  scale_fill_manual(values = c("black","gray")) +
  coord_flip() +
  theme_bw()

HW7_ggplot2

U_76031019

1

2

3

5

6

8

9