DataM: Inclass Exercise 0413: Trellis 2-4
In-class exercise 2.
Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:
Target output
[Solution and Answer]
Load in the data set
'data.frame': 420 obs. of 17 variables:
$ distcod : int 75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
$ county : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
$ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
$ grspan : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
$ enrltot : int 195 240 1550 243 1335 137 195 888 379 2247 ...
$ teachers: num 10.9 11.1 82.9 14 71.5 ...
$ calwpct : num 0.51 15.42 55.03 36.48 33.11 ...
$ mealpct : num 2.04 47.92 76.32 77.05 78.43 ...
$ computer: int 67 101 169 85 171 25 28 66 35 0 ...
$ testscr : num 691 661 644 648 641 ...
$ compstu : num 0.344 0.421 0.109 0.35 0.128 ...
$ expnstu : num 6385 5099 5502 7102 5236 ...
$ str : num 17.9 21.5 18.7 17.4 18.7 ...
$ avginc : num 22.69 9.82 8.98 8.98 9.08 ...
$ elpct : num 0 4.58 30 0 13.86 ...
$ readscr : num 692 660 636 652 642 ...
$ mathscr : num 690 662 651 644 640 ...
Use hele(Caschool) to find the meaning of each variable in Caschool
distcod: district codecounty: countydistrict: districtgrspan: grade span of districtenrltot: total enrollmentteachers: number of teacherscalwpct: percent qualifying for CalWorksmealpct: percent qualifying for reduced-price lunchcomputer: number of computerstestscr: average test score (read.scr+math.scr)/2compstu: computer per studentexpnstu: expenditure per studentstr: student teacher ratioavginc: district average incomeelpct: percent of English learnersreadscr: average reading scoremathscr: average math score
Create a new variable ratio_st: the ratio of #student and #teacher
dta2 <- Caschool %>% dplyr::filter(grspan == 'KK-08') %>%
dplyr::select(enrltot, teachers, readscr) %>%
mutate(ratio_st = enrltot / teachers)
summary(dta2$ratio_st) Min. 1st Qu. Median Mean 3rd Qu. Max.
14.00 18.67 19.78 19.71 20.89 25.80
Group data into ‘L’, ‘M’, and ‘H’ by reading scores (readscr)
# Find the 33th and the 67th percentile of readscr first.
percentile_readscr <- quantile(dta2$readscr, probs = seq(0, 1, .01))
# Group
dta2 <- dta2 %>% mutate(readscr_group = cut(readscr, ordered=TRUE, breaks=c(min(readscr)-1, percentile_readscr[33+1], percentile_readscr[67+1], max(readscr)+1), labels=c('L', 'M', 'H')))
summary(dta2$readscr_group) L M H
119 121 119
In-class exercise 3.
The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.
Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.
- Column 1: Course evaluation score
- Column 2: Beauty score
- Column 3: Gender of professor, 1 = Female, 0 = Male
- Column 4: Pofessor age in years
- Column 5: Minority status of professor, 1 = Minority, 0 = Others
- Column 6: Tenure status of professor, 1 = Tenured, 0 = No
- Column 7: Course ID
Target output
Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.
[Solution and Answer]
Load in the data set
'data.frame': 463 obs. of 7 variables:
$ eval : num 4.3 4.5 3.7 4.3 4.4 4.2 4 3.4 4.5 3.9 ...
$ beauty : num 0.202 -0.826 -0.66 -0.766 1.421 ...
$ sex : int 1 0 0 1 1 0 1 1 1 0 ...
$ age : int 36 59 51 40 31 62 33 51 33 47 ...
$ minority: int 1 0 0 0 0 0 0 0 0 0 ...
$ tenure : int 0 1 1 1 0 1 0 1 0 0 ...
$ courseID: int 3 0 4 2 0 0 4 0 0 4 ...
Plot
- x:
eval{dta3}, y:beauty{dta3}. MakecourseID{dta3}be factorial and use it to be the grouping index. - Set tht physical aspect ratio of the panels.
- Set the layout as six columns and six rows.
- Plot points, regression lines, and grids together
- Name the axes.
- Sort the plots in the descending order of the regression coefficients when using x=
eval{dta3}and y=beauty{dta3}.
xyplot(eval ~ beauty | factor(courseID), data = dta3, #1
aspect = 0.7, #2
layout = c(6, 6), #3
type = c('p', 'r', 'g'), #4
xlab = 'Beauty judgement score', #5
ylab = 'Average course evaluation score', #5
index.cond = function(x, y) coefficients(lm(y ~ x))[2], #6
col = blues9[8]
)In-class exercise 4.
A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.
Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.
- Column 1: Subject ID
- Column 2: Gender ID
- Column 3: Full scale IQ
- Column 4: Verbal IQ
- Column 5: Performance IQ
- Column 6: Body weight in pounds
- Column 7: Height in inches
- Column 8: Totol pixel counts from 18 MRI scans
Use appropriate lattice graphics to answer the following questions.
- Are there gender differences in the three IQ scores?
- Is the relationship between height and weight gender dependent?
- Is the relationship between IQ and brainsize (as measured by MRIcount) gender dependent?
[Solution and Answer]
Load the data set
'data.frame': 40 obs. of 8 variables:
$ Sbj : int 1 2 3 4 5 6 7 8 9 10 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 1 1 2 2 ...
$ FSIQ : int 133 140 139 133 137 99 138 92 89 133 ...
$ VIQ : int 132 150 123 129 132 90 136 90 93 114 ...
$ PIQ : int 124 124 150 128 134 110 131 98 84 147 ...
$ Weight : int 118 NA 143 172 147 146 138 175 134 172 ...
$ Height : num 64.5 72.5 73.3 68.8 65 69 64.5 66 66.3 68.8 ...
$ MRICount: int 816932 1001121 1038437 965353 951545 928799 991305 854258 904858 955466 ...
Q1: Are there gender differences in the three IQ scores?
Draw boxplots of three IQ scores for each gender to see if there is difference.
- FSIQ
The difference of FSIQ between female and male in our sample is not obvious.
- VIQ
It seems that there is a slight difference of VIQ between female and male.
- PIQ
The difference of PIQ between female and male in our sampleis not obvious.
Conduct two-sample independent t-test to determine whether there is gender difference in three IQ scores.
- FSIQ
Welch Two Sample t-test
data: FSIQ by Gender
t = -0.40267, df = 37.892, p-value = 0.6895
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-18.68639 12.48639
sample estimates:
mean in group Female mean in group Male
111.9 115.0
Since \(p>\alpha=0.05\), we retain the null hypothesis.
- VIQ
Welch Two Sample t-test
data: VIQ by Gender
t = -0.77262, df = 36.973, p-value = 0.4447
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-21.010922 9.410922
sample estimates:
mean in group Female mean in group Male
109.45 115.25
Since \(p>\alpha=0.05\), we retain the null hypothesis.
- PIQ
Welch Two Sample t-test
data: PIQ by Gender
t = -0.1598, df = 37.815, p-value = 0.8739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.72079 13.42079
sample estimates:
mean in group Female mean in group Male
110.45 111.60
Since \(p>\alpha=0.05\), we retain the null hypothesis.
[Conclusion] According to three t tests, we found that there is gender difference in neither full scale IQ score, nor verbal IQ score, nor performance IQ score.