DataM: Inclass Exercise 0413: Trellis 2-4

library(dplyr)
library(lattice)

In-class exercise 2.

Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:

Target output

[Solution and Answer]

Load in the data set

data(Caschool, package = 'Ecdat')
head(Caschool)

str(Caschool)

'data.frame':   420 obs. of  17 variables:
 $ distcod : int  75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
 $ county  : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
 $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
 $ grspan  : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
 $ enrltot : int  195 240 1550 243 1335 137 195 888 379 2247 ...
 $ teachers: num  10.9 11.1 82.9 14 71.5 ...
 $ calwpct : num  0.51 15.42 55.03 36.48 33.11 ...
 $ mealpct : num  2.04 47.92 76.32 77.05 78.43 ...
 $ computer: int  67 101 169 85 171 25 28 66 35 0 ...
 $ testscr : num  691 661 644 648 641 ...
 $ compstu : num  0.344 0.421 0.109 0.35 0.128 ...
 $ expnstu : num  6385 5099 5502 7102 5236 ...
 $ str     : num  17.9 21.5 18.7 17.4 18.7 ...
 $ avginc  : num  22.69 9.82 8.98 8.98 9.08 ...
 $ elpct   : num  0 4.58 30 0 13.86 ...
 $ readscr : num  692 660 636 652 642 ...
 $ mathscr : num  690 662 651 644 640 ...

Use `hele(Caschool)` to find the meaning of each variable in `Caschool`

distcod: district code
county: county
district: district
grspan: grade span of district
enrltot: total enrollment
teachers: number of teachers
calwpct: percent qualifying for CalWorks
mealpct: percent qualifying for reduced-price lunch
computer: number of computers
testscr: average test score (read.scr+math.scr)/2
compstu: computer per student
expnstu: expenditure per student
str: student teacher ratio
avginc: district average income
elpct: percent of English learners
readscr: average reading score
mathscr: average math score

Create a new variable `ratio_st`: the ratio of #student and #teacher

dta2 <- Caschool %>% dplyr::filter(grspan == 'KK-08') %>%
  dplyr::select(enrltot, teachers, readscr) %>%
  mutate(ratio_st = enrltot / teachers)
summary(dta2$ratio_st)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.00   18.67   19.78   19.71   20.89   25.80

Group data into ‘L’, ‘M’, and ‘H’ by reading scores (`readscr`)

# Find the 33th and the 67th percentile of readscr first.
percentile_readscr <- quantile(dta2$readscr, probs = seq(0, 1, .01))

# Group
dta2 <- dta2 %>% mutate(readscr_group = cut(readscr, ordered=TRUE, breaks=c(min(readscr)-1, percentile_readscr[33+1], percentile_readscr[67+1], max(readscr)+1), labels=c('L', 'M', 'H')))
summary(dta2$readscr_group)

  L   M   H 
119 121 119

Plot

# Find the 33th and the 67th percentile of readscr first.
xyplot(readscr ~ ratio_st | readscr_group, data=dta2,  
       layout=c(3, 1),            # three columns and one row
       type = c('p', 'g', 'r'),   # plot points, regression lines, and grid
       xlab = 'Student-Teacher ratio', # name the axes
       ylab = 'Reading score')

In-class exercise 3.

The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.

Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.

Column 1: Course evaluation score
Column 2: Beauty score
Column 3: Gender of professor, 1 = Female, 0 = Male
Column 4: Pofessor age in years
Column 5: Minority status of professor, 1 = Minority, 0 = Others
Column 6: Tenure status of professor, 1 = Tenured, 0 = No
Column 7: Course ID

Target output

Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.

[Solution and Answer]

Load in the data set

dta3 <- read.table('../data/data_inclass0413_trellis_3.txt', header = TRUE)
head(dta3)

str(dta3)

'data.frame':   463 obs. of  7 variables:
 $ eval    : num  4.3 4.5 3.7 4.3 4.4 4.2 4 3.4 4.5 3.9 ...
 $ beauty  : num  0.202 -0.826 -0.66 -0.766 1.421 ...
 $ sex     : int  1 0 0 1 1 0 1 1 1 0 ...
 $ age     : int  36 59 51 40 31 62 33 51 33 47 ...
 $ minority: int  1 0 0 0 0 0 0 0 0 0 ...
 $ tenure  : int  0 1 1 1 0 1 0 1 0 0 ...
 $ courseID: int  3 0 4 2 0 0 4 0 0 4 ...

Plot

x: eval{dta3}, y: beauty{dta3}. Make courseID{dta3} be factorial and use it to be the grouping index.
Set tht physical aspect ratio of the panels.
Set the layout as six columns and six rows.
Plot points, regression lines, and grids together
Name the axes.
Sort the plots in the descending order of the regression coefficients when using x=eval{dta3} and y=beauty{dta3}.

xyplot(eval ~ beauty | factor(courseID), data = dta3, #1
       aspect = 0.7,                                  #2
       layout = c(6, 6),                              #3
       type = c('p', 'r', 'g'),                       #4
       xlab = 'Beauty judgement score',               #5
       ylab = 'Average course evaluation score',      #5
       index.cond = function(x, y) coefficients(lm(y ~ x))[2], #6
       col = blues9[8]
)

In-class exercise 4.

A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.

Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.

Column 1: Subject ID
Column 2: Gender ID
Column 3: Full scale IQ
Column 4: Verbal IQ
Column 5: Performance IQ
Column 6: Body weight in pounds
Column 7: Height in inches
Column 8: Totol pixel counts from 18 MRI scans

Use appropriate lattice graphics to answer the following questions.

Are there gender differences in the three IQ scores?
Is the relationship between height and weight gender dependent?
Is the relationship between IQ and brainsize (as measured by MRIcount) gender dependent?

[Solution and Answer]

Load the data set

dta4 <- read.table('../data/data_inclass0413_trellis_4.txt', header = TRUE)
head(dta4)

str(dta4)

'data.frame':   40 obs. of  8 variables:
 $ Sbj     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Gender  : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 1 1 2 2 ...
 $ FSIQ    : int  133 140 139 133 137 99 138 92 89 133 ...
 $ VIQ     : int  132 150 123 129 132 90 136 90 93 114 ...
 $ PIQ     : int  124 124 150 128 134 110 131 98 84 147 ...
 $ Weight  : int  118 NA 143 172 147 146 138 175 134 172 ...
 $ Height  : num  64.5 72.5 73.3 68.8 65 69 64.5 66 66.3 68.8 ...
 $ MRICount: int  816932 1001121 1038437 965353 951545 928799 991305 854258 904858 955466 ...

Q1: Are there gender differences in the three IQ scores?

Draw boxplots of three IQ scores for each gender to see if there is difference.

FSIQ

bwplot(FSIQ ~ Gender, data = dta4,
       xlab='Gender', ylab = 'Full scale IQ')

The difference of FSIQ between female and male in our sample is not obvious.

bwplot(VIQ ~ Gender, data = dta4,
       xlab='Gender', ylab = 'Verbal IQ')

It seems that there is a slight difference of VIQ between female and male.

bwplot(PIQ ~ Gender, data = dta4,
       xlab='Gender', ylab = 'Performance IQ')

The difference of PIQ between female and male in our sampleis not obvious.

Conduct two-sample independent t-test to determine whether there is gender difference in three IQ scores.

FSIQ

t.test(FSIQ ~ Gender, data = dta4)


    Welch Two Sample t-test

data:  FSIQ by Gender
t = -0.40267, df = 37.892, p-value = 0.6895
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18.68639  12.48639
sample estimates:
mean in group Female   mean in group Male 
               111.9                115.0

Since \(p>\alpha=0.05\), we retain the null hypothesis.

t.test(VIQ ~ Gender, data = dta4)


    Welch Two Sample t-test

data:  VIQ by Gender
t = -0.77262, df = 36.973, p-value = 0.4447
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -21.010922   9.410922
sample estimates:
mean in group Female   mean in group Male 
              109.45               115.25

Since \(p>\alpha=0.05\), we retain the null hypothesis.

t.test(PIQ ~ Gender, data = dta4)


    Welch Two Sample t-test

data:  PIQ by Gender
t = -0.1598, df = 37.815, p-value = 0.8739
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -15.72079  13.42079
sample estimates:
mean in group Female   mean in group Male 
              110.45               111.60

Since \(p>\alpha=0.05\), we retain the null hypothesis.

[Conclusion] According to three t tests, we found that there is gender difference in neither full scale IQ score, nor verbal IQ score, nor performance IQ score.

DataM: Inclass Exercise 0413: Trellis 2-4