Exercise 2

Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:

讀取資料

dta2 <- Ecdat::Caschool
str(dta2)
## 'data.frame':    420 obs. of  17 variables:
##  $ distcod : int  75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
##  $ county  : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
##  $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
##  $ grspan  : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
##  $ enrltot : int  195 240 1550 243 1335 137 195 888 379 2247 ...
##  $ teachers: num  10.9 11.1 82.9 14 71.5 ...
##  $ calwpct : num  0.51 15.42 55.03 36.48 33.11 ...
##  $ mealpct : num  2.04 47.92 76.32 77.05 78.43 ...
##  $ computer: int  67 101 169 85 171 25 28 66 35 0 ...
##  $ testscr : num  691 661 644 648 641 ...
##  $ compstu : num  0.344 0.421 0.109 0.35 0.128 ...
##  $ expnstu : num  6385 5099 5502 7102 5236 ...
##  $ str     : num  17.9 21.5 18.7 17.4 18.7 ...
##  $ avginc  : num  22.69 9.82 8.98 8.98 9.08 ...
##  $ elpct   : num  0 4.58 30 0 13.86 ...
##  $ readscr : num  692 660 636 652 642 ...
##  $ mathscr : num  690 662 651 644 640 ...

新增 Student-Teacher Ratio 欄位

dta2$ratio <- with(dta2, enrltot/teachers)
str(dta2)
## 'data.frame':    420 obs. of  18 variables:
##  $ distcod : int  75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
##  $ county  : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
##  $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
##  $ grspan  : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
##  $ enrltot : int  195 240 1550 243 1335 137 195 888 379 2247 ...
##  $ teachers: num  10.9 11.1 82.9 14 71.5 ...
##  $ calwpct : num  0.51 15.42 55.03 36.48 33.11 ...
##  $ mealpct : num  2.04 47.92 76.32 77.05 78.43 ...
##  $ computer: int  67 101 169 85 171 25 28 66 35 0 ...
##  $ testscr : num  691 661 644 648 641 ...
##  $ compstu : num  0.344 0.421 0.109 0.35 0.128 ...
##  $ expnstu : num  6385 5099 5502 7102 5236 ...
##  $ str     : num  17.9 21.5 18.7 17.4 18.7 ...
##  $ avginc  : num  22.69 9.82 8.98 8.98 9.08 ...
##  $ elpct   : num  0 4.58 30 0 13.86 ...
##  $ readscr : num  692 660 636 652 642 ...
##  $ mathscr : num  690 662 651 644 640 ...
##  $ ratio   : num  17.9 21.5 18.7 17.4 18.7 ...

分出 grspan=="KK-08" 的資料

dta2_kk08 <- split(dta2, dta2$grspan)$`KK-08`
str(dta2_kk08)
## 'data.frame':    359 obs. of  18 variables:
##  $ distcod : int  75119 61499 61549 61457 61523 62042 68536 63834 62331 65722 ...
##  $ county  : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 19 ...
##  $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 193 ...
##  $ grspan  : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 2 ...
##  $ enrltot : int  195 240 1550 243 1335 137 195 888 379 446 ...
##  $ teachers: num  10.9 11.1 82.9 14 71.5 ...
##  $ calwpct : num  0.51 15.42 55.03 36.48 33.11 ...
##  $ mealpct : num  2.04 47.92 76.32 77.05 78.43 ...
##  $ computer: int  67 101 169 85 171 25 28 66 35 86 ...
##  $ testscr : num  691 661 644 648 641 ...
##  $ compstu : num  0.344 0.421 0.109 0.35 0.128 ...
##  $ expnstu : num  6385 5099 5502 7102 5236 ...
##  $ str     : num  17.9 21.5 18.7 17.4 18.7 ...
##  $ avginc  : num  22.69 9.82 8.98 8.98 9.08 ...
##  $ elpct   : num  0 4.58 30 0 13.86 ...
##  $ readscr : num  692 660 636 652 642 ...
##  $ mathscr : num  690 662 651 644 640 ...
##  $ ratio   : num  17.9 21.5 18.7 17.4 18.7 ...

創造低、中、高三個 level

quantile(dta2_kk08$readscr, probs=seq(from=0, to=1, by=.1))
##     0%    10%    20%    30%    40%    50%    60%    70%    80%    90%   100% 
## 604.50 628.48 636.66 642.64 650.16 654.90 659.78 663.96 669.56 679.64 704.00

由百分位閱讀成績的低中高分群與閱讀分數連結數級距,可以將低、中、高三分群的界線值訂在 0 、 644 、 660 和 750。

dta2_kk08$rscrf <- with(dta2_kk08, cut(readscr, ordered=T, breaks=c(0, 644, 660, 750), labels=c("L", "M", "H")))
with(dta2_kk08, table(rscrf))
## rscrf
##   L   M   H 
## 115 103 141

畫圖

library(lattice)
xyplot(readscr  ~ ratio | rscrf, 
       data= dta2_kk08,  
       xlab="Student-Teacher ratio", 
       ylab="Reading score",
       type=c('p', 'g', 'r'),
       )

Exercise 3

The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.

Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.

Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.

讀取資料

dta3 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/beautyCourseEval.txt", header = T)
str(dta3)
## 'data.frame':    463 obs. of  7 variables:
##  $ eval    : num  4.3 4.5 3.7 4.3 4.4 4.2 4 3.4 4.5 3.9 ...
##  $ beauty  : num  0.202 -0.826 -0.66 -0.766 1.421 ...
##  $ sex     : int  1 0 0 1 1 0 1 1 1 0 ...
##  $ age     : int  36 59 51 40 31 62 33 51 33 47 ...
##  $ minority: int  1 0 0 0 0 0 0 0 0 0 ...
##  $ tenure  : int  0 1 1 1 0 1 0 1 0 0 ...
##  $ courseID: int  3 0 4 2 0 0 4 0 0 4 ...

Column 1: Course evaluation score Column 2: Beauty score Column 3: Gender of professor, 1 = Female, 0 = Male Column 4: Pofessor age in years Column 5: Minority status of professor, 1 = Minority, 0 = Others Column 6: Tenure status of professor, 1 = Tenured, 0 = No Column 7: Course ID

作圖要求須以 courseID 分類 (condition),需轉為類別變數。

dta3$courseID <- as.character(dta3$courseID)

依 coefficient 排序

library(nlme)
lm <- lmList(eval~beauty| courseID, data = dta3)
lm1 <- data.frame(courseID = rownames(coef(lm)), coef(lm))
lm2 <- lm1[with(lm1,order(beauty, decreasing=F)),]
lm_order <- list(lm2$courseID)
  • index.cond: Whereas perm.cond permutes the dimensions of the multidimensional array of panels, index.cond can be used to subset (or reorder) margins of that array. index.cond can be a list or a function.
xyplot(eval  ~ beauty | courseID, 
       data= dta3,  
       index.cond = lm_order,
       layout = c(6,6),
       xlab="Beauty judgement score", 
       ylab="Average course evaluation score",
       type=c('p', 'g', 'r'),
       )

Exercise 4

A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.

Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.

Use appropriate lattice graphics to answer the following questions.

  1. Are there gender differences in the three IQ scores?

  2. Is the relationship between height and weight gender dependent?

  3. Is the relationship between IQ and brainsize (as measured by MRIcount) gender dependent?

讀取資料

dta4 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/brainsize.txt", header = T)
str(dta4)
## 'data.frame':    40 obs. of  8 variables:
##  $ Sbj     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender  : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 1 1 2 2 ...
##  $ FSIQ    : int  133 140 139 133 137 99 138 92 89 133 ...
##  $ VIQ     : int  132 150 123 129 132 90 136 90 93 114 ...
##  $ PIQ     : int  124 124 150 128 134 110 131 98 84 147 ...
##  $ Weight  : int  118 NA 143 172 147 146 138 175 134 172 ...
##  $ Height  : num  64.5 72.5 73.3 68.8 65 69 64.5 66 66.3 68.8 ...
##  $ MRICount: int  816932 1001121 1038437 965353 951545 928799 991305 854258 904858 955466 ...

Column 1: Subject ID

Column 2: Gender ID

Column 3: Full scale IQ

Column 4: Verbal IQ

Column 5: Performance IQ

Column 6: Body weight in pounds

Column 7: Height in inches

Column 8: Totol pixel counts from 18 MRI scans

Q1: 三種不同的 IQ 分數在性別間是否有差異 ?

library(reshape2)
dta4_1 <- melt(dta4[,c(2:5)])
## Using Gender as id variables
dta4_1$Gender <- as.character(dta4_1$Gender)
head(dta4_1)
##   Gender variable value
## 1 Female     FSIQ   133
## 2   Male     FSIQ   140
## 3   Male     FSIQ   139
## 4   Male     FSIQ   133
## 5 Female     FSIQ   137
## 6 Female     FSIQ    99
# install.packages("devtools")
# devtools::install_github("https://github.com/m-jahn/R-tools")
library(Rtools)
bwplot(value ~ Gender | variable, 
       data= dta4_1,
       ylab="IQ Scores",
       type=c('p', 'g', 'r'),
       panel = function(x, y, ...) {
               panel.bwplot(x,y, ...)
               panel.pvalue(x, y,
               std = NULL,
               symbol = TRUE, pvalue = TRUE, 
               pval_digits = 1, cex = 0.6, offset = -4, 
               fixed_pos = NULL, verbose = TRUE
               )
       }
       )

## p-value for comparison of  1 
##  with  Female Male  =  1 0.689452 
## p-value for comparison of  1 
##  with  Female Male  =  1 0.4446607 
## p-value for comparison of  1 
##  with  Female Male  =  1 0.8738883

由上面盒鬚圖可看到,三種 IQ 分數在性別間的平均數略有不同,但無統計顯著差異。

這裡使用 m-jahn/R-tools via 常用在 bio-informatics 的小工具在圖中加上 p-value 和統計顯著記號。

Q2: 身高與體重之間的關係是否在性別間是獨立的?

xyplot(Weight ~ Height, 
       groups = Gender, 
       data = dta4, 
       xlab = "Height", 
       ylab = "Weight", 
       type = c("p", "g", "r"), 
       auto.key = list(points = TRUE, lines = TRUE, column = 2),
       par.settings=standard.theme(color=FALSE),
       )

根據上圖結果顯示,男女的身高體重關係並不共享同樣的迴歸線模型。

Q3: IQ 分數與由 MRI 數值所表示的大腦大小之間的關係在性別間是否為獨立 ?

dta4_3 <- melt(dta4[,c(2:5, 8)], id=c("Gender", "MRICount")) 
head(dta4_3)
##   Gender MRICount variable value
## 1 Female   816932     FSIQ   133
## 2   Male  1001121     FSIQ   140
## 3   Male  1038437     FSIQ   139
## 4   Male   965353     FSIQ   133
## 5 Female   951545     FSIQ   137
## 6 Female   928799     FSIQ    99
xyplot(MRICount ~ value | variable,
       groups = Gender,
       data= dta4_3,
       xlab = "IQ scores",
       ylab="Totol pixel counts from 18 MRI scans",
       type=c('p', 'g', 'r'),
       auto.key = list(points = TRUE, lines = TRUE, column = 2),
       )

根據上圖結果顯示,各項 IQ 分數與大腦大小之間的關係在性別間並無線性回歸關係,互相獨立。