Create a new student-teacher ratio variable from the enrltot and teachers variables in the data set Caschool{Ecdat} to generate the following plot in which reading scores (readscr) for grade span assignment grspan equals “KK-08” in the data set are split into three levels: lower-third, middle-third, and upper-third:
dta2 <- Ecdat::Caschool
str(dta2)
## 'data.frame': 420 obs. of 17 variables:
## $ distcod : int 75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
## $ county : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
## $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
## $ grspan : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
## $ enrltot : int 195 240 1550 243 1335 137 195 888 379 2247 ...
## $ teachers: num 10.9 11.1 82.9 14 71.5 ...
## $ calwpct : num 0.51 15.42 55.03 36.48 33.11 ...
## $ mealpct : num 2.04 47.92 76.32 77.05 78.43 ...
## $ computer: int 67 101 169 85 171 25 28 66 35 0 ...
## $ testscr : num 691 661 644 648 641 ...
## $ compstu : num 0.344 0.421 0.109 0.35 0.128 ...
## $ expnstu : num 6385 5099 5502 7102 5236 ...
## $ str : num 17.9 21.5 18.7 17.4 18.7 ...
## $ avginc : num 22.69 9.82 8.98 8.98 9.08 ...
## $ elpct : num 0 4.58 30 0 13.86 ...
## $ readscr : num 692 660 636 652 642 ...
## $ mathscr : num 690 662 651 644 640 ...
dta2$ratio <- with(dta2, enrltot/teachers)
str(dta2)
## 'data.frame': 420 obs. of 18 variables:
## $ distcod : int 75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
## $ county : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...
## $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 94 ...
## $ grspan : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...
## $ enrltot : int 195 240 1550 243 1335 137 195 888 379 2247 ...
## $ teachers: num 10.9 11.1 82.9 14 71.5 ...
## $ calwpct : num 0.51 15.42 55.03 36.48 33.11 ...
## $ mealpct : num 2.04 47.92 76.32 77.05 78.43 ...
## $ computer: int 67 101 169 85 171 25 28 66 35 0 ...
## $ testscr : num 691 661 644 648 641 ...
## $ compstu : num 0.344 0.421 0.109 0.35 0.128 ...
## $ expnstu : num 6385 5099 5502 7102 5236 ...
## $ str : num 17.9 21.5 18.7 17.4 18.7 ...
## $ avginc : num 22.69 9.82 8.98 8.98 9.08 ...
## $ elpct : num 0 4.58 30 0 13.86 ...
## $ readscr : num 692 660 636 652 642 ...
## $ mathscr : num 690 662 651 644 640 ...
## $ ratio : num 17.9 21.5 18.7 17.4 18.7 ...
grspan=="KK-08" 的資料dta2_kk08 <- split(dta2, dta2$grspan)$`KK-08`
str(dta2_kk08)
## 'data.frame': 359 obs. of 18 variables:
## $ distcod : int 75119 61499 61549 61457 61523 62042 68536 63834 62331 65722 ...
## $ county : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 19 ...
## $ district: Factor w/ 409 levels "Ackerman Elementary",..: 362 214 367 132 270 53 152 383 263 193 ...
## $ grspan : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 2 ...
## $ enrltot : int 195 240 1550 243 1335 137 195 888 379 446 ...
## $ teachers: num 10.9 11.1 82.9 14 71.5 ...
## $ calwpct : num 0.51 15.42 55.03 36.48 33.11 ...
## $ mealpct : num 2.04 47.92 76.32 77.05 78.43 ...
## $ computer: int 67 101 169 85 171 25 28 66 35 86 ...
## $ testscr : num 691 661 644 648 641 ...
## $ compstu : num 0.344 0.421 0.109 0.35 0.128 ...
## $ expnstu : num 6385 5099 5502 7102 5236 ...
## $ str : num 17.9 21.5 18.7 17.4 18.7 ...
## $ avginc : num 22.69 9.82 8.98 8.98 9.08 ...
## $ elpct : num 0 4.58 30 0 13.86 ...
## $ readscr : num 692 660 636 652 642 ...
## $ mathscr : num 690 662 651 644 640 ...
## $ ratio : num 17.9 21.5 18.7 17.4 18.7 ...
quantile(dta2_kk08$readscr, probs=seq(from=0, to=1, by=.1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 604.50 628.48 636.66 642.64 650.16 654.90 659.78 663.96 669.56 679.64 704.00
由百分位閱讀成績的低中高分群與閱讀分數連結數級距,可以將低、中、高三分群的界線值訂在 0 、 644 、 660 和 750。
dta2_kk08$rscrf <- with(dta2_kk08, cut(readscr, ordered=T, breaks=c(0, 644, 660, 750), labels=c("L", "M", "H")))
with(dta2_kk08, table(rscrf))
## rscrf
## L M H
## 115 103 141
library(lattice)
xyplot(readscr ~ ratio | rscrf,
data= dta2_kk08,
xlab="Student-Teacher ratio",
ylab="Reading score",
type=c('p', 'g', 'r'),
)
The data set concerns student evaluation of instructor’s beauty and teaching quality for several courses at the University of Texas. The teaching evaluatons were done at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations.
Source: Hamermesh, D.S., & Parker, A.M. (2005). Beauty in the classroom: instructor’s pulchritude and putative pedagogical productivity.a Economics and Education Review, 24, 369-376. Reported in Gelman, A., & Hill, J. (2006). Data analysis using regression and hierarchical/multilevel models. p. 51.
Use the lattice package to produce the plot above. Hint: Use ‘reorder’ after obtaining regression coefficients to rearrange conditioning panels.
dta3 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/beautyCourseEval.txt", header = T)
str(dta3)
## 'data.frame': 463 obs. of 7 variables:
## $ eval : num 4.3 4.5 3.7 4.3 4.4 4.2 4 3.4 4.5 3.9 ...
## $ beauty : num 0.202 -0.826 -0.66 -0.766 1.421 ...
## $ sex : int 1 0 0 1 1 0 1 1 1 0 ...
## $ age : int 36 59 51 40 31 62 33 51 33 47 ...
## $ minority: int 1 0 0 0 0 0 0 0 0 0 ...
## $ tenure : int 0 1 1 1 0 1 0 1 0 0 ...
## $ courseID: int 3 0 4 2 0 0 4 0 0 4 ...
Column 1: Course evaluation score Column 2: Beauty score Column 3: Gender of professor, 1 = Female, 0 = Male Column 4: Pofessor age in years Column 5: Minority status of professor, 1 = Minority, 0 = Others Column 6: Tenure status of professor, 1 = Tenured, 0 = No Column 7: Course ID
作圖要求須以 courseID 分類 (condition),需轉為類別變數。
dta3$courseID <- as.character(dta3$courseID)
library(nlme)
lm <- lmList(eval~beauty| courseID, data = dta3)
lm1 <- data.frame(courseID = rownames(coef(lm)), coef(lm))
lm2 <- lm1[with(lm1,order(beauty, decreasing=F)),]
lm_order <- list(lm2$courseID)
perm.cond permutes the dimensions of the multidimensional array of panels, index.cond can be used to subset (or reorder) margins of that array. index.cond can be a list or a function.xyplot(eval ~ beauty | courseID,
data= dta3,
index.cond = lm_order,
layout = c(6,6),
xlab="Beauty judgement score",
ylab="Average course evaluation score",
type=c('p', 'g', 'r'),
)
A sample of 40 psychology students at a large southwestern university took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. The researchers also used Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects.
Source: Willerman, L., Schultz, R., Rutledge, J.N., & Bigler, E. (1991), In Vivo Brain Size and Intelligence, Intelligence, 15, 223-228.
Use appropriate lattice graphics to answer the following questions.
Are there gender differences in the three IQ scores?
Is the relationship between height and weight gender dependent?
Is the relationship between IQ and brainsize (as measured by MRIcount) gender dependent?
dta4 <- read.table("C:/Users/TheorEco Lab/Desktop/108-2/DataManagement/0413/brainsize.txt", header = T)
str(dta4)
## 'data.frame': 40 obs. of 8 variables:
## $ Sbj : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 1 1 2 2 ...
## $ FSIQ : int 133 140 139 133 137 99 138 92 89 133 ...
## $ VIQ : int 132 150 123 129 132 90 136 90 93 114 ...
## $ PIQ : int 124 124 150 128 134 110 131 98 84 147 ...
## $ Weight : int 118 NA 143 172 147 146 138 175 134 172 ...
## $ Height : num 64.5 72.5 73.3 68.8 65 69 64.5 66 66.3 68.8 ...
## $ MRICount: int 816932 1001121 1038437 965353 951545 928799 991305 854258 904858 955466 ...
Column 1: Subject ID
Column 2: Gender ID
Column 3: Full scale IQ
Column 4: Verbal IQ
Column 5: Performance IQ
Column 6: Body weight in pounds
Column 7: Height in inches
Column 8: Totol pixel counts from 18 MRI scans
library(reshape2)
dta4_1 <- melt(dta4[,c(2:5)])
## Using Gender as id variables
dta4_1$Gender <- as.character(dta4_1$Gender)
head(dta4_1)
## Gender variable value
## 1 Female FSIQ 133
## 2 Male FSIQ 140
## 3 Male FSIQ 139
## 4 Male FSIQ 133
## 5 Female FSIQ 137
## 6 Female FSIQ 99
# install.packages("devtools")
# devtools::install_github("https://github.com/m-jahn/R-tools")
library(Rtools)
bwplot(value ~ Gender | variable,
data= dta4_1,
ylab="IQ Scores",
type=c('p', 'g', 'r'),
panel = function(x, y, ...) {
panel.bwplot(x,y, ...)
panel.pvalue(x, y,
std = NULL,
symbol = TRUE, pvalue = TRUE,
pval_digits = 1, cex = 0.6, offset = -4,
fixed_pos = NULL, verbose = TRUE
)
}
)
## p-value for comparison of 1
## with Female Male = 1 0.689452
## p-value for comparison of 1
## with Female Male = 1 0.4446607
## p-value for comparison of 1
## with Female Male = 1 0.8738883
由上面盒鬚圖可看到,三種 IQ 分數在性別間的平均數略有不同,但無統計顯著差異。
這裡使用 m-jahn/R-tools via 常用在 bio-informatics 的小工具在圖中加上 p-value 和統計顯著記號。
xyplot(Weight ~ Height,
groups = Gender,
data = dta4,
xlab = "Height",
ylab = "Weight",
type = c("p", "g", "r"),
auto.key = list(points = TRUE, lines = TRUE, column = 2),
par.settings=standard.theme(color=FALSE),
)
根據上圖結果顯示,男女的身高體重關係並不共享同樣的迴歸線模型。
dta4_3 <- melt(dta4[,c(2:5, 8)], id=c("Gender", "MRICount"))
head(dta4_3)
## Gender MRICount variable value
## 1 Female 816932 FSIQ 133
## 2 Male 1001121 FSIQ 140
## 3 Male 1038437 FSIQ 139
## 4 Male 965353 FSIQ 133
## 5 Female 951545 FSIQ 137
## 6 Female 928799 FSIQ 99
xyplot(MRICount ~ value | variable,
groups = Gender,
data= dta4_3,
xlab = "IQ scores",
ylab="Totol pixel counts from 18 MRI scans",
type=c('p', 'g', 'r'),
auto.key = list(points = TRUE, lines = TRUE, column = 2),
)
根據上圖結果顯示,各項 IQ 分數與大腦大小之間的關係在性別間並無線性回歸關係,互相獨立。