1 In-class 1

A sample of pupils whose mathematics attainment is tested at the end of Year 2 (seven years old). The coverage of mathematics curriculum each pupil received during Year 2 is measured; so is the mathematics attainment at the beginning of the year. There are 39 pupils in the study.

Column 1: Mathematics attainment, end of Year 2
Column 2: Mathematics attainment, end of Year 1
Column 3: Curriculum coverage

1.1 Input data

# first R session using math attainment data set

# read in a plain text file with variable names and assign a name to it
dta <- read.table("math_attainment.txt", header = T)

1.2 Checking data

# structure of data
str(dta)

## 'data.frame':    39 obs. of  3 variables:
##  $ math2: int  28 56 51 13 39 41 30 13 17 32 ...
##  $ math1: int  18 22 44 8 20 12 16 5 9 18 ...
##  $ cc   : num  328 406 387 167 328 ...

# first 6 rows
head(dta)

##   math2 math1     cc
## 1    28    18 328.20
## 2    56    22 406.03
## 3    51    44 386.94
## 4    13     8 166.91
## 5    39    20 328.20
## 6    41    12 328.20

1.3 Descriptive statistics

# variable mean
colMeans(dta)

##     math2     math1        cc 
##  28.76923  15.35897 188.83667

Mean math score of second year is higher than first year.

# variable sd
apply(dta, 2, sd)

##     math2     math1        cc 
## 10.720029  7.744224 84.842513

# correlation matrix
cor(dta)

##           math2     math1        cc
## math2 1.0000000 0.7443604 0.6570098
## math1 0.7443604 1.0000000 0.5956771
## cc    0.6570098 0.5956771 1.0000000

1.4 Regression analysis

# regress math2 by math1
dta.lm <- lm(math2 ~ math1, data=dta)

# show results
summary(dta.lm)

## 
## Call:
## lm(formula = math2 ~ math1, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.430  -5.521  -0.369   4.253  20.388 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.944      2.607   4.965 1.57e-05 ***
## math1          1.030      0.152   6.780 5.57e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.255 on 37 degrees of freedom
## Multiple R-squared:  0.5541, Adjusted R-squared:  0.542 
## F-statistic: 45.97 on 1 and 37 DF,  p-value: 5.571e-08

# show anova table
anova(dta.lm)

## Analysis of Variance Table
## 
## Response: math2
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## math1      1 2419.6 2419.59  45.973 5.571e-08 ***
## Residuals 37 1947.3   52.63                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1.5 Plot data

# specify square plot region
par(pty="s")

# scatter plot of math2 by math1
plot(math2 ~ math1, data=dta, xlim=c(0, 60), ylim=c(0, 60),
     xlab="Math score at Year 1", ylab="Math score at Year 2")

# add grid lines
grid()

# add regression line
abline(dta.lm, lty=2)

# add plot title
title("Mathematics Attainment")

The math score of first year may be a proper index to predict the math score of second year.

1.6 Diagnostics

# specify maximum plot region
par(pty="m")

# plot centering residuals value ~ fitted value
# ??? resid(dta.lm) versus scale(resid(dta.lm)) ???
plot(scale(resid(dta.lm)) ~ fitted(dta.lm), 
     ylim=c(-3.5, 3.5), type="n",
     xlab="Fitted values", ylab="Standardized residuals") 

# add id label  
text(fitted(dta.lm), scale(resid(dta.lm)), labels=rownames(dta), cex=0.5)  

# add auxiliary line
grid()

# add a horizontal red dash line
abline(h=0, lty=2, col="red")

1.7 Normality check

qqnorm(scale(resid(dta.lm)))

qqline(scale(resid(dta.lm)))

grid()

2 In-class 2

Question: The notation, women{datasets}, indicates that a data object by the name women is in the datasets package. This package is preloaded when R is invoked. Explain the difference between c(women) and c(as.matrix(women)) using the women{datasets}.

c() is a generic function which combines values into a vector or list.

The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.

2.1 c(women)

women

##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142
## 11     68    146
## 12     69    150
## 13     70    154
## 14     71    159
## 15     72    164

# women is a list type object  
typeof(women)

## [1] "list"

# therefore c(women) is a list class
class(c(women))

## [1] "list"

# c(women) is a list with 2 vector components
c(women)

## $height
##  [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## 
## $weight
##  [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164

2.2 c(as.matrix(women))

# as.matrix() attempts to turn its argument into a matrix.
# list transform to double type array/matrix
as.matrix(women)

##       height weight
##  [1,]     58    115
##  [2,]     59    117
##  [3,]     60    120
##  [4,]     61    123
##  [5,]     62    126
##  [6,]     63    129
##  [7,]     64    132
##  [8,]     65    135
##  [9,]     66    139
## [10,]     67    142
## [11,]     68    146
## [12,]     69    150
## [13,]     70    154
## [14,]     71    159
## [15,]     72    164

# c(as.matrix(women)) is a double numeric vector 
c(as.matrix(women))

##  [1]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 115 117 120 123
## [20] 126 129 132 135 139 142 146 150 154 159 164

typeof(c(as.matrix(women)))

## [1] "double"

class(c(as.matrix(women)))

## [1] "numeric"

3 In-class 3

Use help to examine the coding scheme for the mother’s race variable in the birthwt{MASS} dataset. The MASS comes with the base R installation but is not automatically loaded when R is invoked.

The birthwt data frame has 189 rows and 10 columns. The data were collected at Baystate Medical Center, Springfield, Mass during 1986.

race: mother’s race (1 = white, 2 = black, 3 = other).

3.1 3-1

1.How many black mothers are there in this data frame?

library(MASS)

head(birthwt)

##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622

ftable(birthwt$race)

##   1  2  3
##          
##  96 26 67

Answer: 96 white mothers, 26 black mothers, 67 other race mother

3.2 3-2

2.What does the following R command do?
\[ c("White", "Black", "Other")[birthwt$race] \]

將race這個欄位裡原本的數字改成文字呈現 1=White, 2=Black, 3=Other

c("White", "Black", "Other")[birthwt$race]

##   [1] "Black" "Other" "White" "White" "White" "Other" "White" "Other" "White"
##  [10] "White" "Other" "Other" "Other" "Other" "White" "White" "Black" "White"
##  [19] "Other" "White" "Other" "White" "White" "Other" "Other" "White" "White"
##  [28] "White" "Black" "Black" "Black" "White" "Black" "White" "Black" "White"
##  [37] "White" "White" "White" "White" "Black" "White" "Black" "White" "White"
##  [46] "White" "White" "Other" "White" "Other" "White" "Other" "White" "White"
##  [55] "Other" "Other" "Other" "Other" "Other" "Other" "Other" "Other" "Other"
##  [64] "White" "Other" "Other" "Other" "Other" "White" "Black" "White" "Other"
##  [73] "Other" "Black" "White" "Black" "White" "White" "Black" "White" "White"
##  [82] "White" "Other" "Other" "Other" "Other" "Other" "White" "White" "White"
##  [91] "White" "Other" "White" "White" "White" "White" "White" "White" "White"
## [100] "White" "White" "White" "Other" "White" "Other" "Black" "White" "White"
## [109] "White" "Black" "White" "Other" "White" "White" "White" "Other" "White"
## [118] "Other" "White" "Other" "White" "Other" "White" "White" "White" "White"
## [127] "White" "White" "White" "White" "Other" "White" "Black" "Other" "Other"
## [136] "Other" "Other" "Black" "Other" "White" "White" "White" "Other" "Other"
## [145] "White" "White" "Black" "White" "Other" "Other" "Other" "White" "White"
## [154] "White" "White" "Other" "Black" "White" "Black" "Other" "White" "Other"
## [163] "Other" "Other" "Black" "White" "Other" "Other" "White" "White" "Black"
## [172] "Black" "Black" "Other" "Other" "White" "White" "White" "White" "Black"
## [181] "Other" "Other" "White" "Other" "White" "Other" "Other" "Black" "White"

4 In-class 4

Question: Regarding UCBAdmissions{datasets} data object, what does the output of each of the R statements mean, respectively?
* UCBAdmissions[,1,]
* UCBAdmissions[,1,1]
* UCBAdmissions[1,1,]

UCBAdmissions {datasets} Student Admissions at UC Berkeley

Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.
A 3-dimensional array resulting from cross-tabulating 4526 observations on 3 variables. The variables and their levels are as follows:

No	Name	Levels
1	Admit	Admitted, Rejected
2	Gender	Male, Female
3	Dept	A, B, C, D, E, F

UCBAdmissions

## , , Dept = A
## 
##           Gender
## Admit      Male Female
##   Admitted  512     89
##   Rejected  313     19
## 
## , , Dept = B
## 
##           Gender
## Admit      Male Female
##   Admitted  353     17
##   Rejected  207      8
## 
## , , Dept = C
## 
##           Gender
## Admit      Male Female
##   Admitted  120    202
##   Rejected  205    391
## 
## , , Dept = D
## 
##           Gender
## Admit      Male Female
##   Admitted  138    131
##   Rejected  279    244
## 
## , , Dept = E
## 
##           Gender
## Admit      Male Female
##   Admitted   53     94
##   Rejected  138    299
## 
## , , Dept = F
## 
##           Gender
## Admit      Male Female
##   Admitted   22     24
##   Rejected  351    317

KEY: UCBAdmissions[Admitlevel(1=Admitted, 2=Rejected), Gender(1=Male, 2=Female), Department (1=A, 2=B, 3=C, 4=D, 5=E)]

4.1 UCBAdmissions[,1,]

UCBAdmissions[,1,]

##           Dept
## Admit        A   B   C   D   E   F
##   Admitted 512 353 120 138  53  22
##   Rejected 313 207 205 279 138 351

The admission status for each department in male students.

4.2 UCBAdmissions[,1,1]

UCBAdmissions[,1,1]

## Admitted Rejected 
##      512      313

The admission status for “A” department in male students.

4.3 UCBAdmissions[1,1,]

UCBAdmissions[1,1,]

##   A   B   C   D   E   F 
## 512 353 120 138  53  22

The admitted number for each department admitted in male students.

5 In-class 5

What happens when the following command is entered?

#No run
help(ls("package:MASS")[92])

Age of Menarche in Warsaw

Description: Proportions of female children at various ages during adolescence who have reached menarche.

Use the observation to find out how many items there are in the package MASS.

ls("package:MASS")

##   [1] "abbey"             "accdeaths"         "addterm"          
##   [4] "Aids2"             "Animals"           "anorexia"         
##   [7] "area"              "as.fractions"      "bacteria"         
##  [10] "bandwidth.nrd"     "bcv"               "beav1"            
##  [13] "beav2"             "biopsy"            "birthwt"          
##  [16] "Boston"            "boxcox"            "cabbages"         
##  [19] "caith"             "Cars93"            "cats"             
##  [22] "cement"            "chem"              "con2tr"           
##  [25] "contr.sdif"        "coop"              "corresp"          
##  [28] "cov.mcd"           "cov.mve"           "cov.rob"          
##  [31] "cov.trob"          "cpus"              "crabs"            
##  [34] "Cushings"          "DDT"               "deaths"           
##  [37] "denumerate"        "dose.p"            "drivers"          
##  [40] "dropterm"          "eagles"            "enlist"           
##  [43] "epil"              "eqscplot"          "farms"            
##  [46] "fbeta"             "fgl"               "fitdistr"         
##  [49] "forbes"            "fractions"         "frequency.polygon"
##  [52] "GAGurine"          "galaxies"          "gamma.dispersion" 
##  [55] "gamma.shape"       "gehan"             "genotype"         
##  [58] "geyser"            "gilgais"           "ginv"             
##  [61] "glm.convert"       "glm.nb"            "glmmPQL"          
##  [64] "hills"             "hist.FD"           "hist.scott"       
##  [67] "housing"           "huber"             "hubers"           
##  [70] "immer"             "Insurance"         "is.fractions"     
##  [73] "isoMDS"            "kde2d"             "lda"              
##  [76] "ldahist"           "leuk"              "lm.gls"           
##  [79] "lm.ridge"          "lmsreg"            "lmwork"           
##  [82] "loglm"             "loglm1"            "logtrans"         
##  [85] "lqs"               "lqs.formula"       "ltsreg"           
##  [88] "mammals"           "mca"               "mcycle"           
##  [91] "Melanoma"          "menarche"          "michelson"        
##  [94] "minn38"            "motors"            "muscle"           
##  [97] "mvrnorm"           "nclass.freq"       "neg.bin"          
## [100] "negative.binomial" "negexp.SSival"     "newcomb"          
## [103] "nlschools"         "npk"               "npr1"             
## [106] "Null"              "oats"              "OME"              
## [109] "painters"          "parcoord"          "petrol"           
## [112] "phones"            "Pima.te"           "Pima.tr"          
## [115] "Pima.tr2"          "polr"              "psi.bisquare"     
## [118] "psi.hampel"        "psi.huber"         "qda"              
## [121] "quine"             "Rabbit"            "rational"         
## [124] "renumerate"        "rlm"               "rms.curv"         
## [127] "rnegbin"           "road"              "rotifer"          
## [130] "Rubber"            "sammon"            "select"           
## [133] "Shepard"           "ships"             "shoes"            
## [136] "shrimp"            "shuttle"           "Sitka"            
## [139] "Sitka89"           "Skye"              "snails"           
## [142] "SP500"             "stdres"            "steam"            
## [145] "stepAIC"           "stormer"           "studres"          
## [148] "survey"            "synth.te"          "synth.tr"         
## [151] "theta.md"          "theta.ml"          "theta.mm"         
## [154] "topo"              "Traffic"           "truehist"         
## [157] "ucv"               "UScereal"          "UScrime"          
## [160] "VA"                "waders"            "whiteside"        
## [163] "width.SJ"          "write.matrix"      "wtloss"

20211018 In-class exercises

Chang-Chun Chen

2021-10-20