The datasets used in the book are available in the package Sleuth2. Each dataset is named as follows: caseXXYY for the YYth study in chapter XX. Example: case0101

The package also contains the dataset used in the exercises. These are named exXXYY for the YYth excerise in chapter XX. Example: ex0116

Load the package and view its content.

library(Sleuth2)
ls("package:Sleuth2")

Chapter 1 – Drawing Statistical Conclusions

The Motivation and Creativity case study - Randomized Experiment

The data for Case Study 1 was collected by psychologist Teresa Amabile in an experiment concerning the effects of intrinsic and extrinsic motivation on creativity. A total of 47 subjects with considerable experience in creative writing were randomly assigned to one of two treatment groups: 23 were placed into the “extrinsic” treatment group and 24 were placed into the “intrinsic” treatment group. Each group was given a questionnaire asking to rank a list of seven reasons for writing. The intrinsic group was given intrinsic reasons fro writing such as “You feel relaxed when you write” while the extrinsic was given extrinsic reasons for writing such as “You enjoy public recognition of your work”.

After completing the questionnaire, all subjects were asked to write a poem in the Haiku style about “laughter”. All poems were submitted to 12 poets who evaluated them on a 40 point scale. The Score variable holds the average ratings given by the 12 judges.

data(case0101)
head(case0101)
##   Score Treatment
## 1   5.0 Extrinsic
## 2   5.4 Extrinsic
## 3   6.1 Extrinsic
## 4  10.9 Extrinsic
## 5  11.8 Extrinsic
## 6  12.0 Extrinsic
summary(case0101)
##      Score           Treatment 
##  Min.   : 5.00   Extrinsic:23  
##  1st Qu.:14.90   Intrinsic:24  
##  Median :18.70                 
##  Mean   :17.86                 
##  3rd Qu.:21.25                 
##  Max.   :29.70

case0101$Score class is numeric of type double

class(case0101$Score)
## [1] "numeric"
typeof(case0101$Score)
## [1] "double"

and case0101$Treatment class is factor (underlying type is integer)

class(case0101$Treatment)
## [1] "factor"
typeof(case0101$Treatment)
## [1] "integer"
levels(case0101$Treatment)
## [1] "Extrinsic" "Intrinsic"

Split the Score into two groups according to the Treatment factor. The function split returns a list with columns named after the levels of the factor.

g = split(case0101$Score, case0101$Treatment)
summary(g$Extrinsic)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   12.15   17.20   15.74   18.95   24.00
sd(g$Extrinsic)
## [1] 5.252596
summary(g$Intrinsic)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   17.43   20.40   19.88   22.30   29.70
sd(g$Intrinsic)
## [1] 4.439513

Note: the standard deviation of a set \(X = \{x_1,\ldots,x_n \}\) of numbers with mean \(\mu = (\sum_i x_i)/n\) is:

\[\sqrt{\frac{\sum_i(x_i - \mu)^2}{n-1}}\]

x = g$Intrinsic
sqrt(sum((x-mean(x))^2/(length(x)-1)))
## [1] 4.439513

Histograms of the Score conditioned on the Treatment

Requires the lattice library

require(lattice)

Side by side (the default)

histogram(~Score | Treatment, data = case0101)

or on top of one another

histogram(~Score | Treatment, data = case0101, layout = c(1,2))

or, using ggplot()

library(ggplot2)
ggplot(case0101, aes(x=Score)) +
    geom_histogram(fill="white", colour="black", binwidth=4) +
    facet_grid(Treatment ~ .)

Stem-and-leaf plot of all scores

Use the aplpack package; note that the base stem function is not working properly and, in any case, does not do back-to-back plots

library(aplpack)
## Loading required package: tcltk
# plot a stem-and-leaf diagram for all the scores
stem.leaf(case0101$Score, unit=.1, trim.outliers=F)
## 1 | 2: represents 1.2
##  leaf unit: 0.1
##             n: 47
##    2     5 | 04
##    3     6 | 0
##          7 | 
##          8 | 
##          9 | 
##    4    10 | 8
##    5    11 | 8
##   10    12 | 00038
##   11    13 | 6
##   12    14 | 8
##   13    15 | 0
##   15    16 | 67
##   21    17 | 222355
##   (4)   18 | 2577
##   22    19 | 12257
##   17    20 | 2567
##   13    21 | 226
##   10    22 | 1126
##    6    23 | 1
##    5    24 | 002
##         25 | 
##    2    26 | 7
##         27 | 
##         28 | 
##    1    29 | 7
# and back-to-back diagrams as in display 1.10
# first split the scores according to the factor Treament:
scores = split(case0101, case0101$Treatment)
# and plot diagram
stem.leaf.backback(scores$Extrinsic$Score, scores$Intrinsic$Score, unit=.1, trim.outliers=F)
## ____________________________
##   1 | 2: represents 1.2, leaf unit: 0.1 
## scores$Extrinsic$Score
##                  scores$Intrinsic$Score
## ____________________________
##    2     40|  5 |           
##    3      0|  6 |           
##            |  7 |           
##            |  8 |           
##            |  9 |           
##    4      8| 10 |           
##    5      8| 11 |           
##    7     30| 12 |008    3   
##            | 13 |6      4   
##    8      8| 14 |           
##    9      0| 15 |           
##   10      7| 16 |6      5   
##   (4)  5322| 17 |25     7   
##    9    775| 18 |2      8   
##    6     52| 19 |127   11   
##    4      7| 20 |256   (3)  
##    3      2| 21 |26    10   
##    2      1| 22 |126    8   
##            | 23 |1      5   
##    1      0| 24 |02     4   
##            | 25 |           
##            | 26 |7      2   
##            | 27 |           
##            | 28 |           
##            | 29 |7      1   
##            | 30 |           
## ____________________________
## n:       23      24     
## ____________________________

We perform a two sample t-test to determine whether the difference in mean value between the two groups is significant. Note that this difference is 4.1442028.

t.test(g$Extrinsic, g$Intrinsic)
## 
##  Welch Two Sample t-test
## 
## data:  g$Extrinsic and g$Intrinsic
## t = -2.9153, df = 43.108, p-value = 0.005618
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.010803 -1.277603
## sample estimates:
## mean of x mean of y 
##  15.73913  19.88333

We see that a 95% confidence interval does not contain 0 with a p-value of 0.006. We conclude that the difference is significant. Since this is a randomized experiment we also conclude that there is a causal relationship between intrinsic motivation and creativity. However, since the participants in the study were volunteered we cannot extend this causal relationship to any population outside of the study participants. Indeed, the participants cannot be thought of representative of any broader population.

If the questionnaire had no effect on creativity, the subjects would have received the same score regardless of the group to which they were assigned. Thus we can reaasign the subjects to the groups using a permutation of the original group assignment and compute the resulting difference in the mean scores between the two new groups. Conceptually, we couldcompute these differences in mean scores bwteen the two groups for every possible assignment of the subjects to the two groups. In practice, this is not possible due to the very large number of permutations. But the following code draws a histogram of the mean differences for 1000 permutations as an estimate of the randomization distribution of the mean differences.

require(gtools)
## Loading required package: gtools
set.seed(1234)
myfun = function() {
  g = split(case0101$Score, permute(case0101$Treatment));
  mean(g$Extrinsic) - mean(g$Intrinsic)
}
nullhyp = replicate(1000, myfun())
hist(nullhyp, freq=F, breaks=40, xlab = "Estimate of the randomization distribution (1000 permutations)")

qplot(nullhyp, geom="histogram", binwidth=.2, fill = I("blue"), colour = I("red"),
      alpha = I(.2), xlab = "Estimate of the randomization distribution (1000 permutations)")

Assuming the null hypothesis that group assignment has no effect on scores we see that a mean difference of 4.1442028 is very extreme. One is led to believe that the nll hypothesis is wrong and that the group assignment, and not the randomization, caused this large difference. The probability that the randomization alone leads to a mean difference as extreme or more extreme than the one observed is call the p-value. In our sample of 1000 regroupings only 1 produced a mean difference as large or greater so the p-value in this case is 0.001. This is a one sided p-value. For a two sided p-value we compute how many mean differences are greater than or equal to 4.14 in absolute value. We get 2 yielding a two-sided p-value of 0.002.

The Sex Discrimination in Employment case study - An Observational Study

data(case0102)
head(case0102)
##   Salary    Sex
## 1   3900 Female
## 2   4020 Female
## 3   4290 Female
## 4   4380 Female
## 5   4380 Female
## 6   4380 Female
summary(case0102)
##      Salary         Sex    
##  Min.   :3900   Female:61  
##  1st Qu.:4980   Male  :32  
##  Median :5400              
##  Mean   :5420              
##  3rd Qu.:6000              
##  Max.   :8100

Histograms for Salary conditioned on Sex

histogram(~Salary | Sex, data = case0102, layout = c(1,2))

Or, using ggplot()

ggplot(case0102, aes(x=Salary)) +
  geom_histogram(binwidth=500, fill="white", colour="black") +
  facet_grid(Sex ~ .)

Histograms and Boxplots combined on a single graphic

A very crude approach

hist(case0102$Salary, main='Male and Female Starting Salaries', xlab='Starting Salary ($US)')
boxplot(case0102$Salary, horizontal=TRUE, at=-.5, add=TRUE, axes=FALSE)

A more sophisticated approach using the package sfsmisc

library(sfsmisc)
hist(case0102$Salary, freq=F)

histBxp(case0102$Salary,main='Male and Female Starting Salaries',xlab='Starting Salary ($US)')
## Warning in hist.default(x, probability = probability, include.lowest =
## include.lowest, : argument 'probability' is not made use of

In the case of this observational study, we want to estimate the diference in average starting salary given to males vs. females. Unlike in the preceding randomized study, we cannot view the sex of individuals as randomly assigned. In order to obtain an interpretation that supports a statistical analysis, we consider a fictitious probability model by assuming that the employer assigned the set of starting salaries to the employees at random. We may then ask whether to observed salary assignment is a likely outcome. The collction of differences in averages from all possible assignments of starting salaries to individuals makes up the permutation distribution of the test statistic for this model. Again, it is impossible to compute this distribution in practice but, as we shall see later, there are shortcuts available to approximate it.

Exercise 1.16

data(ex0116)
summary(ex0116)
##     Planet              Order          Distance     
##  Length:10          Min.   : 1.00   Min.   :  3.87  
##  Class :character   1st Qu.: 3.25   1st Qu.: 11.31  
##  Mode  :character   Median : 5.50   Median : 40.52  
##                     Mean   : 5.50   Mean   :110.07  
##                     3rd Qu.: 7.75   3rd Qu.:167.87  
##                     Max.   :10.00   Max.   :395.00
# scatter plot of distance versus order from the sun
plot(ex0116$Order, ex0116$Distance, xlab="Order", ylab="Distance")

# same but plot the logarithm of the distances
plot(ex0116$Order, log(ex0116$Distance), xlab="Order", ylab="log(Distance)")

# summary for the distances
summary(ex0116$Distance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.87   11.31   40.51  110.10  167.90  395.00
sd(ex0116$Distance)
## [1] 139.5745
# summary for the log of the distances
summary(log(ex0116$Distance))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.353   2.408   3.660   3.718   5.083   5.979
sd(log(ex0116$Distance))
## [1] 1.630669

Chapter 2 – Inference Using t-Distributions

The Bumpus Data on Natural Selection case study

data(case0201)
summary(case0201)
##     Humerus           Status  
##  Min.   :659.0   Perished:24  
##  1st Qu.:724.5   Survived:35  
##  Median :736.0                
##  Mean   :733.9                
##  3rd Qu.:747.0                
##  Max.   :780.0

Note: the humerus lengths are reported as integers. To obtain the lengths in inches (as in the book) divide by 1000.

Split the Humerus (lengths) into two groups according to the Status factor. The function split returns a list with columns named after the levels of the factor

g = split(case0201$Humerus,case0201$Status)
summary(g$Perished)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   659.0   718.2   733.5   727.9   743.2   765.0
length(g$Perished)
## [1] 24
sd(g$Perished)
## [1] 23.54259
summary(g$Survived)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   687.0   728.0   736.0   738.0   751.5   780.0
length(g$Survived)
## [1] 35
sd(g$Survived)
## [1] 19.83906

Two-sample t-test for the humerus length of the surviving versus the dead sparrows.

The null hypothesis is that the difference of mean lengths is zero while the alternative hypothesis is that it is not.
Note that we must set the value of the var.equal parameter to TRUE to perform a standard two-sample t-test, i.e. assuming that teh variances are equal and using the pooled variance as an estimate of the variance. When var.equal is set to FALSE (the default) t.test() performs a Welch two-sample t-test, i.e. it uses the Welch approximation to the degrees of freedom.

t.test(g$Survived, g$Perished, var.equal=T)
## 
##  Two Sample t-test
## 
## data:  g$Survived and g$Perished
## t = 1.777, df = 57, p-value = 0.0809
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.279386 21.446053
## sample estimates:
## mean of x mean of y 
##  738.0000  727.9167

The result of the test is inconclusive, namely the null hypothesis is not rejected, with a (two-sided) p-value of 0.08 and a 95% confidence interval from -1.3 to 21.4 (recall the normalizing factor of 1000 when comapring with book’s results). In particular, the 95% confidence interval contains 0!

The Anatomical Abnormalities Associated with Schizophrenia case study

data(case0202)
summary(case0202)
##     Unaffect        Affected   
##  Min.   :1.250   Min.   :1.02  
##  1st Qu.:1.600   1st Qu.:1.31  
##  Median :1.770   Median :1.59  
##  Mean   :1.759   Mean   :1.56  
##  3rd Qu.:1.935   3rd Qu.:1.78  
##  Max.   :2.080   Max.   :2.02

Paired t-test

The null hypothesis is that the mean of the differences in left hippocampus volumes between schizophrenic individuals and their nonschizophrenic twins is zero and the alternative is that it is not (two-sided test).

t.test(case0202$Unaffect, case0202$Affected, paired=T)
## 
##  Paired t-test
## 
## data:  case0202$Unaffect and case0202$Affected
## t = 3.2289, df = 14, p-value = 0.006062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.06670409 0.33062922
## sample estimates:
## mean of the differences 
##               0.1986667

The results of the test allow us to reject the null hypothesis (at the 5% confidence level). The two-sided p-value is: 0.006. The mean volume is estimated to be 0.20 cubic centimeters smaller for the schizophrenic twins. The 95% conficence interval is from 0.07 to 0.33 cubic centimeters.

The Standard Error of an Average in Random Sampling

The standard error of any statistic is an estimate of the standard deviation in its sampling distribution. Associated with every standard error is a measure of the amount of information used to estimate its variability, called its degrees of freedom. The formula for the standard deviation of the average in a sample is \(\sigma/\sqrt{n}\) where \(\sigma\) is the standard deviation of the population from which the sample is drawn and \(n\) is the sample size. This result is an immediate consequence of the Central Limit Theorem. So letting \(s\) denote the sample standard deviation, used as an estimate of \(\sigma\), we have that the standard error of the sample everage is \(s/\sqrt{n}\) with degrees of freedom equal to \(n-1\):

\[ \mathrm{SE}(\bar{Y}) = \frac{s}{\sqrt{n}}, \quad \mathrm{d.f.} = n-1\]

In the schizophrenia study, the average difference in volume is 0.199 cubic centimeters, as reported in the t-test. The sample standard deviation is 0.238 cubic centimeters:

sd(case0202$Unaffect - case0202$Affected)
## [1] 0.2382935

The standard error of the average is therefore: 0.063 cubic centimeters with 14 degrees of freedom.

sd(case0202$Unaffect - case0202$Affected)/sqrt(14)
## [1] 0.06368663