The datasets used in the book are available in the package Sleuth2. Each dataset is named as follows: caseXXYY for the YYth study in chapter XX. Example: case0101
The package also contains the dataset used in the exercises. These are named exXXYY for the YYth excerise in chapter XX. Example: ex0116
Load the package and view its content.
library(Sleuth2)
ls("package:Sleuth2")
The data for Case Study 1 was collected by psychologist Teresa Amabile in an experiment concerning the effects of intrinsic and extrinsic motivation on creativity. A total of 47 subjects with considerable experience in creative writing were randomly assigned to one of two treatment groups: 23 were placed into the “extrinsic” treatment group and 24 were placed into the “intrinsic” treatment group. Each group was given a questionnaire asking to rank a list of seven reasons for writing. The intrinsic group was given intrinsic reasons fro writing such as “You feel relaxed when you write” while the extrinsic was given extrinsic reasons for writing such as “You enjoy public recognition of your work”.
After completing the questionnaire, all subjects were asked to write a poem in the Haiku style about “laughter”. All poems were submitted to 12 poets who evaluated them on a 40 point scale. The Score variable holds the average ratings given by the 12 judges.
data(case0101)
head(case0101)
## Score Treatment
## 1 5.0 Extrinsic
## 2 5.4 Extrinsic
## 3 6.1 Extrinsic
## 4 10.9 Extrinsic
## 5 11.8 Extrinsic
## 6 12.0 Extrinsic
summary(case0101)
## Score Treatment
## Min. : 5.00 Extrinsic:23
## 1st Qu.:14.90 Intrinsic:24
## Median :18.70
## Mean :17.86
## 3rd Qu.:21.25
## Max. :29.70
case0101$Score class is numeric of type double
class(case0101$Score)
## [1] "numeric"
typeof(case0101$Score)
## [1] "double"
and case0101$Treatment class is factor (underlying type is integer)
class(case0101$Treatment)
## [1] "factor"
typeof(case0101$Treatment)
## [1] "integer"
levels(case0101$Treatment)
## [1] "Extrinsic" "Intrinsic"
Split the Score into two groups according to the Treatment factor. The function split returns a list with columns named after the levels of the factor.
g = split(case0101$Score, case0101$Treatment)
summary(g$Extrinsic)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 12.15 17.20 15.74 18.95 24.00
sd(g$Extrinsic)
## [1] 5.252596
summary(g$Intrinsic)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 17.43 20.40 19.88 22.30 29.70
sd(g$Intrinsic)
## [1] 4.439513
Note: the standard deviation of a set \(X = \{x_1,\ldots,x_n \}\) of numbers with mean \(\mu = (\sum_i x_i)/n\) is:
\[\sqrt{\frac{\sum_i(x_i - \mu)^2}{n-1}}\]
x = g$Intrinsic
sqrt(sum((x-mean(x))^2/(length(x)-1)))
## [1] 4.439513
Requires the lattice library
require(lattice)
Side by side (the default)
histogram(~Score | Treatment, data = case0101)
or on top of one another
histogram(~Score | Treatment, data = case0101, layout = c(1,2))
or, using ggplot()
library(ggplot2)
ggplot(case0101, aes(x=Score)) +
geom_histogram(fill="white", colour="black", binwidth=4) +
facet_grid(Treatment ~ .)
Use the aplpack package; note that the base stem function is not working properly and, in any case, does not do back-to-back plots
library(aplpack)
## Loading required package: tcltk
# plot a stem-and-leaf diagram for all the scores
stem.leaf(case0101$Score, unit=.1, trim.outliers=F)
## 1 | 2: represents 1.2
## leaf unit: 0.1
## n: 47
## 2 5 | 04
## 3 6 | 0
## 7 |
## 8 |
## 9 |
## 4 10 | 8
## 5 11 | 8
## 10 12 | 00038
## 11 13 | 6
## 12 14 | 8
## 13 15 | 0
## 15 16 | 67
## 21 17 | 222355
## (4) 18 | 2577
## 22 19 | 12257
## 17 20 | 2567
## 13 21 | 226
## 10 22 | 1126
## 6 23 | 1
## 5 24 | 002
## 25 |
## 2 26 | 7
## 27 |
## 28 |
## 1 29 | 7
# and back-to-back diagrams as in display 1.10
# first split the scores according to the factor Treament:
scores = split(case0101, case0101$Treatment)
# and plot diagram
stem.leaf.backback(scores$Extrinsic$Score, scores$Intrinsic$Score, unit=.1, trim.outliers=F)
## ____________________________
## 1 | 2: represents 1.2, leaf unit: 0.1
## scores$Extrinsic$Score
## scores$Intrinsic$Score
## ____________________________
## 2 40| 5 |
## 3 0| 6 |
## | 7 |
## | 8 |
## | 9 |
## 4 8| 10 |
## 5 8| 11 |
## 7 30| 12 |008 3
## | 13 |6 4
## 8 8| 14 |
## 9 0| 15 |
## 10 7| 16 |6 5
## (4) 5322| 17 |25 7
## 9 775| 18 |2 8
## 6 52| 19 |127 11
## 4 7| 20 |256 (3)
## 3 2| 21 |26 10
## 2 1| 22 |126 8
## | 23 |1 5
## 1 0| 24 |02 4
## | 25 |
## | 26 |7 2
## | 27 |
## | 28 |
## | 29 |7 1
## | 30 |
## ____________________________
## n: 23 24
## ____________________________
We perform a two sample t-test to determine whether the difference in mean value between the two groups is significant. Note that this difference is 4.1442028.
t.test(g$Extrinsic, g$Intrinsic)
##
## Welch Two Sample t-test
##
## data: g$Extrinsic and g$Intrinsic
## t = -2.9153, df = 43.108, p-value = 0.005618
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.010803 -1.277603
## sample estimates:
## mean of x mean of y
## 15.73913 19.88333
We see that a 95% confidence interval does not contain 0 with a p-value of 0.006. We conclude that the difference is significant. Since this is a randomized experiment we also conclude that there is a causal relationship between intrinsic motivation and creativity. However, since the participants in the study were volunteered we cannot extend this causal relationship to any population outside of the study participants. Indeed, the participants cannot be thought of representative of any broader population.
If the questionnaire had no effect on creativity, the subjects would have received the same score regardless of the group to which they were assigned. Thus we can reaasign the subjects to the groups using a permutation of the original group assignment and compute the resulting difference in the mean scores between the two new groups. Conceptually, we couldcompute these differences in mean scores bwteen the two groups for every possible assignment of the subjects to the two groups. In practice, this is not possible due to the very large number of permutations. But the following code draws a histogram of the mean differences for 1000 permutations as an estimate of the randomization distribution of the mean differences.
require(gtools)
## Loading required package: gtools
set.seed(1234)
myfun = function() {
g = split(case0101$Score, permute(case0101$Treatment));
mean(g$Extrinsic) - mean(g$Intrinsic)
}
nullhyp = replicate(1000, myfun())
hist(nullhyp, freq=F, breaks=40, xlab = "Estimate of the randomization distribution (1000 permutations)")
qplot(nullhyp, geom="histogram", binwidth=.2, fill = I("blue"), colour = I("red"),
alpha = I(.2), xlab = "Estimate of the randomization distribution (1000 permutations)")
Assuming the null hypothesis that group assignment has no effect on scores we see that a mean difference of 4.1442028 is very extreme. One is led to believe that the nll hypothesis is wrong and that the group assignment, and not the randomization, caused this large difference. The probability that the randomization alone leads to a mean difference as extreme or more extreme than the one observed is call the p-value. In our sample of 1000 regroupings only 1 produced a mean difference as large or greater so the p-value in this case is 0.001. This is a one sided p-value. For a two sided p-value we compute how many mean differences are greater than or equal to 4.14 in absolute value. We get 2 yielding a two-sided p-value of 0.002.
data(case0102)
head(case0102)
## Salary Sex
## 1 3900 Female
## 2 4020 Female
## 3 4290 Female
## 4 4380 Female
## 5 4380 Female
## 6 4380 Female
summary(case0102)
## Salary Sex
## Min. :3900 Female:61
## 1st Qu.:4980 Male :32
## Median :5400
## Mean :5420
## 3rd Qu.:6000
## Max. :8100
histogram(~Salary | Sex, data = case0102, layout = c(1,2))
Or, using ggplot()
ggplot(case0102, aes(x=Salary)) +
geom_histogram(binwidth=500, fill="white", colour="black") +
facet_grid(Sex ~ .)
A very crude approach
hist(case0102$Salary, main='Male and Female Starting Salaries', xlab='Starting Salary ($US)')
boxplot(case0102$Salary, horizontal=TRUE, at=-.5, add=TRUE, axes=FALSE)
A more sophisticated approach using the package sfsmisc
library(sfsmisc)
hist(case0102$Salary, freq=F)
histBxp(case0102$Salary,main='Male and Female Starting Salaries',xlab='Starting Salary ($US)')
## Warning in hist.default(x, probability = probability, include.lowest =
## include.lowest, : argument 'probability' is not made use of
In the case of this observational study, we want to estimate the diference in average starting salary given to males vs. females. Unlike in the preceding randomized study, we cannot view the sex of individuals as randomly assigned. In order to obtain an interpretation that supports a statistical analysis, we consider a fictitious probability model by assuming that the employer assigned the set of starting salaries to the employees at random. We may then ask whether to observed salary assignment is a likely outcome. The collction of differences in averages from all possible assignments of starting salaries to individuals makes up the permutation distribution of the test statistic for this model. Again, it is impossible to compute this distribution in practice but, as we shall see later, there are shortcuts available to approximate it.
data(ex0116)
summary(ex0116)
## Planet Order Distance
## Length:10 Min. : 1.00 Min. : 3.87
## Class :character 1st Qu.: 3.25 1st Qu.: 11.31
## Mode :character Median : 5.50 Median : 40.52
## Mean : 5.50 Mean :110.07
## 3rd Qu.: 7.75 3rd Qu.:167.87
## Max. :10.00 Max. :395.00
# scatter plot of distance versus order from the sun
plot(ex0116$Order, ex0116$Distance, xlab="Order", ylab="Distance")
# same but plot the logarithm of the distances
plot(ex0116$Order, log(ex0116$Distance), xlab="Order", ylab="log(Distance)")
# summary for the distances
summary(ex0116$Distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.87 11.31 40.51 110.10 167.90 395.00
sd(ex0116$Distance)
## [1] 139.5745
# summary for the log of the distances
summary(log(ex0116$Distance))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.353 2.408 3.660 3.718 5.083 5.979
sd(log(ex0116$Distance))
## [1] 1.630669
data(case0201)
summary(case0201)
## Humerus Status
## Min. :659.0 Perished:24
## 1st Qu.:724.5 Survived:35
## Median :736.0
## Mean :733.9
## 3rd Qu.:747.0
## Max. :780.0
Note: the humerus lengths are reported as integers. To obtain the lengths in inches (as in the book) divide by 1000.
Split the Humerus (lengths) into two groups according to the Status factor. The function split returns a list with columns named after the levels of the factor
g = split(case0201$Humerus,case0201$Status)
summary(g$Perished)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 659.0 718.2 733.5 727.9 743.2 765.0
length(g$Perished)
## [1] 24
sd(g$Perished)
## [1] 23.54259
summary(g$Survived)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 687.0 728.0 736.0 738.0 751.5 780.0
length(g$Survived)
## [1] 35
sd(g$Survived)
## [1] 19.83906
The null hypothesis is that the difference of mean lengths is zero while the alternative hypothesis is that it is not.
Note that we must set the value of the var.equal parameter to TRUE to perform a standard two-sample t-test, i.e. assuming that teh variances are equal and using the pooled variance as an estimate of the variance. When var.equal is set to FALSE (the default) t.test() performs a Welch two-sample t-test, i.e. it uses the Welch approximation to the degrees of freedom.
t.test(g$Survived, g$Perished, var.equal=T)
##
## Two Sample t-test
##
## data: g$Survived and g$Perished
## t = 1.777, df = 57, p-value = 0.0809
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.279386 21.446053
## sample estimates:
## mean of x mean of y
## 738.0000 727.9167
The result of the test is inconclusive, namely the null hypothesis is not rejected, with a (two-sided) p-value of 0.08 and a 95% confidence interval from -1.3 to 21.4 (recall the normalizing factor of 1000 when comapring with book’s results). In particular, the 95% confidence interval contains 0!
data(case0202)
summary(case0202)
## Unaffect Affected
## Min. :1.250 Min. :1.02
## 1st Qu.:1.600 1st Qu.:1.31
## Median :1.770 Median :1.59
## Mean :1.759 Mean :1.56
## 3rd Qu.:1.935 3rd Qu.:1.78
## Max. :2.080 Max. :2.02
The null hypothesis is that the mean of the differences in left hippocampus volumes between schizophrenic individuals and their nonschizophrenic twins is zero and the alternative is that it is not (two-sided test).
t.test(case0202$Unaffect, case0202$Affected, paired=T)
##
## Paired t-test
##
## data: case0202$Unaffect and case0202$Affected
## t = 3.2289, df = 14, p-value = 0.006062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.06670409 0.33062922
## sample estimates:
## mean of the differences
## 0.1986667
The results of the test allow us to reject the null hypothesis (at the 5% confidence level). The two-sided p-value is: 0.006. The mean volume is estimated to be 0.20 cubic centimeters smaller for the schizophrenic twins. The 95% conficence interval is from 0.07 to 0.33 cubic centimeters.
The standard error of any statistic is an estimate of the standard deviation in its sampling distribution. Associated with every standard error is a measure of the amount of information used to estimate its variability, called its degrees of freedom. The formula for the standard deviation of the average in a sample is \(\sigma/\sqrt{n}\) where \(\sigma\) is the standard deviation of the population from which the sample is drawn and \(n\) is the sample size. This result is an immediate consequence of the Central Limit Theorem. So letting \(s\) denote the sample standard deviation, used as an estimate of \(\sigma\), we have that the standard error of the sample everage is \(s/\sqrt{n}\) with degrees of freedom equal to \(n-1\):
\[ \mathrm{SE}(\bar{Y}) = \frac{s}{\sqrt{n}}, \quad \mathrm{d.f.} = n-1\]
In the schizophrenia study, the average difference in volume is 0.199 cubic centimeters, as reported in the t-test. The sample standard deviation is 0.238 cubic centimeters:
sd(case0202$Unaffect - case0202$Affected)
## [1] 0.2382935
The standard error of the average is therefore: 0.063 cubic centimeters with 14 degrees of freedom.
sd(case0202$Unaffect - case0202$Affected)/sqrt(14)
## [1] 0.06368663