Potential Errata for Statistics for the Terrified (5th ed.)

These are items that I thought might be errata for Statistics for the Terrified (5th ed.). I am not sure that they are necessarily incorrect, but they struck me as not being the way I would present them if I were teaching from this book.

Chapter 3

p. 23 “My Dear Aunt Sally fi Multiply, Divide, Add, Subtract” \(\rightarrow\) “My Dear Aunt Sally for [?] Multiply, Divide, Add, Subtract”
p. 24 “a simple \(\sum^{i=1} x\)” \(\rightarrow\) “a simple \(\sum x\)”?

Chapter 4

p. 41 Problem 6 — the answer does not match the question.

scores <- c(4, 3, 10, 3, 3, 2, 9, 3, 8, 3,
            2, 3, 1, 5, 4, 0, 1, 4, 0, 3,
            2, 4, 3, 1, 4, 2, 8, 2, 1, 2,
            1, 5, 2, 9, 3, 6, 4, 4, 3, 2,
            1, 4, 1, 3, 3, 2, 2, 2, 8, 3,
            9, 4, 9, 3, 3, 10, 1, 3, 5, 3,
            2, 2, 4, 3, 3, 6, 6, 4, 1, 2,
            6, 2, 3, 7, 4, 4, 4, 4, 2, 4)
table(scores)

## scores
##  0  1  2  3  4  5  6  7  8  9 10 
##  2  9 16 20 16  3  4  1  3  4  2

Chapter 5

p. 49 This may not be a simple erratum. It may be me misunderstanding what the author is trying to convey, but the way \(\chi\) is defined, \(\chi = (X - \overline{X})\), it seems like \(\chi\) refers to differences at the level of individual scores. In the table at the bottom of p. 49, it says \(\sum(X-\overline{X}) = \chi = 0\). That makes it seem like \(\chi\) should be defined as \(\chi = \sum (X - \overline{X})\) or that the table should say \(\sum(X-\overline{X}) = \sum\chi = 0\).
p. 50 In either case, the formula in the table at the top of the page is not consistent with p. 49. I think it should say \(\sum(X-\overline{X}) = \sum\chi = 0\), but to be consistent, it should at least say \(\sum(X-\overline{X}) = \chi = 0\).
p. 51 “Note that \(\sum X^2\) is not the same as \((X)^2\).” \(\rightarrow\) “Note that \(\sum X^2\) is not the same as \((\sum X)^2\).” Also, in Rule 1, “as in \((X)^2\)” \(\rightarrow\) “as in \((\sum X)^2\)” These are not technically incorrect, but because the first one is obviously not the same, and because in the second one, there is nothing to do inside the parentheses, they do not illustrate the points well, and they are not consistent with the context provided by the page.
p. 52–53 Math Anxiety Scores computations (h/t to Tamika Simmons for identifying this one) —

(mas  <- c(11, 9, 8, 8, 6))

## [1] 11  9  8  8  6

mas^2

## [1] 121  81  64  64  36

sum(mas^2)

## [1] 366

sum(mas)

## [1] 42

sum(mas)^2

## [1] 1764

length(mas)

## [1] 5

(sd.2  <- (sum(mas^2) - sum(mas)^2/length(mas)) / (length(mas) - 1))

## [1] 3.3

The book gives values of \(\sum X^2 = 351\) when it should be \(\sum X^2 = 366\), \((\sum X)^2 = 1681\) when it should be \((\sum X)^2 = 1764\), and \(SD_x^2 = 3.7\) when it should be \(SD_x^2 = 3.3\).

Chapter 8

p. 84 In step 2, it lists \(SD_Y\) as 6.1, but in the table it is listed 4.9. In step 4, \(\overline{Y}\) is listed as 4.9, but in the table it is listed as 79.8.
p. 89 Problem 1b looks like it is incorrect (h/t to Mark Olivieri)

iq.eq  <- data.frame(Student = 1:10,
                     IQ = c(140, 130, 120, 119, 115, 114, 114, 113, 112, 111),
                     EQ = c(14, 20, 29, 6, 20, 27, 29, 30, 35, 40),
                     Exam.Scores = c(42, 44, 35, 30, 23, 27, 25, 20, 16, 12))
iq.eq

##    Student  IQ EQ Exam.Scores
## 1        1 140 14          42
## 2        2 130 20          44
## 3        3 120 29          35
## 4        4 119  6          30
## 5        5 115 20          23
## 6        6 114 27          27
## 7        7 114 29          25
## 8        8 113 30          20
## 9        9 112 35          16
## 10      10 111 40          12

round(cor(iq.eq[,-1]), 2)  # The raw correlations

##                IQ    EQ Exam.Scores
## IQ           1.00 -0.61        0.88
## EQ          -0.61  1.00       -0.66
## Exam.Scores  0.88 -0.66        1.00

round(cor(iq.eq[,-1])^2, 2)  # The coefficients of determination

##               IQ   EQ Exam.Scores
## IQ          1.00 0.37        0.77
## EQ          0.37 1.00        0.44
## Exam.Scores 0.77 0.44        1.00

The correct answer for 1b is 0.88 whereas the book says r_xy = .85. The correct coefficient of determination is 0.77.

Chapter 10

p. 106 In describing the possibility that there really is a difference between the population means, it uses \(\overline{X}_{Females} \ne \overline{X}_{Males}\). It should be \(\mu_\text{Females} \ne \mu_\text{Males}\). There are two changes there: from \(\overline{X}\) to \(\mu\) and from italic subscript to Roman. It’s probably all over in the book, and you should check with the Pearson copy editor, but the one at APA told me that italics means that the subscript is a variable like \(x_i\) or even \(x_{gender}\) and Roman means that it is a value assigned to the index like \(x_1\) or \(x_\text{Male}\).
p. 106 In describing the possibility that the null hypothesis is true, the book says “…no appreciable difference….” It really should leave out the word appreciable. It is either equal or it is not, and if it is not, then there is a difference.

Chapter 11

p. 123–24 Problem 3 — Actually, these are fine. I am just going to leave them here for now.

g1 <- c(22, 16, 17, 18)
g2 <- c(6, 10, 13, 13, 8, 4)
g3 <- c(8, 6, 4, 5, 2)

A. Compare Groups 2 & 3

(x.bar.A  <- mean(g2))

## [1] 9

(x.bar.B  <- mean(g3))

## [1] 5

(sd.A  <- sd(g2))^2

## [1] 13.6

(sd.B  <- sd(g3))^2

## [1] 5

(n.A  <- length(g2))

## [1] 6

(n.B  <- length(g3))

## [1] 5

x.bar.A - x.bar.B

## [1] 4

(n.A - 1) * sd.A^2

## [1] 68

(n.B - 1) * sd.B^2

## [1] 20

n.A + n.B - 2

## [1] 9

1/n.A

## [1] 0.1667

1/n.B

## [1] 0.2

(t.obt <- (x.bar.A - x.bar.B)/sqrt((((n.A - 1) * sd.A^2 + (n.B - 1) * sd.B^2)/(n.A + n.B - 2)) * (1/n.A + 1/n.B)))

## [1] 2.113

(t.crit <- qt(c(.025, .975), n.A + n.B - 2, lower.tail=TRUE))

## [1] -2.262  2.262

(p.val <- 2 * pt(t.obt, n.A + n.B - 2, lower.tail=FALSE))

## [1] 0.06381

t.test(g2, g3, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  g2 and g3
## t = 2.112, df = 9, p-value = 0.06381
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2833  8.2833
## sample estimates:
## mean of x mean of y 
##         9         5

B. Compare Groups 1 & 2

(x.bar.A  <- mean(g1))

## [1] 18.25

(x.bar.B  <- mean(g2))

## [1] 9

(sd.A  <- sd(g1))^2

## [1] 6.917

(sd.B  <- sd(g2))^2

## [1] 13.6

(n.A  <- length(g1))

## [1] 4

(n.B  <- length(g2))

## [1] 6

x.bar.A - x.bar.B

## [1] 9.25

(n.A - 1) * sd.A^2

## [1] 20.75

(n.B - 1) * sd.B^2

## [1] 68

n.A + n.B - 2

## [1] 8

1/n.A

## [1] 0.25

1/n.B

## [1] 0.1667

(t.obt <- (x.bar.A - x.bar.B)/sqrt((((n.A - 1) * sd.A^2 + (n.B - 1) * sd.B^2)/(n.A + n.B - 2)) * (1/n.A + 1/n.B)))

## [1] 4.302

(t.crit <- qt(c(.025, .975), n.A + n.B - 2, lower.tail=TRUE))

## [1] -2.306  2.306

(p.val <- 2 * pt(t.obt, n.A + n.B - 2, lower.tail=FALSE))

## [1] 0.002607

t.test(g1, g2, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  g1 and g2
## t = 4.302, df = 8, p-value = 0.002607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.292 14.208
## sample estimates:
## mean of x mean of y 
##     18.25      9.00

C. Compare Groups 1 & 3

(x.bar.A  <- mean(g1))

## [1] 18.25

(x.bar.B  <- mean(g3))

## [1] 5

(sd.A  <- sd(g1))^2

## [1] 6.917

(sd.B  <- sd(g3))^2

## [1] 5

(n.A  <- length(g1))

## [1] 4

(n.B  <- length(g3))

## [1] 5

x.bar.A - x.bar.B

## [1] 13.25

(n.A - 1) * sd.A^2

## [1] 20.75

(n.B - 1) * sd.B^2

## [1] 20

n.A + n.B - 2

## [1] 7

1/n.A

## [1] 0.25

1/n.B

## [1] 0.2

(t.obt <- (x.bar.A - x.bar.B)/sqrt((((n.A - 1) * sd.A^2 + (n.B - 1) * sd.B^2)/(n.A + n.B - 2)) * (1/n.A + 1/n.B)))

## [1] 8.186

(t.crit <- qt(c(.975), n.A + n.B - 2, lower.tail=TRUE))

## [1] 2.365

(p.val <- pt(t.obt, n.A + n.B - 2, lower.tail=FALSE))

## [1] 3.933e-05

t.test(g1, g3, var.equal=TRUE, alternative="greater")

## 
##  Two Sample t-test
## 
## data:  g1 and g3
## t = 8.186, df = 7, p-value = 3.933e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  10.18   Inf
## sample estimates:
## mean of x mean of y 
##     18.25      5.00

p. 124, problem 4. The researcher expects the teens’ hearing to be better before the concert than after. Higher scores indicate worse hearg, so lower scores indicate better hearing. That means that her hypothesis is that \(H_A: \mu_\text{before} < \mu_\text{after}\) and \(H_0: \mu_\text{before} \ge \mu_\text{after}\).

d.df  <- data.frame(pre=c(12, 2, 6, 13, 10, 10, 5, 2, 7, 9, 10, 14),
                    post=c(18, 3, 5, 10, 15, 15, 6, 9, 7, 9, 11, 13))
describe(d.df)

##      vars  n  mean   sd median trimmed  mad min max range  skew kurtosis
## pre     1 12  8.33 3.98    9.5     8.4 4.45   2  14    12 -0.28    -1.32
## post    2 12 10.08 4.52    9.5    10.0 5.19   3  18    15  0.16    -1.26
##        se
## pre  1.15
## post 1.31

So the means are in the correct direction to support the alternative hypothesis, \(\overline{X}_\text{before} < \overline{X}_\text{after}\). That means that if \(|t_\text{obt}| > t_\text{crit}\), then the difference is significant.

t.test(d.df$pre, d.df$post, alternative="less", paired=TRUE)

## 
##  Paired t-test
## 
## data:  d.df$pre and d.df$post
## t = -1.898, df = 11, p-value = 0.04214
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -0.09391
## sample estimates:
## mean of the differences 
##                   -1.75

qt(.05, nrow(d.df)-1)

## [1] -1.796

It is not clear why the book says do not reject \(\text{H}_0\).

Chapter 12

p. 127, computation of the probability of a Type I error. It says: \[p = 1 - (1-\alpha)^c = 1 - (1- 0.05)^3 = 1 - 0.93^3 = 0.14.\] It should say: \[p = 1 - (1-\alpha)^c = 1 - (1- 0.05)^3 = 1 - 0.95^3 = 1 - 0.86 = 0.14.\]
For problem 2, \(MS_W = 533.2\).

MS.W  <- 533.2
x.bar.b <- 12
x.bar.c <- 17
x.bar.p <- 58.3
n.b <- 10
n.c <- 4
n.p <- 4
(C.obt.cb <- (x.bar.c - x.bar.b)/sqrt(MS.W*(1/n.c + 1/n.b)))

## [1] 0.366

(C.obt.pb <- (x.bar.p - x.bar.b)/sqrt(MS.W*(1/n.p + 1/n.b)))

## [1] 3.389

(C.obt.pc <- (x.bar.p - x.bar.c)/sqrt(MS.W*(1/n.p + 1/n.c)))

## [1] 2.529

(F.crit <- qf(.05, 2, 15, lower.tail=FALSE))

## [1] 3.682

(C.crit <- sqrt((3-1) * F.crit))

## [1] 2.714

abs(C.obt.cb) > C.crit

## [1] FALSE

abs(C.obt.pb) > C.crit

## [1] TRUE

abs(C.obt.pc) > C.crit

## [1] FALSE

The only significant post-hoc difference is between bus drivers vs. presidents.

Chapter 13

p. 139 I don’t know if this is really an erratum or just a disagreement. a chi-square test does use measurements. They just happen to be measurements on a nominal scale. Maybe this is terrifying, but I like to teach my students that measurement is a procedure for assigning a number or label (using one of the scales from Chapter 4) to some aspect of the universe. I think that saying that non-parametric tests do not use measurement will give students something to unlearn in future staistics courses.
p. 139 I am much more confident that this is a real erratum. The text describes a situation in which 100 students are selected at random and asked if they ever used the college counseling center. Either the researcher knows what year the students are in college or she asks that as well. In any case, this should yield a \(2 \times 4\) table in which the sum of the rows and columns is 100 instead of a \(1 \times 4\) table. The table shown seems to correspond best with a survey in which users of the counseling center were asked what year in college they were in.