This document contains my R code for computations of all of the examples (with chapter and page numbers) from the book “The Manga Guide to Statistics” published by No Starch Press. This document was made with R Markdown that enables easy authoring of dynamic reports with R code and results of its execution. All results are rounded just like in the book.

I would like to thank James Church for his version of R code (www.nostarch.com/download/MangaGuideToStatistics.R) that gave me motivation to write my own variant. He also kindly permitted me to use his code as a starting point of my variant. Take a look at his version if you want only base R code without using additional packages.

In my code I used several popular R packages:

Make sure that these libraries are installed (by install.packages function or by RStudio IDE) and loaded:

library(magrittr)
library(dplyr)
library(ggplot2)

Chapter 1: Determining data types

There are no calculations in this chapter.

Chapter 2: Understanding Numerical Data

Page 33: Table with the prices of the top 50 ramen shops can be presented with the following vector:

ramen.prices <- c(700, 850, 600, 650, 980, 750, 500, 890, 880, 700,
                  890, 720, 680, 650, 790, 670, 680, 900, 880, 720,
                  850, 700, 780, 850, 750, 780, 590, 650, 580, 750,
                  800, 550, 750, 700, 600, 800, 800, 880, 790, 790,
                  780, 600, 670, 680, 650, 890, 930, 650, 777, 700)

Page 38: Frequency table data can be calculated by built-in hist function (it’s not only for histogram plotting). Let’s try to make a table like the one on page 38:

DistributionTable <- function(hist.data){
  interval.lower <- hist.data$breaks[-length(hist.data$breaks)]
  interval.upper <- hist.data$breaks[-1]
  hist.table <- data.frame(
    Interval = paste(interval.lower, "-", interval.upper),
    Midpoint = hist.data$mids,
    Frequency = hist.data$counts,
    Rel.Frequency = hist.data$counts / sum(hist.data$counts)
  )
  return(hist.table)
}

ramen.hist <- hist(ramen.prices, breaks = 5, right = FALSE, plot = FALSE)

DistributionTable(ramen.hist)
Interval Midpoint Frequency Rel.Frequency
500 - 600 550 4 0.08
600 - 700 650 13 0.26
700 - 800 750 18 0.36
800 - 900 850 12 0.24
900 - 1000 950 3 0.06

Page 39: Histogram of ramen prices with ggplot function from ggplot2 package. Resulting plot was tweaked to look like the histogram from the book.

ggplot(NULL, aes(x = ramen.prices)) + theme_bw() +
  geom_histogram(binwidth = 100, colour = "black", fill = "gray") +
  ggtitle("Histogram of 50 best ramen shops prices") +
  xlab("Ramen prices") + ylab("Frequency") +
  coord_cartesian(xlim = c(490, 1010), ylim = c(0, 20)) +
  scale_x_continuous(breaks = seq(550, 950, 100))

Page 41: Data frame containing scores of three bowling teams:

bowling <- data.frame(
  team.A = c(86, 73, 124, 111, 90, 38),
  team.B = c(84, 71, 103, 85, 90, 89),
  team.C = c(229, 77, 59, 95, 70, 88)
)

bowling
team.A team.B team.C
86 84 229
73 71 77
124 103 59
111 85 95
90 90 70
38 89 88

Page 42: Average scores can be calculated using built-in mean function. And summarise_each function from dplyr package can apply any function to all columns at once:

bowling %>% summarise_each(funs(mean))
team.A team.B team.C
87 87 103

Page 43: There is no built-in function for geometric mean, but it’s simple to write special function for that:

GeometricMean <- function(values) {
  gmean <- prod(values) ^ (1 / length(values))
  return(gmean)
}

bowling %>% summarise_each(funs(GeometricMean)) %>% round(3)
team.A team.B team.C
81.614 86.478 92.062

Page 43: Harmonic mean of bowling team scores:

HarmonicMean <- function(values) {
  hmean <- 1 / (sum(1 / values) / length(values))
  return(hmean)
}

bowling %>% summarise_each(funs(HarmonicMean)) %>% round(3)
team.A team.B team.C
75.163 85.948 85.132

Page 45: Median scores for the bowling teams (there is a built-in median function for that):

bowling %>% summarise_each(funs(median))
team.A team.B team.C
88 87 82.5

Page 50: Standard Deviation (\(\sigma\)). There is a built-in function sd for computing sample standard deviation: \(\sigma = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i-\overline{x})^2}\) (see page 52). But we can write a custom function for calculating standard deviation for the entire population \(\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i-\overline{x})^2}\):

StdDev <- function(values) {
    m <- mean(values)
    stddev <- sqrt(sum((values - m) ^ 2) / length(values))
    return(stddev)
}

bowling %>% summarise_each(funs(StdDev)) %>% round(1)
team.A team.B team.C
27.5 9.5 57.5

Page 55: Sturges’ Rule (using built-in nclass.Sturges function) and breakpoint range of a histogram:

cat("Using Sturges' Rule on ramen.prices vector:\n")
nclass.Sturges(ramen.prices) %>% paste("Number of classes:", ., "\n") %>% cat()

BreakRange <- function(values) {
    return(ceiling((max(values) - min(values)) / nclass.Sturges(values)))
}

BreakRange(ramen.prices) %>% paste("Range of class:", ., "\n") %>% cat()
Using Sturges' Rule on ramen.prices vector:
Number of classes: 7 
Range of class: 69 

Page 56: Frequency table of 50 best ramen shops (Sturges’ Rule):

# vector of breakpoints
ramen.breaks <- seq(min(ramen.prices), by = BreakRange(ramen.prices),
                    length.out = nclass.Sturges(ramen.prices) + 1)
ramen.breaks
[1] 500 569 638 707 776 845 914 983
ramen.histS <- hist(ramen.prices, breaks = ramen.breaks, right = FALSE, plot = FALSE)
DistributionTable(ramen.histS)
Interval Midpoint Frequency Rel.Frequency
500 - 569 534.5 2 0.04
569 - 638 603.5 5 0.10
638 - 707 672.5 15 0.30
707 - 776 741.5 6 0.12
776 - 845 810.5 10 0.20
845 - 914 879.5 10 0.20
914 - 983 948.5 2 0.04

Page 57: High school girls’ 100m track race results:

race.times <- c(16.3, 22.4, 18.5, 18.7, 20.1);
mean(race.times) %>% paste("Average:", ., "\n") %>% cat()
median(race.times) %>% paste("Median:", ., "\n") %>% cat()
StdDev(race.times) %>% round(2) %>% paste("Standard Deviation:", ., "\n") %>% cat()
Average: 19.2 
Median: 18.7 
Standard Deviation: 2.01 

Chapter 3: Understanding Categorical Data

Page 61: Results of “Do you like or dislike the new uniform design” survey:

school.uniform.survey <- factor(
  c("LIKE", "NEITHER", "LIKE", "NEITHER", "DISLIKE", "LIKE", "LIKE", "LIKE", "LIKE", "LIKE",
    "LIKE", "LIKE", "NEITHER", "LIKE", "LIKE", "NEITHER", "LIKE", "LIKE", "LIKE", "LIKE",
    "LIKE", "LIKE", "DISLIKE", "NEITHER", "LIKE", "LIKE", "DISLIKE", "LIKE", "LIKE", "LIKE",
    "NEITHER", "NEITHER", "LIKE", "DISLIKE", "LIKE", "LIKE", "LIKE", "LIKE", "NEITHER", "LIKE"))

Page 62: Frequency table can be obtained with table function. Sum totals can be added with addmargins function:

school.uniform.table <- table(school.uniform.survey)

school.uniform.table %>% addmargins()
DISLIKE LIKE NEITHER Sum
4 28 8 40

Percent Table:

school.uniform.ptable <- prop.table(school.uniform.table) * 100

school.uniform.ptable %>% addmargins()
DISLIKE LIKE NEITHER Sum
10 70 20 100

Let’s write a function to make a table like on page 62:

CrossTab <- function(fact){
  tab <- addmargins(table(fact))
  ptab <- addmargins(prop.table(table(fact)))
  ctab <- data.frame(Response = unlist(unname(dimnames(tab))),
                     Frequency = as.vector(tab),
                     Procent = as.vector(ptab) * 100)
  return(ctab)
}

CrossTab(school.uniform.survey)
Response Frequency Procent
DISLIKE 4 10
LIKE 28 70
NEITHER 8 20
Sum 40 100

Graphical representation of categorical variable:

ggplot(NULL, aes(x = school.uniform.survey)) + theme_bw() +
  geom_bar(colour = "black", fill = "gray") +
  ggtitle("Results of school uniform survey") +
  xlab("Answer") + ylab("Count") +
  coord_cartesian(ylim = c(0, 30))

Page 64: Do you expect party A to win or lose against party B?

party.victory <- factor(c("Lose", "Lose", "Lose", "I don't know", "Win", "Lose", "Win", "I don't know", "Lose", "Lose"))

CrossTab(party.victory)
Response Frequency Procent
I don’t know 2 20
Lose 6 60
Win 2 20
Sum 10 100

Chapter 4: Standard score and deviation score

Page 67-69: Test scores for four school subjects:

test.scores <- data.frame(
  Student = c("Rui", "Yumi", LETTERS[1:16]),
  English = c(90, 81, 73, 97, 85, 60, 74, 64, 72, 67, 87, 78, 85, 96, 77 ,100 ,92 ,86),
  Japanese = c(71, 90, 79, 70, 67, 66, 60, 83, 57, 85, 93, 89, 78, 74, 65, 78, 53, 80),
  History = c(73, 61, 14, 41, 49, 87, 69, 65, 36, 7, 53, 100, 57, 45, 56, 34, 37, 70),
  Biology = c(59, 73, 47, 38, 63, 56, 15, 53, 80, 50, 41, 62, 44, 26, 91, 35, 53, 68)
)

test.scores
Student English Japanese History Biology
Rui 90 71 73 59
Yumi 81 90 61 73
A 73 79 14 47
B 97 70 41 38
C 85 67 49 63
D 60 66 87 56
E 74 60 69 15
F 64 83 65 53
G 72 57 36 80
H 67 85 7 50
I 87 93 53 41
J 78 89 100 62
K 85 78 57 44
L 96 74 45 26
M 77 65 56 91
N 100 78 34 35
O 92 53 37 53
P 86 80 70 68

Page 68: Average test scores by subject can be calculated using summarise_each function, dropping first column with students names using -Student parameter:

# number of significant digits to print
options(digits = 3)
test.scores %>% summarise_each(funs(mean), -Student)
English Japanese History Biology
81.3 74.3 53 53

Page 72: Normalization of a vector can be done using built-in scale function, but by default it will use sample standard deviation (with \(n-1\) denominator), so we must change scale parameter:

Normalize <- function(values) {
  norm.v <- scale(values, scale = StdDev(values))
  return(norm.v)
}

options(digits = 2)
# Normalize each column except Student
test.scores %>% mutate_each(funs(Normalize), -Student)
Student English Japanese History Biology
Rui 0.77 -0.30 0.88 0.33
Yumi -0.03 1.39 0.35 1.09
A -0.74 0.41 -1.71 -0.33
B 1.39 -0.39 -0.53 -0.82
C 0.33 -0.65 -0.18 0.55
D -1.90 -0.74 1.49 0.16
E -0.65 -1.27 0.70 -2.08
F -1.54 0.77 0.53 0.00
G -0.83 -1.54 -0.75 1.48
H -1.27 0.95 -2.02 -0.16
I 0.50 1.66 0.00 -0.66
J -0.30 1.30 2.07 0.49
K 0.33 0.33 0.18 -0.49
L 1.30 -0.03 -0.35 -1.48
M -0.39 -0.83 0.13 2.08
N 1.66 0.33 -0.84 -0.98
O 0.95 -1.90 -0.70 0.00
P 0.41 0.50 0.75 0.82

Page 74: Deviation score:

DeviationScore <- function(values){
  return(Normalize(values) * 10 + 50)
}

options(digits = 3)
test.scores %>% mutate_each(funs(DeviationScore), -Student)
Student English Japanese History Biology
Rui 57.7 47.0 58.8 53.3
Yumi 49.7 63.9 53.5 60.9
A 42.6 54.1 32.9 46.7
B 63.9 46.1 44.7 41.8
C 53.3 43.5 48.2 55.5
D 31.0 42.6 65.0 51.6
E 43.5 37.3 57.0 29.2
F 34.6 57.7 55.3 50.0
G 41.7 34.6 42.5 64.8
H 37.3 59.5 29.8 48.4
I 55.0 66.6 50.0 43.4
J 47.0 63.0 70.7 54.9
K 53.3 53.3 51.8 45.1
L 63.0 49.7 46.5 35.2
M 46.1 41.7 51.3 70.8
N 66.6 53.3 41.6 40.2
O 59.5 31.0 43.0 50.0
P 54.1 55.0 57.5 58.2
#reset to default value
options(digits = 7)

Page 78: Mean and Standard Deviation of the standard scores of Girls’ 100m track race times:

mean(Normalize(race.times)) %>% round(4) %>% paste("Mean:", ., "\n") %>% cat()
Mean: 0 
StdDev(Normalize(race.times)) %>% paste("Standard Deviation:", ., "\n") %>% cat()
Standard Deviation: 1 

Chapter 5: Let’s obtain the probability

Page 93: What is the area under the standard normal curve from \(0\) to \(1.96\)? The table in the book ranges from \(0\) to \(x\), but pnorm function in R use range from \(-\infty\) to \(x\). So we can subtract pnorm(0) from pnorm(1.96) to get correct result:

(pnorm(1.96) - pnorm(0)) %>% round(4)
[1] 0.475

Page 97: What is the area under the standard normal curve from \(1.8\) to \(\infty\)? In order to range from \(x\) to \(\infty\) we can use lower.tail = FALSE parameter.

pnorm(1.8, lower.tail = FALSE) %>% round(4)
[1] 0.0359

Page 104: What is the \(\chi^2\) value when the probability is \(0.05\) and \(1\) degree of freedom? The function qchisq by default returns \(q\) for which \(P(X < q)=p\). The table in the book assumes \(P(X \geq q)=p\). To adjust, add lower.tail = FALSE parameter:

qchisq(0.05, df = 1, lower.tail = FALSE) %>% round(4)
[1] 3.8415

Page 108: What is the area under the standard normal curve from \(-\infty\) to \(-0.29\)?

pnorm(-0.29) %>% round(4)
[1] 0.3859

Page 108: What is the \(\chi^2\) value when the probability is \(0.05\) and \(2\) degrees of freedom?

qchisq(0.05, df = 2, lower.tail = FALSE) %>% round(4)
[1] 5.9915

Chapter 6: Let’s look at the relationship between two variables

Page 116: Monthly expenditures on makeup and clothes:

street.survey <- data.frame(
  Respondent = c(paste("Ms.", LETTERS[1:10])),
  Makeup = c(3000,  5000, 12000,  2000,  7000, 15000,  5000,  6000,  8000, 10000),
  Clothes = c(7000,  8000, 25000,  5000, 12000, 30000, 10000, 15000, 20000, 18000)
)

street.survey
Respondent Makeup Clothes
Ms. A 3000 7000
Ms. B 5000 8000
Ms. C 12000 25000
Ms. D 2000 5000
Ms. E 7000 12000
Ms. F 15000 30000
Ms. G 5000 10000
Ms. H 6000 15000
Ms. I 8000 20000
Ms. J 10000 18000

Scatterplot of survey results using ggplot2 package:

ggplot(street.survey, aes(Makeup, Clothes)) + geom_point(size=3) + theme_bw() +
  ggtitle("Scatter chart of monthly expenditures on makeup and clothes") +
  xlab("Amount spent on makeup (¥)") + ylab("Amount spent on clothes (¥)") +
  coord_cartesian(xlim = c(0, 30000), ylim = c(0, 32000))

Page 118: Correlation coefficient between expenditures on makeup and expenditures on clothes:

street.survey %$% cor(Makeup, Clothes) %>% round(4)
[1] 0.968

Page 121: Age and favorite fashion brand:

brand.survey <- data.frame(
  Respondent = c(paste("Ms.", LETTERS[1:15])),
  Age = c(27, 33, 16, 29, 32, 23, 25, 28, 22, 18, 26, 26, 15, 29, 26),
  Brand = as.factor(c("Theremes", "Channelior", "Bureperry", "Bureperry", "Channelior",
                      "Theremes", "Channelior", "Theremes", "Bureperry", "Bureperry",
                      "Channelior", "Theremes", "Bureperry", "Channelior", "Bureperry"))
)

brand.survey
Respondent Age Brand
Ms. A 27 Theremes
Ms. B 33 Channelior
Ms. C 16 Bureperry
Ms. D 29 Bureperry
Ms. E 32 Channelior
Ms. F 23 Theremes
Ms. G 25 Channelior
Ms. H 28 Theremes
Ms. I 22 Bureperry
Ms. J 18 Bureperry
Ms. K 26 Channelior
Ms. L 26 Theremes
Ms. M 15 Bureperry
Ms. N 29 Channelior
Ms. O 26 Bureperry

Page 122: Scatterplot of girl’s “Favorite fashion brand and age” survey:

ggplot(brand.survey, aes(Brand, Age)) + geom_point(size=3) +
  theme_bw() + ggtitle("Scatter chart of favorite fashion brand and age")

Page 123-124: What is the correlation ratio between a girl’s age and a girl’s favorite fashion brand? There is no built-in function for correlation ratio, so we must do all calculations ourselves. I am using here the popular dplyr package, which is very handy for various data manipulations:

# Calculating intraclass variance
intraclass.var <- brand.survey %>% group_by(Brand) %>% 
                  summarise(S = sum((Age - mean(Age))^2)) %>%
                  summarise(sum(S)) %>% as.numeric()

# Average for all data
age.mean <- brand.survey %>% summarise(mean(Age))

# Calculating interclass variance
interclass.var <- brand.survey %>% group_by(Brand) %>%
                  summarise(S = n()*(mean(Age) - age.mean)^2) %>%
                  summarise(sum(S)) %>% as.numeric()

cor.ratio <- interclass.var / (intraclass.var + interclass.var)
intraclass.var %>% round(4) %>% paste("Intraclass variance:", ., "\n") %>% cat()
interclass.var %>% round(4) %>% paste("Interclass variance:", ., "\n") %>% cat()
cor.ratio %>% round(4) %>% paste("Correlation Ratio:", ., "\n") %>% cat()
Intraclass variance: 224 
Interclass variance: 180 
Correlation Ratio: 0.4455 

Page 128: Cross Tabluation of sex and desired way of being asked out:

asked.out.table <- matrix(c(34, 61, 53, 38, 40, 74), byrow=T, nrow=2)
dimnames(asked.out.table) = list(Sex=c("Female", "Male"),
  DesiredWay = c("Phone", "E-mail", "Face to face"))

asked.out.table %>% addmargins()
Phone E-mail Face to face Sum
Female 34 61 53 148
Male 38 40 74 152
Sum 72 101 127 300

Page 132: Calculate Pearson’s \(\chi^2_0\) test statistic using built-in chisq.test function:

asked.out.chisq <- chisq.test(asked.out.table)$statistic
asked.out.chisq %>% round(4)
X-squared 
   8.0091 

Page 132: Calculate Cramer’s coefficient:

Cramers <- function(ctable) {
    chi2 <- as.numeric(chisq.test(ctable)$statistic)
    s <- sum(ctable)
    d <- min(dim(ctable))
    return(sqrt(chi2 / (s * (d - 1))))
}

Cramers(asked.out.table) %>% round(4)
[1] 0.1634

Page 138: What is Cramer’s coefficient of a table on ordered food and tea and coffee preferences?

drink.table <- matrix(c(43, 33, 51, 53, 29, 41), byrow=T, nrow=3)
dimnames(drink.table) = list(FoodType=c("Japanese", "European", "Chinese"),
  Preference = c("Coffee", "Tea"))

drink.table %>% addmargins()
Coffee Tea Sum
Japanese 43 33 76
European 51 53 104
Chinese 29 41 70
Sum 123 127 250

Page 141: Calculate Pearson’s \(\chi^2_0\) test statistic:

chisq.test(drink.table)$statistic %>% round(4)
X-squared 
   3.3483 

Page 141: Calculate Cramer’s coefficient:

Cramers(drink.table) %>% round(4)
[1] 0.1157

Chapter 7: Let’s explore the hypothesis tests

Let’s do Pearson’s chi-square statistic (\(\chi^2_0\)) and chi-square distribution experiment described on pages 152-153:

Page 153: Contingency table of sex and derired way of being asked out for “all high students residing in Japan”:

asked.out.table.all <- as.table(matrix(c(400, 1600, 2000, 600, 2400, 3000),
                                   byrow=T, nrow=2))
dimnames(asked.out.table.all) = list(Sex=c("Female", "Male"),
  DesiredWay = c("Phone", "E-mail", "Face to face"))

asked.out.table.all %>% addmargins()
Phone E-mail Face to face Sum
Female 400 1600 2000 4000
Male 600 2400 3000 6000
Sum 1000 4000 5000 10000

We’ll make a data frame with frequencies of all combinations of the levels of two factors:

asked.out.freq <- as.data.frame(asked.out.table.all)

asked.out.freq
Sex DesiredWay Freq
Female Phone 400
Male Phone 600
Female E-mail 1600
Male E-mail 2400
Female Face to face 2000
Male Face to face 3000

To make a raw data table (like the table 7-1 on page 153) from this table we should repeat all rows Freq times:

FreqToCases <- function(row) {
  n <- row$Freq
  df <- data.frame(
    Sex = rep(row$Sex, n),
    DesiredWay = rep(row$DesiredWay, n)
  )
  return(df)
}

# Call FreqToCases function for every row of asked.out.freq and
# glue all results in a single data frame
asked.out.data <- asked.out.freq %>% rowwise() %>% do(FreqToCases(.))

Then we will make 10000 different random samples of 300 high school students and calculate \(\chi^2_0\) statistic each time:

chisq.data <- data.frame(Sample = 1:10000)

SampleChiSq <- function(n) {
  # random sample of n students
  asked.out.sample <- sample_n(asked.out.data, n)
  # contingency table for this sample
  asked.out.sample.table <- table(asked.out.sample)
  # return Chi-squared statistic for this sample
  return(chisq.test(asked.out.sample.table)$statistic)
}

# Vectorize a function to be able to make all random samples at once
SampleChiSq %<>% Vectorize()

# Calculating 10000 different Chi-squared statistics
chisq.data$ChiSq <- SampleChiSq(rep(300,10000))

# View first 10 rows of chisq.data
chisq.data %>% head(10) %>% round(5)
Sample ChiSq
1 0.70046
2 2.46699
3 4.70145
4 0.83618
5 2.85338
6 0.77407
7 2.19792
8 0.04066
9 0.52391
10 0.90191

Page 154: Histogram of 10000 \(\chi^2_0\) statistics from 10000 random samples:

# plot title with mathematical annotation
title <- expression(paste("Histogram of ", chi[0]^2, " statistics from 10000 samples"))

# plot using ggplot2 package
chi.plot1 <- ggplot(chisq.data, aes(x = ChiSq)) + theme_bw() +
  geom_histogram(aes(y = ..density..), binwidth = 0.5, colour = "black", fill = "gray") +
  xlab(expression(paste(chi[0]^2, " value"))) + ylab("Density of distribution") +
  ggtitle(title) + coord_cartesian(xlim = c(0, 15), ylim = c(0, 0.52))

chi.plot1

Page 158: Critical value \(\chi^2_{crit}\) for \(0.05\) significance level:

chi.crit <- qchisq(0.05, df = 2, lower.tail = FALSE)

chi.crit
[1] 5.991465

Page 159: We can add a graph of \(\chi^2\)-distribution to the histogram and show the critical region:

# title with mathematical annotation
title <- expression(atop(paste("Histogram of ", chi[0]^2, " statistics from 10000 samples and "),
         paste("a graph of ", chi^2, " distribution with 2 degrees of freedom")))

# add some elements to chi.plot1
chi.plot2 <- chi.plot1 + ggtitle(title) +
  stat_function(fun = dchisq, args = list(df=2), colour = "blue") +
  geom_vline(aes(xintercept = chi.crit), colour = "red", linetype = "dashed") +
  annotate("text", label = "{chi[crit]^2}(alpha == 0.05)", parse = TRUE, x = chi.crit + 1.9,
           y = 0.07, color = "red", size = 6)

chi.plot2

Histogram look very similar to \(\chi^2\)-distribution, because Cramer’s coefficient for the population “all high school students residing in Japan” is equal to zero:

Cramers(asked.out.table.all) %>% cat()
0

Let’s see how many samples have \(\chi^2_0\) statistic equal or greater than \(\chi^2_{crit}\):

chisq.data %>% filter(ChiSq >= chi.crit) %>% nrow()
[1] 450

And that’s 4.5% (nearly 5%) from all 10000 samples, as expected.

Page 177: P-value:

chisq.test(asked.out.table)$p.value %>% round(4)
[1] 0.0182

Or you can get full summary with:

chisq.test(asked.out.table)

    Pearson's Chi-squared test

data:  asked.out.table
X-squared = 8.0091, df = 2, p-value = 0.01823

Page 188: \(\chi^2\) test of independence of the table of ordered food and tea and coffee preferences:

chisq.test(drink.table)

    Pearson's Chi-squared test

data:  drink.table
X-squared = 3.3483, df = 2, p-value = 0.1875
qchisq(0.01, df = 2, lower.tail = FALSE) %>% round(4) %>%
  paste("Chi-squared critical value:", ., "\n") %>% cat()
Chi-squared critical value: 9.2103