Chapter 4: Test of Independence of Variables

Statistics for Data Science

Author

Pai

Published

January 1, 2026

1 Chapter Overview

One of the most fundamental questions in data analysis is whether two variables are related to each other, or whether they vary independently. Does a patient’s smoking status relate to their disease outcome? Is a student’s program of study associated with their level of satisfaction? Do income and education move together? Tests of independence answer these questions formally, across all combinations of variable types.

This chapter covers:

The Concept of Independence — theoretical foundations linking back to probability
Chi-Square Test of Independence — testing association between two categorical variables
Fisher’s Exact Test — a robust alternative for small or sparse contingency tables
Cramér’s V and Effect Size — quantifying the strength of categorical association
Correlation Tests — Pearson and Spearman for continuous and ordinal variables
Point-Biserial and Phi Coefficients — association involving binary variables
Partial Correlation — measuring association while controlling for a third variable

Learning Objectives

By the end of this chapter, you will be able to:

Distinguish statistical independence from dependence for different variable types.
Conduct and interpret chi-square and Fisher’s exact tests for categorical data.
Compute and interpret Cramér’s V as a measure of association strength.
Test the significance of Pearson and Spearman correlations.
Select the appropriate association measure based on variable types.
Compute partial correlations to control for confounding variables.

2 The Concept of Independence

2.1 Introduction

In Chapter 2, we defined statistical independence in probability terms: two events $A$ and $B$ are independent if $P(A \cap B) = P(A) \cdot P(B)$. In data analysis, we extend this idea to variables: two variables are statistically independent if knowing the value of one provides no information about the value of the other. When they are not independent, we say they are associated or dependent.

Testing for independence is not the same as proving causation. Association tells us that two variables move together — it says nothing about which causes which, or whether both are driven by a third variable. This distinction is critical in data science, where spurious correlations are common in large datasets.

2.2 Theory

2.2.1 Independence for Categorical Variables

Two categorical variables $X$ and $Y$ are independent if, for all categories $i$ and $j$:

\[P(X = i \cap Y = j) = P(X = i) \cdot P(Y = j)\]

In a contingency table, this means every observed cell frequency should equal the corresponding expected cell frequency under independence:

\[E_{ij} = \frac{(\text{Row } i \text{ total}) \times (\text{Column } j \text{ total})}{\text{Grand total}} = \frac{R_i \cdot C_j}{N}\]

Any systematic departure of observed frequencies from expected frequencies is evidence of association.

2.2.2 Independence for Continuous Variables

Two continuous variables $X$ and $Y$ are independent if their joint distribution equals the product of their marginal distributions:

\[f(x, y) = f_X(x) \cdot f_Y(y)\]

In practice, we test a weaker condition: linear independence (zero correlation). Note that zero correlation does not guarantee full independence — two variables can be nonlinearly related yet have zero Pearson correlation. This is why visualization always accompanies formal tests.

2.2.3 A Taxonomy of Association Measures

The correct test and measure of association depend on the types of variables being compared:

Association measures by variable type
Variable 1	Variable 2	Test	Effect Size
Categorical	Categorical	Chi-square / Fisher’s	Cramér’s V
Continuous	Continuous	Pearson / Spearman correlation test	$r$ / $\rho$
Binary	Continuous	Point-biserial correlation test	$r_{pb}$
Binary	Binary	Chi-square / Fisher’s	Phi ($\phi$)
Continuous (controlling for $Z$)	Continuous	Partial correlation	Partial $r$

2.3 Example: Identifying the Correct Test

Example 4.1. A researcher collects data on university students. Identify the appropriate test for each pair of variables:

Example 4.1 — Test selection by variable type
Variable 1	Variable 2	Appropriate Test
Faculty (Science, Arts, Business)	Pass/Fail status	Chi-square test
Age (years)	GPA (0–4 scale)	Pearson or Spearman correlation
Gender (Male/Female)	Exam score (continuous)	Point-biserial correlation
Smoking (Yes/No)	Disease (Yes/No)	Chi-square or Fisher’s exact
Income	Happiness (controlling for age)	Partial correlation

2.4 R Example: Exploring Variable Relationships

# --- Use the built-in mtcars and titanic-style datasets ---
# Convert relevant mtcars variables to categorical
data(mtcars)
mtcars_cat <- mtcars |>
  mutate(
    transmission = factor(am, labels = c("Automatic", "Manual")),
    cylinders    = factor(cyl),
    efficiency   = factor(ifelse(mpg >= median(mpg), "High", "Low"),
                          levels = c("Low", "High"))
  )

# Quick overview of variable types
cat("=== Variable Type Overview ===\n")

=== Variable Type Overview ===

cat("Categorical × Categorical: transmission vs. efficiency\n")

Categorical × Categorical: transmission vs. efficiency

cat("Continuous × Continuous:   mpg vs. hp\n")

Continuous × Continuous:   mpg vs. hp

cat("Binary × Continuous:       transmission vs. mpg\n\n")

Binary × Continuous:       transmission vs. mpg

# Contingency table: transmission × efficiency
tab <- table(mtcars_cat$transmission, mtcars_cat$efficiency)
kable(addmargins(tab),
      caption = "Contingency Table: Transmission × Fuel Efficiency") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Contingency Table: Transmission × Fuel Efficiency
	Low	High	Sum
Automatic	13	6	19
Manual	2	11	13
Sum	15	17	32

# --- Visualize potential associations ---
p1 <- ggplot(mtcars_cat,
             aes(x = transmission, fill = efficiency)) +
  geom_bar(position = "fill", color = "white") +
  scale_fill_manual(values = c("Low" = "tomato", "High" = "steelblue")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title    = "Transmission vs. Fuel Efficiency",
       subtitle = "Stacked proportional bar — unequal proportions suggest association",
       x        = "Transmission Type",
       y        = "Proportion",
       fill     = "Efficiency") +
  theme_minimal(base_size = 12)

p2 <- ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "steelblue", size = 2.5, alpha = 0.7) +
  geom_smooth(method = "lm", color = "tomato",
              se = TRUE, linewidth = 1) +
  labs(title    = "Horsepower vs. Fuel Efficiency",
       subtitle = "Scatterplot with linear fit — downward trend suggests negative correlation",
       x        = "Horsepower (hp)",
       y        = "Miles per Gallon (mpg)") +
  theme_minimal(base_size = 12)

p1 + p2

Code explanation:

addmargins(table()) adds row and column totals to the contingency table — essential for computing expected frequencies by hand.
geom_bar(position = "fill") creates proportional stacked bars, making differences in conditional distributions visually obvious.
Visual inspection always precedes formal testing — if proportions look identical across groups, a significant test result warrants careful scrutiny.

2.5 Exercises

Exercise 4.1

For each of the following research questions, identify: (a) the variable types, (b) the appropriate test of independence, and (c) what a significant result would mean substantively.

Do male and female students differ in their choice of academic major?
Is there a relationship between study hours per week and final exam score?
Does coffee consumption (cups/day) relate to reported anxiety level (1–10 scale)?
Is being a first-generation student associated with dropout status (yes/no)?

3 Chi-Square Test of Independence

3.1 Introduction

The chi-square test of independence is the standard procedure for testing whether two categorical variables are associated. It compares the observed frequencies in a contingency table to the expected frequencies that would arise if the variables were truly independent. It is among the most widely used tests in social science, medical research, and data science — applicable whenever we want to know if two categorical groupings are related.

3.2 Theory

3.2.1 The Test Statistic

Given a contingency table with $r$ rows and $c$ columns, the chi-square test statistic is:

\[\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

where $O_{ij}$ is the observed frequency and $E_{ij} = R_i C_j / N$ is the expected frequency in cell $(i,j)$.

Under $H_0$ (independence), $\chi^2 \sim \chi^2((r-1)(c-1))$ approximately.

Hypotheses: \[H_0: \text{The two variables are independent}\] \[H_1: \text{The two variables are associated}\]

3.2.2 Assumptions

Observations are independent (each subject appears in exactly one cell).
The data are counts (not percentages or proportions).
Expected cell frequencies are sufficiently large:
- At least 80% of cells have $E_{ij} \geq 5$
- No cell has $E_{ij} < 1$

When these conditions are violated — typically with small samples or rare categories — use Fisher’s Exact Test (Section 3).

3.2.3 Yates’ Continuity Correction

For 2×2 tables, a continuity correction is sometimes applied to account for the fact that we are approximating a discrete distribution with a continuous one:

\[\chi^2_{\text{Yates}} = \sum \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}\]

Yates’ correction is conservative (reduces test power) and is debated in the literature. Many statisticians prefer Fisher’s Exact Test for 2×2 tables regardless of sample size.

3.2.4 Standardized Residuals

A significant chi-square test tells us that association exists but not where it comes from. Standardized residuals identify which cells contribute most:

\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

Cells with $|r_{ij}| > 2$ are typically considered notably discrepant from independence.

3.3 Example: Chi-Square Test

Example 4.2. A researcher investigates whether exercise frequency is associated with self-reported health status among 200 adults.

	Poor Health	Good Health	Total
Rarely exercises	40	25	65
Sometimes exercises	30	45	75
Regularly exercises	15	45	60
Total	85	115	200

Expected frequencies under $H_0$: \[E_{11} = \frac{65 \times 85}{200} = 27.6, \quad E_{12} = \frac{65 \times 115}{200} = 37.4\] \[E_{21} = \frac{75 \times 85}{200} = 31.9, \quad E_{22} = \frac{75 \times 115}{200} = 43.1\] \[E_{31} = \frac{60 \times 85}{200} = 25.5, \quad E_{32} = \frac{60 \times 115}{200} = 34.5\]

Test statistic: \[\chi^2 = \frac{(40-27.6)^2}{27.6} + \frac{(25-37.4)^2}{37.4} + \cdots = 5.58 + 4.12 + 0.11 + 0.08 + 4.33 + 3.20 = 17.42\]

df $= (3-1)(2-1) = 2$, p-value $= P(\chi^2_2 > 17.42) < 0.001$

Decision: Reject $H_0$ — exercise frequency and health status are significantly associated ($\chi^2(2) = 17.42$, $p < 0.001$).

Standardized residuals reveal the pattern: “Rarely exercises / Poor Health” is notably higher than expected ($r = +2.36$), while “Regularly exercises / Poor Health” is notably lower ($r = -2.08$) — people who exercise regularly are less likely to report poor health.

3.4 R Example: Chi-Square Test of Independence

# --- Build the exercise × health contingency table ---
exercise_health <- matrix(
  c(40, 25, 30, 45, 15, 45),
  nrow   = 3, byrow  = TRUE,
  dimnames = list(
    Exercise = c("Rarely", "Sometimes", "Regularly"),
    Health   = c("Poor", "Good")
  )
)

# Display observed table with margins
kable(addmargins(exercise_health),
      caption = "Observed Frequencies: Exercise × Health Status") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Observed Frequencies: Exercise × Health Status
	Poor	Good	Sum
Rarely	40	25	65
Sometimes	30	45	75
Regularly	15	45	60
Sum	85	115	200

# --- Chi-square test ---
chi_result <- chisq.test(exercise_health, correct = FALSE)
print(chi_result)


    Pearson's Chi-squared test

data:  exercise_health
X-squared = 17.352, df = 2, p-value = 0.0001706

# --- Expected frequencies ---
kable(round(chi_result$expected, 2),
      caption = "Expected Frequencies Under H₀ (Independence)") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Expected Frequencies Under H₀ (Independence)
	Poor	Good
Rarely	27.62	37.38
Sometimes	31.88	43.12
Regularly	25.50	34.50

# --- Standardized residuals ---
kable(round(chi_result$stdres, 3),
      caption = "Standardized Residuals (|r| > 2 = notable departure)") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE) |>
  column_spec(2:3,
              color = ifelse(
                abs(chi_result$stdres) > 2, "tomato", "black"
              ))

Standardized Residuals (|r| > 2 = notable departure)
	Poor	Good
Rarely	3.779	-3.779
Sometimes	-0.554	0.554
Regularly	-3.277	3.277

# --- Mosaic plot and association plot ---
# Convert to data frame for ggplot
eh_df <- as.table(exercise_health) |>
  as.data.frame(responseName = "Count")

# Proportional stacked bar
ggplot(eh_df, aes(x = Exercise, y = Count, fill = Health)) +
  geom_bar(stat = "identity", position = "fill", color = "white") +
  geom_hline(yintercept = 85/200, linetype = "dashed",
             color = "gray30", linewidth = 0.8) +
  annotate("text", x = 0.6, y = 85/200 + 0.03,
           label = "Overall poor health rate (42.5%)",
           color = "gray30", size = 3.5, hjust = 0) +
  scale_fill_manual(values = c("Poor" = "tomato", "Good" = "steelblue")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title    = "Exercise Frequency vs. Health Status",
       subtitle = paste0("Chi-square test: χ²(2) = ",
                         round(chi_result$statistic, 2),
                         ", p < 0.001"),
       x        = "Exercise Frequency",
       y        = "Proportion",
       fill     = "Health Status") +
  theme_minimal(base_size = 13)

Code explanation:

chisq.test(table, correct = FALSE) runs the chi-square test without Yates’ correction. Set correct = TRUE for Yates’ correction on 2×2 tables.
The result object stores $expected (expected frequencies), $stdres (standardized residuals), and $p.value.
The dashed horizontal line marks the overall “poor health” rate — bars above it indicate excess poor health relative to the population average, making deviations from independence visually clear.

Try modifying: Check all(chi_result$expected >= 5) to verify the minimum expected frequency assumption is met.

3.5 Exercises

Exercise 4.2

Use the HairEyeColor dataset built into R (hair color × eye color × sex for 592 statistics students).

Collapse across sex: margin.table(HairEyeColor, c(1,2)) to get a Hair × Eye contingency table.
Run a chi-square test. Are hair and eye color independent?
Compute and display standardized residuals. Which hair-eye combinations are most over- or under-represented?
Produce a proportional stacked bar chart and interpret the pattern.

Exercise 4.3

A clinical study records treatment type (Drug A, Drug B, Placebo) and recovery outcome (Recovered, Not Recovered) for 150 patients.

Create the contingency table in R with matrix().
Verify the minimum expected frequency assumption. If violated, what alternative test should you use?
Run the chi-square test and report results in APA format: $\chi^2(df) = X.XX$, $p = .XXX$.

4 Fisher’s Exact Test

4.1 Introduction

The chi-square test relies on a large-sample approximation — when expected cell frequencies are small (below 5), the $\chi^2$ approximation becomes unreliable and p-values are inaccurate. Fisher’s Exact Test solves this problem by computing the exact probability of observing the table, or one more extreme, under $H_0$ of independence. It was originally designed for 2×2 tables but extends to larger tables. In data science and biomedical research, small samples and rare events are common — Fisher’s test is an essential tool.

4.2 Theory

4.2.1 The Hypergeometric Distribution

Fisher’s test is based on the hypergeometric distribution — the exact probability of a specific 2×2 table with fixed marginal totals $(R_1, R_2, C_1, C_2)$:

\[P\left(\begin{bmatrix} a & b \\ c & d \end{bmatrix}\right) = \frac{\binom{R_1}{a}\binom{R_2}{c}}{\binom{N}{C_1}} = \frac{(R_1)!(R_2)!(C_1)!(C_2)!}{N! \cdot a! \cdot b! \cdot c! \cdot d!}\]

The p-value is the sum of probabilities for all tables as extreme as, or more extreme than, the observed table — while keeping marginal totals fixed.

4.2.2 Two-Sided Fisher’s Test

For two-sided tests, “more extreme” is defined as tables with probability less than or equal to the probability of the observed table. R’s fisher.test() implements this by default.

4.2.3 When to Use Fisher’s Exact Test

Fisher’s Exact Test usage guidelines
Situation	Recommendation
Any cell has $E_{ij} < 5$	Use Fisher’s
Total $N < 20$	Use Fisher’s
2×2 table, any sample size	Fisher’s is always valid (some prefer it over chi-square)
Large $r \times c$ table, large $N$	Chi-square is adequate

4.3 Example: Fisher’s Exact Test

Example 4.3. A rare disease study examines whether a genetic marker is associated with disease status. Only 14 subjects are available.

	Disease	No Disease	Total
Marker Present	5	2	7
Marker Absent	1	6	7
Total	6	8	14

Expected frequencies: $E_{11} = 6 \times 7/14 = 3.0$ — below 5, so chi-square is inappropriate.

Fisher’s exact p-value (two-sided): $p = 0.103$

Decision: Fail to reject $H_0$ at $\alpha = 0.05$. Despite the apparent pattern, there is insufficient evidence to conclude the marker is associated with disease — likely due to the very small sample size.

Interpretation note: This example illustrates an important limitation. The study is severely underpowered — a larger study might well find significance. Fisher’s test is honest about uncertainty; it does not manufacture significance from sparse data.

4.4 R Example: Fisher’s Exact Test

# --- Genetic marker × disease status ---
marker_disease <- matrix(
  c(5, 2, 1, 6),
  nrow     = 2, byrow    = TRUE,
  dimnames = list(
    Marker  = c("Present", "Absent"),
    Disease = c("Disease", "No Disease")
  )
)

kable(addmargins(marker_disease),
      caption = "Observed: Genetic Marker × Disease Status") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Observed: Genetic Marker × Disease Status
	Disease	No Disease	Sum
Present	5	2	7
Absent	1	6	7
Sum	6	8	14

# --- Check expected frequencies ---
chi_check <- chisq.test(marker_disease, correct = FALSE)
cat("Expected frequencies:\n")

Expected frequencies:

print(round(chi_check$expected, 2))

         Disease
Marker    Disease No Disease
  Present       3          4
  Absent        3          4

cat("\nMinimum expected frequency:",
    round(min(chi_check$expected), 2),
    "— chi-square assumption violated!\n\n")


Minimum expected frequency: 3 — chi-square assumption violated!

# --- Fisher's Exact Test ---
fisher_result <- fisher.test(marker_disease, alternative = "two.sided")
print(fisher_result)


    Fisher's Exact Test for Count Data

data:  marker_disease
p-value = 0.1026
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0.7203391 817.9820107
sample estimates:
odds ratio 
  11.71745

# --- Compare Fisher's vs Chi-square on same data ---
comparison <- data.frame(
  Test      = c("Chi-Square (no correction)",
                "Chi-Square (Yates' correction)",
                "Fisher's Exact Test"),
  Statistic = c(round(chisq.test(marker_disease,
                                  correct=FALSE)$statistic, 3),
                round(chisq.test(marker_disease,
                                  correct=TRUE)$statistic, 3),
                "—"),
  p_value   = c(round(chisq.test(marker_disease,
                                   correct=FALSE)$p.value, 4),
                round(chisq.test(marker_disease,
                                   correct=TRUE)$p.value, 4),
                round(fisher_result$p.value, 4)),
  Appropriate = c("No (E < 5)", "No (E < 5)", "Yes ✓")
)

kable(comparison,
      caption   = "Test Comparison: Small Sample Scenario",
      col.names = c("Test", "Statistic", "p-value", "Appropriate?")) |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE) |>
  column_spec(4, bold = TRUE,
              color = c("tomato","tomato","darkgreen"))

Test Comparison: Small Sample Scenario
Test	Statistic	p-value	Appropriate?
Chi-Square (no correction)	4.667	0.0308	No (E < 5)
Chi-Square (Yates' correction)	2.625	0.1052	No (E < 5)
Fisher's Exact Test	—	0.1026	Yes ✓

# --- Visualize with balloon plot ---
marker_df <- as.table(marker_disease) |>
  as.data.frame(responseName = "Count") 

ggplot(marker_df, aes(x = Disease, y = Marker)) +
  geom_point(aes(size = Count, color = Count),
             shape = 16, alpha = 0.8) +
  geom_text(aes(label = Count), color = "white",
            fontface = "bold", size = 5) +
  scale_size_continuous(range = c(8, 25)) +
  scale_color_gradient(low = "steelblue", high = "tomato") +
  labs(title    = "Balloon Plot: Marker × Disease",
       subtitle = paste0("Fisher's Exact Test: p = ",
                         round(fisher_result$p.value, 3)),
       x        = "Disease Status",
       y        = "Marker Status") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

Code explanation:

fisher.test(table, alternative) computes the exact p-value. For 2×2 tables it also returns an odds ratio with a confidence interval — a useful effect measure.
The comparison table illustrates how chi-square p-values are unreliable with small expected frequencies — they differ substantially from Fisher’s exact result.
The balloon plot is an intuitive alternative to a contingency table for small tables: bubble area represents cell count, making the pattern immediately visible.

4.5 Exercises

Exercise 4.4

A new teaching intervention is tested in two small classrooms. Results:

	Passed	Failed
Intervention	8	2
Control	4	6

Check whether chi-square assumptions are met.
Run Fisher’s Exact Test. What is the p-value and odds ratio?
Interpret the odds ratio in plain language.
What would a larger study need to show to confirm this result?

5 Cramér’s V and Effect Size for Categorical Association

5.1 Introduction

A significant chi-square test only tells us that an association exists — it does not say how strong it is. With large samples, even trivially weak associations become statistically significant. Effect size measures for categorical variables answer the practical question: How strongly are these two variables related? The most versatile measure is Cramér’s V, which standardizes chi-square to a 0–1 scale regardless of table dimensions.

5.2 Theory

5.2.1 Cramér’s V

Cramér’s V is derived from the chi-square statistic, normalized to fall between 0 and 1:

\[V = \sqrt{\frac{\chi^2 / N}{\min(r-1, c-1)}}\]

where $N$ is the total sample size, $r$ is the number of rows, and $c$ is the number of columns.

$V = 0$: Perfect independence (no association)
$V = 1$: Perfect association

Interpretation benchmarks (Cohen, 1988), adjusted for table size:

Cramér’s V interpretation by table dimensions
Table Size	Small	Medium	Large
2×2	0.10	0.30	0.50
3×3	0.07	0.21	0.35
4×4	0.06	0.17	0.29

5.2.2 The Phi Coefficient

For 2×2 tables specifically, the phi coefficient $\phi$ is a simpler equivalent:

\[\phi = \sqrt{\frac{\chi^2}{N}} = \frac{ad - bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}\]

$\phi$ ranges from $-1$ to $+1$ for 2×2 tables (with sign indicating direction) and equals Cramér’s V when $\min(r-1, c-1) = 1$.

5.2.3 Goodman and Kruskal’s Lambda

An alternative measure of association is Lambda ($\lambda$), which measures the proportional reduction in prediction error — how much better we can predict one variable knowing the other:

\[\lambda = \frac{\sum_j \max_i f_{ij} - \max_i R_i}{N - \max_i R_i}\]

$\lambda = 0$ means knowing $X$ does not help predict $Y$; $\lambda = 1$ means perfect prediction.

5.3 Example: Cramér’s V

Example 4.4. Returning to the exercise × health study (Example 4.2): $\chi^2 = 17.42$, $N = 200$, table is 3×2.

\[V = \sqrt{\frac{17.42/200}{\min(3-1, 2-1)}} = \sqrt{\frac{0.0871}{1}} = \sqrt{0.0871} \approx 0.295\]

For a table with $\min(r-1, c-1) = 1$, benchmarks are 0.10/0.30/0.50 (small/medium/large). $V = 0.295$ is close to a medium effect — the association between exercise and health is statistically significant and of moderate practical importance.

5.4 R Example: Cramér’s V and Association Measures

# --- Cramér's V for exercise × health ---
exercise_health <- matrix(
  c(40, 25, 30, 45, 15, 45),
  nrow = 3, byrow = TRUE,
  dimnames = list(
    Exercise = c("Rarely", "Sometimes", "Regularly"),
    Health   = c("Poor", "Good")
  )
)

chi_result  <- chisq.test(exercise_health, correct = FALSE)
N           <- sum(exercise_health)
min_dim     <- min(nrow(exercise_health)-1, ncol(exercise_health)-1)
cramers_v   <- sqrt(chi_result$statistic / (N * min_dim))

cat("Chi-square statistic:", round(chi_result$statistic, 3), "\n")

Chi-square statistic: 17.352

cat("N:", N, "\n")

N: 200

cat("min(r-1, c-1):", min_dim, "\n")

min(r-1, c-1): 1

cat("Cramér's V:", round(cramers_v, 4), "\n")

Cramér's V: 0.2946

cat("Interpretation: Medium effect (benchmark ~0.30 for this table size)\n\n")

Interpretation: Medium effect (benchmark ~0.30 for this table size)

# --- Using vcd package for association measures ---
assoc_stats <- vcd::assocstats(exercise_health)
print(assoc_stats)

                    X^2 df   P(> X^2)
Likelihood Ratio 17.694  2 0.00014384
Pearson          17.352  2 0.00017064

Phi-Coefficient   : NA 
Contingency Coeff.: 0.283 
Cramer's V        : 0.295

# --- Visualize standardized residuals as heatmap ---
stdres <- chi_result$stdres
stdres_df <- as.data.frame(as.table(stdres)) |>
  rename(Exercise = Exercise, Health = Health, Residual = Freq)

ggplot(stdres_df, aes(x = Health, y = Exercise, fill = Residual)) +
  geom_tile(color = "white", linewidth = 1.5) +
  geom_text(aes(label = round(Residual, 2)),
            size = 5, fontface = "bold",
            color = ifelse(abs(stdres_df$Residual) > 2,
                           "white", "black")) +
  scale_fill_gradient2(low  = "steelblue", mid = "white",
                       high = "tomato",    midpoint = 0,
                       limits = c(-3, 3)) +
  labs(title    = "Standardized Residuals Heatmap",
       subtitle = paste0("Cramér's V = ", round(cramers_v, 3),
                         " (medium effect) | Red = more than expected, Blue = less"),
       x        = "Health Status",
       y        = "Exercise Frequency",
       fill     = "Std. Residual") +
  theme_minimal(base_size = 13)

Code explanation:

Cramér’s V is computed manually here to show the formula clearly; vcd::assocstats() computes it (and phi, contingency coefficient) in one call.
The heatmap of standardized residuals is more informative than a standard mosaic plot for larger tables — color immediately draws the eye to cells with the strongest departures from independence.
scale_fill_gradient2() creates a diverging color scale centered at zero, perfect for signed residuals.

5.5 Exercises

Exercise 4.5

Using the HairEyeColor chi-square result from Exercise 4.2:

Compute Cramér’s V manually from the $\chi^2$ statistic.
Use vcd::assocstats() to verify.
Interpret the effect size using the appropriate benchmark for the table dimensions.
Produce a standardized residuals heatmap and describe which hair-eye combinations drive the association.

6 Correlation Tests

6.1 Introduction

When both variables are continuous (or at least ordinal with many levels), correlation quantifies the strength and direction of their linear relationship. But a sample correlation $r$ could arise by chance even when the true population correlation $\rho = 0$. A correlation test determines whether the observed correlation is statistically distinguishable from zero. In this section we cover the two most important correlation measures: Pearson’s $r$ for linear relationships between normally distributed variables, and Spearman’s $\rho$ for monotone relationships with no normality requirement.

6.2 Theory

6.2.1 Pearson Correlation Test

Hypotheses: $H_0: \rho = 0$ vs. $H_1: \rho \neq 0$ (or one-sided)

Given sample correlation $r$ from $n$ pairs, the test statistic is:

\[t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2) \text{ under } H_0\]

Assumptions:

Both variables are continuous.
The relationship is linear.
Both variables are approximately normally distributed (bivariate normality).
Observations are independent.
No extreme outliers (Pearson’s $r$ is sensitive to outliers).

Confidence interval for $\rho$: Use Fisher’s z-transformation: \[z = \frac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right), \quad \text{SE}(z) = \frac{1}{\sqrt{n-3}}\]

Then transform back to the correlation scale.

6.2.2 Spearman Rank Correlation Test

Spearman’s $\rho$ replaces raw values with ranks and computes Pearson’s $r$ on the ranks. This makes it robust to:

Non-normality
Outliers
Monotone but non-linear relationships
Ordinal variables

\[\rho_s = 1 - \frac{6\sum d_i^2}{n(n^2-1)}\]

where $d_i$ is the difference in ranks for pair $i$. For large samples, a t-approximation is used for hypothesis testing.

6.2.3 Interpreting Correlation

Correlation strength benchmarks
$\|r\|$ or $\|\rho\|$	Strength
0.00 – 0.19	Negligible
0.20 – 0.39	Weak
0.40 – 0.59	Moderate
0.60 – 0.79	Strong
0.80 – 1.00	Very strong

Critical Reminder: Correlation ≠ Causation

A significant correlation between $X$ and $Y$ does not imply that $X$ causes $Y$, nor that $Y$ causes $X$. Both could be driven by a third variable $Z$ (confounding), or the association could be spurious. Always consider alternative explanations and visualize the scatterplot alongside the correlation coefficient.

6.3 Example: Pearson and Spearman Correlation Tests

Example 4.5. A researcher examines the relationship between hours of sleep per night ($X$) and cognitive performance score ($Y$) in 40 participants.

Results: $r = 0.61$, $t_{38} = 4.73$, $p < 0.001$, 95% CI: $(0.37, 0.78)$.

Interpretation: There is a statistically significant, strong positive linear correlation between sleep duration and cognitive performance ($r = 0.61$, $p < 0.001$). The 95% CI $(0.37, 0.78)$ excludes zero, confirming the association. However, this is an observational study — we cannot conclude that more sleep causes better performance.

6.4 R Example: Pearson and Spearman Correlation Tests

# --- Use mtcars: test correlation between hp and mpg ---
data(mtcars)

# Pearson correlation test
pearson_result <- cor.test(mtcars$hp, mtcars$mpg,
                            method      = "pearson",
                            alternative = "two.sided")

# Spearman correlation test
spearman_result <- cor.test(mtcars$hp, mtcars$mpg,
                             method      = "spearman",
                             alternative = "two.sided",
                             exact       = FALSE)

# Summary table
cor_summary <- data.frame(
  Method    = c("Pearson", "Spearman"),
  r_or_rho  = round(c(pearson_result$estimate,
                       spearman_result$estimate), 4),
  Statistic = round(c(pearson_result$statistic,
                       spearman_result$statistic), 3),
  df        = c(pearson_result$parameter, "—"),
  p_value   = round(c(pearson_result$p.value,
                       spearman_result$p.value), 6),
  CI_lower  = c(round(pearson_result$conf.int[1], 3), "—"),
  CI_upper  = c(round(pearson_result$conf.int[2], 3), "—")
)

kable(cor_summary,
      caption   = "Correlation Test Results: Horsepower vs. MPG",
      col.names = c("Method", "r / ρ", "t / S",
                    "df", "p-value", "95% CI Lower", "95% CI Upper")) |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Correlation Test Results: Horsepower vs. MPG
	Method	r / ρ	t / S	df	p-value	95% CI Lower	95% CI Upper
cor	Pearson	-0.7762	-6.742	30	0	-0.885	-0.586
rho	Spearman	-0.8947	10337.290	—	0	—	—

# --- Scatterplot with correlation annotation ---
r_val <- round(pearson_result$estimate, 3)
p_val <- round(pearson_result$p.value, 4)

p1 <- ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "steelblue", size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", color = "tomato",
              se = TRUE, linewidth = 1.1) +
  annotate("text", x = 280, y = 30,
           label = paste0("r = ", r_val,
                          "\np = ", p_val),
           color = "tomato", fontface = "bold",
           size = 4, hjust = 1) +
  labs(title    = "Horsepower vs. Fuel Efficiency",
       subtitle = "Pearson correlation with 95% confidence band",
       x        = "Horsepower (hp)",
       y        = "Miles per Gallon (mpg)") +
  theme_minimal(base_size = 12)

# --- Full correlation matrix for mtcars ---
cont_vars <- mtcars[, c("mpg","cyl","disp","hp","wt","qsec")]
cor_mat   <- cor(cont_vars, method = "pearson")

p2 <- ggcorrplot(cor_mat,
                 method    = "circle",
                 type      = "upper",
                 lab       = TRUE,
                 lab_size  = 3.5,
                 colors    = c("tomato","white","steelblue"),
                 title     = "Correlation Matrix — mtcars",
                 ggtheme   = theme_minimal(base_size = 11))

p1 + p2

Code explanation:

cor.test(x, y, method, alternative) tests both Pearson ("pearson") and Spearman ("spearman") correlations. Pearson returns a CI; Spearman does not (due to the rank-based nature).
exact = FALSE in Spearman uses a normal approximation — necessary when ties are present in the data.
ggcorrplot() produces a publication-ready correlation matrix. type = "upper" shows only the upper triangle to avoid redundancy.

6.5 Exercises

Exercise 4.6

Using the iris dataset, test the correlation between Sepal.Length and Petal.Length.

Produce a scatterplot. Does the relationship appear linear?
Check normality of both variables (Shapiro-Wilk). Which correlation method is more appropriate?
Run both Pearson and Spearman tests. Compare and explain any differences.
Compute the full 4×4 correlation matrix for all numeric variables and visualize with ggcorrplot().

Exercise 4.7

The anscombe dataset contains four pairs of variables with nearly identical correlations but very different scatterplot shapes.

Compute Pearson’s $r$ for all four pairs (x1/y1, x2/y2, x3/y3, x4/y4).
Plot all four scatterplots in a 2×2 grid.
For which pairs is Pearson’s $r$ a misleading summary? What would be more appropriate?

7 Point-Biserial and Phi Coefficients

7.1 Introduction

Not all association problems involve two continuous or two categorical variables. In data science, binary variables are ubiquitous — pass/fail, churn/retain, click/no-click, disease/healthy. When one or both variables in a pair are binary, specialized correlation measures maintain interpretability while remaining mathematically consistent with the general correlation framework. The point-biserial correlation handles the binary-continuous case; the phi coefficient handles the binary-binary case.

7.2 Theory

7.2.1 Point-Biserial Correlation ($r_{pb}$)

The point-biserial correlation measures the association between a naturally binary variable (coded 0/1) and a continuous variable. It is mathematically equivalent to Pearson’s $r$ computed between the 0/1 binary variable and the continuous variable:

\[r_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_Y}\sqrt{\frac{n_1 n_0}{n^2}}\]

where $\bar{Y}_1$ and $\bar{Y}_0$ are the means of the continuous variable in the two groups, $s_Y$ is the overall standard deviation, and $n_1$, $n_0$ are the group sizes.

Significance testing: Because $r_{pb}$ is algebraically equivalent to Pearson’s $r$, the same t-test applies:

\[t = \frac{r_{pb}\sqrt{n-2}}{\sqrt{1-r_{pb}^2}} \sim t(n-2)\]

This is equivalent to running an independent samples t-test comparing the two groups — the same information, expressed differently.

Assumptions: The continuous variable should be approximately normally distributed within each group (same as the independent samples t-test).

7.2.2 Phi Coefficient ($\phi$)

The phi coefficient measures association between two binary variables. It is Pearson’s $r$ applied to two 0/1 variables:

\[\phi = \frac{ad - bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}\]

where $a, b, c, d$ are the four cells of a 2×2 contingency table. $\phi$ ranges from $-1$ to $+1$ and equals Cramér’s V for 2×2 tables.

Interpretation benchmarks (same as Pearson’s $r$):

$|\phi| < 0.10$: Negligible
$0.10 \leq |\phi| < 0.30$: Weak
$0.30 \leq |\phi| < 0.50$: Moderate
$|\phi| \geq 0.50$: Strong

7.2.3 Biserial vs. Point-Biserial

A subtle distinction: the point-biserial is used when the binary variable is truly dichotomous (e.g., sex, treatment assignment). The biserial correlation is used when the binary variable is an artificially dichotomized continuous variable (e.g., pass/fail from a continuous score). The biserial correlation is larger in magnitude and corrects for the information lost by dichotomization.

7.3 Example: Point-Biserial Correlation

Example 4.6. A study records whether students passed a preparatory course (0 = No, 1 = Yes) and their final exam score. Results: $r_{pb} = 0.48$, $t_{58} = 4.19$, $p < 0.001$.

Interpretation: Students who passed the preparatory course scored significantly higher on the final exam ($r_{pb} = 0.48$, $p < 0.001$). This is a moderate-to-strong effect, indicating the preparatory course is associated with better exam performance. Note that this is equivalent to a two-sample t-test comparing exam scores between the two groups.

7.4 R Example: Point-Biserial and Phi Coefficients

# --- Point-biserial: transmission (binary) vs. mpg (continuous) ---
# am: 0 = automatic, 1 = manual
data(mtcars)

# Method 1: cor.test (Pearson on binary × continuous)
pb_result <- cor.test(mtcars$am, mtcars$mpg,
                       method = "pearson")

cat("=== Point-Biserial Correlation ===\n")

=== Point-Biserial Correlation ===

cat("r_pb:", round(pb_result$estimate, 4), "\n")

r_pb: 0.5998

cat("t:   ", round(pb_result$statistic, 3), "\n")

t:    4.106

cat("df:  ", pb_result$parameter, "\n")

df:   30

cat("p:   ", round(pb_result$p.value, 4), "\n")

p:    3e-04

cat("95% CI: [", round(pb_result$conf.int[1],3),
    ",", round(pb_result$conf.int[2],3), "]\n\n")

95% CI: [ 0.318 , 0.784 ]

# Equivalence to t-test
t_equiv <- t.test(mpg ~ am, data = mtcars, var.equal = FALSE)
cat("Equivalent Welch's t-test p-value:",
    round(t_equiv$p.value, 4), "(same conclusion)\n")

Equivalent Welch's t-test p-value: 0.0014 (same conclusion)

# --- Phi coefficient: am (auto/manual) × efficiency (high/low) ---
mtcars_bin <- mtcars |>
  mutate(
    transmission = factor(am, labels = c("Auto","Manual")),
    efficiency   = factor(ifelse(mpg >= median(mpg),
                                  "High","Low"),
                           levels = c("Low","High"))
  )

tab_2x2 <- table(mtcars_bin$transmission,
                  mtcars_bin$efficiency)

# Phi coefficient
chi_2x2 <- chisq.test(tab_2x2, correct = FALSE)
phi_val  <- sqrt(chi_2x2$statistic / sum(tab_2x2))

cat("=== Phi Coefficient ===\n")

=== Phi Coefficient ===

kable(addmargins(tab_2x2),
      caption = "Transmission × Efficiency (2×2 Table)") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Transmission × Efficiency (2×2 Table)
	Low	High	Sum
Auto	13	6	19
Manual	2	11	13
Sum	15	17	32

cat("Phi coefficient:", round(phi_val, 4), "\n")

Phi coefficient: 0.522

cat("Chi-square p-value:", round(chi_2x2$p.value, 4), "\n")

Chi-square p-value: 0.0031

cat("Interpretation: Moderate-to-strong association\n")

Interpretation: Moderate-to-strong association

# --- Visualize point-biserial ---
p1 <- ggplot(mtcars_bin, aes(x = transmission,
                              y = mpg, fill = transmission)) +
  geom_violin(alpha = 0.5, trim = FALSE) +
  geom_boxplot(width = 0.12, fill = "white") +
  scale_fill_manual(values = c("Auto" = "steelblue",
                                "Manual" = "tomato")) +
  annotate("text", x = 1.5, y = 35,
           label = paste0("r_pb = ",
                           round(pb_result$estimate, 3),
                           "\np = ", round(pb_result$p.value, 4)),
           size = 4, fontface = "bold", color = "gray20") +
  labs(title = "A. Point-Biserial: Transmission vs. MPG",
       x = "Transmission", y = "MPG") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

# --- Visualize phi ---
p2 <- as.data.frame(tab_2x2) |>
  rename(Transmission = Var1, Efficiency = Var2) |>
  ggplot(aes(x = Efficiency, y = Transmission,
             fill = Freq)) +
  geom_tile(color = "white", linewidth = 1.5) +
  geom_text(aes(label = Freq), size = 6, fontface = "bold") +
  scale_fill_gradient(low = "white", high = "steelblue") +
  annotate("text", x = 1.5, y = 2.45,
           label = paste0("φ = ", round(phi_val, 3)),
           size  = 4.5, fontface = "bold") +
  labs(title = "B. Phi Coefficient: Transmission × Efficiency",
       x = "Efficiency", y = "Transmission") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

p1 + p2

Code explanation:

cor.test(binary, continuous, method = "pearson") correctly computes the point-biserial correlation — no special function is needed since it is mathematically Pearson’s $r$.
Phi is computed as $\sqrt{\chi^2/N}$ — identical to Cramér’s V for 2×2 tables.
The equivalence between point-biserial and the independent samples t-test is shown explicitly — both tests use the same information and yield the same p-value (up to the choice of equal/unequal variances).

7.5 Exercises

Exercise 4.8

Using the mtcars dataset:

Compute the point-biserial correlation between vs (engine shape: 0 = V-shaped, 1 = straight) and qsec (quarter-mile time).
Verify this equals the Pearson correlation between the two variables.
Is there a significant association? Report in full.
Compute the phi coefficient between vs and am (both binary). What is the strength of this association?

8 Partial Correlation

8.1 Introduction

Simple bivariate correlation measures the relationship between two variables, but in real datasets variables are rarely isolated — many are influenced by common underlying factors. Partial correlation measures the association between two variables $X$ and $Y$ after removing the influence of one or more control variables $Z$. It answers: Is the relationship between $X$ and $Y$ real, or is it merely a reflection of both being related to $Z$? This is a crucial step in distinguishing genuine associations from spurious ones, and a natural bridge toward multiple regression (Chapter 10).

8.2 Theory

8.2.1 First-Order Partial Correlation

The first-order partial correlation between $X$ and $Y$ controlling for $Z$ is:

\[r_{XY \cdot Z} = \frac{r_{XY} - r_{XZ} \cdot r_{YZ}}{\sqrt{(1-r_{XZ}^2)(1-r_{YZ}^2)}}\]

where $r_{XY}$, $r_{XZ}$, $r_{YZ}$ are the zero-order (simple) Pearson correlations.

Interpretation: $r_{XY \cdot Z}$ is the correlation between the residuals of regressing $X$ on $Z$ and the residuals of regressing $Y$ on $Z$ — it is the “pure” relationship between $X$ and $Y$ once the linear effect of $Z$ is removed from both.

8.2.2 Significance Testing

The partial correlation is tested using a t-distribution with $n - 2 - k$ degrees of freedom, where $k$ is the number of control variables:

\[t = \frac{r_{XY \cdot Z}\sqrt{n-2-k}}{\sqrt{1-r_{XY \cdot Z}^2}} \sim t(n-2-k)\]

8.2.3 Partial vs. Semi-Partial Correlation

A related concept is the semi-partial (part) correlation $r_{X(Y \cdot Z)}$: the correlation between $X$ and the part of $Y$ that is unique (not shared with $Z$). It measures the unique contribution of $Y$ to $X$, which is central to multiple regression (it is the square root of the $\Delta R^2$ when $Y$ is added to a regression already containing $Z$).

8.2.4 Spurious Correlation

A spurious correlation occurs when $X$ and $Y$ are correlated only because both are caused by a third variable $Z$. After partialling out $Z$, the partial correlation drops to (near) zero. Classic example: ice cream sales and drowning rates are positively correlated — both are caused by hot weather (the confounder). The partial correlation controlling for temperature would be near zero.

8.3 Example: Partial Correlation

Example 4.7. In the mtcars dataset, mpg and hp are strongly negatively correlated ($r = -0.78$). But both are also related to car weight (wt): heavier cars tend to have more horsepower and worse fuel economy. Is the mpg-hp correlation “real” or partly a reflection of their shared relationship with wt?

$r_{\text{mpg,hp}} = -0.78$
$r_{\text{mpg,wt}} = -0.87$
$r_{\text{hp,wt}} = +0.66$

Partial correlation controlling for wt:

\[r_{\text{mpg,hp} \cdot \text{wt}} = \frac{-0.78 - (-0.87)(0.66)}{\sqrt{(1-0.87^2)(1-0.66^2)}} = \frac{-0.78 + 0.574}{\sqrt{0.243 \times 0.564}} = \frac{-0.206}{0.370} \approx -0.557\]

The partial correlation ($-0.56$) is notably weaker than the zero-order correlation ($-0.78$), suggesting that part of the mpg-hp relationship is mediated through vehicle weight. However, a meaningful negative relationship remains even after controlling for weight.

8.4 R Example: Partial Correlation

# --- Partial correlations in mtcars ---
data(mtcars)

# Zero-order correlations (for context)
cat("=== Zero-Order Correlations ===\n")

=== Zero-Order Correlations ===

cor_mat <- cor(mtcars[, c("mpg","hp","wt","disp")])
round(cor_mat, 3)

        mpg     hp     wt   disp
mpg   1.000 -0.776 -0.868 -0.848
hp   -0.776  1.000  0.659  0.791
wt   -0.868  0.659  1.000  0.888
disp -0.848  0.791  0.888  1.000

# --- Partial correlation: mpg ~ hp controlling for wt ---
library(ppcor)

# First-order partial correlation (controlling for one variable)
partial_result <- pcor.test(mtcars$mpg, mtcars$hp, mtcars$wt)

cat("=== Partial Correlation: mpg & hp | wt ===\n")

=== Partial Correlation: mpg & hp | wt ===

cat("Zero-order r(mpg, hp):     ",
    round(cor(mtcars$mpg, mtcars$hp), 4), "\n")

Zero-order r(mpg, hp):      -0.7762

cat("Partial r(mpg, hp | wt):   ",
    round(partial_result$estimate, 4), "\n")

Partial r(mpg, hp | wt):    -0.547

cat("t-statistic:               ",
    round(partial_result$statistic, 3), "\n")

t-statistic:                -3.519

cat("df:                        ", partial_result$n - 2 - 1, "\n")

df:                         29

cat("p-value:                   ",
    round(partial_result$p.value, 4), "\n\n")

p-value:                    0.0015

# Full partial correlation matrix (controlling for wt)
pcor_matrix <- pcor(mtcars[, c("mpg","hp","disp","qsec")],
                    method = "pearson")

cat("=== Partial Correlation Matrix (all pairs, no controls) ===\n")

=== Partial Correlation Matrix (all pairs, no controls) ===

round(pcor_matrix$estimate, 3)

        mpg     hp   disp   qsec
mpg   1.000 -0.342 -0.568 -0.154
hp   -0.342  1.000  0.393 -0.668
disp -0.568  0.393  1.000  0.151
qsec -0.154 -0.668  0.151  1.000

# --- Compare zero-order vs partial correlation visually ---
# Compute residuals of mpg and hp after removing wt's influence
resid_mpg <- residuals(lm(mpg ~ wt, data = mtcars))
resid_hp  <- residuals(lm(hp  ~ wt, data = mtcars))

p1 <- ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "steelblue", size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", color = "tomato",
              se = FALSE, linewidth = 1) +
  annotate("text", x = 280, y = 31,
           label = paste0("r = ",
                           round(cor(mtcars$hp, mtcars$mpg), 3)),
           color = "tomato", fontface = "bold", size = 4) +
  labs(title    = "A. Zero-Order Correlation",
       subtitle = "mpg vs. hp (no control)",
       x = "Horsepower (hp)", y = "MPG") +
  theme_minimal(base_size = 12)

p2 <- ggplot(data.frame(res_hp  = resid_hp,
                         res_mpg = resid_mpg),
             aes(x = res_hp, y = res_mpg)) +
  geom_point(color = "seagreen", size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", color = "tomato",
              se = FALSE, linewidth = 1) +
  annotate("text", x = 100, y = 4,
           label = paste0("r = ",
                           round(cor(resid_hp, resid_mpg), 3)),
           color = "tomato", fontface = "bold", size = 4) +
  labs(title    = "B. Partial Correlation",
       subtitle = "mpg vs. hp (after removing wt influence)",
       x = "hp Residual (after wt)",
       y = "mpg Residual (after wt)") +
  theme_minimal(base_size = 12)

p1 + p2

Code explanation:

pcor.test(x, y, z) from ppcor computes the first-order partial correlation between x and y controlling for z, with a significance test.
pcor(data_frame) computes the full partial correlation matrix (each pair controlled for all others) — analogous to cor() for zero-order correlations.
Panels A and B illustrate the concept geometrically: Panel B shows the scatterplot of residuals — the pure association between mpg and hp after both have been “de-weighted” by wt.

Try modifying: Control for both wt and disp simultaneously using pcor.test(mpg, hp, cbind(wt, disp)). Does the partial correlation weaken further?

8.5 Exercises

Exercise 4.9

Using the iris dataset, Sepal.Length and Petal.Length are strongly correlated ($r \approx 0.87$). Species is a potential confounder.

Compute the zero-order Pearson correlation between Sepal.Length and Petal.Length.
Encode Species as a numeric variable (1, 2, 3) and compute the partial correlation controlling for it using pcor.test().
Does the partial correlation differ substantially from the zero-order correlation? What does this suggest?
Visualize both zero-order and partial relationships as scatterplots (color points by species in panel A).

Exercise 4.10 (Challenge)

Using mtcars, construct a full analysis comparing zero-order and partial correlations for the variable pair (wt, mpg), controlling for cyl (number of cylinders).

Compute zero-order $r$(wt, mpg) and partial $r$(wt, mpg | cyl).
Do the correlations differ? What does this tell us about the role of engine size?
Produce a 2-panel residual scatterplot similar to the chapter example.
Write a 150-word research summary reporting both correlations in APA format.

9 Chapter Lab Activity: Exploring Independence with the `titanic` and `mtcars` Datasets

9.1 Objectives

In this lab you will apply the full spectrum of independence tests to real data — choosing the correct test based on variable types, checking assumptions, reporting effect sizes, and visualizing associations. You will work with the built-in Titanic dataset (categorical variables) and mtcars (mixed variable types).

9.2 The Datasets

# --- Titanic dataset (built-in) ---
titanic_df <- as.data.frame(Titanic)
titanic_expanded <- titanic_df[rep(seq_len(nrow(titanic_df)),
                                    titanic_df$Freq), -5]
rownames(titanic_expanded) <- NULL

kable(head(titanic_expanded, 10),
      caption = "Titanic Dataset (first 10 rows)") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                font_size = 11)

Titanic Dataset (first 10 rows)
Class	Sex	Age	Survived
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No
3rd	Male	Child	No

cat("Total passengers:", nrow(titanic_expanded), "\n")

Total passengers: 2201

cat("Variables:", names(titanic_expanded), "\n")

Variables: Class Sex Age Survived

9.3 Lab Task 1: Survival × Class — Chi-Square Test

# Contingency table: Class × Survived
class_surv <- table(titanic_expanded$Class,
                     titanic_expanded$Survived)

kable(addmargins(class_surv),
      caption = "Passenger Class × Survival Status") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Passenger Class × Survival Status
	No	Yes	Sum
1st	122	203	325
2nd	167	118	285
3rd	528	178	706
Crew	673	212	885
Sum	1490	711	2201

# Chi-square test
chi_titanic <- chisq.test(class_surv, correct = FALSE)
cat("\nChi-Square Test Results:\n")


Chi-Square Test Results:

cat("χ²(", chi_titanic$parameter, ") =",
    round(chi_titanic$statistic, 2), "\n")

χ²( 3 ) = 190.4

cat("p-value:", round(chi_titanic$p.value, 6), "\n")

p-value: 0

cat("All expected ≥ 5:", all(chi_titanic$expected >= 5), "\n")

All expected ≥ 5: TRUE

# Cramér's V
N_t     <- sum(class_surv)
min_dim <- min(nrow(class_surv)-1, ncol(class_surv)-1)
v_t     <- sqrt(chi_titanic$statistic / (N_t * min_dim))
cat("Cramér's V:", round(v_t, 4), "\n")

Cramér's V: 0.2941

9.4 Lab Task 2: Survival × Sex — Fisher’s Exact Test

# Sex × Survived (2×2 table)
sex_surv <- table(titanic_expanded$Sex,
                   titanic_expanded$Survived)

kable(addmargins(sex_surv),
      caption = "Sex × Survival Status") |>
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE)

Sex × Survival Status
	No	Yes	Sum
Male	1364	367	1731
Female	126	344	470
Sum	1490	711	2201

# Both chi-square and Fisher's for comparison
chi_sex    <- chisq.test(sex_surv, correct = FALSE)
fisher_sex <- fisher.test(sex_surv)

phi_sex <- sqrt(chi_sex$statistic / sum(sex_surv))

cat("Chi-Square: χ²(1) =", round(chi_sex$statistic, 2),
    ", p =", round(chi_sex$p.value, 6), "\n")

Chi-Square: χ²(1) = 456.87 , p = 0

cat("Fisher's Exact: p =", round(fisher_sex$p.value, 6), "\n")

Fisher's Exact: p = 0

cat("Odds Ratio:", round(fisher_sex$estimate, 3),
    "  95% CI: [",
    round(fisher_sex$conf.int[1], 3), ",",
    round(fisher_sex$conf.int[2], 3), "]\n")

Odds Ratio: 10.132   95% CI: [ 7.977 , 12.929 ]

cat("Phi coefficient:", round(phi_sex, 4), "\n")

Phi coefficient: 0.4556

9.5 Lab Task 3: Correlation Analysis — mtcars

# Full correlation matrix with significance tests
data(mtcars)
cont_vars <- mtcars[, c("mpg","hp","wt","disp","qsec")]

# Correlation matrix
cor_mat  <- cor(cont_vars)
#p_mat    <- cor_pmat(cont_vars)   # p-value matrix from ggcorrplot
ggcorrplot(cor_mat,
           method    = "circle",
           type      = "upper",
           lab       = TRUE,
           sig.level = 0.05,
           lab_size  = 3.5,
           insig     = "blank",
           colors    = c("tomato","white","steelblue"),
           title     = "Correlation Matrix: mtcars\n(blanked = not significant at α=0.05)",
           ggtheme   = theme_minimal(base_size = 11))

9.6 Lab Task 4: Partial Correlation — Controlling for Weight

# Compare zero-order vs partial correlations
pairs_to_test <- list(
  c("mpg", "hp"),
  c("mpg", "disp"),
  c("hp",  "disp")
)

partial_summary <- map_dfr(pairs_to_test, function(pair) {
  x <- mtcars[[pair[1]]]
  y <- mtcars[[pair[2]]]
  z <- mtcars$wt

  r0  <- cor(x, y)
  res <- pcor.test(x, y, z)

  data.frame(
    Pair           = paste(pair[1], "~", pair[2]),
    Zero_order_r   = round(r0, 4),
    Partial_r_wt   = round(res$estimate, 4),
    p_value        = round(res$p.value, 4),
    Change         = round(res$estimate - r0, 4)
  )
})

kable(partial_summary,
      caption   = "Zero-Order vs. Partial Correlations (controlling for wt)",
      col.names = c("Variable Pair", "Zero-Order r",
                    "Partial r | wt", "p-value",
                    "Change in r")) |>
  kable_styling(bootstrap_options = c("striped","hover")) |>
  column_spec(5, bold = TRUE,
              color = ifelse(abs(partial_summary$Change) > 0.1,
                             "tomato", "gray40"))

Zero-Order vs. Partial Correlations (controlling for wt)
Variable Pair	Zero-Order r	Partial r \| wt	p-value	Change in r
mpg ~ hp	-0.7762	-0.5470	0.0015	0.2292
mpg ~ disp	-0.8476	-0.3372	0.0636	0.5104
hp ~ disp	0.7909	0.5954	0.0004	-0.1956

9.7 Lab Task 5: Comprehensive Visualization

# Panel A: Survival rates by class (Titanic)
p1 <- as.data.frame(class_surv) |>
  rename(Class = Var1, Survived = Var2) |>
  group_by(Class) |>
  mutate(pct = Freq / sum(Freq)) |>
  ggplot(aes(x = Class, y = pct, fill = Survived)) +
  geom_col(color = "white") +
  scale_fill_manual(values = c("No" = "tomato",
                                "Yes" = "steelblue")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "A. Survival Rate by Passenger Class",
       subtitle = paste0("Cramér's V = ", round(v_t, 3)),
       x = "Class", y = "Proportion") +
  theme_minimal(base_size = 11)

# Panel B: Survival by sex
p2 <- as.data.frame(sex_surv) |>
  rename(Sex = Var1, Survived = Var2) |>
  group_by(Sex) |>
  mutate(pct = Freq / sum(Freq)) |>
  ggplot(aes(x = Sex, y = pct, fill = Survived)) +
  geom_col(color = "white") +
  scale_fill_manual(values = c("No" = "tomato",
                                "Yes" = "steelblue")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "B. Survival Rate by Sex",
       subtitle = paste0("φ = ", round(phi_sex, 3),
                         " | OR = ",
                         round(fisher_sex$estimate, 2)),
       x = "Sex", y = "Proportion") +
  theme_minimal(base_size = 11)

# Panel C: Standardized residuals heatmap (Class × Survived)
stdres_df <- as.data.frame(chi_titanic$stdres) |>
  rename(Class = Var1, Survived = Var2, Residual = Freq)

p3 <- ggplot(stdres_df,
             aes(x = Survived, y = Class, fill = Residual)) +
  geom_tile(color = "white", linewidth = 1.2) +
  geom_text(aes(label = round(Residual, 2)),
            fontface = "bold", size = 4.5,
            color = ifelse(abs(stdres_df$Residual) > 2,
                           "white", "black")) +
  scale_fill_gradient2(low  = "steelblue", mid = "white",
                       high = "tomato",    midpoint = 0) +
  labs(title = "C. Standardized Residuals",
       subtitle = "Class × Survived",
       x = "Survived", y = "Class") +
  theme_minimal(base_size = 11)

# Panel D: Partial vs zero-order (hp vs mpg)
resid_mpg <- residuals(lm(mpg ~ wt, data = mtcars))
resid_hp  <- residuals(lm(hp  ~ wt, data = mtcars))

p4 <- ggplot(data.frame(res_hp  = resid_hp,
                         res_mpg = resid_mpg),
             aes(x = res_hp, y = res_mpg)) +
  geom_point(color = "seagreen", size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", color = "tomato",
              se = FALSE, linewidth = 1) +
  labs(title    = "D. Partial Correlation: mpg & hp | wt",
       subtitle = paste0("Partial r = ",
                          round(pcor.test(mtcars$mpg,
                                          mtcars$hp,
                                          mtcars$wt)$estimate,3)),
       x = "hp Residual", y = "mpg Residual") +
  theme_minimal(base_size = 11)

(p1 + p2) / (p3 + p4)

9.8 Lab Discussion Questions

Answer the following in writing (100–150 words each):

Test Selection: In Lab Task 2, both chi-square and Fisher’s exact test give nearly identical p-values. Why? Does the very large sample size ($N = 2201$) affect which test is preferable?
Effect Size vs. Significance: The chi-square test for Class × Survival is highly significant ($p < 0.001$). Interpret Cramér’s V = 0.294. Is this a practically important association? How would you communicate this to a non-statistician?
Odds Ratio Interpretation: Fisher’s test for Sex × Survival gives an odds ratio. Interpret this in plain language: what does the odds ratio tell us about the relationship between being female and surviving?
Partial Correlation: After controlling for weight (wt), the correlation between mpg and hp weakens. Does this mean weight fully “explains” the mpg-hp relationship? What other variables might also act as confounders?
Spurious Correlation: Can you identify a potential spurious correlation in the mtcars dataset? Propose which third variable might be the confounder and describe how you would test this using partial correlation.

10 Chapter Summary

This chapter built a comprehensive toolkit for testing independence and measuring association across all combinations of variable types:

The concept of independence — two variables are independent if knowing one provides no information about the other; this must be tested formally, not assumed.
Chi-square test — the standard test for two categorical variables; based on comparing observed to expected frequencies in a contingency table.
Fisher’s Exact Test — the correct alternative when expected cell frequencies are small; computes exact probabilities rather than relying on asymptotic approximations.
Cramér’s V — standardizes chi-square to a 0–1 scale, enabling interpretation of association strength independently of sample size and table dimensions.
Pearson and Spearman correlation tests — test whether linear (Pearson) or monotone (Spearman) relationships between continuous or ordinal variables are significantly different from zero.
Point-biserial and phi coefficients — extend the correlation framework to binary variables, maintaining mathematical consistency with Pearson’s $r$.
Partial correlation — isolates the “pure” relationship between two variables after removing the influence of one or more control variables, helping distinguish real associations from spurious ones.

Key Formulas to Know

Chi-Square Statistic: \[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad E_{ij} = \frac{R_i C_j}{N}\]

Cramér’s V: \[V = \sqrt{\frac{\chi^2/N}{\min(r-1,\, c-1)}}\]

Pearson Correlation t-test: \[t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2)\]

First-Order Partial Correlation: \[r_{XY \cdot Z} = \frac{r_{XY} - r_{XZ} \cdot r_{YZ}}{\sqrt{(1-r_{XZ}^2)(1-r_{YZ}^2)}}\]

Phi Coefficient (2×2): \[\phi = \frac{ad - bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}\]

End of Chapter 4. Proceed to Chapter 5: Data Sampling Techniques.

Situation	Recommendation
Any cell has \(E_{ij} < 5\)	Use Fisher’s
Total \(N < 20\)	Use Fisher’s
2×2 table, any sample size	Fisher’s is always valid (some prefer it over chi-square)
Large \(r \times c\) table, large \(N\)	Chi-square is adequate

1 Chapter Overview

2 The Concept of Independence

2.1 Introduction

2.2 Theory

2.2.1 Independence for Categorical Variables

2.2.2 Independence for Continuous Variables

2.2.3 A Taxonomy of Association Measures

2.3 Example: Identifying the Correct Test

2.4 R Example: Exploring Variable Relationships

2.5 Exercises

3 Chi-Square Test of Independence

3.1 Introduction

3.2 Theory

3.2.1 The Test Statistic

3.2.2 Assumptions

3.2.3 Yates’ Continuity Correction

3.2.4 Standardized Residuals

3.3 Example: Chi-Square Test

3.4 R Example: Chi-Square Test of Independence

3.5 Exercises

4 Fisher’s Exact Test

4.1 Introduction

4.2 Theory

4.2.1 The Hypergeometric Distribution

4.2.2 Two-Sided Fisher’s Test

4.2.3 When to Use Fisher’s Exact Test

4.3 Example: Fisher’s Exact Test

4.4 R Example: Fisher’s Exact Test

4.5 Exercises

5 Cramér’s V and Effect Size for Categorical Association

5.1 Introduction

5.2 Theory

5.2.1 Cramér’s V

5.2.2 The Phi Coefficient

5.2.3 Goodman and Kruskal’s Lambda

5.3 Example: Cramér’s V

5.4 R Example: Cramér’s V and Association Measures

5.5 Exercises

6 Correlation Tests

6.1 Introduction

6.2 Theory

6.2.1 Pearson Correlation Test

6.2.2 Spearman Rank Correlation Test

6.2.3 Interpreting Correlation

6.3 Example: Pearson and Spearman Correlation Tests

6.4 R Example: Pearson and Spearman Correlation Tests

6.5 Exercises

7 Point-Biserial and Phi Coefficients

7.1 Introduction

7.2 Theory

7.2.1 Point-Biserial Correlation (\(r_{pb}\))

7.2.2 Phi Coefficient (\(\phi\))

7.2.3 Biserial vs. Point-Biserial

7.3 Example: Point-Biserial Correlation

7.4 R Example: Point-Biserial and Phi Coefficients

7.5 Exercises

8 Partial Correlation

8.1 Introduction

8.2 Theory

8.2.1 First-Order Partial Correlation

8.2.2 Significance Testing

8.2.3 Partial vs. Semi-Partial Correlation

8.2.4 Spurious Correlation

8.3 Example: Partial Correlation

8.4 R Example: Partial Correlation

8.5 Exercises

9 Chapter Lab Activity: Exploring Independence with the titanic and mtcars Datasets

9.1 Objectives

9.2 The Datasets

9.3 Lab Task 1: Survival × Class — Chi-Square Test

9.4 Lab Task 2: Survival × Sex — Fisher’s Exact Test

9.5 Lab Task 3: Correlation Analysis — mtcars

9.6 Lab Task 4: Partial Correlation — Controlling for Weight

9.7 Lab Task 5: Comprehensive Visualization

9.8 Lab Discussion Questions

10 Chapter Summary

9 Chapter Lab Activity: Exploring Independence with the `titanic` and `mtcars` Datasets