1 Preface

This textbook-style R Markdown note introduces Bayesian statistics from the beginning. It is written for students who have basic knowledge of descriptive statistics and probability but are new to Bayesian thinking.

The aim is to teach Bayesian statistics using three connected layers:

  1. Conceptual understanding: What does Bayesian thinking mean?
  2. Mathematical notation: How do we write Bayesian models formally?
  3. R implementation: How do we simulate, visualize, estimate, and interpret Bayesian models?

This note is suitable for first-year master’s students in statistics, data science, epidemiology, business analytics, and applied research methods.

2 Learning Outcomes

After completing this note, students should be able to:

3 Chapter 1: Why Bayesian Statistics?

3.1 1.1 Motivation

Statistics is about learning from data. In many real-world problems, we do not know the true value of a parameter. For example:

  • What proportion of voters support a political party?
  • What is the probability that a patient has a disease?
  • What is the average effect of a new teaching method?
  • What is the future number of disease cases?
  • What is the probability that a machine will fail?

In classical or frequentist statistics, unknown parameters are usually treated as fixed but unknown quantities. In Bayesian statistics, unknown parameters are treated as uncertain quantities, and this uncertainty is described using probability distributions.

3.2 1.2 Frequentist and Bayesian Views

Let \(\theta\) be an unknown parameter.

In the frequentist approach:

\[ \theta = \text{fixed but unknown} \]

In the Bayesian approach:

\[ \theta \sim p(\theta) \]

This means \(\theta\) has a probability distribution that represents our uncertainty.

3.3 1.3 A Simple Everyday Example

Suppose you want to know whether it will rain today.

Before looking outside, you may think:

\[ P(\text{Rain}) = 0.20 \]

Then you see dark clouds. Your belief changes:

\[ P(\text{Rain} \mid \text{Dark clouds}) > 0.20 \]

This is Bayesian thinking: update your belief when new evidence arrives.

3.4 1.4 Bayesian Learning

Bayesian learning can be summarized as:

\[ \text{Posterior belief} = \text{Prior belief} + \text{Information from data} \]

More formally:

\[ \text{Posterior} \propto \text{Likelihood} \times \text{Prior} \]

4 Chapter 2: Probability Refresher

4.1 2.1 What is Probability?

Probability measures uncertainty. If \(A\) is an event, then:

\[ 0 \leq P(A) \leq 1 \]

where:

  • \(P(A)=0\) means impossible.
  • \(P(A)=1\) means certain.

4.2 2.2 Sample Space

The sample space is the set of all possible outcomes.

Example: rolling a fair die.

\[ S = \{1,2,3,4,5,6\} \]

The probability of rolling a six is:

\[ P(6)=\frac{1}{6} \]

4.3 2.3 Complement Rule

If \(A^c\) is the complement of event \(A\), then:

\[ P(A^c)=1-P(A) \]

Example:

If \(P(\text{Rain})=0.30\), then:

\[ P(\text{No rain})=1-0.30=0.70 \]

4.4 2.4 Addition Rule

For two events \(A\) and \(B\):

\[ P(A \cup B)=P(A)+P(B)-P(A\cap B) \]

If \(A\) and \(B\) are mutually exclusive:

\[ P(A \cup B)=P(A)+P(B) \]

4.5 2.5 Conditional Probability

Conditional probability is the probability of one event given another event has occurred.

\[ P(A \mid B)=\frac{P(A\cap B)}{P(B)}, \quad P(B)>0 \]

Example:

\[ P(\text{Pass} \mid \text{Attend class}) \]

means probability of passing given the student attended class.

4.6 2.6 R Example: Conditional Probability

We create a small dataset of students.

students <- data.frame(
  Student_ID = 1:20,
  Attendance = c("High", "High", "Low", "High", "Low", "High", "High", "Low", "High", "Low",
                 "High", "Low", "High", "High", "Low", "High", "Low", "High", "High", "Low"),
  Result = c("Pass", "Pass", "Fail", "Pass", "Fail", "Pass", "Pass", "Fail", "Pass", "Fail",
             "Pass", "Fail", "Pass", "Pass", "Fail", "Pass", "Fail", "Pass", "Pass", "Fail")
)

kable(students)
Student_ID Attendance Result
1 High Pass
2 High Pass
3 Low Fail
4 High Pass
5 Low Fail
6 High Pass
7 High Pass
8 Low Fail
9 High Pass
10 Low Fail
11 High Pass
12 Low Fail
13 High Pass
14 High Pass
15 Low Fail
16 High Pass
17 Low Fail
18 High Pass
19 High Pass
20 Low Fail

Now calculate:

\[ P(\text{Pass} \mid \text{High Attendance}) \]

high_attendance <- students %>% filter(Attendance == "High")
prob_pass_given_high <- mean(high_attendance$Result == "Pass")
prob_pass_given_high
## [1] 1

Interpretation: Among students with high attendance, the estimated probability of passing is 1.

5 Chapter 3: Bayes’ Theorem

5.1 3.1 Bayes’ Theorem

Bayes’ theorem is the foundation of Bayesian statistics.

\[ P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)} \]

where:

  • \(P(A)\) is the prior probability.
  • \(P(B \mid A)\) is the likelihood.
  • \(P(B)\) is the evidence or marginal probability.
  • \(P(A \mid B)\) is the posterior probability.

5.2 3.2 Bayesian Formula for Parameters

In statistics, we usually write Bayes’ theorem as:

\[ p(\theta \mid y)=\frac{p(y \mid \theta)p(\theta)}{p(y)} \]

where:

  • \(\theta\) is the unknown parameter.
  • \(y\) is the observed data.
  • \(p(\theta)\) is the prior.
  • \(p(y \mid \theta)\) is the likelihood.
  • \(p(\theta \mid y)\) is the posterior.

Since \(p(y)\) is often a normalizing constant, we commonly write:

\[ p(\theta \mid y) \propto p(y \mid \theta)p(\theta) \]

5.3 3.3 Medical Testing Example

Suppose a disease is rare.

  • Disease prevalence: \(P(D)=0.01\)
  • Test sensitivity: \(P(+ \mid D)=0.95\)
  • False positive rate: \(P(+ \mid D^c)=0.05\)

We want:

\[ P(D \mid +) \]

Using Bayes’ theorem:

\[ P(D \mid +)=\frac{P(+ \mid D)P(D)}{P(+ \mid D)P(D)+P(+ \mid D^c)P(D^c)} \]

5.4 3.4 R Calculation

p_disease <- 0.01
sensitivity <- 0.95
false_positive <- 0.05
p_no_disease <- 1 - p_disease

posterior_disease <- (sensitivity * p_disease) /
  (sensitivity * p_disease + false_positive * p_no_disease)

posterior_disease
## [1] 0.1610169

The probability of actually having the disease after a positive test is approximately:

round(posterior_disease * 100, 2)
## [1] 16.1

percent.

This is much lower than 95% because the disease is rare.

5.5 3.5 Simulation of Medical Testing

set.seed(123)
n <- 100000

disease <- rbinom(n, size = 1, prob = p_disease)

test_positive <- ifelse(
  disease == 1,
  rbinom(n, size = 1, prob = sensitivity),
  rbinom(n, size = 1, prob = false_positive)
)

medical_df <- data.frame(
  disease = factor(disease, labels = c("No disease", "Disease")),
  test = factor(test_positive, labels = c("Negative", "Positive"))
)

posterior_simulated <- mean(medical_df$disease == "Disease" & medical_df$test == "Positive") /
  mean(medical_df$test == "Positive")

posterior_simulated
## [1] 0.1619473

5.6 3.6 Graph: Test Results by Disease Status

ggplot(medical_df, aes(x = test, fill = disease)) +
  geom_bar(position = "fill") +
  labs(
    title = "Medical Test Result by Disease Status",
    x = "Test Result",
    y = "Proportion",
    fill = "True Status"
  ) +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal()

6 Chapter 4: Prior Distributions

6.1 4.1 What is a Prior?

A prior distribution represents what we believe about a parameter before observing the data.

If \(\theta\) is a parameter, the prior is:

\[ p(\theta) \]

Examples:

\[ \theta \sim Uniform(0,1) \]

\[ \theta \sim Beta(\alpha,\beta) \]

\[ \theta \sim Normal(\mu,\sigma^2) \]

6.2 4.2 Uniform Prior

If we know very little about a probability parameter \(p\), we may use:

\[ p \sim Uniform(0,1) \]

This gives equal prior weight to all values between 0 and 1.

p_grid <- seq(0, 1, length.out = 1000)
uniform_df <- data.frame(p = p_grid, density = dunif(p_grid, 0, 1))

ggplot(uniform_df, aes(x = p, y = density)) +
  geom_line(linewidth = 1) +
  labs(title = "Uniform Prior Distribution", x = "p", y = "Density") +
  theme_minimal()

6.3 4.3 Beta Distribution

The Beta distribution is very important in Bayesian statistics because it is used as a prior for probabilities.

\[ p \sim Beta(\alpha,\beta) \]

The density is:

\[ f(p)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}p^{\alpha-1}(1-p)^{\beta-1}, \quad 0<p<1 \]

where \(\alpha>0\) and \(\beta>0\).

6.4 4.4 Shapes of Beta Priors

beta_df <- data.frame(p = rep(p_grid, 4),
                      density = c(dbeta(p_grid, 1, 1),
                                  dbeta(p_grid, 2, 2),
                                  dbeta(p_grid, 8, 2),
                                  dbeta(p_grid, 2, 8)),
                      Prior = rep(c("Beta(1,1): Uniform",
                                    "Beta(2,2): Balanced",
                                    "Beta(8,2): High values likely",
                                    "Beta(2,8): Low values likely"), each = length(p_grid)))

ggplot(beta_df, aes(x = p, y = density, linetype = Prior)) +
  geom_line(linewidth = 1) +
  labs(title = "Different Beta Prior Distributions", x = "Probability p", y = "Density") +
  theme_minimal()

6.5 4.5 Interpretation of Beta Parameters

For \(p \sim Beta(\alpha,\beta)\):

\[ E(p)=\frac{\alpha}{\alpha+\beta} \]

\[ Var(p)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \]

beta_summary <- data.frame(
  Prior = c("Beta(1,1)", "Beta(2,2)", "Beta(8,2)", "Beta(2,8)", "Beta(20,20)"),
  alpha = c(1, 2, 8, 2, 20),
  beta = c(1, 2, 2, 8, 20)
)

beta_summary <- beta_summary %>%
  mutate(
    Mean = alpha / (alpha + beta),
    Variance = alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1))
  )

kable(beta_summary, digits = 4)
Prior alpha beta Mean Variance
Beta(1,1) 1 1 0.5 0.0833
Beta(2,2) 2 2 0.5 0.0500
Beta(8,2) 8 2 0.8 0.0145
Beta(2,8) 2 8 0.2 0.0145
Beta(20,20) 20 20 0.5 0.0061

7 Chapter 5: Likelihood

7.1 5.1 What is Likelihood?

The likelihood measures how well different parameter values explain the observed data.

Suppose:

\[ X \sim Binomial(n,p) \]

The probability mass function is:

\[ P(X=x \mid p)=\binom{n}{x}p^x(1-p)^{n-x} \]

As a likelihood function of \(p\), we write:

\[ L(p) \propto p^x(1-p)^{n-x} \]

7.2 5.2 Coin Toss Example

Suppose we toss a coin 10 times and observe 7 heads.

\[ n=10, \quad x=7 \]

The likelihood is:

\[ L(p) \propto p^7(1-p)^3 \]

x <- 7
n <- 10
p_grid <- seq(0.001, 0.999, length.out = 1000)
likelihood <- dbinom(x, size = n, prob = p_grid)
likelihood_df <- data.frame(p = p_grid, likelihood = likelihood)

ggplot(likelihood_df, aes(x = p, y = likelihood)) +
  geom_line(linewidth = 1) +
  labs(title = "Likelihood Curve for 7 Heads in 10 Tosses", x = "p", y = "Likelihood") +
  theme_minimal()

The likelihood is highest near:

\[ \hat{p}=\frac{x}{n}=\frac{7}{10}=0.70 \]

8 Chapter 6: Posterior Distribution

8.1 6.1 Prior Ă— Likelihood = Posterior

The most important Bayesian relationship is:

\[ p(p \mid x) \propto p(x \mid p)p(p) \]

For the Beta-Binomial model:

Prior:

\[ p \sim Beta(\alpha,\beta) \]

Likelihood:

\[ X \sim Binomial(n,p) \]

Posterior:

\[ p \mid x \sim Beta(\alpha+x,\beta+n-x) \]

8.2 6.2 Example: Beta-Binomial Posterior

Suppose:

\[ p \sim Beta(2,2) \]

and we observe:

\[ x=7, \quad n=10 \]

Then:

\[ p \mid x \sim Beta(2+7,2+10-7)=Beta(9,5) \]

alpha_prior <- 2
beta_prior <- 2
x <- 7
n <- 10

alpha_post <- alpha_prior + x
beta_post <- beta_prior + n - x

posterior_df <- data.frame(
  p = rep(p_grid, 2),
  density = c(dbeta(p_grid, alpha_prior, beta_prior),
              dbeta(p_grid, alpha_post, beta_post)),
  Distribution = rep(c("Prior: Beta(2,2)", "Posterior: Beta(9,5)"), each = length(p_grid))
)

ggplot(posterior_df, aes(x = p, y = density, linetype = Distribution)) +
  geom_line(linewidth = 1) +
  labs(title = "Prior and Posterior Distribution", x = "p", y = "Density") +
  theme_minimal()

8.3 6.3 Prior, Likelihood, and Posterior Together

For visualization, we rescale likelihood to compare shapes.

prior_density <- dbeta(p_grid, alpha_prior, beta_prior)
likelihood_scaled <- likelihood / max(likelihood) * max(prior_density)
posterior_density <- dbeta(p_grid, alpha_post, beta_post)

plp_df <- data.frame(
  p = rep(p_grid, 3),
  value = c(prior_density, likelihood_scaled, posterior_density),
  Component = rep(c("Prior", "Likelihood (scaled)", "Posterior"), each = length(p_grid))
)

ggplot(plp_df, aes(x = p, y = value, linetype = Component)) +
  geom_line(linewidth = 1) +
  labs(title = "Prior, Likelihood and Posterior", x = "p", y = "Value") +
  theme_minimal()

9 Chapter 7: Bayesian Estimation

9.1 7.1 Posterior Mean

For:

\[ p \mid x \sim Beta(\alpha+x,\beta+n-x) \]

The posterior mean is:

\[ E(p \mid x)=\frac{\alpha+x}{\alpha+\beta+n} \]

9.2 7.2 Posterior Variance

\[ Var(p \mid x)=\frac{(\alpha+x)(\beta+n-x)}{(\alpha+\beta+n)^2(\alpha+\beta+n+1)} \]

9.3 7.3 Posterior Mode

If \(\alpha+x>1\) and \(\beta+n-x>1\), the posterior mode is:

\[ \frac{\alpha+x-1}{\alpha+\beta+n-2} \]

9.4 7.4 R Posterior Summary

posterior_mean <- alpha_post / (alpha_post + beta_post)
posterior_variance <- alpha_post * beta_post / ((alpha_post + beta_post)^2 * (alpha_post + beta_post + 1))
posterior_sd <- sqrt(posterior_variance)
posterior_median <- qbeta(0.5, alpha_post, beta_post)
posterior_mode <- (alpha_post - 1) / (alpha_post + beta_post - 2)
credible_interval <- qbeta(c(0.025, 0.975), alpha_post, beta_post)

summary_table <- data.frame(
  Statistic = c("Posterior Mean", "Posterior Median", "Posterior Mode", "Posterior SD", "2.5%", "97.5%"),
  Value = c(posterior_mean, posterior_median, posterior_mode, posterior_sd, credible_interval[1], credible_interval[2])
)

kable(summary_table, digits = 4)
Statistic Value
Posterior Mean 0.6429
Posterior Median 0.6498
Posterior Mode 0.6667
Posterior SD 0.1237
2.5% 0.3857
97.5% 0.8614

10 Chapter 8: Credible Intervals

10.1 8.1 What is a Credible Interval?

A 95% credible interval is an interval \([a,b]\) such that:

\[ P(a < p < b \mid x)=0.95 \]

This means that, given the data and prior, there is 95% posterior probability that \(p\) lies between \(a\) and \(b\).

10.2 8.2 R Calculation

ci_lower <- credible_interval[1]
ci_upper <- credible_interval[2]

credible_df <- data.frame(p = p_grid, density = dbeta(p_grid, alpha_post, beta_post))
credible_df$inside <- credible_df$p >= ci_lower & credible_df$p <= ci_upper

ggplot(credible_df, aes(x = p, y = density)) +
  geom_line(linewidth = 1) +
  geom_area(data = subset(credible_df, inside), aes(x = p, y = density), alpha = 0.4) +
  geom_vline(xintercept = c(ci_lower, ci_upper), linetype = "dashed") +
  labs(title = "95% Bayesian Credible Interval", x = "p", y = "Posterior Density") +
  theme_minimal()

10.3 8.3 Credible Interval vs Confidence Interval

A Bayesian credible interval allows a direct probability statement about the parameter:

\[ P(a<p<b \mid x)=0.95 \]

A frequentist confidence interval does not say there is 95% probability that the fixed parameter lies in a specific observed interval. Instead, it describes the long-run performance of the procedure.

11 Chapter 9: Posterior Simulation

11.1 9.1 Why Simulate?

Many Bayesian summaries can be calculated by drawing random samples from the posterior distribution.

If:

\[ p \mid x \sim Beta(9,5) \]

we can simulate:

\[ p^{(1)},p^{(2)},...,p^{(S)} \sim Beta(9,5) \]

11.2 9.2 R Posterior Simulation

set.seed(123)
S <- 10000
posterior_samples <- rbeta(S, alpha_post, beta_post)

simulation_summary <- data.frame(
  Statistic = c("Mean", "Median", "SD", "2.5%", "97.5%"),
  Value = c(mean(posterior_samples), median(posterior_samples), sd(posterior_samples),
            quantile(posterior_samples, 0.025), quantile(posterior_samples, 0.975))
)

kable(simulation_summary, digits = 4)
Statistic Value
Mean 0.6429
Median 0.6505
SD 0.1244
2.5% 0.3869
97.5% 0.8649

11.3 9.3 Histogram of Posterior Samples

posterior_samples_df <- data.frame(p = posterior_samples)

ggplot(posterior_samples_df, aes(x = p)) +
  geom_histogram(bins = 40, alpha = 0.7) +
  geom_vline(xintercept = mean(posterior_samples), linetype = "dashed") +
  labs(title = "Posterior Samples from Beta(9,5)", x = "p", y = "Frequency") +
  theme_minimal()

12 Chapter 10: Posterior Predictive Distribution

12.1 10.1 Prediction in Bayesian Statistics

Bayesian statistics naturally supports prediction. The posterior predictive distribution is:

\[ p(y_{new}\mid y)=\int p(y_{new}\mid \theta)p(\theta\mid y)d\theta \]

This combines parameter uncertainty and future observation uncertainty.

12.2 10.2 Predicting the Next Coin Toss

If \(p\) is the probability of heads, then:

\[ Y_{new}\mid p \sim Bernoulli(p) \]

We simulate:

  1. Draw \(p\) from posterior.
  2. Draw future outcome from Bernoulli\((p)\).
set.seed(123)
future_heads <- rbinom(S, size = 1, prob = posterior_samples)
mean_future_heads <- mean(future_heads)
mean_future_heads
## [1] 0.647

The posterior predictive probability of heads in the next toss is approximately 0.647.

12.3 10.3 Predicting Future Number of Heads

Suppose we want to predict the number of heads in 20 future tosses.

set.seed(123)
future_20_heads <- rbinom(S, size = 20, prob = posterior_samples)
future_df <- data.frame(heads = future_20_heads)

ggplot(future_df, aes(x = heads)) +
  geom_bar(alpha = 0.7) +
  labs(title = "Posterior Predictive Distribution: Heads in 20 Future Tosses", x = "Number of Heads", y = "Frequency") +
  theme_minimal()

future_summary <- data.frame(
  Statistic = c("Mean predicted heads", "Median predicted heads", "2.5%", "97.5%"),
  Value = c(mean(future_20_heads), median(future_20_heads),
            quantile(future_20_heads, 0.025), quantile(future_20_heads, 0.975))
)

kable(future_summary, digits = 2)
Statistic Value
Mean predicted heads 12.87
Median predicted heads 13.00
2.5% 6.00
97.5% 19.00

13 Chapter 11: Prior Sensitivity Analysis

13.1 11.1 Why Prior Sensitivity Matters

Bayesian results depend on both the prior and the data. It is good practice to check whether conclusions change under different reasonable priors.

13.2 11.2 Comparing Different Priors

Suppose the observed data remain:

\[ x=7, \quad n=10 \]

We compare four priors:

\[ Beta(1,1), Beta(2,2), Beta(10,10), Beta(20,5) \]

priors <- data.frame(
  Prior = c("Beta(1,1)", "Beta(2,2)", "Beta(10,10)", "Beta(20,5)"),
  alpha = c(1, 2, 10, 20),
  beta = c(1, 2, 10, 5)
)

sensitivity_results <- priors %>%
  mutate(
    alpha_post = alpha + x,
    beta_post = beta + n - x,
    posterior_mean = alpha_post / (alpha_post + beta_post),
    lower_95 = qbeta(0.025, alpha_post, beta_post),
    upper_95 = qbeta(0.975, alpha_post, beta_post)
  )

kable(sensitivity_results, digits = 4)
Prior alpha beta alpha_post beta_post posterior_mean lower_95 upper_95
Beta(1,1) 1 1 8 4 0.6667 0.3903 0.8907
Beta(2,2) 2 2 9 5 0.6429 0.3857 0.8614
Beta(10,10) 10 10 17 13 0.5667 0.3894 0.7355
Beta(20,5) 20 5 27 8 0.7714 0.6210 0.8925

13.3 11.3 Graph of Posterior Under Different Priors

sens_plot_df <- data.frame()

for (i in 1:nrow(priors)) {
  temp <- data.frame(
    p = p_grid,
    density = dbeta(p_grid, priors$alpha[i] + x, priors$beta[i] + n - x),
    Prior = priors$Prior[i]
  )
  sens_plot_df <- rbind(sens_plot_df, temp)
}

ggplot(sens_plot_df, aes(x = p, y = density, linetype = Prior)) +
  geom_line(linewidth = 1) +
  labs(title = "Posterior Distributions Under Different Priors", x = "p", y = "Density") +
  theme_minimal()

14 Chapter 12: Bayesian Estimation of a Normal Mean

14.1 12.1 Problem Setting

Suppose we observe continuous data:

\[ y_1,y_2,...,y_n \]

Assume:

\[ y_i \sim Normal(\mu,\sigma^2) \]

For simplicity, assume \(\sigma^2\) is known.

We place a Normal prior on \(\mu\):

\[ \mu \sim Normal(\mu_0,\tau_0^2) \]

14.2 12.2 Posterior Distribution

The posterior is also Normal:

\[ \mu \mid y \sim Normal(\mu_n,\tau_n^2) \]

where:

\[ \tau_n^2=\left(\frac{n}{\sigma^2}+\frac{1}{\tau_0^2}\right)^{-1} \]

\[ \mu_n=\tau_n^2\left(\frac{n\bar{y}}{\sigma^2}+\frac{\mu_0}{\tau_0^2}\right) \]

14.3 12.3 Simulated Dataset

set.seed(123)
n_normal <- 50
true_mu <- 70
sigma_known <- 10

y <- rnorm(n_normal, mean = true_mu, sd = sigma_known)
normal_df <- data.frame(score = y)

kable(head(normal_df, 10), digits = 2)
score
64.40
67.70
85.59
70.71
71.29
87.15
74.61
57.35
63.13
65.54

14.4 12.4 Data Visualization

ggplot(normal_df, aes(x = score)) +
  geom_histogram(bins = 15, alpha = 0.7) +
  geom_vline(xintercept = mean(y), linetype = "dashed") +
  labs(title = "Simulated Student Scores", x = "Score", y = "Frequency") +
  theme_minimal()

14.5 12.5 Bayesian Calculation in R

mu0 <- 65
tau0 <- 15

sample_mean <- mean(y)
posterior_variance_mu <- 1 / (n_normal / sigma_known^2 + 1 / tau0^2)
posterior_mean_mu <- posterior_variance_mu * (n_normal * sample_mean / sigma_known^2 + mu0 / tau0^2)
posterior_sd_mu <- sqrt(posterior_variance_mu)

normal_posterior_summary <- data.frame(
  Statistic = c("Sample mean", "Prior mean", "Posterior mean", "Posterior SD", "95% lower", "95% upper"),
  Value = c(sample_mean, mu0, posterior_mean_mu, posterior_sd_mu,
            posterior_mean_mu - 1.96 * posterior_sd_mu,
            posterior_mean_mu + 1.96 * posterior_sd_mu)
)

kable(normal_posterior_summary, digits = 3)
Statistic Value
Sample mean 70.344
Prior mean 65.000
Posterior mean 70.297
Posterior SD 1.408
95% lower 67.537
95% upper 73.057

14.6 12.6 Prior and Posterior Graph

mu_grid <- seq(55, 85, length.out = 1000)
normal_pp_df <- data.frame(
  mu = rep(mu_grid, 2),
  density = c(dnorm(mu_grid, mean = mu0, sd = tau0),
              dnorm(mu_grid, mean = posterior_mean_mu, sd = posterior_sd_mu)),
  Distribution = rep(c("Prior", "Posterior"), each = length(mu_grid))
)

ggplot(normal_pp_df, aes(x = mu, y = density, linetype = Distribution)) +
  geom_line(linewidth = 1) +
  labs(title = "Prior and Posterior for Normal Mean", x = expression(mu), y = "Density") +
  theme_minimal()

15 Chapter 13: Bayesian Linear Regression

15.1 13.1 Regression Model

In linear regression, we model a continuous outcome \(Y\) using predictor \(X\):

\[ Y_i=\beta_0+\beta_1X_i+\epsilon_i \]

where:

\[ \epsilon_i \sim Normal(0,\sigma^2) \]

In Bayesian regression, we assign priors:

\[ \beta_0 \sim Normal(0,100^2) \]

\[ \beta_1 \sim Normal(0,100^2) \]

\[ \sigma \sim HalfNormal(0,10) \]

15.2 13.2 Simulated Regression Dataset

set.seed(123)
n_reg <- 150
study_hours <- runif(n_reg, min = 0, max = 10)
exam_score <- 50 + 4 * study_hours + rnorm(n_reg, mean = 0, sd = 8)

reg_df <- data.frame(
  study_hours = study_hours,
  exam_score = exam_score
)

kable(head(reg_df, 10), digits = 2)
study_hours exam_score
2.88 69.71
7.88 79.25
4.09 56.59
8.83 86.77
9.40 86.51
0.46 51.87
5.28 74.21
8.92 82.73
5.51 77.21
4.57 66.50

15.3 13.3 Scatterplot

ggplot(reg_df, aes(x = study_hours, y = exam_score)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Relationship Between Study Hours and Exam Score", x = "Study Hours", y = "Exam Score") +
  theme_minimal()

15.4 13.4 Approximate Bayesian Regression by Simulation

For beginner-level teaching, we use a normal approximation around the classical regression estimates. This is not full MCMC, but it gives intuition about posterior uncertainty.

lm_fit <- lm(exam_score ~ study_hours, data = reg_df)
summary(lm_fit)
## 
## Call:
## lm(formula = exam_score ~ study_hours, data = reg_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.2890  -4.8644  -0.9938   4.7232  25.7358 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  49.6221     1.2653   39.22   <2e-16 ***
## study_hours   4.0644     0.2183   18.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.65 on 148 degrees of freedom
## Multiple R-squared:  0.7009, Adjusted R-squared:  0.6989 
## F-statistic: 346.8 on 1 and 148 DF,  p-value: < 2.2e-16
set.seed(123)
coef_est <- coef(lm_fit)
vcov_est <- vcov(lm_fit)

# Simulate approximate posterior samples using multivariate normal logic
# Cholesky decomposition without extra packages
S <- 5000
z <- matrix(rnorm(S * 2), ncol = 2)
L <- chol(vcov_est)
beta_samples <- sweep(z %*% L, 2, coef_est, "+")

beta_df <- data.frame(
  beta0 = beta_samples[, 1],
  beta1 = beta_samples[, 2]
)

beta_summary <- data.frame(
  Parameter = c("Intercept", "Slope"),
  Mean = c(mean(beta_df$beta0), mean(beta_df$beta1)),
  Lower_95 = c(quantile(beta_df$beta0, 0.025), quantile(beta_df$beta1, 0.025)),
  Upper_95 = c(quantile(beta_df$beta0, 0.975), quantile(beta_df$beta1, 0.975))
)

kable(beta_summary, digits = 3)
Parameter Mean Lower_95 Upper_95
Intercept 49.621 47.113 52.096
Slope 4.064 3.637 4.492

15.5 13.5 Posterior Distribution of Slope

ggplot(beta_df, aes(x = beta1)) +
  geom_histogram(bins = 40, alpha = 0.7) +
  geom_vline(xintercept = mean(beta_df$beta1), linetype = "dashed") +
  labs(title = "Approximate Posterior Distribution of Regression Slope", x = expression(beta[1]), y = "Frequency") +
  theme_minimal()

15.6 13.6 Posterior Prediction Lines

set.seed(123)
selected <- sample(1:S, 100)
x_seq <- seq(0, 10, length.out = 100)
line_df <- data.frame()

for (i in selected) {
  temp <- data.frame(
    study_hours = x_seq,
    exam_score = beta_df$beta0[i] + beta_df$beta1[i] * x_seq,
    draw = i
  )
  line_df <- rbind(line_df, temp)
}

ggplot(reg_df, aes(x = study_hours, y = exam_score)) +
  geom_point(alpha = 0.5) +
  geom_line(data = line_df, aes(x = study_hours, y = exam_score, group = draw), alpha = 0.15) +
  labs(title = "Posterior Regression Lines", x = "Study Hours", y = "Exam Score") +
  theme_minimal()

16 Chapter 14: Introduction to MCMC

16.1 14.1 Why Do We Need MCMC?

In simple Bayesian models, we can calculate the posterior directly. But in realistic models, the posterior may be too complicated.

Bayesian posterior:

\[ p(\theta\mid y)=\frac{p(y\mid \theta)p(\theta)}{p(y)} \]

The denominator is:

\[ p(y)=\int p(y\mid \theta)p(\theta)d\theta \]

This integral can be difficult or impossible to solve analytically.

MCMC methods allow us to generate samples from the posterior distribution without solving the integral directly.

16.2 14.2 Basic Idea of MCMC

MCMC creates a chain:

\[ \theta^{(1)},\theta^{(2)},... ,\theta^{(S)} \]

After convergence, these values behave like samples from the posterior distribution.

16.3 14.3 Simple Metropolis Algorithm Example

We estimate the posterior of \(p\) for a coin toss model using a random walk proposal.

log_posterior <- function(p, x, n, alpha, beta) {
  if (p <= 0 || p >= 1) return(-Inf)
  dbinom(x, size = n, prob = p, log = TRUE) + dbeta(p, alpha, beta, log = TRUE)
}

metropolis_coin <- function(iter = 10000, start = 0.5, proposal_sd = 0.05,
                            x = 7, n = 10, alpha = 2, beta = 2) {
  chain <- numeric(iter)
  chain[1] <- start
  accepted <- 0
  
  for (i in 2:iter) {
    current <- chain[i - 1]
    proposal <- rnorm(1, mean = current, sd = proposal_sd)
    
    log_ratio <- log_posterior(proposal, x, n, alpha, beta) -
      log_posterior(current, x, n, alpha, beta)
    
    if (log(runif(1)) < log_ratio) {
      chain[i] <- proposal
      accepted <- accepted + 1
    } else {
      chain[i] <- current
    }
  }
  
  list(chain = chain, acceptance_rate = accepted / (iter - 1))
}

set.seed(123)
mcmc_result <- metropolis_coin()
mcmc_result$acceptance_rate
## [1] 0.8736874

16.4 14.4 Trace Plot

mcmc_df <- data.frame(iteration = 1:length(mcmc_result$chain), p = mcmc_result$chain)

ggplot(mcmc_df, aes(x = iteration, y = p)) +
  geom_line(alpha = 0.8) +
  labs(title = "MCMC Trace Plot", x = "Iteration", y = "p") +
  theme_minimal()

16.5 14.5 MCMC Posterior Histogram

burnin <- 1000
mcmc_samples <- mcmc_result$chain[-(1:burnin)]

mcmc_samples_df <- data.frame(p = mcmc_samples)

ggplot(mcmc_samples_df, aes(x = p)) +
  geom_histogram(bins = 40, alpha = 0.7) +
  stat_function(fun = dbeta, args = list(shape1 = alpha_post, shape2 = beta_post), linewidth = 1) +
  labs(title = "MCMC Samples Compared with Exact Posterior", x = "p", y = "Frequency / Density") +
  theme_minimal()

17 Chapter 15: Bayesian Workflow

17.1 15.1 Full Bayesian Workflow

A standard Bayesian analysis usually follows these steps:

  1. Define the research question.
  2. Choose a probability model for the data.
  3. Choose prior distributions.
  4. Compute or simulate the posterior.
  5. Summarize posterior results.
  6. Check model fit.
  7. Conduct prior sensitivity analysis.
  8. Make predictions.
  9. Communicate uncertainty clearly.

17.2 15.2 Reporting Bayesian Results

A Bayesian result should report:

  • Prior distribution.
  • Likelihood/model.
  • Posterior mean or median.
  • 95% credible interval.
  • Posterior probability of meaningful hypotheses.
  • Sensitivity analysis.
  • Predictive checks where possible.

Example statement:

“Using a Beta(2,2) prior and observing 7 successes out of 10 trials, the posterior distribution was Beta(9,5). The posterior mean was 0.643, and the 95% credible interval was approximately [0.39, 0.86].”

18 Chapter 16: Complete Applied Case Study

18.1 16.1 Case Study: Student Success Probability

Suppose a lecturer wants to estimate the probability that students pass a statistics course after using a new interactive teaching method.

Observed data:

  • Number of students: \(n=80\)
  • Number passed: \(x=62\)

We estimate the pass probability \(p\).

18.2 16.2 Model

Likelihood:

\[ X \sim Binomial(n,p) \]

Prior:

\[ p \sim Beta(5,5) \]

Posterior:

\[ p \mid x \sim Beta(5+62,5+80-62)=Beta(67,23) \]

18.3 16.3 R Analysis

n_course <- 80
x_pass <- 62
alpha_course <- 5
beta_course <- 5

alpha_course_post <- alpha_course + x_pass
beta_course_post <- beta_course + n_course - x_pass

course_summary <- data.frame(
  Quantity = c("Observed pass rate", "Posterior mean", "Posterior median", "95% lower", "95% upper"),
  Value = c(x_pass / n_course,
            alpha_course_post / (alpha_course_post + beta_course_post),
            qbeta(0.5, alpha_course_post, beta_course_post),
            qbeta(0.025, alpha_course_post, beta_course_post),
            qbeta(0.975, alpha_course_post, beta_course_post))
)

kable(course_summary, digits = 4)
Quantity Value
Observed pass rate 0.7750
Posterior mean 0.7444
Posterior median 0.7463
95% lower 0.6500
95% upper 0.8286

18.4 16.4 Probability That Pass Rate Exceeds 70%

prob_above_70 <- 1 - pbeta(0.70, alpha_course_post, beta_course_post)
prob_above_70
## [1] 0.8341849

Interpretation: The posterior probability that the true pass probability exceeds 70% is 0.834.

18.5 16.5 Graph

course_df <- data.frame(
  p = p_grid,
  density = dbeta(p_grid, alpha_course_post, beta_course_post)
)

ggplot(course_df, aes(x = p, y = density)) +
  geom_line(linewidth = 1) +
  geom_vline(xintercept = 0.70, linetype = "dashed") +
  geom_area(data = subset(course_df, p > 0.70), aes(x = p, y = density), alpha = 0.4) +
  labs(title = "Posterior Probability That Pass Rate Exceeds 70%", x = "Pass Probability", y = "Posterior Density") +
  theme_minimal()

19 Chapter 17: Common Bayesian Terms

Term Meaning
Prior Belief before observing data
Likelihood Information from data
Posterior Updated belief after observing data
Evidence Marginal probability of data
Credible interval Bayesian uncertainty interval
Posterior mean Average of posterior distribution
Posterior predictive Distribution for future data
MCMC Simulation method for posterior sampling
Conjugate prior Prior producing posterior in same family

20 Chapter 18: Exercises

20.1 Exercise 1

A coin is tossed 20 times and gives 14 heads. Use a \(Beta(1,1)\) prior.

  1. Write the likelihood.
  2. Derive the posterior distribution.
  3. Compute posterior mean.
  4. Compute 95% credible interval using R.
  5. Plot the posterior distribution.

20.2 Exercise 2

Repeat Exercise 1 using \(Beta(10,10)\). Compare the posterior with Exercise 1.

20.3 Exercise 3

A disease affects 2% of a population. A test has sensitivity 90% and false positive rate 8%.

Calculate:

\[ P(Disease \mid Positive) \]

using R.

20.4 Exercise 4

Simulate a dataset of 100 students where the true pass probability is 0.75. Use a \(Beta(2,2)\) prior and estimate the posterior distribution.

20.5 Exercise 5

Generate a regression dataset where:

\[ Y=10+3X+\epsilon \]

where \(\epsilon \sim Normal(0,4^2)\). Fit a linear regression and simulate approximate posterior samples for the slope.

21 Chapter 19: Solutions to Selected Exercises

21.1 Solution to Exercise 1

x_ex1 <- 14
n_ex1 <- 20
alpha_ex1 <- 1
beta_ex1 <- 1

alpha_ex1_post <- alpha_ex1 + x_ex1
beta_ex1_post <- beta_ex1 + n_ex1 - x_ex1

ex1_summary <- data.frame(
  Statistic = c("Posterior distribution", "Posterior mean", "2.5%", "97.5%"),
  Value = c(paste0("Beta(", alpha_ex1_post, ",", beta_ex1_post, ")"),
            round(alpha_ex1_post / (alpha_ex1_post + beta_ex1_post), 4),
            round(qbeta(0.025, alpha_ex1_post, beta_ex1_post), 4),
            round(qbeta(0.975, alpha_ex1_post, beta_ex1_post), 4))
)

kable(ex1_summary)
Statistic Value
Posterior distribution Beta(15,7)
Posterior mean 0.6818
2.5% 0.4782
97.5% 0.8541
ex1_df <- data.frame(
  p = p_grid,
  density = dbeta(p_grid, alpha_ex1_post, beta_ex1_post)
)

ggplot(ex1_df, aes(x = p, y = density)) +
  geom_line(linewidth = 1) +
  labs(title = "Exercise 1 Posterior Distribution", x = "p", y = "Density") +
  theme_minimal()

22 Appendix A: Formula Sheet

22.1 Bayes’ Theorem

\[ P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)} \]

22.2 Bayesian Parameter Form

\[ p(\theta \mid y)=\frac{p(y \mid \theta)p(\theta)}{p(y)} \]

22.3 Proportional Posterior

\[ p(\theta \mid y) \propto p(y \mid \theta)p(\theta) \]

22.4 Binomial Likelihood

\[ P(X=x \mid p)=\binom{n}{x}p^x(1-p)^{n-x} \]

22.5 Beta Prior

\[ p \sim Beta(\alpha,\beta) \]

22.6 Beta-Binomial Posterior

\[ p \mid x \sim Beta(\alpha+x,\beta+n-x) \]

22.7 Beta Mean

\[ E(p)=\frac{\alpha}{\alpha+\beta} \]

22.8 Beta Variance

\[ Var(p)=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \]

22.9 Posterior Predictive Distribution

\[ p(y_{new}\mid y)=\int p(y_{new}\mid \theta)p(\theta\mid y)d\theta \]

24 Appendix C: Final Teaching Message

Bayesian statistics is not only a collection of formulas. It is a way of learning from evidence. The central idea is simple:

\[ \text{New belief} = \text{Old belief} + \text{Data evidence} \]

In mathematical form:

\[ \text{Posterior} \propto \text{Likelihood} \times \text{Prior} \]

Once this idea is clear, students can gradually move toward advanced topics such as hierarchical models, MCMC, Bayesian regression, Bayesian disease modelling, Bayesian machine learning, and decision theory.