To determine whether it’s a good idea to play the game, we have to calculate the expected net gain per game. If it is less than 0, we’ll assume that playing this game is a bad idea.
Expected net gain \(E[X]\) is given by:
\[ E[X] = \sum_{k=0}^{6} P(k) \times (\text{payoff}_k - 1) \]
Where:
\[ P(k) = \binom{6}{k} \left( \frac{1}{6} \right)^k \left( \frac{5}{6} \right)^{6 - k} \]
# Range of possible sixes
# and probabilities of rolling k sixes
k <- 0:6
probabilities <- dbinom(k, size = 6, prob = 1/6)
payoffs <- c(0, 1, 1.5, 2, 2.5, 3, 3.5)
# Subtracting the cost to play
net_gains <- payoffs - 1
expected_net_gain <- sum(probabilities * net_gains)
cat("Expected net gain per game:", expected_net_gain, "CHF\n")
## Expected net gain per game: -0.167449 CHF
Answer: I will NOT be playing this dice game, especially on a PhD student’s stipend.
#2A:Metropolis Algorithm Function
Determining the bias of a coin flipped 20 times with 14 heads. No prior information about the coin, so beta distribution with a=1, b=1.
# Metropolis algorithm function
metropolis_algo <- function(theta_init, n_iter, burn_in, n, z, a, b, proposal_sd) {
theta_current <- theta_init
samples <- numeric(n_iter)
for (i in 1:n_iter) {
theta_proposed <- rnorm(1, mean = theta_current, sd = proposal_sd)
# Reflect proposed theta if outside [0, 1]
if (theta_proposed < 0) theta_proposed <- abs(theta_proposed)
if (theta_proposed > 1) theta_proposed <- 2 - theta_proposed
if (theta_proposed < 0) theta_proposed <- runif(1, 0, 1)
# Acceptance ratio
log_p_current <- dbeta(theta_current, a + z, b + n - z, log = TRUE)
log_p_proposed <- dbeta(theta_proposed, a + z, b + n - z, log = TRUE)
log_accept_ratio <- log_p_proposed - log_p_current
# Accept or reject
if (log(runif(1)) < log_accept_ratio) {
theta_current <- theta_proposed
}
samples[i] <- theta_current
}
samples[(burn_in + 1):n_iter]
}
# Parameters
n_iter <- 5000
burn_in <- 2500
n <- 20
z <- 14
a <- 1
b <- 1
proposal_sd <- 0.2
# Chains
set.seed(123)
chain1 <- metropolis_algo(0.1, n_iter, burn_in, n, z, a, b, proposal_sd)
chain2 <- metropolis_algo(0.9, n_iter, burn_in, n, z, a, b, proposal_sd)
chain3 <- metropolis_algo(0.5, n_iter, burn_in, n, z, a, b, proposal_sd)
#2B: Plotting chains (convergence assessment)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
iterations <- 1:length(chain1)
data_chains <- data.frame(
Iteration = rep(iterations, 3),
Theta = c(chain1, chain2, chain3),
Chain = factor(rep(1:3, each = length(chain1)))
)
# Trace plots
ggplot(data_chains, aes(x = Iteration, y = Theta, color = Chain)) +
geom_line(alpha = 0.7) +
labs(title = "Trace Plots of the Three Chains",
x = "Iteration",
y = expression(theta),
color = "Chain") +
theme_minimal()
library(ggplot2)
iterations <- 1:length(chain1)
data_chains <- data.frame(
Iteration = rep(iterations, 3),
Theta = c(chain1, chain2, chain3),
Chain = factor(rep(1:3, each = length(chain1)))
)
# Trace plots
ggplot(data_chains, aes(x = Iteration, y = Theta, color = Chain)) +
geom_line(alpha = 0.7) +
labs(title = "Trace Plots of the Three Chains",
x = "Iteration",
y = expression(theta),
color = "Chain") +
theme_minimal()
Intepretation: Upon visual inspection, chains appear to have
converged!
# Density plots
ggplot(data_chains, aes(x = Theta, fill = Chain)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plots of the Three Chains",
x = expression(theta),
y = "Density",
fill = "Chain") +
theme_minimal()
#2C: Concatenating chains and assessing fairness
# Concatenate chains
combined_samples <- c(chain1, chain2, chain3)
# Summary statistics
mean_theta <- mean(combined_samples)
credible_interval <- quantile(combined_samples, probs = c(0.025, 0.975))
prob_theta_less_0.5 <- mean(combined_samples < 0.5)
cat("Mean of theta:", mean_theta, "\n")
## Mean of theta: 0.6779609
cat("95% Credible Interval:", credible_interval, "\n")
## 95% Credible Interval: 0.4600251 0.8497673
cat("Probability that theta < 0.5:", prob_theta_less_0.5, "\n")
## Probability that theta < 0.5: 0.05026667
# Posterior distribution plot
ggplot(data.frame(Theta = combined_samples), aes(x = Theta)) +
geom_density(fill = "skyblue", alpha = 0.5) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "red") +
labs(title = "Posterior Distribution of Theta",
x = expression(theta),
y = "Density") +
theme_minimal()
The posterior distribution suggests the coin is biased towards heads. The 95% credible interval does not include 0.5, and the probability that 𝜃< 0.5
θ<0.5 is very low.
##2D: Modeling with JAGS
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.1
## Loaded modules: basemod,bugs
library(coda) # Not familiar with this package but it was recommended on stack overflow for MCMC diagnostics and manipulation, n
# Define JAGS model as a string
model_string <- "
model {
# Likelihood
z ~ dbin(theta, n)
# Prior
theta ~ dbeta(a, b)
}
"
# Data to pass to JAGS
data_jags <- list(
z = z,
n = n,
a = a,
b = b
)
inits <- function() {
list(theta = runif(1))
}
# Run JAGS model
jags_model <- jags.model(
textConnection(model_string),
data = data_jags,
inits = inits,
n.chains = 3,
n.adapt = 1000
)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 1
## Total graph size: 5
##
## Initializing model
# Burn-in
update(jags_model, n.iter = 2500)
# Amass samples from the posterior
samples_jags <- coda.samples(
model = jags_model,
variable.names = "theta",
n.iter = 2500
)
samples_matrix <- as.matrix(samples_jags)
theta_samples_jags <- samples_matrix[, "theta"]
# Compute statistics from JAGS samples
mean_theta_jags <- mean(theta_samples_jags)
credible_interval_jags <- quantile(theta_samples_jags, probs = c(0.025, 0.975))
prob_theta_less_0.5_jags <- mean(theta_samples_jags < 0.5)
# Output results
cat("Comparison of Results:\n")
## Comparison of Results:
cat("Metropolis - Mean of theta:", mean_theta, "\n")
## Metropolis - Mean of theta: 0.6779609
cat("JAGS - Mean of theta:", mean_theta_jags, "\n\n")
## JAGS - Mean of theta: 0.6806679
cat("Metropolis - 95% Credible Interval:", credible_interval, "\n")
## Metropolis - 95% Credible Interval: 0.4600251 0.8497673
cat("JAGS - 95% Credible Interval:", credible_interval_jags, "\n\n")
## JAGS - 95% Credible Interval: 0.4753564 0.8537872
cat("Metropolis - Probability theta < 0.5:", prob_theta_less_0.5, "\n")
## Metropolis - Probability theta < 0.5: 0.05026667
cat("JAGS - Probability theta < 0.5:", prob_theta_less_0.5_jags, "\n")
## JAGS - Probability theta < 0.5: 0.04
# Combine samples for plotting
df_metropolis <- data.frame(Theta = combined_samples, Method = "Metropolis")
df_jags <- data.frame(Theta = theta_samples_jags, Method = "JAGS")
df_combined <- rbind(df_metropolis, df_jags)
ggplot(df_combined, aes(x = Theta, fill = Method)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "red") +
labs(
title = "Comparison of Posterior Distributions",
x = expression(theta),
y = "Density",
fill = "Method"
) +
theme_minimal()
We’ll use JAGS to estimate both the mean (mu) and standard deviation (sigma) of IQ test scores, assuming the data are normally distributed. ### a. Using the Inferred Mean to Test Whether the Expected IQ is Greater Than 100
library(runjags)
library(coda)
library(ggplot2)
# HDIofMCMC function, also taken from stackoverflow. Did this before remembering we had this in our class materials.
HDIofMCMC <- function(sampleVec, credMass = 0.95) {
sortedPts <- sort(sampleVec)
ciIdxInc <- floor(credMass * length(sortedPts))
nCIs <- length(sortedPts) - ciIdxInc
ciWidth <- rep(0, nCIs)
for (i in 1:nCIs) {
ciWidth[i] <- sortedPts[i + ciIdxInc] - sortedPts[i]
}
HDImin <- sortedPts[which.min(ciWidth)]
HDImax <- sortedPts[which.min(ciWidth) + ciIdxInc]
HDIlim <- c(HDImin, HDImax)
return(HDIlim)
}
model_string <- "
model {
for (i in 1:Ntotal) {
y[i] ~ dnorm(mu, tau)
}
mu ~ dnorm(0, 1.0E-6)
tau <- pow(sigma, -2)
sigma ~ dunif(1.0E-3, 1.0E+3)
}
"
myDataFrame <- read.csv(file = "TwoGroupIQ.csv")
IQscore <- myDataFrame$Score[myDataFrame$Group == "Smart Drug"]
# Histogram of the data
hist(IQscore, main = "Histogram of IQ Scores (Smart Drug Group)", xlab = "IQ Score")
y <- IQscore
Ntotal <- length(y)
data_jags <- list(y = y, Ntotal = Ntotal)
# Initialize chains
inits1 <- list(mu = 100, sigma = 30)
inits2 <- list(mu = 120, sigma = 20)
inits3 <- list(mu = 80, sigma = 40)
inits_list <- list(inits1, inits2, inits3)
# Monitoring
params <- c("mu", "sigma")
# Running
jags_model <- jags.model(
textConnection(model_string),
data = data_jags,
inits = inits_list,
n.chains = 3,
n.adapt = 1000
)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 63
## Unobserved stochastic nodes: 2
## Total graph size: 73
##
## Initializing model
# Burn-in
update(jags_model, n.iter = 4000)
# Sample from the posterior
samples <- coda.samples(
model = jags_model,
variable.names = params,
n.iter = 1000,
thin = 4
)
plot(samples)
# Combine chains into one matrix
chains <- as.mcmc(do.call(rbind, samples))
# Extracting mu and sigma samples
mu_samples <- chains[, "mu"]
sigma_samples <- chains[, "sigma"]
# Plot the posterior distribution of mu
plot(density(mu_samples), main = "Posterior Distribution of mu", xlab = expression(mu))
# Add vertical lines for median and HDI
mu_median <- median(mu_samples)
mu_hdi <- HDIofMCMC(mu_samples, credMass = 0.95)
abline(v = mu_median, col = "red", lwd = 2)
abline(v = mu_hdi[1], lty = 2, col = "blue", lwd = 2)
abline(v = mu_hdi[2], lty = 2, col = "blue", lwd = 2)
# Add a vertical line at mu = 100
abline(v = 100, col = "grey", lwd = 2)
Conclusion:
# Display the HDI and probability
cat("95% HDI for mu:", mu_hdi, "\n")
## 95% HDI for mu: 101.8406 115.171
prob_mu_greater_100 <- mean(mu_samples > 100)
cat("Probability that mu > 100:", prob_mu_greater_100, "\n")
## Probability that mu > 100: 0.9866667
Interpretation: Since the 95% HDI for mu is entirely above 100 and the probability that mu > 100 is high (close to 1), we can conclude that the expected IQ score is greater than 100.
# Calculate Cohen's d
d_samples <- (mu_samples - 100) / sigma_samples
# Plot the posterior distribution of d
plot(density(d_samples), main = "Posterior Distribution of Cohen's d", xlab = "Cohen's d")
# Add vertical lines for median and HDI
d_median <- median(d_samples)
d_hdi <- HDIofMCMC(d_samples, credMass = 0.95)
abline(v = d_median, col = "red", lwd = 2)
abline(v = d_hdi[1], lty = 2, col = "blue", lwd = 2)
abline(v = d_hdi[2], lty = 2, col = "blue", lwd = 2)
# Vertical line at d = 0 for good measure
abline(v = 0, col = "grey", lwd = 2)
# Display the HDI and probability
cat("95% HDI for Cohen's d:", d_hdi, "\n")
## 95% HDI for Cohen's d: 0.07237863 0.5858195
prob_d_greater_0 <- mean(d_samples > 0)
cat("Probability that d > 0:", prob_d_greater_0, "\n")
## Probability that d > 0: 0.9866667
Interpretation: The 95% HDI for Cohen’s d is entirely above 0, and the probability that d > 0 is high (close to 1), indicating that the effect size is significantly larger than 0.
Small effect size: d = 0.2 Medium effect size: d = 0.5 Large effect size: d = 0.8
cat("Median of Cohen's d:", d_median, "\n")
## Median of Cohen's d: 0.3024798
Conclusion: With a median d value of approximately 0.3, the effect size is considered small to medium according to Cohen’s standards.
##Takeaways
#Question 4 - Model comparison:
load("~/BayesWorkshop/Homework/dataFit.RData")
# Plot the data for preliminary consideration (spoiler alert, it is obviously quadratic)
plot(x, y, main = "Scatter Plot of y vs. x", xlab = "x", ylab = "y", pch = 19)
# Prepare for JAGS
data_jags <- list(
x = x,
y = y,
N = length(y)
)
# Initial values for the chains
# Initializing values for each model separately
inits_null <- function() {
list(
a = rnorm(1, 0, 10),
sigma = runif(1, 0, 100)
)
}
inits_linear <- function() {
list(
a = rnorm(1, 0, 10),
b = rnorm(1, 0, 10),
sigma = runif(1, 0, 100)
)
}
inits_quadratic <- function() {
list(
a = rnorm(1, 0, 10),
b = rnorm(1, 0, 10),
c = rnorm(1, 0, 10),
sigma = runif(1, 0, 100)
)
}
# Parameters to monitor for JAGS
params_linear <- c("a", "b", "sigma")
params_quadratic <- c("a", "b", "c", "sigma")
params_null <- c("a", "sigma")
String format model definitions:
model_linear <- "
model {
for (i in 1:N) {
y[i] ~ dnorm(mu[i], tau)
mu[i] <- a + b * x[i]
}
a ~ dnorm(0, 1.0E-6)
b ~ dnorm(0, 1.0E-6)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
"
model_quadratic <- "
model {
for (i in 1:N) {
y[i] ~ dnorm(mu[i], tau)
mu[i] <- a + b * x[i] + c * pow(x[i], 2)
}
a ~ dnorm(0, 1.0E-6)
b ~ dnorm(0, 1.0E-6)
c ~ dnorm(0, 1.0E-6)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
"
model_null <- "
model {
for (i in 1:N) {
y[i] ~ dnorm(a, tau)
}
a ~ dnorm(0, 1.0E-6)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
"
library(rjags)
library(coda)
library(loo)
## Warning: package 'loo' was built under R version 4.4.2
## This is loo version 2.8.0
## - Online documentation and vignettes at mc-stan.org/loo
## - As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session.
## - Windows 10 users: loo may be very slow if 'mc.cores' is set in your .Rprofile file (see https://github.com/stan-dev/loo/issues/94).
library(ggplot2)
##Linear Model:
# Initialize
jags_linear <- jags.model(
textConnection(model_linear),
data = data_jags,
inits = inits_linear,
n.chains = 3,
n.adapt = 1000
)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 50
## Unobserved stochastic nodes: 3
## Total graph size: 210
##
## Initializing model
# Burn-in
update(jags_linear, n.iter = 5000)
# Sample posterior
samples_linear <- coda.samples(
model = jags_linear,
variable.names = params_linear,
n.iter = 10000
)
##Quadratic:
# Initialize the model
jags_quadratic <- jags.model(
textConnection(model_quadratic),
data = data_jags,
inits = inits_quadratic,
n.chains = 3,
n.adapt = 1000
)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 50
## Unobserved stochastic nodes: 4
## Total graph size: 311
##
## Initializing model
# Burn-in
update(jags_quadratic, n.iter = 5000)
# Sample from the posterior
samples_quadratic <- coda.samples(
model = jags_quadratic,
variable.names = params_quadratic,
n.iter = 10000
)
# Initialize the model
jags_null <- jags.model(
textConnection(model_null),
data = data_jags,
inits = inits_null,
n.chains = 3,
n.adapt = 1000
)
## Warning in jags.model(textConnection(model_null), data = data_jags, inits =
## inits_null, : Unused variable "x" in data
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 50
## Unobserved stochastic nodes: 2
## Total graph size: 59
##
## Initializing model
# Burn-in
update(jags_null, n.iter = 5000)
# Sample from the posterior
samples_null <- coda.samples(
model = jags_null,
variable.names = params_null,
n.iter = 10000
)
##Compute posterior Predictive means:
# Extract posterior samples
samples_linear_mat <- as.matrix(samples_linear)
# Compute posterior means
x_seq <- seq(min(x), max(x), length.out = 100)
mu_linear <- apply(samples_linear_mat, 1, function(s) {
s["a"] + s["b"] * x_seq
})
mu_linear_mean <- apply(mu_linear, 1, mean)
# Extract posterior samples
samples_quadratic_mat <- as.matrix(samples_quadratic)
# Compute posterior predictive means
mu_quadratic <- apply(samples_quadratic_mat, 1, function(s) {
s["a"] + s["b"] * x_seq + s["c"] * x_seq^2
})
mu_quadratic_mean <- apply(mu_quadratic, 1, mean)
# Extract posterior samples
samples_null_mat <- as.matrix(samples_null)
# Compute posterior predictive mean
a_null_mean <- mean(samples_null_mat[, "a"])
mu_null_mean <- rep(a_null_mean, length(x_seq))
##Plotting data and predictions:
plot(x, y, main = "Data with Model Predictions", xlab = "x", ylab = "y", pch = 19)
# Add predictions to plot
lines(x_seq, mu_linear_mean, col = "blue", lwd = 2, lty = 2)
lines(x_seq, mu_quadratic_mean, col = "red", lwd = 2)
lines(x_seq, mu_null_mean, col = "green", lwd = 2, lty = 3)
legend("topleft", legend = c("Linear Model", "Quadratic Model", "Null Model"),
col = c("blue", "red", "green"), lty = c(2, 1, 3), lwd = 2)
The quadratic model seems to fit the data better than the linear and null models (which was apparent from the scatterplot)
##Model evaluation:
###Deviance Information criterion:
# Function to extract DIC from JAGS model, taken from stack overflow
extract_dic <- function(jags_model) {
dic.samples(jags_model, n.iter = 10000, type = "pD")
}
# DIC for Linear Model
dic_linear <- extract_dic(jags_linear)
dic_linear_value <- sum(dic_linear$deviance) + sum(dic_linear$penalty)
# DIC for Quadratic Model
dic_quadratic <- extract_dic(jags_quadratic)
dic_quadratic_value <- sum(dic_quadratic$deviance) + sum(dic_quadratic$penalty)
# DIC for Null Model
dic_null <- extract_dic(jags_null)
dic_null_value <- sum(dic_null$deviance) + sum(dic_null$penalty)
# Display DIC values
dic_values <- data.frame(
Model = c("Null", "Linear", "Quadratic"),
DIC = c(dic_null_value, dic_linear_value, dic_quadratic_value)
)
print(dic_values)
## Model DIC
## 1 Null 442.4947
## 2 Linear 417.0615
## 3 Quadratic 311.5617
###Leave one out cross validation:
# Compute log-likelihoods
compute_log_lik <- function(samples_mat, model_type) {
n_iter <- nrow(samples_mat)
log_lik_matrix <- matrix(NA, nrow = n_iter, ncol = data_jags$N)
if (model_type == "linear") {
for (i in 1:n_iter) {
mu_i <- samples_mat[i, "a"] + samples_mat[i, "b"] * data_jags$x
sigma_i <- samples_mat[i, "sigma"]
log_lik_matrix[i, ] <- dnorm(data_jags$y, mean = mu_i, sd = sigma_i, log = TRUE)
}
} else if (model_type == "quadratic") {
for (i in 1:n_iter) {
mu_i <- samples_mat[i, "a"] + samples_mat[i, "b"] * data_jags$x + samples_mat[i, "c"] * data_jags$x^2
sigma_i <- samples_mat[i, "sigma"]
log_lik_matrix[i, ] <- dnorm(data_jags$y, mean = mu_i, sd = sigma_i, log = TRUE)
}
} else if (model_type == "null") {
for (i in 1:n_iter) {
mu_i <- samples_mat[i, "a"]
sigma_i <- samples_mat[i, "sigma"]
log_lik_matrix[i, ] <- dnorm(data_jags$y, mean = mu_i, sd = sigma_i, log = TRUE)
}
}
return(log_lik_matrix)
}
# Compute log-likelihoods for each model
log_lik_linear <- compute_log_lik(samples_linear_mat, "linear")
log_lik_quadratic <- compute_log_lik(samples_quadratic_mat, "quadratic")
log_lik_null <- compute_log_lik(samples_null_mat, "null")
###Compute Leave One out for each model using log-likelihoods:
# Compute LOO
loo_linear <- loo(log_lik_linear)
loo_quadratic <- loo(log_lik_quadratic)
loo_null <- loo(log_lik_null)
# Display LOO values
loo_values <- data.frame(
Model = c("Null", "Linear", "Quadratic"),
LOOIC = c(loo_null$estimates["looic", "Estimate"],
loo_linear$estimates["looic", "Estimate"],
loo_quadratic$estimates["looic", "Estimate"])
)
print(loo_values)
## Model LOOIC
## 1 Null 442.1603
## 2 Linear 416.9274
## 3 Quadratic 312.2105
Discussion:
From the DIC values:
Quadratic Model has the lowest DIC, indicating the best fit after penalizing for model complexity. The Linear Model has a higher DIC than the Quadratic Model but lower than the Null Model. The Null Model has the highest DIC, indicating the worst fit among the three.
LOO Values:
Quadratic Model again has the lowest LOOIC, suggesting the best predictive performance. The Linear Model has a higher LOOIC than the Quadratic Model but lower than the Null Model. The Null Model has the highest LOOIC.
Conclusion:
Both DIC and LOO metrics agree that the Quadratic Model provides the best balance between model fit and complexity. The Quadratic Model captures the curvature observed in the data, which the Linear and Null models fail to account for. Therefore, the Quadratic Model is the preferred model for explaining the relationship between x and y.