1. We start by generating some data from a logistic regression model. Recall that a logistic regression model is used to model binary responses as a function of a set of covariates and is widely used in practice. Examples include biostatistics, risk modelling, and so on. To generate data from the logistic model, we use the following steps. We will use synthetic sample sizes of N = 10,50,100.
(a) Select the intercept term and the true values of the regression coefficients. We will start by choosing (\(β_0\),\(β_1\),\(β_2\)) = (0.1,1.1,−0.9).
We need to first setup the intercept term \(β_0\) = 0.1 and the regression coefficients \(β_1\) = 1.1 and \(β_2\) = -0.9.
set.seed(123)
# Regression coefficients
beta0 <- 0.1
beta1 <- 1.1
beta2 <- -0.9
These are the true regression coefficients for the logistic regression model. We will use these values to generate the synthetic data.
(b) Generate values of covariates x(n)i with i = 1,2andn = 1,…,N bysampling N independent and identically distributed values of \(x_i^{(n)}\) from a Uniform [−2,2] distribution.
We will generate 𝑁 independent and identically distributed covariates
from a Uniform[−2, 2] distribution. These covariates will be used to
model the binary responses.
For 𝑁= 10, 50, 100 we will generate the covariates \(x_1\) and \(x_2\) from a Uniform[−2, 2] distribution:
# Generate covariates from Uniform[-2, 2]
covariates <- function(N) {
x1 <- runif(N, min = -2, max = 2)
x2 <- runif(N, min = -2, max = 2)
data.frame(x1 = x1, x2 = x2)
}
# Generate for N = 10, 50, 100
covariates_10 <- covariates(10)
covariates_50 <- covariates(50)
covariates_100 <- covariates(100)
# Preview the covariates for N=10
covariates_10
## x1 x2
## 1 -0.8496899 1.8273334
## 2 1.1532205 -0.1866634
## 3 -0.3640923 0.7102825
## 4 1.5320696 0.2905336
## 5 1.7618691 -1.5883013
## 6 -1.8177740 1.5992999
## 7 0.1124220 -1.0156491
## 8 1.5696762 -1.8317619
## 9 0.2057401 -0.6883171
## 10 -0.1735411 1.8180146
(c) Generate binary responses corresponding to each pair of the covariates you have generated in the previous step by sampling \(y_i^{n}\) ∈ {0,1} for i = 1,…,N with probability \[ p(y_i^{(n)} =1|x_1^{(n)}, x_2^{(n)}) = \frac{1}{1+ exp(-(β_0 + β_1 x_1^{(n)} + β_2 x_2^{(n)}))} \]
The probability that the response is 1 is given by the logistic function, given covariates \(x_1\), \(x_2\):
\[ p(y_i^{(n)} =1|x_1^{(n)}, x_2^{(n)}) = \frac{1}{1+ exp(-(β_0 + β_1 x_1^{(n)} + β_2 x_2^{(n)}))} \]
We will use this probability to sample the binary responses.
# Generate logistic function
logistic_function <- function(covariates, beta0, beta1, beta2) {
# Linear predictor
lp <- beta0 + beta1 * covariates$x1 + beta2 * covariates$x2
# Probabilities using the logistic function
prob <- 1 / (1 + exp(-lp))
# Generate binary responses
y <- rbinom(length(prob), 1, prob)
return(y)
}
# Generate binary responses for N = 10, 50, 100
responses_10 <- logistic_function(covariates_10, beta0, beta1, beta2)
responses_50 <- logistic_function(covariates_50, beta0, beta1, beta2)
responses_100 <- logistic_function(covariates_100, beta0, beta1, beta2)
# Proportion of zeros and ones
proportions <- function(y) {
prop_0 <- mean(y == 0)
prop_1 <- mean(y == 1)
return(c(prop_0 = prop_0, prop_1 = prop_1))
}
# Report proportions for N = 10, 50, 100
proportions_10 <- proportions(responses_10)
proportions_50 <- proportions(responses_50)
proportions_100 <-proportions(responses_100)
# Print the results
data.frame(proportions_10, proportions_50, proportions_100)
## proportions_10 proportions_50 proportions_100
## prop_0 0.4 0.46 0.48
## prop_1 0.6 0.54 0.52
| Sample Size (N) | Proportion of Zeros | Proportion of Ones |
|---|---|---|
| 10 | 0.4 | 0.6 |
| 50 | 0.46 | 0.54 |
| 100 | 0.48 | 0.52 |
2. Our next step will be to do Bayesian inference for the parameters of the logistic regression model. If our method works well, we should be able to recover the true values of the parameters underlying the simulation with a fair degree of accuracy. Put standard Gaussian, independent priors on each of the parameters of the logistic regression model and use (weighted) importance sampling to estimate the posterior means of \((β_0,β_1,β_2)\). As your proposal, you may use a spherical multivariate normal, centered at 0, with variance sufficient to capture the variation in the posterior. Report the posterior means together with associated error estimates. In addition, plot histograms by resampling your sampled β values with probability proportional to the corresponding importance weights. Note that this gives an approximation to the posterior distribution of the β’s: you are simply using the empirical measure to approximate the true posterior. Finally, report the effective sample size. Repeat this for the three sample sizes N. How does your result compare with an answer obtained using a standard optimizer?
In this question, we need to perform Bayesian inference for the parameters of logistic regression model using the importance sampling.
We are using a logistic regression model with synthetic data for N= 10,50,100, generated using the true parameters: (\(β_0\),\(β_1\),\(β_2\)) = (0.1,1.1,−0.9)
We have to do Bayesian inference by placing standard Gaussian priors on each of the regression coefficients and using importance sampling to estimate the posterior distribution of these parameters.
In Bayesian inference, we place priors on the parameters and compute the posterior distribution given the observed data. The posterior distribution is proportional to the product of the likelihood and the prior:
\[p(β|y,X) ∝ p(y|X,β) p(β)\] For N = 10, 50, 100, we will use:
Prior Distribution: Standard Gaussian priors for the
parameters, i.e.,\(p(β) = N(0,1)\) for
each \(β_0\),\(β_1\),\(β_2\)
Likelihood: Binary responses generated using logistic model.
Importance sampling helps to approximate the posterior distribution by sampling from a proposal distribution and reweighting the samples according to the ratio of the target and proposal distributions.
Proposal Distribution: We will use a spherical multivariate normal distribution centered at 0 with a variance chosen large enough to capture the variation in the posterior. \[q(β) = N(0, σ^2 I)\]
Here, \(σ\) = 1.5 is used for the solution. This sigma is chosen based on prior experiments to ensure the proposal adequately covers the posterior. The value is chosen so that the proposal distribution will be too concentrated around the 0.In case if it’s too large, the proposal distribution will include regions with very low posterior probability, leading to inefficient importance sampling
Importance Weight: The importance weight for each parameters \(β^{(i)}\) are computed as:
\[w^{(i)} = \frac{p(y|β^{(i)}) p(β^{(i)})} {q(β^{(i)})}\] Where, \(p(y|β^{(i)})\) is the likelihood of the data given the parameters. \(p(β^{(i)})\) is the prior distribution. \(q(β^{(i)})\) is the proposal distribution.
Resampling: After getting the importance weights, we will resample 10,000 times with probabilities proportional to the importance weights to approximate the posterior distribution.
Effective Sample Size: The effective sample size will be calculated as:
\[ESS = \frac{1}{\sum^{N}_{i=1}w_{i}^2}\] This will help to determine how well the importance sampler worked.
Comparison with Optimizer: We will compare the results of the importance sampling with those obtained from a standard optimizer (e.g., glm).
set.seed(123)
# Parameters
beta_0 <- 0.1
beta_1 <- 1.1
beta_2 <- -0.9
# Sample sizes
N_values <- c(10, 50, 100)
# Generate synthetic data for N
generate_data <- function(N) {
x1 <- runif(N, -2, 2)
x2 <- runif(N, -2, 2)
# Logistic function for probabilities
prob <- 1 / (1 + exp(-(beta_0 + beta_1 * x1 + beta_2 * x2)))
# Binary response
y <- rbinom(N, 1, prob)
return(data.frame(x1 = x1, x2 = x2, y = y))
}
# Generate data for each N
data_list <- lapply(N_values, generate_data)
# Number of samples for importance sampling and resampling
num_samples <- 20000
num_resamples <- 10000
# Log-likelihood
log_likelihood <- function(beta, data) {
lp <- beta[1] + beta[2] * data$x1 + beta[3] * data$x2
prob <- 1 / (1 + exp(-lp))
log_lik <- sum(data$y * log(prob) + (1 - data$y) * log(1 - prob))
return(log_lik)
}
# Proposal distribution (spherical multivariate normal)
sigma<-1.5
proposal <- function(N) {
matrix(rnorm(3 * N, mean=0, sd=sigma), ncol = 3) # 3 coefficients
}
# Importance weights for the proposal
importance_weights <- function(beta, data) {
log_post <- log_likelihood(beta, data) - 0.5 * sum(beta^2) # Gaussian prior
return(exp(log_post))
}
# Importance sampling
importance_sampling <- function(data, N) {
# Proposal distribution
proposal_samples <- proposal(N)
# Weights for each sample
weights <- sapply(1:N, function(i) importance_weights(proposal_samples[i, ], data))
# Normalize weights
weights <- weights / sum(weights)
# Resample with probabilities proportional to the weights
resample_indices <- sample(1:N, num_resamples, replace = TRUE, prob = weights)
resampled_beta <- proposal_samples[resample_indices, ]
# Posterior means and ESS
posterior_means <- colSums(proposal_samples * weights)
ESS <- 1 / sum(weights^2)
return(list(posterior_means = posterior_means, ESS = ESS, resampled_beta = resampled_beta))
}
# Importance sampling for each N
results_list <- lapply(data_list, function(data) importance_sampling(data, num_samples))
# Posterior means for each N
posterior_means_list <- lapply(results_list, function(result) result$posterior_means)
posterior_means_list
## [[1]]
## [1] -0.01243316 0.33431097 -1.28033525
##
## [[2]]
## [1] 0.08996474 1.15678651 -0.89219553
##
## [[3]]
## [1] 0.5107353 0.8860322 -0.9950473
# ESS for each N
ESS_list <- lapply(results_list, function(result) result$ESS)
ESS_list
## [[1]]
## [1] 2083.798
##
## [[2]]
## [1] 303.9224
##
## [[3]]
## [1] 127.5431
# Plot histograms for resampled beta values for each N
par(mfrow = c(3, 3))
for (i in 1:length(N_values)) {
resampled_beta <- results_list[[i]]$resampled_beta
hist(resampled_beta[, 1], main = paste("β0 (N =", N_values[i], ")"), xlab = "β0", col = "blue")
hist(resampled_beta[, 2], main = paste("β1 (N =", N_values[i], ")"), xlab = "β1", col = "green")
hist(resampled_beta[, 3], main = paste("β2 (N =", N_values[i], ")"), xlab = "β2", col = "red")
}
# Fit the logistic regression model using glm
glm_results <- lapply(data_list, function(data) {
glm(y ~ x1 + x2, data = data, family = binomial)
})
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Coefficients from glm
glm_coefficients <- lapply(glm_results, function(model) coef(model))
glm_coefficients
## [[1]]
## (Intercept) x1 x2
## 28.08128 -17.89334 -82.46897
##
## [[2]]
## (Intercept) x1 x2
## 0.09631188 1.40098693 -1.07596759
##
## [[3]]
## (Intercept) x1 x2
## 0.5792200 0.9963177 -1.0936871
| Sample Size (N) | Proportion of Zeros | Proportion of Ones |
|---|---|---|
| 10 | 0.4 | 0.6 |
| 50 | 0.46 | 0.54 |
| 100 | 0.48 | 0.52 |
Posterior Means: The results depends on the sample size: For small sample, N=10, show higher variablity and larger deviation from the true parameter values.
But for larger samples, N=50 and N=100, the estimates are closer to the true values.
Effective Sample Size (ESS): The ESS measures the effective number of independent samples, considering the weights. A higher ESS means the importance sampling is more efficient (there is less redundancy and the samples are more informative). As expected, ESS decreases as the sample size increases, because with larger datasets, the posterior distribution becomes narrower and more concentrated, leading to less weight variation among the samples.
Histogram The histograms show the distribution of the parameters \(β_0\), \(β_1\), and \(β_2\) after resampling with importance weights. The histograms are roughly centered around the true values (\(β_0\) = 0.1, \(β_1\) = 1.1, \(β_2\) = -0.9). As sample size N increases, the histograms become more sharply peaked around the true parameter values.
Comparison with GLM For N=10, the coefficients are large and far from the true values. Which indicates instability in the model due to the small sample size.
For N=50 and N=100, the coefficients from glm are closer to the true values, but the results from the Bayesian approach (importance sampling) are still more robust and less sensitive to fluctuations in the data.
3. Now, increase the dimensionality of your problem from 3 to 9 by adding six extra simulated covariates. Repeat the experiment above. How does your effective sample size change?
set.seed(123)
# Parameters for the original model
beta_0 <- 0.1
beta_1 <- 1.1
beta_2 <- -0.9
# Generate new beta values (Uniform[-1, 1])
beta_add <- runif(6, -1, 1)
# Combine all betas
beta_all <- c(beta_0, beta_1, beta_2, beta_add)
# Sample sizes for N
N_values <- c(10, 50, 100)
# Generate synthetic data for N
generate_data <- function(N) {
# Generate 3 original covariates (x1, x2, x3)
x1 <- runif(N, -2, 2)
x2 <- runif(N, -2, 2)
# Generate 6 additional covariates (x3 to x8)
x_add <- matrix(runif(N * 6, -1, 1), ncol = 6)
# Combine all covariates
x <- cbind(x1, x2, x_add)
# Logistic function for probabilities
linear_predictor <- beta_all[1] + beta_all[2] * x[, 1] + beta_all[3] * x[, 2] +
sum(beta_all[4:9] * x[, 3:8])
prob <- 1 / (1 + exp(-linear_predictor))
# Binary response
y <- rbinom(N, 1, prob)
return(data.frame(cbind(x, y)))
}
# Generate data for each N
data_list <- lapply(N_values, generate_data)
# Samples for importance sampling
num_samples <- 20000
# Log-likelihood for logistic regression
log_likelihood <- function(beta, data) {
lp <- beta[1] + beta[2] * data$x1 + beta[3] * data$x2 +
sum(beta[4:9] * data[, 4:9])
prob <- 1 / (1 + exp(-lp))
log_lik <- sum(data$y * log(prob) + (1 - data$y) * log(1 - prob))
return(log_lik)
}
# Proposal distribution (spherical multivariate normal)
sigma<-1.5
proposal <- function(N) {
matrix(rnorm(9 * N, mean=0, sd=sigma), ncol = 9) # 9 coefficients
}
# Importance weights for the proposal
importance_weights <- function(beta, data) {
log_post <- log_likelihood(beta, data) - 0.5 * sum(beta^2) # Gaussian prior
return(exp(log_post))
}
# Importance sampling
importance_sampling <- function(data, N) {
# Proposal distribution
proposal_samples <- proposal(N)
# Weights for each sample
weights <- sapply(1:N, function(i) importance_weights(proposal_samples[i, ], data))
# Normalize weights
weights <- weights / sum(weights)
# Posterior means and ESS
posterior_means <- colSums(proposal_samples * weights)
ESS <- 1 / sum(weights^2)
return(list(posterior_means = posterior_means, ESS = ESS))
}
# Importance sampling for each N
results_list <- lapply(data_list, function(data) importance_sampling(data, num_samples))
# Posterior means for each N
posterior_means_list <- lapply(results_list, function(result) result$posterior_means)
posterior_means_list
## [[1]]
## [1] -0.05092112 0.17611158 0.58109803 0.14426567 -0.21245478 -0.27210311
## [7] -0.03040912 0.35349368 0.26417048
##
## [[2]]
## [1] -0.223008171 0.002265476 -0.009681493 0.306622563 0.179231296
## [6] 0.435265943 -0.775823009 0.530284095 0.598825303
##
## [[3]]
## [1] -0.17386241 0.01307058 -0.03465651 0.04254519 0.07430081 -0.10689068
## [7] 0.77853418 -0.48812595 0.59705805
# ESS for each N
ESS_list <- lapply(results_list, function(result) result$ESS)
ESS_list
## [[1]]
## [1] 181.5787
##
## [[2]]
## [1] 321.5239
##
## [[3]]
## [1] 403.0693
# Report the new beta values
beta_add
## [1] -0.4248450 0.5766103 -0.1820462 0.7660348 0.8809346 -0.9088870
Effective Sample Size (ESS):
When the number of parameters increases (from 3 to 9), the posterior distribution becomes more complex. As more covariates are introduced, the proposal distribution needs to cover a larger space, and the likelihood function becomes more sensitive to the interactions between the coefficients.
For N = 10, With fewer data points, the ESS is relatively low ESS = 181.58. This is because with a small sample size and a higher-dimensional parameter space, the weights become highly concentrated on a small region of the parameter space, reducing the effective number of independent samples.
For N = 50 The ESS increases to 321.52, which is still lower compared to larger sample sizes. Increasing the data size slightly helps to reduce the concentration of the weights, allowing more independent samples.
For N = 100 The ESS further increases to 403.07, which shows that with larger sample sizes, the sampling is more effective in covering the parameter space. However, the ESS is still not proportional to the total sample size due to the increased dimensionality and the complexity of the posterior.
4. Now try the experiment using a smarter proposal. For instance, you can center the proposal at the posterior mode that you find with an optimization method. How do your results change? Are you achieving a better sample size?
Here, instead of using a spherical Gaussian proposal centered at zero, a smarter proposal is used. This smarter proposal centers the distribution at the posterior mode obtained through optimization. The posterior mode is computed using a standard optimization method (BFGS) to find the mode of the posterior distribution based on the data and the prior.
Then, the importance sampling procedure is repeated, but the proposal is now a spherical Gaussian centered at the posterior mode rather than at zero.
set.seed(123)
# Parameters
beta_0 <- 0.1
beta_1 <- 1.1
beta_2 <- -0.9
# Sample sizes
N_values <- c(10, 50, 100)
# Generate synthetic data for N
generate_data <- function(N) {
x1 <- runif(N, -2, 2)
x2 <- runif(N, -2, 2)
# Logistic function for probabilities
prob <- 1 / (1 + exp(-(beta_0 + beta_1 * x1 + beta_2 * x2)))
# Binary response
y <- rbinom(N, 1, prob)
return(data.frame(x1 = x1, x2 = x2, y = y))
}
# Generate data for each N
data_list <- lapply(N_values, generate_data)
# Samples for importance sampling
num_samples <- 20000
# Log-likelihood for logistic regression
log_likelihood <- function(beta, data) {
lp <- beta[1] + beta[2] * data$x1 + beta[3] * data$x2
prob <- 1 / (1 + exp(-lp))
log_lik <- sum(data$y * log(prob) + (1 - data$y) * log(1 - prob))
return(log_lik)
}
# Gaussian prior for logistic regression (standard normal prior)
log_prior <- function(beta) {
return(-0.5 * sum(beta^2)) # Gaussian prior with zero mean and unit variance
}
# Combine log-likelihood and log-prior to get log-posterior
log_posterior <- function(beta, data) {
return(log_likelihood(beta, data) + log_prior(beta))
}
# Posterior mode using optimization
find_posterior_mode <- function(data) {
optim_result <- optim(par = c(0, 0, 0), fn = function(beta) -log_posterior(beta, data), method = "BFGS")
return(optim_result$par)
}
# Proposal distribution centered at the posterior mode
proposal <- function(N, mode) {
# Using a spherical Gaussian distribution centered at mode
return(matrix(mode + rnorm(3 * N), ncol = 3)) # 3 coefficients
}
# Importance weights for the proposal
importance_weights <- function(beta, data) {
log_post <- log_posterior(beta, data)
return(exp(log_post)) # Return unnormalized importance weights
}
# Importance sampling
importance_sampling <- function(data, N, mode) {
# Proposal distribution
proposal_samples <- proposal(N, mode)
# Weights for each sample
weights <- sapply(1:N, function(i) importance_weights(proposal_samples[i, ], data))
# Normalize weights
weights <- weights / sum(weights)
# Posterior means and ESS
posterior_means <- colSums(proposal_samples * weights)
ESS <- 1 / sum(weights^2)
return(list(posterior_means = posterior_means, ESS = ESS))
}
# Posterior mode, posterior means, and ESS for each N
results_list <- lapply(1:length(data_list), function(i) {
data <- data_list[[i]]
mode <- find_posterior_mode(data) # Posterior mode using optimization
cat("Posterior mode for N =", N_values[i], ":\n", mode, "\n")
# Print the posterior mode
result <- importance_sampling(data, num_samples, mode) # Importance sampling with smarter proposal
cat("Posterior means for N =", N_values[i], ":\n", result$posterior_means, "\n")
cat("Effective Sample Size (ESS) for N =", N_values[i], ":\n", result$ESS, "\n\n")
return(result)
})
## Posterior mode for N = 10 :
## 0.001255137 0.2417342 -1.35716
## Posterior means for N = 10 :
## -0.03891284 0.2356927 -1.364209
## Effective Sample Size (ESS) for N = 10 :
## 3362.917
##
## Posterior mode for N = 50 :
## 0.09739793 1.154655 -0.8712141
## Posterior means for N = 50 :
## 0.0845441 1.13264 -0.8454312
## Effective Sample Size (ESS) for N = 50 :
## 429.3392
##
## Posterior mode for N = 100 :
## 0.5253428 0.8993995 -1.000474
## Posterior means for N = 100 :
## 0.5352862 0.9234392 -1.022761
## Effective Sample Size (ESS) for N = 100 :
## 212.6174
Posterior mean: By using a smarter proposal centered at the posterior mode, the posterior means are closer to the true values, especially for smaller N. For N=100, the improvement become less noticeable but still positive. In contrast, when using the spherical Gaussian proposal centered at zero, the posterior means were further from the true values.
Effective Sample Size (ESS): From the output it is observed that using a smarter proposal that is centered around the posterior mode results an increase in ESS compared to using a spherical Gaussian centered at zero.The increase is more noticeable for smaller sample sizes (N=10 and 50)
Therefore, the smarter proposal (centered around the posterior mode) significantly improves the performance of the importance sampling method by yielding better posterior mean estimates and larger ESS, especially when compared to the spherical Gaussian proposal centered at zero. This is because the proposal is more aligned with the actual distribution of the posterior, making the importance weights more concentrated and the sampling more efficient.
5. In Bayesian inference problems, parameters are typically dependent in the posterior. How can you set the covariance of the proposal in a smarter way than using a proposal with independent coordinates. Think of possible solutions here. There is no correct answer here.