sigma
)Dear Professor Shestopaloff,
In response to the your suggestion to test with different proposal standard deviations to improve performance, I made the following changes:
1- Overview of the changes: I changed the code to test several values for the scaling parameter (sigma) of the proposal distribution. The objective is to look at the effects of proposal distribution scaling on the Effective Sample Size or ESS. Also, the ESS results were computed and displayed for every combination of proposal scaling and sample size.
2- Introduction of Experimentation with Proposal Scaling (sigma): The code is updated to include multiple values for the proposal distribution’s standard deviation (sigma). This adjustment permit us to test with how the proposal distribution’s scaling affects the performance of the importance sampling process. I used the following sigma values for testing:
0.5 1.0 1.5 2.0 3.0
3- Tracking ESS for various Proposal Scaling: For every combination of sample size and proposal distribution scaling, the Effective Sample Size (ESS) was calculated. One important statistic for assessing the effectiveness of the importance sampling procedure is ESS. Higher values of the ESS indicate better sampling efficiency and show how closely the resampled weights reflect the real posterior distribution.
4- Visualization of Results: The results were visualized using plots that show how ESS changes with respect to the sample size and proposal scaling (sigma).
5- Posterior Mean Calculation: After performing importance sampling, the posterior means of the parameters 𝛽0,𝛽1 and 𝛽2 were computed and stored for analysis. The results for posterior means weree displayed in histograms for each sample size and proposal scaling (sigma).
Proposal Scaling and ESS:
Inefficient sampling results from a poorly scaled proposal distribution, which is predicted to have a low ESS. We can see how changing the spread of the proposal distribution affects the ESS by experimenting with different sigma values. If the proposal is too heavily concentrated around zero, a smaller sigma could lead to low ESS (inefficient sampling); conversely, if a bigger sigma does not introduce excessive variation, it may result in greater exploration of the parameter space and higher ESS.
Proposal Distribution Scaling:
To see how it affected the sampling efficiency, the proposal distribution’s scaling (which was run by sigma) was changed. I think I managed the distance that the sampled betas may be from the origin by varying the standard deviation of the proposal distribution. This investigation helps in determining the ideal scale that finds an agreement between parameter space exploration and sampling effectiveness.
Impact on Posterior Means: Histograms of the resampled betas for every sigma value were produced as part of the analysis. These histograms provide greater clarity on the connection between ESS and proposal scaling by illustrating how the posterior distribution of the parameters changes when various proposal scailing are used.
set.seed(170)
intercept <- 0.1
coef_x1 <- 1.1
coef_x2 <- -0.9
sample_sizes <- c(10, 50, 100)
create_data <- function(N) {
x1 <- runif(N, min = -2, max = 2)
x2 <- runif(N, min = -2, max = 2)
linear_predictor <- intercept + coef_x1 * x1 + coef_x2 * x2
probabilities <- 1 / (1 + exp(-linear_predictor))
response <- rbinom(N, 1, probabilities)
return(data.frame(x1 = x1, x2 = x2, y = response))
}
simulated_data <- lapply(sample_sizes, create_data)
num_samples <- 20000
num_resamples <- 10000
compute_log_likelihood <- function(beta, data) {
linear_predictor <- beta[1] + beta[2] * data$x1 + beta[3] * data$x2
probabilities <- 1 / (1 + exp(-linear_predictor))
log_likelihood_value <- sum(data$y * log(probabilities) + (1 - data$y) * log(1 - probabilities))
return(log_likelihood_value)
}
proposal_distribution <- function(N, sigma = 1.5) {
matrix(rnorm(3 * N, mean = 0, sd = sigma), ncol = 3)
}
compute_importance_weights <- function(beta, data) {
log_posterior <- compute_log_likelihood(beta, data) - 0.5 * sum(beta^2)
return(exp(log_posterior))
}
importance_sampling <- function(data, N, sigma) {
proposal_samples <- proposal_distribution(N, sigma)
weights <- sapply(1:N, function(i) compute_importance_weights(proposal_samples[i, ], data))
normalized_weights <- weights / sum(weights)
resample_indices <- sample(1:N, num_resamples, replace = TRUE, prob = normalized_weights)
resampled_beta <- proposal_samples[resample_indices, ]
posterior_means <- colSums(proposal_samples * normalized_weights)
ESS <- 1 / sum(normalized_weights^2)
return(list(posterior_means = posterior_means, ESS = ESS, resampled_beta = resampled_beta))
}
experiment_with_sigma <- function(data, sample_sizes, sigmas) {
results <- list()
for (sigma in sigmas) {
sampling_results <- lapply(sample_sizes, function(N) importance_sampling(data, N, sigma))
ESS_results <- lapply(sampling_results, function(result) result$ESS)
posterior_means_results <- lapply(sampling_results, function(result) result$posterior_means)
results[[paste("Sigma =", sigma)]] <- list(ESS = ESS_results, posterior_means = posterior_means_results)
}
return(results)
}
sigma_values <- c(0.5, 1, 1.5, 2, 3)
experiment_results <- experiment_with_sigma(simulated_data[[1]], sample_sizes, sigma_values)
par(mfrow = c(3, 3))
for (sigma in sigma_values) {
ESS_result <- experiment_results[[paste("Sigma =", sigma)]]$ESS
posterior_means_result <- experiment_results[[paste("Sigma =", sigma)]]$posterior_means
plot(sample_sizes, unlist(ESS_result), type = "b", col = "blue",
main = paste("ESS for", paste("Sigma =", sigma)), xlab = "Sample Size", ylab = "ESS")
posterior_means_matrix <- do.call(rbind, posterior_means_result)
hist(posterior_means_matrix[, 1], main = paste("β0 (Sigma =", sigma, ")"), xlab = "β0", col = "yellow", breaks = 50)
hist(posterior_means_matrix[, 2], main = paste("β1 (Sigma =", sigma, ")"), xlab = "β1", col = "lightblue", breaks = 50)
hist(posterior_means_matrix[, 3], main = paste("β2 (Sigma =", sigma, ")"), xlab = "β2", col = "green", breaks = 50)
}
glm_models <- lapply(simulated_data, function(data) glm(y ~ x1 + x2, data = data, family = binomial))
glm_coefficients <- lapply(glm_models, function(model) coef(model))
1- Actions Presented:
Testing Various Proposal Standard Deviations (σ): The proposal distribution’s spread is managed by the proposal standard deviation (sigma). Whereas a smaller value of sigma concentrates the distribution around the mean, a bigger value results in a broader spread
2- The Effect of σ on ESS:
Larger σ: The diversity of the samples may be enhanced by a wider proposal distribution that may encompass a larger region of the parameter space. Poorer convergence may result from it, though, as it could increase the important weights’ variance.
Smaller σ: The suggested distribution will be concentrated nearer the origin (i.e., around zero) with a smaller σ, which may result in more precise sampled betas. But if σ is too small, it might not adequately investigate the posterior distribution, which would result in low ESS and not enough coverage.
set.seed(170)
intercept <- 0.1
coef_x1 <- 1.1
coef_x2 <- -0.9
additional_betas <- runif(6, min = -1, max = 1)
names(additional_betas) <- paste("beta_add", 1:6, sep = "_")
print("Generated additional beta values:")
## [1] "Generated additional beta values:"
print(additional_betas)
## beta_add_1 beta_add_2 beta_add_3 beta_add_4 beta_add_5 beta_add_6
## 0.9683449 -0.5339030 0.5368513 -0.8417813 0.1257677 -0.8274445
sample_sizes <- c(10, 50, 100)
create_data <- function(N) {
x1 <- runif(N, min = -2, max = 2)
x2 <- runif(N, min = -2, max = 2)
additional_covariates <- matrix(runif(N * 6, min = -2, max = 2), ncol = 6)
colnames(additional_covariates) <- paste("x_add", 1:6, sep = "_")
linear_predictor <- intercept + coef_x1 * x1 + coef_x2 * x2
for (i in 1:6) {
linear_predictor <- linear_predictor + additional_covariates[, i] * additional_betas[i]
}
probabilities <- 1 / (1 + exp(-linear_predictor))
response <- rbinom(N, 1, probabilities)
data <- data.frame(x1 = x1, x2 = x2, y = response, additional_covariates)
return(data)
}
simulated_data <- lapply(sample_sizes, create_data)
num_samples <- 20000
num_resamples <- 10000
compute_log_likelihood <- function(beta, data) {
linear_predictor <- beta[1] + beta[2] * data$x1 + beta[3] * data$x2
for (i in 1:6) {
linear_predictor <- linear_predictor + data[, paste("x_add", i, sep = "_")] * beta[3 + i]
}
probabilities <- 1 / (1 + exp(-linear_predictor))
log_likelihood_value <- sum(data$y * log(probabilities) + (1 - data$y) * log(1 - probabilities))
return(log_likelihood_value)
}
proposal_distribution <- function(N, sigma = 1.5) {
matrix(rnorm(9 * N, mean = 0, sd = sigma), ncol = 9)
}
compute_importance_weights <- function(beta, data) {
log_likelihood <- compute_log_likelihood(beta, data)
log_prior <- -0.5 * sum(beta^2)
log_posterior <- log_likelihood + log_prior
return(exp(log_posterior))
}
importance_sampling <- function(data, N) {
proposal_samples <- proposal_distribution(N)
weights <- sapply(1:N, function(i) compute_importance_weights(proposal_samples[i, ], data))
if (any(is.na(weights)) | any(is.nan(weights))) {
weights[is.na(weights) | is.nan(weights)] <- 1e-10
}
normalized_weights <- weights / sum(weights)
resample_indices <- sample(1:N, num_resamples, replace = TRUE, prob = normalized_weights)
resampled_beta <- proposal_samples[resample_indices, ]
posterior_means <- colSums(proposal_samples * normalized_weights)
ESS <- 1 / sum(normalized_weights^2)
return(list(posterior_means = posterior_means, ESS = ESS, resampled_beta = resampled_beta))
}
sampling_results <- lapply(simulated_data, function(data) importance_sampling(data, num_samples))
posterior_means_results <- lapply(sampling_results, function(result) result$posterior_means)
ESS_results <- lapply(sampling_results, function(result) result$ESS)
print("Posterior means for each sample size:")
## [1] "Posterior means for each sample size:"
print(posterior_means_results)
## [[1]]
## [1] 0.31279720 0.64257649 -0.63977290 0.10788616 -0.70143446 0.29467828
## [7] 0.01030248 -0.39261481 -0.83282578
##
## [[2]]
## [1] -0.007941247 1.540101108 -0.774561409 1.322760275 -0.241152351
## [6] 0.362213366 -0.611630032 -0.218248803 -0.778815135
##
## [[3]]
## [1] 0.1952008 1.1087517 -1.0680444 0.3095585 -0.3575560 -0.1548563 -0.7879833
## [8] 0.1095985 -1.2033588
print("Effective Sample Size (ESS) for each sample size:")
## [1] "Effective Sample Size (ESS) for each sample size:"
print(ESS_results)
## [[1]]
## [1] 42.81159
##
## [[2]]
## [1] 2.806032
##
## [[3]]
## [1] 1.121614
par(mfrow = c(3, 3))
for (i in 1:length(sample_sizes)) {
resampled_betas <- sampling_results[[i]]$resampled_beta
hist(resampled_betas[, 1], main = paste("β0 (N =", sample_sizes[i], ")"), xlab = "β0", col = "yellow", breaks = 30)
hist(resampled_betas[, 2], main = paste("β1 (N =", sample_sizes[i], ")"), xlab = "β1", col = "lightblue", breaks = 30)
hist(resampled_betas[, 3], main = paste("β2 (N =", sample_sizes[i], ")"), xlab = "β2", col = "green", breaks = 30)
for (j in 1:6) {
hist(resampled_betas[, 3 + j], main = paste("β_add", j, " (N =", sample_sizes[i], ")"), xlab = paste("β_add", j, sep = ""), col = rainbow(6)[j], breaks = 30)
}
}
glm_models <- lapply(simulated_data, function(data) glm(y ~ x1 + x2 + x_add_1 + x_add_2 + x_add_3 + x_add_4 + x_add_5 + x_add_6, data = data, family = binomial))
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
glm_coefficients <- lapply(glm_models, function(model) coef(model))
print("GLM coefficients:")
## [1] "GLM coefficients:"
print(glm_coefficients)
## [[1]]
## (Intercept) x1 x2 x_add_1 x_add_2 x_add_3
## 6.433204 14.798308 -1.828293 -12.216024 -10.451806 7.680328
## x_add_4 x_add_5 x_add_6
## -7.121071 -20.632866 -34.417421
##
## [[2]]
## (Intercept) x1 x2 x_add_1 x_add_2 x_add_3
## -0.3287673 1.9150312 -0.6886903 1.6157019 -0.7749457 0.2304285
## x_add_4 x_add_5 x_add_6
## -0.9472955 -0.0231685 -0.8970437
##
## [[3]]
## (Intercept) x1 x2 x_add_1 x_add_2 x_add_3
## -0.2661212 0.9181552 -1.2082422 0.7910228 -0.4107330 0.3579283
## x_add_4 x_add_5 x_add_6
## -1.1341144 0.4847570 -0.8623633