HW4.knit

Exercise 1:

1.1:

Using initial estimates for beta through an initial regression as shown below, a penalty is calculated as shown through Wj. The gamma variable is applied to enhance the effect of the penalties. By multiplying the penaties to the 20 covariates’ coefficients, their final coefficients will be reduced. By having gamma greater than 1, the penalty on dominant covariates will be much greater and will be more likely to be removed from the final model. The penalties on the other variables will be high as well but since they are not needed, this is acceptable.

\[ Y = \beta_0 + \beta_1 A + \sum_{j=1}^p \beta_j X_j + \varepsilon, \]

\[ w_j = \frac{1}{|\hat{\beta}_j|^\gamma}, \quad \gamma > 1 \]

##1.2: > # Results > mean(ate_lasso) [1] 1.945405 > var(ate_lasso) [1] 0.1049053 > mean(ate_full) [1] 1.960559 > var(ate_full) [1] 0.2497782

While the mean ATE for the two IP weights are similar, the variance from the refitted adaptive lasso method is lower than the one from the standard logistic regression. Since the standard logistic regression does not remove any variables, there is more noise in the models used to calculate the ATEs. In contrast, refitted adaptive lasso removed unimportant variables that allowed only the more significant variables to remain leading to less noise. ## 1.3: Number of X2 false positives: 0 Number of X1 false negatives: 0

With gamma set relatively high (gamma=4), variables are removed more easily, thus removing X2 more. X1 less so since it has a higher coefficient than X2.

Exercise 2:

2.1:

The optimal treatment rule for the logistic regression model is defined by adjusting the formula with main effects as seen in the equation above. The highest predictive probability then needs to be increased so the probability \(\Pr(Y = 1 \mid A, X)\) is chosen. Then the log odd difference for A=1 and A=0 is calculated. The logit is then used to simplify the difference to get the optimal treatment rule below.

\[ \text{logit} \Pr(Y = 1 \mid A, X) = \beta_0 + \beta_1 A + \beta_2^T X + \beta_3^T (A \cdot X) \]

\[ \text{log-odds difference} = \log \left( \frac{\Pr(Y = 1 \mid A = 1, X)}{1 - \Pr(Y = 1 \mid A = 1, X)} \right) - \log \left( \frac{\Pr(Y = 1 \mid A = 0, X)}{1 - \Pr(Y = 1 \mid A = 0, X)} \right) \] \[ \Delta(X) = \text{logit}^{-1}(\hat{\beta}_0 + \hat{\beta}_1 + \hat{\beta}_2^T X + \hat{\beta}_3^T X) - \text{logit}^{-1}(\hat{\beta}_0 + \hat{\beta}_2^T X) \]

Optimal Treatment rule: \[ d^*(X) = \begin{cases} 1 & \text{if } \text{logit}^{-1}(\beta_0 + \beta_1 + \beta_2^T X + \beta_3^T X) > \text{logit}^{-1}(\beta_0 + \beta_2^T X) \\ 0 & \text{otherwise} \end{cases} \] ## 2.2: With the data coming from a case-control study, the usage of odds ratios is already appropriate and thus, the treatment rule would not change. The outcomes can be compared through the odds ratios and can be reduced as shown below. Equation 4: \[ \Pr(Y = 1 \mid A = 1, X) > \Pr(Y = 1 \mid A = 0, X) \] ## 2.3: Ensure removal of highly correlated and/or unimportant variables to reduce choices that need to be made leading to a simpler decision rule. Adaptive weights can be designed to be more penalizing (higher gamma (as shown below) and/or lambda) to ensure the removal of the unimportant variables with lasso and reduce correlated variables with adaptive weights.

\[ w_j = \frac{1}{|\hat{\beta}_j|^\gamma}, \quad \gamma > 0 \]

Appendix

library(glmnet)

#Initialize variables
n <- 400
p <- 20 
R <- 2000

false_positive_X2 <- 0
false_negative_X1 <- 0

ate_lasso <- numeric(R)
ate_full <- numeric(R)

expit <- function(x) 1 / (1 + exp(-x))

#Simulation
for (j in 1:R) {
  set.seed(j)
  
  #Generate data
  X <- matrix(rnorm(n * p), n, p)
  colnames(X) <- paste0("X", 1:p)
  
  lin_pred <- -1 + X[, 1] + X[, 2]
  A <- rbinom(n, 1, expit(lin_pred))
  Y <- A - X[, 1] + X[, 3] + rnorm(n)
  
  #Initial fit to determine initial betas for adaptive lasso
  init_fit <- glmnet(X, Y, alpha = 1, lambda = .2)
  beta_init <- as.vector(coef(init_fit, s = "lambda.min"))[-1]
  beta_init
  
  #Adaptive weights
  gamma <- 3
  ittybitty <- .0000001
  w <- 1 / (abs(beta_init)^gamma + ittybitty)  # avoid divide by 0
  
  
  # Step 2: Adaptive lasso
  cv_fit <- cv.glmnet(X, A, family = "binomial", alpha = 1,
                      penalty.factor = w,
                      nfolds = 5, nlambda = 20)
  
  #Refit using the lambda.min from the Cross Validation
  fit_lasso <- glmnet(X, A, family = "binomial", alpha = .95,
                      penalty.factor = w, standardize = TRUE,
                      lambda = cv_fit$lambda.min)
  selected <- which(as.vector(coef(fit_lasso))[-1] != 0)
  
  #Track false positive (x2) and false negative (x1)
  if (2 %in% selected) false_positive_X2 <- false_positive_X2 + 1
  if (!(1 %in% selected)) false_negative_X1 <- false_negative_X1 + 1
  
  #Refit logistic model with selected vars
  if (length(selected) == 0) {
    prob_lasso <- rep(mean(A), n)  # fallback if no variable selected
  } else {
    data_lasso <- data.frame(A = A, X_sel = X[, selected, drop = FALSE])
    refit <- glm(A ~ ., data = data_lasso, family = binomial)
    prob_lasso <- predict(refit, type = "response")
  }
  
  #Calculate ATE from refitted adaptive lasso
  w_lasso <- A / prob_lasso + (1 - A) / (1 - prob_lasso)
  ate_lasso[j] <- mean(w_lasso * (A - mean(A)) * Y) / mean(w_lasso * (A - mean(A)))
  
  #IP weights from standard logistic regression
  df_full <- data.frame(A = A, X)
  full_fit <- glm(A ~ ., data = df_full, family = binomial)
  prob_full <- predict(full_fit, type = "response")
  
  #Calculate ATE (logistic regression)
  w_full <- A / prob_full + (1 - A) / (1 - prob_full)
  ate_full[j] <- mean(w_full * (A - mean(A)) * Y) / mean(w_full * (A - mean(A)))
}

#Results
#Mean and Variance from the refitted adaptive lasso
mean(ate_lasso)
var(ate_lasso)

#Mean and variance from the standard logistic regression
mean(ate_full)
var(ate_full)

#Number of X2 false positives:
false_positive_X2/R
#Number of X1 false negatives:
false_negative_X1/R