1. (4 points) Consider the sequentially randomized trial described by the SWIG above. We will see later that in order to establish identifiability of \(E[Y(a_1, a_2)]\), we will require the particular independence conditions described below. The exercise here is to determine from the SWIG, which variables should be conditioned on in order to establish this independence.
  1. Find the smallest conditioning set \(Z_0\) such that (i) \(Y(a_0, a_1) \perp A_0 \mid Z_0\) and (ii) \(L_1(a_0) \perp A_0 \mid Z_0\), and (iii) \(A_1(a_0) \perp A_0 \mid Z_0\).

  2. Find the smallest conditioning set \(Z_1\) such that \(Y(a_0, a_1) \perp A_1(a_0) \mid Z_1\).

  1. (9 points) Consider Table displaying data from a population of 3320 individuals.
  1. Fill in Table to show how the population above would look if we did not stratify on \(W\).
  1. Use table to check whether \(Y(1) \perp A\) and whether \(Y(0) \perp A\).

  2. Use the Table to check whether \(Y(1) \perp A \mid W\) and whether \(Y(0) \perp A \mid W\).

  3. Assuming causal consistency, fill in the Table with what the observed data would be for this population.

  1. Based on Table , do you conclude that the positivity assumption is satisfied?

  2. Using Table , compute \(E[Y(1)]\) and \(E[Y(0)]\).

  3. Using Table , compute the G-computation formula, \(E[E(Y \mid A = 1, W)]\) and \(E[E(Y \mid A = 0, W)]\).

  4. Using Table , compute the ATT \(E[Y(1) - Y(0) \mid A = 1]\).

  5. Using Table , compute the G-computation formula for the ATT, \(E[E(Y \mid A = 1, W) - E(Y \mid A = 0, W) \mid A = 1]\).

  6. Using Table , compute the ATE in the \(W = 0\) subgroup and \(W = 1\) subgroup. Is there effect modification of the risk difference in this problem?

  1. (7 points) Install the R package survtmle from CRAN and load the RV144 data.
install.packages("survtmle")

This is a simulated data set that mimics the RV144 trial; a randomized trial of a preventive HIV vaccine. We will make some changes to define an outcome suitable for analysis with our current toolkit.

library(survtmle); data(rv144)
## Warning: package 'survtmle' was built under R version 4.0.4
## survtmle: Targeted Learning for Survival Analysis
## Version: 1.1.1
# indicator of observed HIV infection
rv144$out <- as.numeric(rv144$ftype > 0)
table(rv144$out, rv144$vax)
##    
##        0    1
##   0 7902 7943
##   1   64   46
  1. Compute the G-computation estimator of vaccine efficacy, VE = \(1 - E[Y(1)] / E[Y(0)]\), where \(Y\) is out and we are estimating the causal effect of the vaccine (vax). Use a main-terms logistic regression model for the outcome regression that adjusts for vax, sex (male), enrollment year (year04, year05), risk category (medRisk, highRisk) and age category (medAge, highAge).
# logistic regression model
fit_vax <- glm(out ~ vax + male + year04 + year05 + medRisk + highRisk + medAge + highAge, data = rv144, 
                   family = binomial())

# data.frame that defines subgroups A = 1, W = W_i
df_a1_Wi <- data.frame(vax = 1, male = rv144$male, year04=rv144$year04, year05=rv144$year05,
                       medRisk=rv144$medRisk, highRisk=rv144$highRisk, medAge=rv144$medAge, highAge=rv144$highAge)
# data.frame that defines subgroups A = 0, W = W_i
df_a0_Wi <- data.frame(vax = 0, male = rv144$male, year04=rv144$year04, year05=rv144$year05,
                       medRisk=rv144$medRisk, highRisk=rv144$highRisk, medAge=rv144$medAge, highAge=rv144$highAge)
# estimate of E(Y | A = 1, W = W_i)
Qbar_a1_Wi_glm <- predict(fit_vax, newdata = df_a1_Wi, type = "response")
# estimate of E(Y | A = 0, W = W_i)
Qbar_a0_Wi_glm <- predict(fit_vax, newdata = df_a0_Wi, type = "response")
# estimate of E[Y(1)]
psi_n_1_gcomp2 <- mean(Qbar_a1_Wi_glm)
psi_n_1_gcomp2
## [1] 0.005760993
# estimate of E[Y(0)]
psi_n_0_gcomp2 <- mean(Qbar_a0_Wi_glm)
psi_n_0_gcomp2 
## [1] 0.008029864
# estimate of ATE
psi_n_1_gcomp2 - psi_n_0_gcomp2
## [1] -0.002268871
#VE 
1-psi_n_1_gcomp2/psi_n_0_gcomp2
## [1] 0.282554
  1. Compute the IPTW estimator of vaccine efficacy. Use a main-terms logistic regression model for the propensity score that adjusts for sex (male), enrollment year (year04, year05), risk category (medRisk, highRisk) and age category (medAge, highAge).
ps_fit_vax <- glm(vax ~ male + year04 + year05 + medRisk + highRisk + medAge + highAge, data = rv144, 
                   family = binomial())

# define A_i, Y_i for some notational consistency
A_i <- rv144$vax; Y_i <- rv144$out

# estimate of P(A = 1 | W = W_i)
# no need for newdata here; by default R will predict
# using the data that was used to fit the regression
g_n_a1_Wi <- predict(ps_fit_vax, type = "response")
# estimate of P(A = 0 | W = W_i)
g_n_a0_Wi <- 1 - g_n_a1_Wi
# estimate of E[Y(1)]
psi_n_1_iptw2 <- mean((A_i == 1) / g_n_a1_Wi * Y_i)
psi_n_1_iptw2
## [1] 0.005762127
# estimate of E[Y(0)]
psi_n_0_iptw2 <- mean((A_i == 0) / (g_n_a0_Wi) * Y_i)
psi_n_0_iptw2
## [1] 0.008029805
# estimate of ATE
psi_n_1_iptw2 - psi_n_0_iptw2
## [1] -0.002267678
1-psi_n_1_iptw2/psi_n_0_iptw2
## [1] 0.2824076
  1. Propose four other regression models for estimating the outcome regression. Describe these four models in terms suitable for the methods section of a manuscript. These could be other GLMs, or could be other regression algorithms (e.g., regression trees, random forests, etc…).

  2. Propose four other regression models for estimating the propensity score. Describe these four models in terms suitable for the methods section of a manuscript.

  3. Compute the G-computation and IPTW estimators of VE using the estimators described above (you should now have a total of 10 estimates of VE). Create a table displaying the various estimates of VE. Comment on whether and how the estimates of VE vary as a function of the chosen regression.