The perfect doctor

Lab specs can be found here

Exercise 1

Question: Recreate table 1 shown below, inserting values appropriately for the three empty colums: (i) The column labeled \(\delta_i\): please enter each patient’s treatment effect (ii) The column labeled \(D\): the optimal treatment for this patient (iii) The column labeled \(Y\): the observed outcomes. Calculate the average treatment effect (ATE) and the average treatment effect for the treated (ATT) when comparing the outcome of the ventilators treatment with that of the bedrest treatment and comment as to which type of intervention is more effective on average. Finally, explain under which conditions might SUTVA be violated for treatments of covid-19 in the scenario described above.

# Write your code here

tribble(
  ~patient, ~`Y(0)`, ~`Y(1)`, ~Age, ~`   delta   `, ~`   D   `, ~`   Y   `,
  1, 10, 1,  29, -9, 0, 10,
  2,  5, 1,  35, -4, 0, 5,
  3,  4, 1,  19, -3, 0, 4,
  4,  6, 5,  45, -1, 0, 6,
  5,  1, 5,  65, 4, 1, 5,
  6,  7, 6,  50,-1, 0, 7, 
  7,  8, 7,  77, -1, 0,7,
  8, 10, 7,  18, -3, 0, 10, 
  9,  2, 8,  85, 6, 1, 8,
  10, 6, 9,  96, 3, 1, 9,
  11, 7, 10, 77, 3, 1,10) %>% 
  gt()
patient Y(0) Y(1) Age delta D Y
1 10 1 29 -9 0 10
2 5 1 35 -4 0 5
3 4 1 19 -3 0 4
4 6 5 45 -1 0 6
5 1 5 65 4 1 5
6 7 6 50 -1 0 7
7 8 7 77 -1 0 7
8 10 7 18 -3 0 10
9 2 8 85 6 1 8
10 6 9 96 3 1 9
11 7 10 77 3 1 10

Answer: [Replace this with your answer]

Exercise 2

Question: Calculate the simple difference in outcomes (SDO), showing the details of your calculation. Is the SDO a good estimation for the ATE? Finally, check whether the SDO is equal to the sum of the ATT and the selection bias, \(E[Y(0)|T=1] - E[Y(0)|T=0]\).

# Write your code here
library(dplyr)

# Data frame based on the given table
data <- tibble(
  patient = 1:11,
  Y0 = c(10, 5, 4, 6, 1, 7, 8, 10, 2, 6, 7),
  Y1 = c(1, 1, 1, 5, 5, 6, 7, 7, 8, 9, 10),
  Age = c(29, 35, 19, 45, 65, 50, 77, 18, 85, 96, 77),
  D = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1),
  Y = c(10, 5, 4, 6, 5, 7, 8, 10, 8, 9, 10)
)

# Calculate the Simple Difference in Outcomes (SDO)
SDO <- data %>%
  group_by(D) %>%
  summarise(Average_Y = mean(Y), .groups = 'drop') %>%
  summarise(SDO = diff(Average_Y)) %>%
  pull(SDO)

# Calculate the Average Treatment Effect on the Treated (ATT)
ATT <- data %>%
  filter(D == 1) %>%
  summarise(ATT = mean(Y1 - Y0)) %>%
  pull(ATT)
ATT
## [1] 4
# Calculate E[Y0|T=1] and E[Y0|T=0]
E_Y0_t1 <- mean(data$Y0[data$D == 1])
E_Y0_t0 <- mean(data$Y0[data$D == 0])
E_Y0_t1
## [1] 4
E_Y0_t0
## [1] 7.142857
# Selection Bias
selection_bias <- E_Y0_t1 - E_Y0_t0
selection_bias
## [1] -3.142857
# Check the relationship
relationship_check <- SDO == (ATT + selection_bias)

# Output results
cat("SDO: ", SDO, "\n")
## SDO:  0.8571429

Answer: [Replace this with your answer]

Exercise 3

Question: Compare the treatment effect for both groups: for those treated with a ventilator and for those treated with bedrest. What explains the difference in the average effect? Now compare all four measures of effects. What are the advantages and disadvantages of each? Is the ATE equal to the mean of the ATU and the ATT? Why or why not?

# Write your code here

Answer: [Replace this with your answer]

Using regression to estimate effects

The following exercises demonstrate that regression is a useful tool to estimate average outcomes and treatment effects in the different groups. Notice that in contrast to the role that regressions play in traditional statistics, here standard errors and significance are not of primary concern. Instead, we are interested in using regression to calculate average effects proper.

Exercise 4

Question: Calculate the outcome, conditional on getting the bedrest treatment \(\mathbb{E}[Y|D=0]\). Now estimate the following regression, comparing the coefficients \(\alpha\) and \(\delta\) to the statistics you’ve previously calculated. What did you find? How would you explain these finding?

# Write your code here

Answer: [Replace this with your answer]

Exercise 5

Question: Now estimate the same regression, but this time, controlling for age, again comparing, the coefficient \(\delta\) to the statistics you’ve previously calculated. What did you find? How do you explain these results?

# Write your code here

Answer: [Replace this with your answer]

Exercise 6

Question: Estimate the following three regression models. The first model is the same as the one above. The second equation is the auxiliary regression of \(D\) onto \(X_{age}\). The third equation regresses \(Y\) onto \(\tilde{D}\) which is the residual from the second equation. Compare the coefficient on \(D\) from the first equation to the coefficient on \(\tilde{D}\) in the third equation. What does this tell you about how to interpret multivariate regressions?

# Write your code here

Answer: [Replace this with your answer]