Lab specs can be found here
Question: Recreate table 1 shown below,
inserting values appropriately for the three empty colums: (i) The
column labeled \(\delta_i\): please
enter each patient’s treatment effect (ii) The column labeled \(D\): the optimal treatment for this patient
(iii) The column labeled \(Y\): the
observed outcomes. Calculate the average treatment effect (ATE) and the
average treatment effect for the treated (ATT) when comparing the
outcome of the ventilators treatment with that of the
bedrest treatment and comment as to which type of
intervention is more effective on average. Finally, explain under which
conditions might SUTVA be violated
for treatments of covid-19 in the scenario described above.
# Write your code here
tribble(
~patient, ~`Y(0)`, ~`Y(1)`, ~Age, ~` delta `, ~` D `, ~` Y `,
1, 10, 1, 29, -9, 0, 10,
2, 5, 1, 35, -4, 0, 5,
3, 4, 1, 19, -3, 0, 4,
4, 6, 5, 45, -1, 0, 6,
5, 1, 5, 65, 4, 1, 5,
6, 7, 6, 50,-1, 0, 7,
7, 8, 7, 77, -1, 0,7,
8, 10, 7, 18, -3, 0, 10,
9, 2, 8, 85, 6, 1, 8,
10, 6, 9, 96, 3, 1, 9,
11, 7, 10, 77, 3, 1,10) %>%
gt()| patient | Y(0) | Y(1) | Age | delta | D | Y |
|---|---|---|---|---|---|---|
| 1 | 10 | 1 | 29 | -9 | 0 | 10 |
| 2 | 5 | 1 | 35 | -4 | 0 | 5 |
| 3 | 4 | 1 | 19 | -3 | 0 | 4 |
| 4 | 6 | 5 | 45 | -1 | 0 | 6 |
| 5 | 1 | 5 | 65 | 4 | 1 | 5 |
| 6 | 7 | 6 | 50 | -1 | 0 | 7 |
| 7 | 8 | 7 | 77 | -1 | 0 | 7 |
| 8 | 10 | 7 | 18 | -3 | 0 | 10 |
| 9 | 2 | 8 | 85 | 6 | 1 | 8 |
| 10 | 6 | 9 | 96 | 3 | 1 | 9 |
| 11 | 7 | 10 | 77 | 3 | 1 | 10 |
Answer: [Replace this with your answer]
Question: Calculate the simple difference in
outcomes (SDO), showing the details of your calculation. Is
the SDO a good estimation for the ATE? Finally, check
whether the SDO is equal to the sum of the ATT
and the selection bias, \(E[Y(0)|T=1] -
E[Y(0)|T=0]\).
# Write your code here
library(dplyr)
# Data frame based on the given table
data <- tibble(
patient = 1:11,
Y0 = c(10, 5, 4, 6, 1, 7, 8, 10, 2, 6, 7),
Y1 = c(1, 1, 1, 5, 5, 6, 7, 7, 8, 9, 10),
Age = c(29, 35, 19, 45, 65, 50, 77, 18, 85, 96, 77),
D = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1),
Y = c(10, 5, 4, 6, 5, 7, 8, 10, 8, 9, 10)
)
# Calculate the Simple Difference in Outcomes (SDO)
SDO <- data %>%
group_by(D) %>%
summarise(Average_Y = mean(Y), .groups = 'drop') %>%
summarise(SDO = diff(Average_Y)) %>%
pull(SDO)
# Calculate the Average Treatment Effect on the Treated (ATT)
ATT <- data %>%
filter(D == 1) %>%
summarise(ATT = mean(Y1 - Y0)) %>%
pull(ATT)
ATT## [1] 4
# Calculate E[Y0|T=1] and E[Y0|T=0]
E_Y0_t1 <- mean(data$Y0[data$D == 1])
E_Y0_t0 <- mean(data$Y0[data$D == 0])
E_Y0_t1## [1] 4
## [1] 7.142857
## [1] -3.142857
# Check the relationship
relationship_check <- SDO == (ATT + selection_bias)
# Output results
cat("SDO: ", SDO, "\n")## SDO: 0.8571429
Answer: [Replace this with your answer]
Question: Compare the treatment effect for both groups: for those treated with a ventilator and for those treated with bedrest. What explains the difference in the average effect? Now compare all four measures of effects. What are the advantages and disadvantages of each? Is the ATE equal to the mean of the ATU and the ATT? Why or why not?
Answer: [Replace this with your answer]
The following exercises demonstrate that regression is a useful tool to estimate average outcomes and treatment effects in the different groups. Notice that in contrast to the role that regressions play in traditional statistics, here standard errors and significance are not of primary concern. Instead, we are interested in using regression to calculate average effects proper.
Question: Calculate the outcome, conditional on
getting the bedrest treatment \(\mathbb{E}[Y|D=0]\). Now estimate the
following regression, comparing the coefficients \(\alpha\) and \(\delta\) to the statistics you’ve
previously calculated. What did you find? How would you explain these
finding?
Answer: [Replace this with your answer]
Question: Now estimate the same regression, but this time, controlling for age, again comparing, the coefficient \(\delta\) to the statistics you’ve previously calculated. What did you find? How do you explain these results?
Answer: [Replace this with your answer]
Question: Estimate the following three regression models. The first model is the same as the one above. The second equation is the auxiliary regression of \(D\) onto \(X_{age}\). The third equation regresses \(Y\) onto \(\tilde{D}\) which is the residual from the second equation. Compare the coefficient on \(D\) from the first equation to the coefficient on \(\tilde{D}\) in the third equation. What does this tell you about how to interpret multivariate regressions?
Answer: [Replace this with your answer]