Lab specs can be found here
Question: Recreate table 1 shown below,
inserting values appropriately for the three empty colums: (i) The
column labeled \(\delta_i\): please
enter each patient’s treatment effect (ii) The column labeled \(D\): the optimal treatment for this patient
(iii) The column labeled \(Y\): the
observed outcomes. Calculate the average treatment effect (ATE) and the
average treatment effect for the treated (ATT) when comparing the
outcome of the ventilators treatment with that of the
bedrest treatment and comment as to which type of
intervention is more effective on average. Finally, explain under which
conditions might SUTVA be violated
for treatments of covid-19 in the scenario described above.
# Write your code here
tribble(
~patient, ~`Y(0)`, ~`Y(1)`, ~Age, ~` delta `, ~` D `, ~` Y `,
1, 10, 1, 29, -9, 0, 10,
2, 5, 1, 35, -4, 0, 5,
3, 4, 1, 19, -3, 0, 4,
4, 6, 5, 45, -1, 0, 6,
5, 1, 5, 65, 4, 1, 5,
6, 7, 6, 50,-1, 0, 7,
7, 8, 7, 77, -1, 0,7,
8, 10, 7, 18, -3, 0, 10,
9, 2, 8, 85, 6, 1, 8,
10, 6, 9, 96, 3, 1, 9,
11, 7, 10, 77, 3, 1,10) %>%
gt()| patient | Y(0) | Y(1) | Age | delta | D | Y |
|---|---|---|---|---|---|---|
| 1 | 10 | 1 | 29 | -9 | 0 | 10 |
| 2 | 5 | 1 | 35 | -4 | 0 | 5 |
| 3 | 4 | 1 | 19 | -3 | 0 | 4 |
| 4 | 6 | 5 | 45 | -1 | 0 | 6 |
| 5 | 1 | 5 | 65 | 4 | 1 | 5 |
| 6 | 7 | 6 | 50 | -1 | 0 | 7 |
| 7 | 8 | 7 | 77 | -1 | 0 | 7 |
| 8 | 10 | 7 | 18 | -3 | 0 | 10 |
| 9 | 2 | 8 | 85 | 6 | 1 | 8 |
| 10 | 6 | 9 | 96 | 3 | 1 | 9 |
| 11 | 7 | 10 | 77 | 3 | 1 | 10 |
##prepare data
data <- data.frame(
patient = 1:11,
Y0 = c(10, 5, 4, 6, 1, 7, 8, 10, 2, 6, 7),
Y1 = c(1, 1, 1, 5, 5, 6, 7, 7, 8, 9, 10),
Age = c(29, 35, 19, 45, 65, 50, 77, 18, 85, 96, 77),
delta=c (-9, -4, -3, -1, 4, -1, -1, -3, 6, 3, 3),
D = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1),
Y = c(10, 5, 4, 6, 5, 7, 8, 10, 8, 9, 10)
)
##ATE
ate <- mean(data$delta)
ate## [1] -0.5454545
## [1] 4
## patient Y0 Y1 Age delta D Y
## 1 1 10 1 29 -9 0 10
## 2 2 5 1 35 -4 0 5
## 3 3 4 1 19 -3 0 4
## 4 4 6 5 45 -1 0 6
## 5 5 1 5 65 4 1 5
## 6 6 7 6 50 -1 0 7
## 7 7 8 7 77 -1 0 8
## 8 8 10 7 18 -3 0 10
## 9 9 2 8 85 6 1 8
## 10 10 6 9 96 3 1 9
## 11 11 7 10 77 3 1 10
## Average Treatment Effect (ATE): -0.5454545
## Average Treatment Effect for the Treated (ATT): 4
Answer: [Average Treatment Effect (ATE): -0.5454545 Average Treatment Effect for the Treated (ATT): 4 A positive ATT (4) suggests that ventilators are beneficial treatment, a negative ATE suggests that when considering the entire population, the ventilators treatment might have negative inflence. ‘explain under which conditions might [SUTVA] be violated’ when people received treatment, and cut transmission of virus, the decrease the chance of virus exposure.]
Question: Calculate the simple difference in
outcomes (SDO), showing the details of your calculation. Is
the SDO a good estimation for the ATE? Finally, check
whether the SDO is equal to the sum of the ATT
and the selection bias, \(E[Y(0)|T=1] -
E[Y(0)|T=0]\).
# Write your code here
library(dplyr)
# Calculate the Simple Difference in Outcomes (SDO)
treat_mean <- mean(data$Y[data$D==1])
treat_mean## [1] 8
## [1] 7.142857
## [1] 0.8571429
## [1] -0.5454545
##selection bias
T1 <- mean(data$Y0[data$D == 1])
T0 <- mean(data$Y0[data$D == 0])
# Selection Bias
selection_bias <- T1 - T0
selection_bias## [1] -3.142857
## [1] 4
# Calculate selection bias
E_Y0_T1 <- mean(data$Y0[data$D == 1])
E_Y0_T0 <- mean(data$Y0[data$D == 0])
selection_bias <- E_Y0_T1 - E_Y0_T0
# Verify that SDO = ATT + Selection Bias
SDO_check <- ATT + selection_bias
SDO_check## [1] 0.8571429
Answer: [SDO=0.8571429 ATE=-0.5454545 ATT=4
selection bias=-3.142857 No, SDO is not a good estimator for ATE in this
context. The SDO (0.8571) and ATE (-0.5454) are significantly different,
indicating that SDO does not capture the true average treatment effect
across the entire population. SDO is equal to the sum of the
ATT and the selection bias]
Question: Compare the treatment effect for both groups: for those treated with a ventilator and for those treated with bedrest. What explains the difference in the average effect? Now compare all four measures of effects. What are the advantages and disadvantages of each? Is the ATE equal to the mean of the ATU and the ATT? Why or why not?
# Write your code here
ATE <- mean(data$delta)
# Calculate Average Treatment Effect for the Treated (ATT)
ATT <- mean(data$delta[data$D == 1])
# Calculate Average Treatment Effect for the Untreated (ATU)
ATU <- mean(data$delta[data$D == 0])
##SDO
sdo <- mean(data$Y1[data$D == 1]) - mean(data$Y0[data$D == 0])
sdo## [1] 0.8571429
## ATE: -0.5454545
## ATT: 4
## ATU : -3.142857
## SDO: 0.8571429
Answer: [ATE: -0.5454545 ATT: 4 ATU : -3.142857 SDO: 0.8571429 ATE is not equal to mean of ATT and ATU differences of treatment effects could be due to the diffent demographic characteristics of included groups. SDO did not consider confounders effect.]
The following exercises demonstrate that regression is a useful tool to estimate average outcomes and treatment effects in the different groups. Notice that in contrast to the role that regressions play in traditional statistics, here standard errors and significance are not of primary concern. Instead, we are interested in using regression to calculate average effects proper.
Question: Calculate the outcome, conditional on
getting the bedrest treatment \(\mathbb{E}[Y|D=0]\). Now estimate the
following regression, comparing the coefficients \(\alpha\) and \(\delta\) to the statistics you’ve
previously calculated. What did you find? How would you explain these
finding?
# Write your code here
##calculate expected outcome {E}[Y|D=0]
E_Y0 <- mean(data$Y[data$D==0])
cat("Expected outcome for bedrest treatment E[Y|D=0]:", E_Y0,"\n")## Expected outcome for bedrest treatment E[Y|D=0]: 7.142857
##
## Call:
## lm(formula = Y ~ D, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.143 -1.643 0.000 1.500 2.857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.1429 0.8624 8.282 1.68e-05 ***
## D 0.8571 1.4302 0.599 0.564
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.282 on 9 degrees of freedom
## Multiple R-squared: 0.03838, Adjusted R-squared: -0.06847
## F-statistic: 0.3592 on 1 and 9 DF, p-value: 0.5637
Answer: [1. Expected outcome for bedrest treatment E[Y|D=0]: 7.142857 2. Treatment effect (delta) from regression: 0.8571429 3. The intercept from the regression should match the calculated E[Y|D=0] if the model is correctly specified. 4. The coefficient of D (delta) represents the average treatment effect (ATE) assuming linear effects and no confounding. 5. The treatment effect (delta) in the regression should be scrutinized for assumptions of linearity and equal variance across treatment groups.]
Question: Now estimate the same regression, but this time, controlling for age, again comparing, the coefficient \(\delta\) to the statistics you’ve previously calculated. What did you find? How do you explain these results?
# Write your code here
##toncrol age
regression2 <- lm(Y ~ D + Age, data = data)
summary(regression2)##
## Call:
## lm(formula = Y ~ D + Age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7390 -1.6630 -0.0858 1.3839 3.2811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.35543 1.90736 3.332 0.0104 *
## D 0.01419 2.33997 0.006 0.9953
## Age 0.02019 0.04309 0.469 0.6519
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.388 on 8 degrees of freedom
## Multiple R-squared: 0.06407, Adjusted R-squared: -0.1699
## F-statistic: 0.2738 on 2 and 8 DF, p-value: 0.7673
Answer: [1. treatment effect(regression model controlling age) is 0.01419 when control age, number of D shows treatment effect difference across patients, age is a confounder ]
Question: Estimate the following three regression models. The first model is the same as the one above. The second equation is the auxiliary regression of \(D\) onto \(X_{age}\). The third equation regresses \(Y\) onto \(\tilde{D}\) which is the residual from the second equation. Compare the coefficient on \(D\) from the first equation to the coefficient on \(\tilde{D}\) in the third equation. What does this tell you about how to interpret multivariate regressions?
##
## Call:
## lm(formula = Y ~ D + Age, data = data)
##
## Coefficients:
## (Intercept) D Age
## 6.35543 0.01419 0.02019
##
## Call:
## lm(formula = D ~ Age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68663 -0.16289 0.04442 0.17433 0.48323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.403323 0.236116 -1.708 0.12178
## Age 0.014155 0.003925 3.606 0.00569 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3401 on 9 degrees of freedom
## Multiple R-squared: 0.591, Adjusted R-squared: 0.5455
## F-statistic: 13 on 1 and 9 DF, p-value: 0.005694
residuals3 <- residuals(regression3)
##third equation regresses $Y$ onto $\tilde{D}$ which is the residual from the second equation
data1 <- mutate(data,D_t= residuals3)
regerssion4 <- lm(Y~D_t, data=data1)
summary(regerssion4)##
## Call:
## lm(formula = Y ~ D_t, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4565 -1.9522 0.5426 2.0429 2.5456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.45455 0.70156 10.626 2.15e-06 ***
## D_t 0.01419 2.28040 0.006 0.995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.327 on 9 degrees of freedom
## Multiple R-squared: 4.302e-06, Adjusted R-squared: -0.1111
## F-statistic: 3.872e-05 on 1 and 9 DF, p-value: 0.9952
Answer: [coefficient of regression 2(model1) only adjust for age, the regression 4(model 3) isoloates the component of D unrerlated with age. when compre coefficient of model 1 and model 3, it shows importance of confounders when estimating treatment effects. The disparity between these coefficients illustrates the influence of omitted variable bias when Age is not considered]