| Statistic | Mean | St. Dev. | Min | Pctl(25) | Median | Pctl(75) | Max |
| X | 500.500 | 288.819 | 1 | 250.8 | 500.5 | 750.2 | 1,000 |
| PublicHousing | 29.370 | 8.801 | 0.000 | 23.669 | 29.827 | 35.221 | 60.000 |
| HealthStatus | 10.414 | 2.804 | 1.000 | 8.651 | 10.506 | 12.258 | 20.000 |
| Supply | 5.036 | 1.584 | 0.300 | 3.929 | 5.063 | 6.101 | 10.000 |
| WaitingTime | 25.019 | 6.139 | 5.000 | 20.922 | 25.128 | 29.400 | 46.000 |
| HealthBehavior | 12.527 | 4.078 | 0.000 | 9.788 | 12.515 | 15.318 | 25.000 |
| Stamp | 197.278 | 27.484 | 120.000 | 178.719 | 195.641 | 215.845 | 300.000 |
| ParentsHealthStatus | 9.891 | 3.280 | 1.000 | 7.733 | 9.797 | 12.111 | 20.000 |
| Age | 44.531 | 6.003 | 20.000 | 40.559 | 44.536 | 48.317 | 63.000 |
| Race | 2.500 | 1.119 | 1 | 1.8 | 2.5 | 3.2 | 4 |
| Education | 2.500 | 1.119 | 1 | 1.8 | 2.5 | 3.2 | 4 |
| MaritalStatus | 2.500 | 1.119 | 1 | 1.8 | 2.5 | 3.2 | 4 |
| Race_4 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| Race_3 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| Race_1 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| Race_2 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| MaritalStatus_3 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| MaritalStatus_2 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| MaritalStatus_4 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
| MaritalStatus_1 | 0.250 | 0.433 | 0 | 0 | 0 | 0.2 | 1 |
The mean of the dependent variable (HealthStatus) is 10.414, and the standard deviation is 2.804.
The mean of the policy variable (PublicHousing) is 29.370, and the standard deviation is 8.801.
| Dependent variable: | |
| HealthStatus | |
| Constant | -0.89*** |
| (0.19) | |
| PublicHousing | 0.30*** |
| (0.002) | |
| Race_1 | 0.17*** |
| (0.06) | |
| Race_2 | 0.14** |
| (0.06) | |
| Race_3 | 0.11* |
| (0.06) | |
| Race_4 | |
| Education | -0.04** |
| (0.02) | |
| Age | 0.05*** |
| (0.004) | |
| MaritalStatus_1 | -0.01 |
| (0.06) | |
| MaritalStatus_2 | 0.04 |
| (0.06) | |
| MaritalStatus_3 | -0.09 |
| (0.06) | |
| MaritalStatus_4 | |
| Observations | 1,000 |
| R2 | 0.94 |
| Adjusted R2 | 0.94 |
| Residual Std. Error | 0.67 (df = 990) |
| F Statistic | 1,830.58*** (df = 9; 990) |
| Note: | p<0.1; p<0.05; p<0.01 |
The estimated effect of public housing in this model is 0.30, which is statistically significant at the 0.01 level. This means that an additional month of public housing assistance is associated with an increase in health status by 0.30 points.
Correlated with the policy variable: The instrumental variable should have a strong correlation with the policy variable (in this case, PublicHousing). This ensures that the instrumental variable can help explain the variation in the policy variable and contribute to the identification of the causal effect of the policy variable on the dependent variable.
Correlated to the dependent variable only through the policy variable: The instrumental variable should be related to the dependent variable (in this case, HealthStatus) only through the policy variable. This means that the relationship between the instrumental variable and the dependent variable should be fully explained by the policy variable, with no other direct pathways.
Not correlated with the omitted variable: The instrumental variable should not be correlated with any omitted variables that might be affecting both the policy variable and the dependent variable. This ensures that the instrumental variable captures only the variation in the policy variable and helps address issues arising from the presence of omitted variables.
Criterion 1 (the correlation between the instrument and policy variable) can indeed be tested empirically using the correlation matrix.
Criterion 2 (IV correlation with the dependent variable only through the policy variable) cannot be fully tested empirically using the correlation matrix, as it requires theoretical reasoning and background knowledge.
Criterion 3 (IV should not be correlated with omitted variable) generally cannot be tested empirically due to the nature of omitted variables being unobserved. However, in this case, the “omitted” variable is included in the matrix… does that mean it’s a not-omitted variable?
In conclusion, criterion 1 can be tested empirically using the correlation matrix, while criterion 2 requires theoretical reasoning and background knowledge, and criterion 3 is generally not testable empirically, but can be in this theoretical matrix since it is included.
Supply:
Supply has a positive correlation with PublicHousing (r = 0.106), meeting the first criterion.
Supply has a weak correlation with HealthStatus (r = 0.065) and a positive correlation with PublicHousing (r = 0.106). Theoretically, we can argue that this correlation exists because an increase in supply of public housing leads to a higher number of people having public housing (policy variable), thus improving their health status.
Supply has a very weak correlation with HealthBehavior (r = 0.0136), which suggests that it is not correlated with the omitted variable.
WaitingTime:
WaitingTime has a positive correlation with PublicHousing (r = 0.550), meeting the first criterion.
WaitingTime has a moderate correlation with HealthStatus (r = 0.346) and a positive correlation with PublicHousing (r = 0.550). We can theorize that longer waiting times for public housing may result in individuals experiencing poorer living conditions while waiting, which could impact their health status due to the policy variable.
WaitingTime has a very weak negative correlation with HealthBehavior (r = -0.023), which suggests that it is not correlated with the omitted variable.
Stamp
Stamp has a weak negative correlation with PublicHousing (r = -0.011), which does not meet the first criterion.
Stamp has a very weak negative correlation with HealthStatus (r = -0.006) and a weak negative correlation with PublicHousing (r = -0.011). The correlation between Stamp and HealthStatus cannot be explained through the policy variable.
Stamp has a weak negative correlation with HealthBehavior (r = 0.027), which suggests that it is not correlated with the omitted variable.
ParentsHealthStatus:
ParentsHealthStatus has a weak positive correlation with PublicHousing (r = 0.071), meeting the first criterion.
ParentsHealthStatus has a weak positive correlation with HealthStatus (r = 0.083) and a weak positive correlation with PublicHousing (r = 0.071). We can theorize that parents’ health status might impact the need for public housing, and therefore, indirectly affect their children’s health status through the policy variable.
ParentsHealthStatus has a weak positive correlation with HealthBehavior (r = 0.103), which suggests that it is correlated with the omitted variable.
Supply and WaitingTime appear to be valid instruments based on this assessment.
| Dependent variable: | ||
| PublicHousing | ||
| (1) | (2) | |
| Constant | 20.861*** | 3.009 |
| (2.459) | (2.154) | |
| Supply | 0.604*** | |
| (0.175) | ||
| WaitingTime | 0.795*** | |
| (0.038) | ||
| Race_1 | 1.015 | 1.753*** |
| (0.784) | (0.656) | |
| Race_2 | -0.232 | 0.611 |
| (0.782) | (0.654) | |
| Race_3 | 0.954 | 1.297** |
| (0.785) | (0.656) | |
| Race_4 | ||
| Education | -0.237 | -0.247 |
| (0.248) | (0.207) | |
| Age | 0.119** | 0.132*** |
| (0.046) | (0.039) | |
| MaritalStatus_1 | -0.057 | -0.110 |
| (0.784) | (0.655) | |
| MaritalStatus_2 | 0.306 | 0.439 |
| (0.785) | (0.655) | |
| MaritalStatus_3 | 1.052 | 0.830 |
| (0.786) | (0.657) | |
| MaritalStatus_4 | ||
| Observations | 1,000 | 1,000 |
| R2 | 0.025 | 0.319 |
| Adjusted R2 | 0.016 | 0.313 |
| Residual Std. Error (df = 990) | 8.731 | 7.296 |
| F Statistic (df = 9; 990) | 2.802*** | 51.526*** |
| Note: | p<0.1; p<0.05; p<0.01 | |
I would choose WaitingTime as the instrumental variable.
WaitingTime has a stronger relationship with the policy variable PublicHousing (coefficient = 0.795, p < 0.01) compared to Supply (coefficient = 0.604, p < 0.01). The R2 for the model with WaitingTime is higher (0.313) compared to the model with Supply (0.016), indicating a better model fit. Additionally, the F-Statistic for Supply is below 10, likely indicating a weak instrument. Thus, WaitingTime is the more suitable instrumental variable.
| Dependent variable: | ||
| HealthStatus | ||
| (1) | (2) | |
| Constant | -0.885*** | 1.532* |
| (0.185) | (0.783) | |
| PublicHousing | 0.304*** | |
| (0.002) | ||
| phat_Waiting | 0.203*** | |
| (0.017) | ||
| Race_1 | 0.170*** | 0.272 |
| (0.060) | (0.231) | |
| Race_2 | 0.142** | 0.123 |
| (0.060) | (0.230) | |
| Race_3 | 0.106* | 0.208 |
| (0.060) | (0.231) | |
| Race_4 | ||
| Education | -0.039** | -0.060 |
| (0.019) | (0.073) | |
| Age | 0.053*** | 0.065*** |
| (0.004) | (0.014) | |
| MaritalStatus_1 | -0.014 | -0.025 |
| (0.060) | (0.231) | |
| MaritalStatus_2 | 0.044 | 0.089 |
| (0.060) | (0.231) | |
| MaritalStatus_3 | -0.085 | 0.022 |
| (0.060) | (0.232) | |
| MaritalStatus_4 | ||
| Observations | 1,000 | 1,000 |
| R2 | 0.943 | 0.168 |
| Adjusted R2 | 0.943 | 0.160 |
| Residual Std. Error (df = 990) | 0.671 | 2.569 |
| F Statistic (df = 9; 990) | 1,830.584*** | 22.145*** |
| Note: | p<0.1; p<0.05; p<0.01 | |
The effect of public housing on health status is positive after using the instrumental variable, with a coefficient of 0.203. It is statistically significant at p<0.01.
For each additional month spent in public housing, an individual’s health status increases by 0.203 units.
In the naive model, the coefficient for PublicHousing is 0.304, and it’s statistically significant at p<0.01. The IV model, however, estimates a smaller effect of 0.203, which is also statistically significant at p<0.01. This means that we were overestimating the effect of public housing on health status in the naive model. The IV model provides a more accurate estimate by accounting for the issue of the independent variable being correlated with the error term.
Imagine that you are explaining your results to a journalist. You want to provide a concrete example of the effect of public housing assistance. (5 points)
| Statistic | Value |
|---|---|
| Min. | 0.00000 |
| 1st Qu. | 23.66876 |
| Median | 29.82691 |
| Mean | 29.37035 |
| 3rd Qu. | 35.22123 |
| Max. | 60.00000 |
The average individual spends 29.37 months in public housing, while an individual in the top 25 percent (third quantile) spends 35.22 months.
# Coefficients from the IV model
coefficients <- second_stage_model$coefficients
# PublicHousing values
public_housing_avg <- 29.37
public_housing_top_25 <- 35.22
# Demographics
white <- 0
diploma <- 2
age_35 <- 35
single <- 0
# Health status for average individual
health_status_avg <- coefficients["(Intercept)"] + coefficients["phat_Waiting"] * public_housing_avg + coefficients["Education"] * diploma + coefficients["Age"] * age_35
# Health status for top-25 percent individual
health_status_top_25 <- coefficients["(Intercept)"] + coefficients["phat_Waiting"] * public_housing_top_25 + coefficients["Education"] * diploma + coefficients["Age"] * age_35
# Difference in health status
diff_health_status <- health_status_top_25 - health_status_avg
# Savings
savings <- diff_health_status * 1200
savings## (Intercept)
## 1428.045
Savings = $1,428.045
Will the omitted variable be correlated with the residual of stage 1? Will it be correlated with the predicted values of X1? (5 points)
No correlation with the predicted value.
Strong correlation with the residuals.