## Loading required package: pacman
## Warning: package 'pacman' was built under R version 3.6.3
Below I load the four dta files, assuming I’ll be merging them at some stage or will at least need portions of each at some point. Using the view command, the descriptions for the various items of data are very useful. After a skim of the research linked to this dataset, it appears that the hazardous nature of the sites are indicated by the “Mile” dataset’s ‘hrs_82’.
Sitecov is 1982 HRS sites w/ nonmissing house price data, 306 observations of which that have 1982 HRS above 28.5. Mile seems to be a similar dataset but drops 4 observations.
setwd("C:/Users/phili/Desktop/Econometrics III/coremetrics3/disc_sec/coremetrics3_discsec/Problem Sets")
sitecov <- read_dta("sitecovariates.dta")
allsites <- read_dta("allsites.dta")
allcov <- read_dta("allcovariates.dta")
mile <- read_dta("2miledata.dta")
#Found glimpse was a bit too crude to look at.
#head(sitecov)
#head(allsites)
#head(allcov)
#head(mile)
‘hrs_82’ and it’s visualized relationship with various measures available to us is of immediate interest. Introducing these measures into a scatterplot their respective median prices in both 1990(mdvalhs9) and 2000(mdvalhs0), with both looking to have quite a weak relationship.
HRS_HP90 <- ggplot(data = mile, aes(x=hrs_82, y=mdvalhs9)) +
geom_point(color="blue", alpha=0.5)
HRS_HP00 <- ggplot(data = mile, aes(x=hrs_82, y=mdvalhs0)) +
geom_point(color="blue", alpha=0.5)
HRS_HP90
HRS_HP00
#re78order <- sort(nsw$re78)
#df <- data.frame(x = re75order, y =re78order)
#ggplot(data = df, size = large) +
# geom_line(aes(x=1:nrow(df), y =x), color = "purple", size = 0.5) +
# geom_line(aes(x=1:nrow(df), y =y), color = "red", size = 0.5)+
# ylab("Earnings") +
# theme_bw()
It does not appear like HRS is correlated strongly with other Mile variables.
mile_Mat <- as_tibble(mile[,-43])
res <- cor(mile_Mat)
#install.packages("corrplot")
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.6.3
## corrplot 0.84 loaded
corrplot(res, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 90, tl.cex=0.6)
Houses listed on the National Priorities List prior to 2000 are associated with a 2 per cent lower value, on average, for the given sample. In terms of significance, the finding is highlight significant, suggesting this dummy variable explains little to none of the variation in our dependent variable, median house prices across tracts. The R-squared of 0.00 further highlights this point. Given that only 2.3 per cent of the overall sample is actually listed on this specific document, we may be comparing against too wide a range of the sample whereas a tighter sub-sample closer to some key threshhold region would be more appropriate.
logmedHval <- lm_robust(data=allsites, lnmdvalhs0 ~ npl2000)
export_summs(logmedHval)
Model 1 | |
(Intercept) | 11.72 *** |
(0.00) | |
npl2000 | -0.02 |
(0.02) | |
nobs | 42974 |
r.squared | 0.00 |
adj.r.squared | 0.00 |
statistic | 1.53 |
p.value | 0.22 |
df.residual | 42972.00 |
N | 42974.00 |
se_type | HC2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
summary(allsites$npl2000)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.02292 0.00000 1.00000
Given that only 3 per cent of the overall population of tracts include NPL listings, that places 985 sites on the list. Clustering by state changes the standard error quite dramatically, which is understandable since state plays an important role in the determination of effect. Ongoing state-specific shocks would be limited to specific regions, hence it is understandable that error terms within states would be correlated with one another, being subject to such shocks. As a result, when allowing the model to account for this interaction, the state error jumps up dramatically from 0.02 to 0.05.
library(lfe)
## Warning: package 'lfe' was built under R version 3.6.3
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
#Fit a linear model with multiple group fixed effects
FE_logmed <- felm(data=allsites, lnmdvalhs0 ~ npl2000 | 0 | 0 | statefips)
export_summs(FE_logmed)
Model 1 | |
(Intercept) | 11.72 *** |
(0.09) | |
npl2000 | -0.02 |
(0.05) | |
nobs | 42974 |
r.squared | 0.00 |
adj.r.squared | 0.00 |
sigma | 0.63 |
statistic | 1.11 |
p.value | 0.29 |
df | 42972.00 |
df.residual | 42972.00 |
logLik | -40804.46 |
AIC | 81614.92 |
BIC | 81640.93 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Using state fixed effects at this points results in a far lower point estimate of 0.00.
#Standard OLS with housing value control using Clustered Robust Standard Errors
lm_logm_H1 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 | 0 | 0 | statefips )
#lm_logm_H2 <- lm_robust(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + tothsun8)
#lm_logm_H3 <- lm_robust(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + tothsun8 + occupied80)
#lm_logm_H4 <- lm_robust(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + tothsun8 + occupied80 + ownocc8)
export_summs(lm_logm_H1, model.names=c("LM with cluster rob. SE"))
LM with cluster rob. SE | |
(Intercept) | 2.40 *** |
(0.44) | |
npl2000 | 0.04 |
(0.02) | |
lnmeanhs8 | 0.86 *** |
(0.04) | |
nobs | 42974 |
r.squared | 0.58 |
adj.r.squared | 0.58 |
sigma | 0.41 |
statistic | 29568.13 |
p.value | 0.00 |
df | 42971.00 |
df.residual | 42971.00 |
logLik | -22208.05 |
AIC | 44424.11 |
BIC | 44458.78 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Fixed Effects with housing value control using Clustered Robust Standard Errors
FE_logm_H1 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 | statefips | 0 | statefips )
export_summs(FE_logm_H1, model.names=c("Fixed Eff with cluster rob. SE"))
Fixed Eff with cluster rob. SE | |
npl2000 | -0.00 |
(0.02) | |
lnmeanhs8 | 0.81 *** |
(0.03) | |
nobs | 42974 |
r.squared | 0.70 |
adj.r.squared | 0.70 |
sigma | 0.34 |
statistic | 1906.11 |
p.value | 0.00 |
df | 42921.00 |
df.residual | 50.00 |
logLik | -15090.70 |
AIC | 30289.41 |
BIC | 30757.50 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Applying fixed effects at this point results in the estimated coefficient falling to 0.04 and becoming insignificiant.
#Standard OLS with controls for Economic Conditions using clustered robust standard errors
lm_logm_E1 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 | 0 | 0 | statefips)
lm_logm_E2 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 | 0 | 0 | statefips)
lm_logm_E3 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 | 0 | 0 | statefips)
lm_logm_E4 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 | 0 | 0 | statefips)
lm_logm_E5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 | 0 | 0 | statefips)
export_summs(lm_logm_E1, lm_logm_E2, lm_logm_E3, lm_logm_E4, lm_logm_E5, model.names=c("Basic", "+Avg Inc", "+ Poverty Rate", "+Unemployment", "+Welfare"))
Basic | +Avg Inc | + Poverty Rate | +Unemployment | +Welfare | |
(Intercept) | 2.40 *** | 3.40 *** | 3.03 *** | 3.05 *** | 3.02 *** |
(0.44) | (0.66) | (0.68) | (0.77) | (0.71) | |
npl2000 | 0.04 | 0.05 | 0.05 * | 0.06 * | 0.06 ** |
(0.02) | (0.03) | (0.02) | (0.02) | (0.02) | |
lnmeanhs8 | 0.86 *** | 0.74 *** | 0.76 *** | 0.76 *** | 0.77 *** |
(0.04) | (0.07) | (0.07) | (0.07) | (0.07) | |
avhhin8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | |
(0.00) | (0.00) | (0.00) | (0.00) | ||
povrat8 | 0.47 * | 0.48 | -0.13 | ||
(0.21) | (0.31) | (0.34) | |||
unemprt8 | -0.08 | -0.84 | |||
(0.82) | (0.76) | ||||
welfare8 | 1.31 ** | ||||
(0.49) | |||||
nobs | 42974 | 42974 | 42974 | 42974 | 42974 |
r.squared | 0.58 | 0.60 | 0.60 | 0.60 | 0.61 |
adj.r.squared | 0.58 | 0.60 | 0.60 | 0.60 | 0.61 |
sigma | 0.41 | 0.40 | 0.39 | 0.39 | 0.39 |
statistic | 29568.13 | 21292.25 | 16203.28 | 12963.25 | 11122.40 |
p.value | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
df | 42971.00 | 42970.00 | 42969.00 | 42968.00 | 42967.00 |
df.residual | 42971.00 | 42970.00 | 42969.00 | 42968.00 | 42967.00 |
logLik | -22208.05 | -21232.65 | -21044.87 | -21043.94 | -20664.61 |
AIC | 44424.11 | 42475.30 | 42101.74 | 42101.88 | 41345.23 |
BIC | 44458.78 | 42518.64 | 42153.75 | 42162.56 | 41414.58 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Fixed Effects with controls for Economic Conditions using clustered robust standard errors
FE_logm_E1 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 | statefips | 0 | statefips)
FE_logm_E2 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 | statefips | 0 | statefips)
FE_logm_E3 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 | statefips | 0 | statefips)
FE_logm_E4 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 | statefips | 0 | statefips)
FE_logm_E5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 | statefips | 0 | statefips)
export_summs(FE_logm_E1, FE_logm_E2, FE_logm_E3, FE_logm_E4, FE_logm_E5, model.names=c("Basic", "+Avg Inc", "+ Poverty Rate", "+Unemployment", "+Welfare"))
Basic | +Avg Inc | + Poverty Rate | +Unemployment | +Welfare | |
npl2000 | -0.00 | 0.01 | 0.01 | 0.02 | 0.02 |
(0.02) | (0.03) | (0.03) | (0.03) | (0.02) | |
lnmeanhs8 | 0.81 *** | 0.64 *** | 0.67 *** | 0.63 *** | 0.63 *** |
(0.03) | (0.05) | (0.05) | (0.05) | (0.06) | |
avhhin8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | |
(0.00) | (0.00) | (0.00) | (0.00) | ||
povrat8 | 0.35 | 0.73 *** | 0.79 *** | ||
(0.19) | (0.20) | (0.11) | |||
unemprt8 | -1.91 *** | -1.85 *** | |||
(0.16) | (0.26) | ||||
welfare8 | -0.13 | ||||
(0.38) | |||||
nobs | 42974 | 42974 | 42974 | 42974 | 42974 |
r.squared | 0.70 | 0.72 | 0.72 | 0.73 | 0.73 |
adj.r.squared | 0.70 | 0.72 | 0.72 | 0.73 | 0.73 |
sigma | 0.34 | 0.33 | 0.33 | 0.32 | 0.32 |
statistic | 1906.11 | 2105.30 | 2084.70 | 2123.85 | 2086.46 |
p.value | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
df | 42921.00 | 42920.00 | 42919.00 | 42918.00 | 42917.00 |
df.residual | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 |
logLik | -15090.70 | -13283.15 | -13145.11 | -12567.00 | -12562.56 |
AIC | 30289.41 | 26676.30 | 26402.21 | 25248.00 | 25241.13 |
BIC | 30757.50 | 27153.06 | 26887.64 | 25742.10 | 25743.89 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Without fixed effects, and using housing characteristics, demographics and controls for economic status, the regression reports a estimated average increase of 6 per cent in the median value of housing in 2000 as a result of being on the NPL list. This effect is measured as being highly significant too, Including fixed effects drops the value down to an insignificant 0.02.
Additional controls are availalbe for whether head of household is female and further education related measures, but concerned about including too many controls arbitrarily as this could simply result in an irrelevant variable problem where the standard error becomes too large to infere significance. With controls and fixed effects, the final final regression suggests a shift to the NPL list results in an average treatment effect of 8 per cent on median property values in 2000.
#Standard OLS with controls for Economic Conditions & Demographics using wClustered Robuster Standard Errors
lm_logm_E5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8)
lm_logm_ED2 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 | 0| 0 | statefips)
lm_logm_ED3 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 | 0| 0 | statefips)
lm_logm_ED4 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 | 0| 0 | statefips)
lm_logm_ED5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 | 0| 0 | statefips)
#lm_logm_ED6 <- lm_robust(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8)
lm_logm_ED7 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8 | 0| 0 | statefips)
export_summs(lm_logm_E5, lm_logm_ED2, lm_logm_ED3, lm_logm_ED4, lm_logm_ED5, lm_logm_ED7, model.names=c("Economic BK", "+Pop Density", "+% Black", "+% Hispanic", "+Under 18", "+% With BA"))
Economic BK | +Pop Density | +% Black | +% Hispanic | +Under 18 | +% With BA | |
(Intercept) | 3.02 *** | 3.33 *** | 3.42 *** | 3.40 *** | 3.72 *** | 4.01 *** |
(0.05) | (0.77) | (0.74) | (0.76) | (0.68) | (0.72) | |
npl2000 | 0.06 *** | 0.12 *** | 0.11 *** | 0.11 *** | 0.12 *** | 0.13 *** |
(0.01) | (0.02) | (0.02) | (0.02) | (0.02) | (0.02) | |
lnmeanhs8 | 0.77 *** | 0.73 *** | 0.72 *** | 0.72 *** | 0.71 *** | 0.68 *** |
(0.00) | (0.07) | (0.07) | (0.07) | (0.07) | (0.07) | |
avhhin8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
povrat8 | -0.13 *** | -0.20 | -0.06 | -0.02 | -0.16 | -0.35 |
(0.03) | (0.30) | (0.31) | (0.21) | (0.24) | (0.24) | |
unemprt8 | -0.84 *** | -0.70 | -0.67 | -0.69 | -0.58 | -0.44 |
(0.06) | (0.66) | (0.65) | (0.60) | (0.59) | (0.57) | |
welfare8 | 1.31 *** | 0.67 | 0.96 * | 1.00 * | 1.22 ** | 1.38 ** |
(0.05) | (0.39) | (0.38) | (0.44) | (0.47) | (0.49) | |
pop_den8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | ||
shrblk8 | -0.23 *** | -0.25 ** | -0.20 * | -0.21 * | ||
(0.06) | (0.09) | (0.09) | (0.09) | |||
shrhsp8 | -0.08 | 0.03 | 0.06 | |||
(0.22) | (0.21) | (0.22) | ||||
child8 | -0.74 *** | -0.61 ** | ||||
(0.19) | (0.19) | |||||
ba_or_better8 | 0.38 ** | |||||
(0.14) | ||||||
nobs | 42974 | 42974 | 42974 | 42974 | 42974 | 42974 |
r.squared | 0.61 | 0.65 | 0.66 | 0.66 | 0.66 | 0.67 |
adj.r.squared | 0.61 | 0.65 | 0.66 | 0.66 | 0.66 | 0.67 |
sigma | 0.39 | 0.37 | 0.37 | 0.37 | 0.36 | 0.36 |
statistic | 11122.40 | 11522.38 | 10273.83 | 9141.66 | 8448.16 | 7760.40 |
p.value | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
df | 42967.00 | 42966.00 | 42965.00 | 42964.00 | 42963.00 | 42962.00 |
df.residual | 42967.00 | 42966.00 | 42965.00 | 42964.00 | 42963.00 | 42962.00 |
logLik | -20664.61 | -18097.02 | -17831.71 | -17816.91 | -17441.29 | -17292.64 |
AIC | 41345.23 | 36212.04 | 35683.42 | 35655.83 | 34906.59 | 34611.28 |
BIC | 41414.58 | 36290.06 | 35770.10 | 35751.18 | 35010.61 | 34723.97 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Fixed Effects with controls for Economic Conditions & Demographics using Clustered Robuster Standard Errors
FE_logm_E5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 | statefips | 0 | statefips)
FE_logm_ED2 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 | statefips | 0 | statefips)
FE_logm_ED3 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 | statefips | 0 | statefips)
FE_logm_ED4 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 | statefips | 0 | statefips)
FE_logm_ED5 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 | statefips | 0 | statefips)
#FE_logm_ED6 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 | statefips | 0 | statefips)
FE_logm_ED7 <- felm(data=allsites, lnmdvalhs0 ~ npl2000 + lnmeanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8 | statefips | 0 | statefips)
export_summs(FE_logm_E5, FE_logm_ED2, FE_logm_ED3, FE_logm_ED4, FE_logm_ED5, FE_logm_ED7, model.names=c("Economic BK", "+Pop Density", "+% Black", "+% Hispanic", "+Under 18", "+% With BA"))
Economic BK | +Pop Density | +% Black | +% Hispanic | +Under 18 | +% With BA | |
npl2000 | 0.02 | 0.06 ** | 0.06 ** | 0.06 ** | 0.07 ** | 0.08 ** |
(0.02) | (0.02) | (0.02) | (0.02) | (0.02) | (0.02) | |
lnmeanhs8 | 0.63 *** | 0.60 *** | 0.60 *** | 0.60 *** | 0.59 *** | 0.54 *** |
(0.06) | (0.04) | (0.04) | (0.04) | (0.04) | (0.04) | |
avhhin8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
povrat8 | 0.79 *** | 0.63 *** | 0.67 *** | 0.67 *** | 0.54 *** | 0.32 ** |
(0.11) | (0.08) | (0.09) | (0.09) | (0.10) | (0.10) | |
unemprt8 | -1.85 *** | -1.64 *** | -1.60 *** | -1.60 *** | -1.49 *** | -1.36 *** |
(0.26) | (0.20) | (0.20) | (0.19) | (0.18) | (0.21) | |
welfare8 | -0.13 | -0.42 | -0.25 | -0.25 | -0.05 | 0.13 |
(0.38) | (0.27) | (0.25) | (0.23) | (0.24) | (0.23) | |
pop_den8 | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | 0.00 *** | |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | ||
shrblk8 | -0.11 * | -0.11 * | -0.07 | -0.09 | ||
(0.05) | (0.05) | (0.06) | (0.06) | |||
shrhsp8 | -0.00 | 0.09 | 0.11 | |||
(0.11) | (0.11) | (0.10) | ||||
child8 | -0.53 *** | -0.34 ** | ||||
(0.13) | (0.12) | |||||
ba_or_better8 | 0.51 *** | |||||
(0.14) | ||||||
nobs | 42974 | 42974 | 42974 | 42974 | 42974 | 42974 |
r.squared | 0.73 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 |
adj.r.squared | 0.73 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 |
sigma | 0.32 | 0.31 | 0.31 | 0.31 | 0.31 | 0.31 |
statistic | 2086.46 | 2221.20 | 2192.82 | 2155.61 | 2151.57 | 2159.91 |
p.value | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
df | 42917.00 | 42916.00 | 42915.00 | 42914.00 | 42913.00 | 42912.00 |
df.residual | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 |
logLik | -12562.56 | -11287.21 | -11214.03 | -11214.00 | -10973.29 | -10643.12 |
AIC | 25241.13 | 22692.41 | 22548.05 | 22550.00 | 22070.58 | 21412.25 |
BIC | 25743.89 | 23203.85 | 23068.15 | 23078.77 | 22608.02 | 21958.35 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
This would require that either the assignment of tracts to the NPL list was randomly assigned, which would likely serve to make the point of the list in the first place meaningless, or, more appropriately for this case, we have implemented the Conditional Independence Assumption (CIA).
Under CIA, supposing we have controlled for all the relevant variables that would otherwise contribte towards omitted variable bias, then we can interpret the coefficient of the NPL indicator as an unbiased estimate of the true model’s effect of NPL listings on local property values.
However, if any relevant observable or underobserable variable is omitted, then CIA cannot hold and the estimate suffers from endogeneity (a violation of the third assumption of exogeneity of our regressors, key in the application of ordinary least squares regression).
As displayed in the code below, the initial comparison of treated and control groups in the AllCov dataset displays an imbalance with NPL tracts represented by a greater percentage of occupied units, a smaller share of black and hispanic communities, more children and less degrees.
Following 1.7, I repeated the same exercise using the ‘SiteCovariates’ dataset using three definitions of treated and control groups.
Conclusion: As we converge upon the threshhold, the difference between our treated and control group narrows dramatically. This is a very intuitive result, given that sites right upon the cusp of being added to the NPL list are very likely to be next to identical to those who barely made the cut. Especially so, given the high demand for getting on these lists and the limited resources the EPA had available, this would leave the threshold region highly populated as a result. I would posit that the final subsample using observations neighboring the threshold of 28.5 is a relatively more balanced set, given the lack of significant coefficients.
#OLS using controls for Economic Conditions & Demographics with White Standard Errors
#1.6 requires check for balanced data between NPL=1 and NPL=0 for AllCov Dataset
lm_NPL_allcov <- lm_robust(data=allcov, npl2000 ~ meanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8)
#1.7, first bulletpoint requests a repeat of this for SiteCov Dataset
lm_NPL_sitecov <- lm_robust(data=sitecov, npl2000 ~ meanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8)
#Creating dummy variable for hrs_82, dummy=1 if hrs_80>=28.5, I repeat this same balance check on the binded dummies.
HRS_High <- ifelse(sitecov$hrs_82>=28.5,1,0)
sitecov_D <- cbind(sitecov,HRS_High)
#Regression using new dummy for critical value of HRS
lm_HRS_sitecov <- lm_robust(data=sitecov_D, HRS_High ~ meanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8)
#Filtering out high and low values for HRS to get thinner margin analysis;
sitecov_DD <- filter(sitecov, hrs_82>=16.5 & hrs_82<=40.5)
HRS_High_DD <- ifelse(sitecov_DD$hrs_82>=28.5,1,0)
sitecov_DD_R <- cbind(sitecov_DD,HRS_High_DD)
#Regression using new dummy for more concentrated critical value of HRS
lm_HRS_sitecov_DD <- lm_robust(data=sitecov_DD_R, HRS_High_DD ~ meanhs8 + avhhin8 + povrat8 + unemprt8 + welfare8 + pop_den8 + shrblk8 + shrhsp8 + child8 + ba_or_better8)
export_summs(lm_NPL_allcov, lm_NPL_sitecov, lm_HRS_sitecov, lm_HRS_sitecov_DD, model.names=c("AllCov NPL", "SiteCov NPL", "SiteCov HRS", "Reduced SiteCov HRS"))
AllCov NPL | SiteCov NPL | SiteCov HRS | Reduced SiteCov HRS | |
(Intercept) | 0.03 *** | 0.69 *** | 0.51 ** | 0.42 |
(0.00) | (0.16) | (0.17) | (0.27) | |
meanhs8 | -0.00 | -0.00 | 0.00 | 0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | |
avhhin8 | -0.00 ** | -0.00 | -0.00 | -0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | |
povrat8 | -0.01 | 0.15 | 0.34 | 0.49 |
(0.01) | (0.58) | (0.60) | (0.92) | |
unemprt8 | 0.04 | -1.17 | -1.17 | 0.18 |
(0.02) | (0.63) | (0.62) | (1.06) | |
welfare8 | 0.01 | 0.40 | 0.32 | -0.60 |
(0.02) | (0.60) | (0.59) | (0.85) | |
pop_den8 | -0.00 *** | -0.00 ** | -0.00 * | -0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | |
shrblk8 | -0.01 *** | -0.25 | -0.30 | 0.02 |
(0.00) | (0.17) | (0.17) | (0.27) | |
shrhsp8 | -0.02 *** | 0.19 | 0.12 | 0.06 |
(0.00) | (0.18) | (0.20) | (0.52) | |
child8 | 0.03 *** | 0.27 | 0.55 | 0.60 |
(0.01) | (0.41) | (0.41) | (0.58) | |
ba_or_better8 | -0.04 *** | 1.21 ** | 1.16 ** | 1.00 |
(0.01) | (0.39) | (0.40) | (0.60) | |
nobs | 48245 | 487 | 487 | 227 |
r.squared | 0.01 | 0.07 | 0.07 | 0.04 |
adj.r.squared | 0.01 | 0.05 | 0.05 | -0.01 |
statistic | 56.40 | 4.36 | 4.45 | 1.16 |
p.value | 0.00 | 0.00 | 0.00 | 0.32 |
df.residual | 48234.00 | 476.00 | 476.00 | 216.00 |
N | 48245.00 | 487.00 | 487.00 | 227.00 |
se_type | HC2.00 | HC2.00 | HC2.00 | HC2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
We assume that we can extract part of the good variation in \(\text{D}_{i}\) (using our IV, \(\text{Z}_{i}\)) and then use this good part of \(\text{D}_{i}\) to estimate the effect of \(\text{D}_{i}\) on \(\text{Y}_{i}\). This allows us to discard the bad variation in \(\text{D}_{i}\). The main concern is that \(\text{D}_{i}\) covaries with the disturbance term, hence by finding a variable uncorrelated with both the dependent variable and disturbances, we can project a fitted series of observations that stands in for \(\text{D}_{i}\) and satisfies our assumption of exogeneity for the covariate of interest.
On a basic level, this would require that the covariance between HRS_80 and our dependent variable, median house value, is zero. Additionally, HRS_82 would need to be able to generate a good fit for NPL during the first stage regression while remaining uncorrelated with the disturbance term of the model. Should these three conditions hold, we’d be left with a reasonable instrumental variable.
A cutoff of 28.5 on HRS_82 suggests a sharp regression discontinuity. It could simply be the case however that the relationship we are trying to evaluate is non-linear and a non-linearity is being mistaken as a sign of regression discontinuity. While this does not mean we assume away the non-linear case, it does imply a more careful approach is required to validate the assumption of a our study being a case of discontinuity.
We therefore assume that there is a neighborhood around the discontinuity where we have that;
\(E[Y_i|x_0-\delta<x_i<x_0] %~~% E[Y_{0i}|x_i=x_0]\)
\(E[Y_i|x_0<x_i<x_0+\delta] %~~% E[Y_{1i}|x_i=x_0]\)
This ensures that linear or non-linear, some discontinuous change in the treatment effect occurs upon this sharp threshold. We are therefore assuming that there is differing slopes and differing intercepts between the pre- and post- threshold regressions using a set of linear appromixations to estimate the sudden change in treatment effects.
\(E[Y_{0i} | X_{i} = x]\) and \(E[Y_{1i} | X_{i} = x]\) are linear in \(x\),\(E[Y_{0i}|X_{i}] = \alpha_0 + \beta_0 X_{i}\) and \(E[Y_{1i} | {X}_{i}] = \alpha_1 + \beta_1 X_{i}\)
Now treatment effects are assumed to vary with \(X_{i}\). This imples that \(E[Y_{1i}-Y_{0i}|X_{i}] = \alpha_1 - \alpha_0 + (\beta_1 - \beta_0)X_i\)
Given treatment, \(D_i\),
\[ \begin{align} E[Y_i|X_i,D_i] = D_{i} E[Y_{1i}|X_{i}] + (1- D_{i}) E[Y_{0i} | X_{i}] \end{align} \]
\[ \begin{align} = \alpha_0 + \beta_0 X_i + (\alpha_1 - \alpha_0)D_i + (\beta_1 - \beta_0)D_i X_i \end{align} \]
\[ \begin{align} =E[Y_{i}|X_{i},D_{i}]= \widetilde{\alpha} + \beta_0 \widetilde{X_{i}} + \tau D_{i} + \widetilde{\beta}D_{i}\widetilde{X_{i}} \end{align} \]
Additionally, it assumes that the discontinuity is triggered sharply in the first place, whereas it may be a specific region making the problem a Fuzzy RD. While considering multiple critical thresholds may also be another point to consider, given the design of this list, this consideration seems far less relevant.
In this case we’d be assuming ;
\[ \begin{align} 0 < \lim_{x\downarrow c} Pr(D_{i} = 1 | X_{i} = x) - \lim_{x\uparrow c} Pr(D_{i} = 1 | X_{i} = x) < 1 \end{align} \]
The first fact is more so a product of the pressure the EPA was under in terms of resources, but does help identify whyand where this cutoff was implemented. It also signals there is likely no distinguising difference between values slightly below and above 28.5 which would suggests an analysis upon the margin of the threshold would be valid.
The second fact suggests very little risk of selection bias, given that HRS was not an observed value when agents were making decisions. This allows it to function well as an IV, concerning a lack of covariance with the error term. It also is unlikely to have influenced median house values in 2000 for the reason that it was not an observable factor to price into the market. This suggests the covariance of our IV between the disturbance and the dependent variable is possibly zero. This makes HRS very enticing as a potential IV!
Lastly, the EPA documentation emphasizes potential measurement error given that HRS are an imperfect scoring measure for how hazardous any particular site was. This is also of use as it further blends observations upon the threshold together into a balanced set, as reflected by the fourth stage of analysis in question 1.7. This enhances the validity of an RD analysis. Specifically, a fuzzy RD analysis since it seemed to be up to luck who gets on either side of the threshhold, given the inaccuracy of HRS.
IV regression suggests a 31 per cent average increase in median property values.Further below I apply this same IV process manually, using the 2SLS process.
p_load(magrittr)
#mile %<>% mutate(lnmdvalhs0=log(mdvalhs0))
lnmdvalhs0 = log(mile$mdvalhs0)
mile_ln = cbind(mile, lnmdvalhs0)
IVreg <- iv_robust(lnmdvalhs0 ~ npl2000 | hrs_82 ,data=mile_ln, clusters=statefips)
#FULL SET OF CONTROLS VERSION
#IVreg <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr | hrs_82 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr ,data=mile, clusters=statefips)
#Version with Fixed Effects
#IVreg <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr | hrs_82 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr ,data=mile, clusters=statefips, fixed_effects=statefips)
export_summs(IVreg)
Model 1 | |
(Intercept) | 11.43 *** |
(0.11) | |
npl2000 | 0.31 ** |
(0.09) | |
nobs | 483 |
r.squared | 0.01 |
adj.r.squared | 0.01 |
df.residual | 481.00 |
N | 483.00 |
se_type | CR2.00 |
statistic | 13.17 |
p.value | 0.00 |
statistic.weakinst | |
p.value.weakinst | |
statistic.endogeneity | |
p.value.endogeneity | |
statistic.overid | |
p.value.overid | |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Using the three steps of the process, where;
Estimate the effect of the instrument \(Z_i\) on our endogenous variable \(D_i\) and (predetermined) covariates \(X_i\). Save \(\hat{D}_i\).
The reduced form regresses the outcome \(Y_i\) (LHS of the second stage) on our instrument \(Z_i\) and covariates \(X_i\) (RHS of the first stage).
Estimate model we wanted—but only using the variation in \(D_i\) that correlates with \(Z_i\), i.e. \(\hat{D}_i\).
I find the same final estimated coefficient as the IV_robust approach initally generated, confirming its validity from a coding standpoint. Further below I capture the ratio of the coefficients of \(Z_i\) to demonstrate success!
#First Stage (FULL CONTROLS)
#Reg111_FS <- lm_robust(npl2000 ~ hrs_82 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr, clusters=statefips, data=mile_ln)
Reg111_FS <- lm_robust(npl2000 ~ hrs_82, clusters=statefips, data=mile_ln)
export_summs(Reg111_FS)
Model 1 | |
(Intercept) | 0.00 |
(0.04) | |
hrs_82 | 0.02 *** |
(0.00) | |
nobs | 483 |
r.squared | 0.56 |
adj.r.squared | 0.56 |
statistic | 460.21 |
p.value | 0.00 |
df.residual | 14.83 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Reduced Form
Reg111_RF <- lm_robust(lnmdvalhs0 ~ hrs_82, clusters=statefips, data=mile_ln)
export_summs(Reg111_RF)
Model 1 | |
(Intercept) | 11.43 *** |
(0.11) | |
hrs_82 | 0.01 ** |
(0.00) | |
nobs | 483 |
r.squared | 0.05 |
adj.r.squared | 0.04 |
statistic | 12.70 |
p.value | 0.00 |
df.residual | 14.83 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Second Stage
phat <- Reg111_FS$fitted.values
mile_ln <- cbind(mile_ln, phat)
#mile_ln <- mile_ln[,-53]
Reg111_SS <- lm_robust(lnmdvalhs0 ~ phat, clusters=statefips, data=mile_ln)
export_summs(Reg111_SS)
Model 1 | |
(Intercept) | 11.43 *** |
(0.11) | |
phat | 0.31 ** |
(0.09) | |
nobs | 483 |
r.squared | 0.05 |
adj.r.squared | 0.04 |
statistic | 12.70 |
p.value | 0.00 |
df.residual | 14.83 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Success! This matches the 31 per cent rate the IV_robust and manual process of 2SLS generated.
#Taking the ratio of the two estimated coefficients from the first stage and reduced form,
est_coeff <- Reg111_RF[["coefficients"]][["hrs_82"]]/Reg111_FS[["coefficients"]][["hrs_82"]]
est_coeff
## [1] 0.3091575
Applying fixed effects to the previous code, my treatment effect is now reduced by 50 per cent and is insignificant. Adding additional controls would be very important. Treatment was not randomly assigned so we must instead rely upon CIA. In this case, suggesting that CIA is satisfied with no control variables (suggesting X in our standard CIA write-up is an empty set) would likely not hold up in a rigorous acamdemic evaluation. Instead, I would include the use of housing characteristics, economic conditions and demographics more accurately trying to capture the treatment effect through the minimization of selection bias.
IVreg_FE <- iv_robust(lnmdvalhs0 ~ npl2000 | hrs_82 ,data=mile_ln, clusters=statefips, fixed_effects=statefips)
#IVreg_FE <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr | hrs_82 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr +old8_nbr + ba_or_better8_nbr ,data=mile, clusters=statefips, fixed_effects=statefips)
export_summs(IVreg_FE)
Model 1 | |
npl2000 | 0.16 |
(0.08) | |
nobs | 483 |
r.squared | 0.40 |
adj.r.squared | 0.34 |
df.residual | 442.00 |
N | 483.00 |
se_type | CR2.00 |
statistic | |
p.value | |
statistic.weakinst | |
p.value.weakinst | |
statistic.endogeneity | |
p.value.endogeneity | |
statistic.overid | |
p.value.overid | |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Applying 2SLS instead of the direct IV regression above, I find an insignificant result of 0.16 per cent again. The first 2SLS without Fixed Effects found the following values for the estimated effect of NPL listings.
Reduced Form Est 0.0062383 First-Stage Esti 0.0201783 Second-Stage Est 0.3091575
The second 2SLS with fixed effects left the estimates coefficient for the first stage of the process largely unchanged, it appears that the reduction in the reduced form estimate by approximately a half directly led to the second-stage estimate for the coefficient of interest being reduced by a half as well. This would suggest that while on average there may be no relationship between HS_82 and median house values in 2000, there may have been some specific states that inferred some specific relationship between the two items.
Additionally, it could simply be the case that including fixed effects introduced enough unique variation in the data that the spuriously small correlation our IV shares with our dependent variable drifted slightly out of frame. If this is the case, this would suggest our results are extremely sensitive when applying fixed effects to a sample of 487 observations.
Reduced Form Est 0.0032926 First-Stage Esti 0.0199812 Second-Stage Est 0.1647854
#First Stage
Reg111_FS_FE <- lm_robust(npl2000 ~ hrs_82, clusters=statefips, fixed_effects = statefips, data=mile)
export_summs(Reg111_FS_FE)
Model 1 | |
hrs_82 | 0.02 *** |
(0.00) | |
nobs | 483 |
r.squared | 0.61 |
adj.r.squared | 0.58 |
statistic | |
p.value | |
df.residual | 10.14 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Reduced Form
Reg111_RF_FE <- lm_robust(lnmdvalhs0 ~ hrs_82, clusters=statefips, fixed_effects = statefips, data=mile_ln)
export_summs(Reg111_RF_FE)
Model 1 | |
hrs_82 | 0.00 |
(0.00) | |
nobs | 483 |
r.squared | 0.40 |
adj.r.squared | 0.35 |
statistic | |
p.value | |
df.residual | 10.14 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Second Stage
phat_FE <- Reg111_FS_FE$fitted.values
mile_ln <- cbind(mile_ln, phat_FE)
#mile_ln <- mile_ln[,-53]
Reg111_SS_FE <- lm_robust(lnmdvalhs0 ~ phat_FE, clusters=statefips, fixed_effects = statefips, data=mile_ln)
export_summs(Reg111_SS_FE)
Model 1 | |
phat_FE | 0.16 |
(0.07) | |
nobs | 483 |
r.squared | 0.40 |
adj.r.squared | 0.35 |
statistic | |
p.value | |
df.residual | 10.14 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Below is the calculate of these coefficient ratios for the second-stage coefficients.
#Taking the ratio of the two estimated coefficients from the first stage and reduced form,
est_coeff <- Reg111_RF[["coefficients"]][["hrs_82"]]/Reg111_FS[["coefficients"]][["hrs_82"]]
Reg111_RF[["coefficients"]][["hrs_82"]]
## [1] 0.006238263
Reg111_FS[["coefficients"]][["hrs_82"]]
## [1] 0.02017827
est_coeff
## [1] 0.3091575
Reg111_RF_FE[["coefficients"]][["hrs_82"]]
## [1] 0.003292613
Reg111_FS_FE[["coefficients"]][["hrs_82"]]
## [1] 0.01998122
#Taking the ratio of the two estimated coefficients from the first stage and reduced form,
est_coeff2 <- Reg111_RF_FE[["coefficients"]][["hrs_82"]]/Reg111_FS_FE[["coefficients"]][["hrs_82"]]
est_coeff2
## [1] 0.1647854
Changing to use dummies for HRS instead with no fixed effects, I use the following code for the transformation.
#Similarly to 1.7, I split my data with dummy = 1 including 28.5
HRS_High_M <- ifelse(mile_ln$hrs_82>=28.5,1,0)
mile_ln_D <- cbind(mile_ln,HRS_High_M)
The results change slightly, increasing from 0.31 to 0.22 though less significant now. The IVrobust and 2SLS methods generate matching values, suggesting correct implementation.
IVreg_D <- iv_robust(lnmdvalhs0 ~ npl2000 | HRS_High_M ,data=mile_ln_D, clusters=statefips)
export_summs(IVreg_D)
Model 1 | |
(Intercept) | 11.49 *** |
(0.10) | |
npl2000 | 0.22 * |
(0.08) | |
nobs | 483 |
r.squared | 0.03 |
adj.r.squared | 0.02 |
df.residual | 481.00 |
N | 483.00 |
se_type | CR2.00 |
statistic | 8.14 |
p.value | 0.01 |
statistic.weakinst | |
p.value.weakinst | |
statistic.endogeneity | |
p.value.endogeneity | |
statistic.overid | |
p.value.overid | |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#First Stage
Reg111_FS_D <- lm_robust(npl2000 ~ HRS_High_M, clusters=statefips, data=mile_ln_D)
export_summs(Reg111_FS_D)
Model 1 | |
(Intercept) | 0.16 *** |
(0.03) | |
HRS_High_M | 0.83 *** |
(0.03) | |
nobs | 483 |
r.squared | 0.74 |
adj.r.squared | 0.74 |
statistic | 598.69 |
p.value | 0.00 |
df.residual | 15.56 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Reduced Form
Reg111_RF_D <- lm_robust(lnmdvalhs0 ~ HRS_High_M, clusters=statefips, data=mile_ln_D)
export_summs(Reg111_RF_D)
Model 1 | |
(Intercept) | 11.53 *** |
(0.09) | |
HRS_High_M | 0.18 * |
(0.06) | |
nobs | 483 |
r.squared | 0.03 |
adj.r.squared | 0.03 |
statistic | 7.76 |
p.value | 0.01 |
df.residual | 15.56 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Second Stage
phat_D <- Reg111_FS_D$fitted.values
mile_ln_D <- cbind(mile_ln_D, phat_D)
#mile_ln_D <- mile_ln_D[,-57]
Reg111_SS_D <- lm_robust(lnmdvalhs0 ~ phat_D, clusters=statefips, data=mile_ln_D)
export_summs(Reg111_SS_D)
Model 1 | |
(Intercept) | 11.49 *** |
(0.10) | |
phat_D | 0.22 * |
(0.08) | |
nobs | 483 |
r.squared | 0.03 |
adj.r.squared | 0.03 |
statistic | 7.76 |
p.value | 0.01 |
df.residual | 15.59 |
N | 483.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Firstly, for this sharp RD the running variable must be separated into an assortment of bins. The min value of HRS is 0.00 and the max value is 74.16 in the mile dataset (483 obs), the larger samples are unavailable due to reasons including missing house price data and no HRS information being available. For the sake of distinct bins, I set a max of 74.1 and split my bins into thirteen bins, each representing a 5.7 interval of HRS scores from 0 to 74.1. To pinpoint each average to its respective bin, I run the midpoint of HRS bins across the x-axis, resulting in a balanced visualization of the potential discontinuities. The red line is held fixed at HRS 28.5 to highlight our crossing point wherein no overlap occurs.
#Data: mile_ln, Dependent: lnmdvalhs0, Regressor: npl2000 RunningVariable: hrs_82
# Bins:
split <- 5.7
bin <- seq(0, 74.1, split)
midpoint <- seq(2.85, 71.25, split)
#2. Place each observation into a block via its fitted p-score.
bin_HRS = cut(mile_ln$hrs_82, breaks = bin, labels = FALSE)
mile_ln_bin <- cbind(mile_ln, bin_HRS)
avg_Y <- aggregate(mile_ln_bin$lnmdvalhs0_nbr, list(mile_ln_bin$bin_HRS), mean)
avg_Y_bin <- cbind(avg_Y,midpoint)
# The outcome variable vs. the running variable
Y_RD <- ggplot(data=avg_Y_bin, aes(y= x, x=midpoint)) +
geom_point(color="blue", alpha=0.5)+
geom_hline(yintercept = 10, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Outcome")
#+
#scale_y_continuous(TeX("$Y_k$")) +
#scale_x_continuous(TeX("$HRS_82$"))
Y_RD
avg_X <- aggregate(mile_ln_bin$npl2000, list(mile_ln_bin$bin_HRS), mean)
avg_X_bin <- cbind(avg_X,midpoint)
# The treatment variable vs. the running variable
X_RD <- ggplot(data=avg_X_bin, aes(y= x, x=midpoint)) +
geom_point(color="orange", alpha=0.8)+
geom_hline(yintercept = 0, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Treatment")
X_RD
# Covariates vs. the running variable
#(i)Example of Housing Characteristics: Occupancy Rate, occupied80_nbr
avg_H <- aggregate(mile_ln_bin$occupied80_nbr, list(mile_ln_bin$bin_HRS), mean)
avg_H_bin <- cbind(avg_H,midpoint)
H_RD <- ggplot(data=avg_H_bin, aes(y= x, x=midpoint)) +
geom_point(color="purple", alpha=0.8)+
geom_hline(yintercept = 0, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Occupancy Rate")
H_RD
#(ii)Example of Economic Conditions: Average Income, avhhin8_nbr
avg_E <- aggregate(mile_ln_bin$avhhin8_nbr, list(mile_ln_bin$bin_HRS), mean)
avg_E_bin <- cbind(avg_E,midpoint)
E_RD <- ggplot(data=avg_E_bin, aes(y= x, x=midpoint)) +
geom_point(color="purple", alpha=0.8)+
geom_hline(yintercept = 0, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Income")
E_RD
#(iii)Example of Demographics: Percentage of Population, Black, shrblk8_nbr
avg_D <- aggregate(mile_ln_bin$shrblk8_nbr, list(mile_ln_bin$bin_HRS), mean)
avg_D_bin <- cbind(avg_D,midpoint)
D_RD <- ggplot(data=avg_D_bin, aes(y= x, x=midpoint)) +
geom_point(color="purple", alpha=0.8)+
geom_hline(yintercept = 0, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Share of Black Pop")
D_RD
# Bin counts vs. the running variable
#Counting Observations per Bin
obs_df0 <- mile_ln_bin %>%
group_by(bin_HRS) %>%
summarise(number = n())
#Dropping the 2 NA's above 74.1 score (max was 74.16)
obs_df1 <- obs_df0[-14,]
avg_N_bin <- cbind(obs_df1,midpoint)
N_RD <- ggplot(data=avg_N_bin, aes(y= number, x=midpoint)) +
geom_point(color="blue", alpha=0.8)+
geom_hline(yintercept = 0, size = 0.1) +
geom_vline(xintercept = 28.5, color = "red", size = 1)+
xlab("Hazardous Ranking System Score") + ylab("Average Number of Tracts (Bin Counts)")
N_RD
While there is a smoothness to the covariates and distinct discontinuous jump with respect to the treatment variable, there is no discontinuous jump visible when taking the simple average of outcomes across the thirteen bins prepared. This seems to be supportive of the conclusions the models have reached in terms of the effect being insignificant when controlling for the appropriate factors and including fixed effects. This further supports the possibility that Superfund cleanup effects are economically small and statistically insignificant with respect to residential property values.
Switching back to the a restricted subset of the data as in 1.13, I use the code displayed below. Mirroring 1.13, I leave out fixed effects initially and stick to the same set of controls as before.
#Similarly to 1.7, I filter then split my data with dummy = 1 including 28.5
mile_ln_DD <- filter(mile_ln, hrs_82>=16.5 & hrs_82<=40.5)
HRS_Thresh_DD <- ifelse(mile_ln_DD$hrs_82>=28.5,1,0)
mile_ln_DD_R <- cbind(mile_ln_DD,HRS_Thresh_DD)
The results change from previous findings of signficance in the treatment effect. I estimate a 6 per cent increase in housing value but the result is measured as highly insignificant. Below code is displayed for both the IV regression and 2SLS process to highlight the matching results. In the next portion of code, I reintroduce fixed effects and a set of controls for housing characteristics, economic status and demographics to emphasize the importance of controlling for omitted variable bias.
IVreg_DD <- iv_robust(lnmdvalhs0 ~ npl2000 | HRS_Thresh_DD ,data=mile_ln_DD_R, clusters=statefips)
export_summs(IVreg_DD)
Model 1 | |
(Intercept) | 11.56 *** |
(0.09) | |
npl2000 | 0.06 |
(0.09) | |
nobs | 226 |
r.squared | -0.00 |
adj.r.squared | -0.01 |
df.residual | 224.00 |
N | 226.00 |
se_type | CR2.00 |
statistic | 0.40 |
p.value | 0.53 |
statistic.weakinst | |
p.value.weakinst | |
statistic.endogeneity | |
p.value.endogeneity | |
statistic.overid | |
p.value.overid | |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#First Stage
Reg111_FS_DD <- lm_robust(npl2000 ~ HRS_Thresh_DD, clusters=statefips, data=mile_ln_DD_R)
export_summs(Reg111_FS_DD)
Model 1 | |
(Intercept) | 0.27 *** |
(0.04) | |
HRS_Thresh_DD | 0.72 *** |
(0.05) | |
nobs | 226 |
r.squared | 0.59 |
adj.r.squared | 0.58 |
statistic | 242.14 |
p.value | 0.00 |
df.residual | 14.98 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Reduced Form
Reg111_RF_DD <- lm_robust(lnmdvalhs0 ~ HRS_Thresh_DD, clusters=statefips, data=mile_ln_DD_R)
export_summs(Reg111_RF_DD)
Model 1 | |
(Intercept) | 11.57 *** |
(0.08) | |
HRS_Thresh_DD | 0.04 |
(0.06) | |
nobs | 226 |
r.squared | 0.00 |
adj.r.squared | -0.00 |
statistic | 0.39 |
p.value | 0.54 |
df.residual | 14.98 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Second Stage
phat_DD <- Reg111_FS_DD$fitted.values
mile_ln_DD_R <- cbind(mile_ln_DD_R, phat_DD)
#mile_ln <- mile_ln[,-53]
Reg111_SS_DD <- lm_robust(lnmdvalhs0 ~ phat_DD, clusters=statefips, data=mile_ln_DD_R)
export_summs(Reg111_SS_DD)
Model 1 | |
(Intercept) | 11.56 *** |
(0.09) | |
phat_DD | 0.06 |
(0.09) | |
nobs | 226 |
r.squared | 0.00 |
adj.r.squared | -0.00 |
statistic | 0.39 |
p.value | 0.54 |
df.residual | 15.10 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Regarding choice controls, there are a few changes I would suggest making. Firstly, I would elect to keep fixed effects involved. As Greenstone & Gallagher highlight in their paper, the majority of cases of HRS scores above 28.5 are clustered around the northeastern states of the US. The unbalanced distribution of sites across states introduced concerns to them about the presence of localized market shocks, hence by including fixed effects such constant shocks within groups are dismissed. As displayed in the initial portion of the code below, running a simple regression without fixed effects leads to a completely sparse model in terms of explanatory power. Including fixed effects results in the R-squared term jumping from approximately zero to 40 percent.
I must also highlight the fact that I included housing characteristics in the regression throughout the analysis as is the norm for hedonic housing approaches, although it was not specifically requested by the problem set. From my previous work experience, I had an opportunity to review the Irish housing price index and finalized a working paper for the Central Statistics Office (CSO) on the matter. I believe housing characteristics are a vital part of this hedonic regression, when trying to identify the effect of environmental quality on housing values. It allows a far clearer comparison of like-for-like across various properties.
I re-run estimates upon the given threshhold with fixed effects active and additional housing characteristic controls implemented in a manner that avoids perfect multicollinearity (number of units, occpuancy of units, number of bedrooms, percentage of detached & attached homes, housing age, percentage without full kitchens, no heating, and those without a full bathroom). Ultimately the coefficient remains in the single digit region, 7 per cent, but now the standard error has climbed dramatically, furthering the insignificance of the result. This is likely a result of some non-perfect multicollineary, wherein the correlation between our fitted value and the now larger set of controls has contributed towards an increased standard error. This is a common issue of loading in too many variables into a regression, as irrelevant controls not only poorly explain the variation in \(Y_i\) but also introduce multicollinearity upon our coefficient of interest, effectively concealing the significance of our estimated coefficient.
Regardless, even when using far lower counts of coefficients but including fixed effects, the previous work done has demonstrated that the effect of being placed on the NPL list is consistently insignificant.
IVreg_DD_FE_0 <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr | HRS_Thresh_DD + lnmeanhs8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr+ ba_or_better8_nbr ,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
# + Controls for Number of Units
IVreg_DD_FE_1 <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr | HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr ,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
#+ Controls for Attached/Detached, No Full Kitchen, Fire Stove Heating
IVreg_DD_FE_2 <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr + detach80occ_nbr + attach80occ_nbr | HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr + detach80occ_nbr + attach80occ_nbr,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
# + Controls for Number of Bedrooms
IVreg_DD_FE_3 <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr+bedrms1_80occ_nbr +bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr | HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr + bedrms1_80occ_nbr+ bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
# + Controls for Year Built
IVreg_DD_FE_4 <- iv_robust(lnmdvalhs0 ~ npl2000 + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr+bedrms1_80occ_nbr +bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr + blt2_5yrs80occ_nbr + blt6_10yrs80occ_nbr + blt10_20yrs80occ_nbr + blt20_30yrs80occ_nbr + blt30_40yrs80occ_nbr + blt40_yrs80occ_nbr | HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr + bedrms1_80occ_nbr+ bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr + blt2_5yrs80occ_nbr + blt6_10yrs80occ_nbr + blt10_20yrs80occ_nbr + blt20_30yrs80occ_nbr + blt30_40yrs80occ_nbr + blt40_yrs80occ_nbr,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
#IVreg_DD_1 <- iv_robust(lnmdvalhs0 ~ npl2000 | HRS_Thresh_DD ,data=mile_ln_DD_R, clusters=statefips)
#IVreg_DD_2 <- iv_robust(lnmdvalhs0 ~ npl2000 | HRS_Thresh_DD ,data=mile_ln_DD_R, clusters=statefips, fixed_effects = statefips)
#export_summs(IVreg_DD_1,IVreg_DD_2)
export_summs(IVreg_DD, IVreg_DD_FE_0, IVreg_DD_FE_1, IVreg_DD_FE_2, IVreg_DD_FE_3, IVreg_DD_FE_4, model.names = c("NO FE of Housing","+Fixed Effects & Econ/Demo", "+Number of Units", "+Features", "+Number of Bedrooms", "+Years Built"))
NO FE of Housing | +Fixed Effects & Econ/Demo | +Number of Units | +Features | +Number of Bedrooms | +Years Built | |
(Intercept) | 11.56 *** | |||||
(0.09) | ||||||
npl2000 | 0.06 | 0.09 | 0.08 | 0.08 | 0.08 | 0.07 |
(0.09) | (0.06) | (0.06) | (0.06) | (0.06) | (0.06) | |
lnmeanhs8_nbr | 0.40 | 0.37 * | 0.32 * | 0.33 | 0.30 | |
(0.16) | (0.14) | (0.12) | (0.14) | (0.13) | ||
avhhin8_nbr | 0.00 * | 0.00 * | 0.00 ** | 0.00 ** | 0.00 *** | |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | ||
povrat8_nbr | 2.07 | 1.37 | 0.72 | 0.62 | 0.87 | |
(0.96) | (1.05) | (1.22) | (1.10) | (1.06) | ||
unemprt8_nbr | -2.55 | -2.99 | -3.05 | -2.91 | -2.97 | |
(1.63) | (1.58) | (1.68) | (1.62) | (1.39) | ||
welfare8_nbr | -2.54 | -2.12 | -1.80 | -1.71 | -1.47 | |
(1.29) | (1.20) | (1.31) | (1.29) | (1.19) | ||
pop_den8_nbr | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | ||
shrblk8_nbr | -0.36 | -0.15 | -0.04 | -0.09 | 0.02 | |
(0.23) | (0.30) | (0.29) | (0.29) | (0.26) | ||
shrhsp8_nbr | 0.50 | 0.25 | 0.37 | 0.34 | 0.49 | |
(0.64) | (0.63) | (0.71) | (0.73) | (0.72) | ||
child8_nbr | 1.73 | 1.25 | 0.92 | 1.43 | 0.69 | |
(0.77) | (0.62) | (0.46) | (0.77) | (0.75) | ||
ba_or_better8_nbr | -0.20 | -0.32 | -0.13 | 0.14 | 0.08 | |
(0.42) | (0.34) | (0.32) | (0.29) | (0.36) | ||
tothsun8_nbr | 0.00 | -0.00 | -0.00 | -0.00 | ||
(0.00) | (0.00) | (0.00) | (0.00) | |||
occupied80_nbr | -0.09 | 0.08 | 0.07 | 0.92 | ||
(1.15) | (1.25) | (1.33) | (1.08) | |||
ownocc8_nbr | -0.00 ** | -0.00 | -0.00 | -0.00 | ||
(0.00) | (0.00) | (0.00) | (0.00) | |||
firestoveheat80_nbr | -0.24 | -0.30 | -0.04 | |||
(0.43) | (0.41) | (0.37) | ||||
nofullkitchen80_nbr | 1.37 | 1.39 | 0.92 | |||
(1.25) | (1.19) | (1.08) | ||||
detach80occ_nbr | -0.78 * | -0.57 | -0.29 | |||
(0.33) | (0.37) | (0.44) | ||||
attach80occ_nbr | -1.50 ** | -1.31 * | -1.20 | |||
(0.37) | (0.43) | (0.53) | ||||
bedrms1_80occ_nbr | 5.71 | 9.92 | ||||
(8.13) | (8.66) | |||||
bedrms2_80occ_nbr | 5.22 | 9.56 | ||||
(7.93) | (8.49) | |||||
bedrms3_80occ_nbr | 5.03 | 9.57 | ||||
(7.89) | (8.56) | |||||
bedrms4_80occ_nbr | 4.50 | 8.88 | ||||
(7.87) | (8.41) | |||||
bedrms5_80occ_nbr | 5.30 | 9.22 | ||||
(7.88) | (8.31) | |||||
blt2_5yrs80occ_nbr | -2.00 | |||||
(1.93) | ||||||
blt6_10yrs80occ_nbr | -0.80 | |||||
(1.50) | ||||||
blt10_20yrs80occ_nbr | -1.89 | |||||
(1.63) | ||||||
blt20_30yrs80occ_nbr | -1.90 | |||||
(1.42) | ||||||
blt30_40yrs80occ_nbr | -1.69 | |||||
(2.10) | ||||||
blt40_yrs80occ_nbr | -1.45 | |||||
(1.48) | ||||||
nobs | 226 | 226 | 226 | 226 | 226 | 226 |
r.squared | -0.00 | 0.77 | 0.79 | 0.81 | 0.81 | 0.82 |
adj.r.squared | -0.01 | 0.71 | 0.73 | 0.75 | 0.74 | 0.75 |
df.residual | 224.00 | 178.00 | 175.00 | 171.00 | 166.00 | 160.00 |
N | 226.00 | 226.00 | 226.00 | 226.00 | 226.00 | 226.00 |
se_type | CR2.00 | CR2.00 | CR2.00 | CR2.00 | CR2.00 | CR2.00 |
statistic | 0.40 | |||||
p.value | 0.53 | |||||
statistic.weakinst | ||||||
p.value.weakinst | ||||||
statistic.endogeneity | ||||||
p.value.endogeneity | ||||||
statistic.overid | ||||||
p.value.overid | ||||||
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#First Stage
Reg111_FS_DD_FE <- lm_robust(npl2000 ~ HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr+bedrms1_80occ_nbr +bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr + blt2_5yrs80occ_nbr + blt6_10yrs80occ_nbr + blt10_20yrs80occ_nbr + blt20_30yrs80occ_nbr + blt30_40yrs80occ_nbr + blt40_yrs80occ_nbr, clusters=statefips, data=mile_ln_DD_R, fixed_effects = statefips)
export_summs(Reg111_FS_DD_FE)
Model 1 | |
HRS_Thresh_DD | 0.69 *** |
(0.06) | |
lnmeanhs8_nbr | 0.02 |
(0.14) | |
tothsun8_nbr | 0.00 |
(0.00) | |
occupied80_nbr | -0.94 |
(0.96) | |
ownocc8_nbr | 0.00 |
(0.00) | |
avhhin8_nbr | 0.00 |
(0.00) | |
povrat8_nbr | -0.14 |
(1.06) | |
unemprt8_nbr | 1.63 |
(0.86) | |
welfare8_nbr | 1.57 |
(1.83) | |
pop_den8_nbr | -0.00 |
(0.00) | |
shrblk8_nbr | 0.22 |
(0.35) | |
shrhsp8_nbr | 0.84 |
(0.51) | |
child8_nbr | -1.05 |
(1.11) | |
ba_or_better8_nbr | 0.44 |
(0.63) | |
firestoveheat80_nbr | 0.23 |
(0.72) | |
nofullkitchen80_nbr | -2.21 |
(2.41) | |
bedrms1_80occ_nbr | 16.13 |
(11.73) | |
bedrms2_80occ_nbr | 17.47 |
(11.61) | |
bedrms3_80occ_nbr | 17.27 |
(11.50) | |
bedrms4_80occ_nbr | 16.17 |
(11.42) | |
bedrms5_80occ_nbr | 16.05 |
(11.31) | |
detach80occ_nbr | 0.57 |
(0.55) | |
attach80occ_nbr | 0.16 |
(0.62) | |
blt2_5yrs80occ_nbr | 3.89 |
(3.11) | |
blt6_10yrs80occ_nbr | 2.38 |
(2.56) | |
blt10_20yrs80occ_nbr | 3.20 |
(2.58) | |
blt20_30yrs80occ_nbr | 2.21 |
(2.27) | |
blt30_40yrs80occ_nbr | 2.44 |
(2.58) | |
blt40_yrs80occ_nbr | 2.73 |
(2.50) | |
nobs | 226 |
r.squared | 0.68 |
adj.r.squared | 0.55 |
statistic | |
p.value | |
df.residual | 12.91 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Reduced Form
Reg111_RF_DD_FE <- lm_robust(lnmdvalhs0 ~ HRS_Thresh_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr + firestoveheat80_nbr + nofullkitchen80_nbr + bedrms1_80occ_nbr+ bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr + blt2_5yrs80occ_nbr + blt6_10yrs80occ_nbr + blt10_20yrs80occ_nbr + blt20_30yrs80occ_nbr + blt30_40yrs80occ_nbr + blt40_yrs80occ_nbr, clusters=statefips, data=mile_ln_DD_R, fixed_effects = statefips)
export_summs(Reg111_RF_DD_FE)
Model 1 | |
HRS_Thresh_DD | 0.05 |
(0.04) | |
lnmeanhs8_nbr | 0.30 |
(0.12) | |
tothsun8_nbr | -0.00 |
(0.00) | |
occupied80_nbr | 0.86 |
(1.10) | |
ownocc8_nbr | -0.00 |
(0.00) | |
avhhin8_nbr | 0.00 *** |
(0.00) | |
povrat8_nbr | 0.86 |
(1.06) | |
unemprt8_nbr | -2.85 |
(1.32) | |
welfare8_nbr | -1.36 |
(1.16) | |
pop_den8_nbr | 0.00 |
(0.00) | |
shrblk8_nbr | 0.04 |
(0.24) | |
shrhsp8_nbr | 0.55 |
(0.71) | |
child8_nbr | 0.61 |
(0.75) | |
ba_or_better8_nbr | 0.11 |
(0.38) | |
firestoveheat80_nbr | -0.02 |
(0.37) | |
nofullkitchen80_nbr | 0.77 |
(1.04) | |
bedrms1_80occ_nbr | 11.05 |
(8.89) | |
bedrms2_80occ_nbr | 10.79 |
(8.76) | |
bedrms3_80occ_nbr | 10.78 |
(8.82) | |
bedrms4_80occ_nbr | 10.02 |
(8.68) | |
bedrms5_80occ_nbr | 10.35 |
(8.62) | |
detach80occ_nbr | -0.25 |
(0.44) | |
attach80occ_nbr | -1.19 |
(0.54) | |
blt2_5yrs80occ_nbr | -1.72 |
(1.79) | |
blt6_10yrs80occ_nbr | -0.64 |
(1.41) | |
blt10_20yrs80occ_nbr | -1.67 |
(1.53) | |
blt20_30yrs80occ_nbr | -1.75 |
(1.33) | |
blt30_40yrs80occ_nbr | -1.52 |
(1.96) | |
blt40_yrs80occ_nbr | -1.26 |
(1.38) | |
nobs | 226 |
r.squared | 0.83 |
adj.r.squared | 0.76 |
statistic | |
p.value | |
df.residual | 12.91 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
#Second Stage
phat_DD_FE <- Reg111_FS_DD_FE$fitted.values
mile_ln_DD_R <- cbind(mile_ln_DD_R, phat_DD_FE)
#mile_ln <- mile_ln[,-53]
Reg111_SS_DD_FE <- lm_robust(lnmdvalhs0 ~ phat_DD + lnmeanhs8_nbr + tothsun8_nbr + occupied80_nbr + ownocc8_nbr + avhhin8_nbr + povrat8_nbr + unemprt8_nbr + welfare8_nbr + pop_den8_nbr + shrblk8_nbr + shrhsp8_nbr + child8_nbr + ba_or_better8_nbr+ firestoveheat80_nbr + nofullkitchen80_nbr + bedrms1_80occ_nbr+ bedrms2_80occ_nbr + bedrms3_80occ_nbr + bedrms4_80occ_nbr + bedrms5_80occ_nbr + detach80occ_nbr + attach80occ_nbr + blt2_5yrs80occ_nbr + blt6_10yrs80occ_nbr + blt10_20yrs80occ_nbr + blt20_30yrs80occ_nbr + blt30_40yrs80occ_nbr + blt40_yrs80occ_nbr, clusters=statefips, data=mile_ln_DD_R, fixed_effects = statefips)
export_summs(Reg111_SS_DD_FE)
Model 1 | |
phat_DD | 0.07 |
(0.05) | |
lnmeanhs8_nbr | 0.30 |
(0.12) | |
tothsun8_nbr | -0.00 |
(0.00) | |
occupied80_nbr | 0.86 |
(1.10) | |
ownocc8_nbr | -0.00 |
(0.00) | |
avhhin8_nbr | 0.00 *** |
(0.00) | |
povrat8_nbr | 0.86 |
(1.06) | |
unemprt8_nbr | -2.85 |
(1.32) | |
welfare8_nbr | -1.36 |
(1.16) | |
pop_den8_nbr | 0.00 |
(0.00) | |
shrblk8_nbr | 0.04 |
(0.24) | |
shrhsp8_nbr | 0.55 |
(0.71) | |
child8_nbr | 0.61 |
(0.75) | |
ba_or_better8_nbr | 0.11 |
(0.38) | |
firestoveheat80_nbr | -0.02 |
(0.37) | |
nofullkitchen80_nbr | 0.77 |
(1.04) | |
bedrms1_80occ_nbr | 11.05 |
(8.89) | |
bedrms2_80occ_nbr | 10.79 |
(8.76) | |
bedrms3_80occ_nbr | 10.78 |
(8.82) | |
bedrms4_80occ_nbr | 10.02 |
(8.68) | |
bedrms5_80occ_nbr | 10.35 |
(8.62) | |
detach80occ_nbr | -0.25 |
(0.44) | |
attach80occ_nbr | -1.19 |
(0.54) | |
blt2_5yrs80occ_nbr | -1.72 |
(1.79) | |
blt6_10yrs80occ_nbr | -0.64 |
(1.41) | |
blt10_20yrs80occ_nbr | -1.67 |
(1.53) | |
blt20_30yrs80occ_nbr | -1.75 |
(1.33) | |
blt30_40yrs80occ_nbr | -1.52 |
(1.96) | |
blt40_yrs80occ_nbr | -1.26 |
(1.38) | |
nobs | 226 |
r.squared | 0.83 |
adj.r.squared | 0.76 |
statistic | |
p.value | |
df.residual | 12.91 |
N | 226.00 |
se_type | CR2.00 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
If I were to enhance my control selection, I would aim to only control for variables that significantly explain the variation in both \(Y_i\) and \(D_i\), as the product of these two coefficients would be a solid proxy for the selection bias introduced into the model. In this specific setting however, upon the threshold region itself, the treated and control groups as extremely similar. Regressing controls on the dependent variable, \(Y_i\), and our estimated treatment term (\(\hat{D}_i\)), no significant coefficients arise. It appears the controls are very balanced in that sense.
I tried incorporating a model selection tool from the leaps package to come up with an alternative combination of regressors that treats for potentially too much multicollinearity. The paper describes them including every single variable they had available to them and this could infact drown out a significant effect entirely through the inclusion of a multitude of irrelevant controls that simply drive up the standard error on our estimate of interest.
p_load(leaps)
glimpse(mile_ln_DD_R$lnmdvalhs0)
glimpse(mile_ln_DD_R$phat_DD_FE)
#At this point in the code below, I am informed by an error code that y and x different lengths even though the glimpse above reveals them both to be columns in the dataframe, both of 226 rows in length. Sad!
regsubsetsObj <- regsubsets(data=mile_ln_DD_R, x=phat_DD_FE ,y=lnmdvalhs0, nbest = 12, really.big = T)
plot(regsubsetsObj, scale = "adjr2") # regsubsets plot based on R-sq
Beyond desiring a more component model selection process, I feel at this stage I am ill equiped to apply a reliable procedure. I would apply some mean square error approach over the entire set of possible permutations of the model and find a subset of models which provide the lowest mean-square error measures. These would be appropriately balancing between minor degrees of selection bias and minor degrees of multicollinearity in a bid to best optimize identification of the treatment effect.
I believe estimates generated in this most recent permuation of the model is most appropriate. It captures the fixed effects which drive a large portion of the explanatory power of the model. It also accounts for a sufficient albeit possibly too high number of controls, but even at the minimal level of controls with fixed effects implemented, our model initial showed no signs of significant effect too. Given that, though the standard error is somewhat bloated by excessive multicollineary, even with these controls absent it appears there is no significant causal relationship to identify in this case.