Question 1:

Based on Figure 1, it seems that the general pattern across counties of Chicago PMSA is that Foreclosures increase as the distance from the lake and city center increases. In Cook, Foreclosures seem to be concentrated in the South and South-West side, pattern that seems to be more accentuated (potential contagion effect?) in neighborhood counties like Kane, Will and Kendall, which have high concentration of Foreclosures. Grundy and DeKalb have the lowest concentrations of foreclosures, which could be explained by a higher presence of suburb-type of neighborhoods.

Figure 1. Foreclosure Rates acorss Chicago PMSA Census Tracts

Question 2:

Figure 2.1 It seems that the Sq. Root of Foreclosure Rate is the variable that is the closest to a normal distribution and should be considered to the OLS regression model. However, another option could be Sq. Root of the Number of Foreclosures since that would look much less skewed in we would remove some of the highest values that would most likely be outliers. As an excercise, Figure 2.2 shows the same distributions when without the outliers, being these defined as tracts with Sq. Root of Numer of foreclosures being greater than 20. Although there are other less arbitrary ways to identify and remove outliers, by comparing the two plots we can see that the shape of the distribution of the Sq. Root of Foreclosures do gets less “normal distributed” like. Finally, I have decided to use the Sq. Root of the number of Foreclosures since it just simply makes more sense at the time to interpret. What’s the sq. root of a rate anyway?. Also, regarding the shape of the distrubution I think that for modeling pourposes one should consider the modeling of this variable as a potential censored data model, where one models first the probability of a tract of having cero rate of foreclosures and then it includes the Mill’s ratio of this probability in a second step regression as it will be an ommited variable problem. This is what Heckman did when encountering a simmilar modeling problem in the case of modeling a mincer wage equation model since he only observed those who have wage information.

Figure 2.1. Different plots of the distribution of Variables including potential outliers

Figure 2.1. Different plots of the distribution of Variables without potential outliers

Task 2: OLS Model and Diagnostics

When trying to reduce the problems of non-normality and heteroskedastycity a reasearcher has usually several options. The first option to help solving is chaning the model specification to control for problems of ommited variable bias that can cause heteroskedasticity. In this case model 2 in Table 1 included new variables that I though would help capturing potential effects not included in model 1. Hence I created the percentage of Black Race (PCT_BLACK), and included the dummy variable RURPOP. As it can be seen in Table 1, PCT_BLACK seems to be positive and significant which seems to have a positive impact in rasing the R^2 compared to model 1. As for the problem of normality, I tried normalizing the dependent variable by de-meaning it and dividing it by its standard deviation. Now the coefficients must be interpreted as generating a change in one standard deviation of the Foreclosure Rate for one unit change of a particular independent variable. Although efforts that make sense, these were not enough to solve the aforentioned issues as the Sapiro and BP test fail to reject the H0 for both tests. As it can be seen in Figure 1 there should be a problem of ommited variable in the spatial context that this simple OLS is not capturing. Although the COUNTY variable is included as covariate, this is not enough to account for the potential spatial auto-correlation present in the DGP of foreclosures. Finnaly by looking at Table 2 it doesn’t seem that there are too many differences between the two models, but noticing that the new variable PCT_BLACK is still significant after the White’s Standard Error correction.

Table 1. OLS Model Comparisons

tracts.sub[,n_EST_FCS:=((EST_FCS-mean(EST_FCS))/sd(EST_FCS))]
tracts.sub[,n_EST_FCS_RT:=((EST_FCS_RT-mean(EST_FCS_RT))/sd(EST_FCS_RT))]
tracts.sub[,PCT_BLACK:=((BLACK/sum(WHITE + BLACK + AMERI_ES + ASIAN + HAWN_PI + HAWN_PI + OTHER + MULT_RACE + HISPANIC))*100)]

model.2 <- lm(n_EST_FCS_RT ~ PCTFORBORN + MED_AGE + OOMEDVAL00 +
PCTRENTER + HUAGE00 + BLS_UNEMP  + PCT_BLACK + CBD_MI + RURPOP +COUNTY, data=tracts.sub)

stargazer(list(model.1,model.2),type = "text")
## 
## ==========================================================================
##                                       Dependent variable:                 
##                      -----------------------------------------------------
##                              EST_FCS_RT                n_EST_FCS_RT       
##                                 (1)                        (2)            
## --------------------------------------------------------------------------
## PCTFORBORN                   -0.051***                  -0.006***         
##                               (0.004)                    (0.001)          
##                                                                           
## MED_AGE                      -0.126***                  -0.032***         
##                               (0.011)                    (0.003)          
##                                                                           
## OOMEDVAL00                  -0.00002***                -0.00000***        
##                              (0.00000)                  (0.00000)         
##                                                                           
## PCTRENTER                    -0.021***                  -0.007***         
##                               (0.003)                    (0.001)          
##                                                                           
## HUAGE00                       0.022***                   0.007***         
##                               (0.004)                    (0.001)          
##                                                                           
## BLS_UNEMP                     0.965***                   0.243***         
##                               (0.056)                    (0.015)          
##                                                                           
## PCT_BLACK                                               20.404***         
##                                                          (1.082)          
##                                                                           
## COUNTYDeKalb County          -2.645***                  -0.431***         
##                               (0.494)                    (0.136)          
##                                                                           
## COUNTYDuPage County          -0.620***                   -0.119**         
##                               (0.215)                    (0.057)          
##                                                                           
## COUNTYGrundy County          -2.838***                   -0.399**         
##                               (0.727)                    (0.201)          
##                                                                           
## COUNTYKane County            -0.843***                    -0.098          
##                               (0.280)                    (0.076)          
##                                                                           
## COUNTYKendall County         -2.124***                    -0.300          
##                               (0.772)                    (0.214)          
##                                                                           
## COUNTYLake County              -0.055                     0.062           
##                               (0.208)                    (0.055)          
##                                                                           
## COUNTYMcHenry County         -1.610***                   -0.211**         
##                               (0.340)                    (0.094)          
##                                                                           
## COUNTYWill County            -0.785***                    -0.068          
##                               (0.262)                    (0.071)          
##                                                                           
## CBD_MI                        0.111***                   0.015**          
##                               (0.023)                    (0.006)          
##                                                                           
## RURPOP                                                   -0.00001         
##                                                         (0.00003)         
##                                                                           
## Constant                      6.105***                    -0.248          
##                               (0.678)                    (0.180)          
##                                                                           
## --------------------------------------------------------------------------
## Observations                   1,845                      1,845           
## R2                             0.614                      0.677           
## Adjusted R2                    0.611                      0.674           
## Residual Std. Error      2.154 (df = 1829)          0.571 (df = 1827)     
## F Statistic          194.031*** (df = 15; 1829) 225.324*** (df = 17; 1827)
## ==========================================================================
## Note:                                          *p<0.1; **p<0.05; ***p<0.01
vif(model.2)
##                GVIF Df GVIF^(1/(2*Df))
## PCTFORBORN 1.388900  1        1.178516
## MED_AGE    1.791116  1        1.338326
## OOMEDVAL00 1.368293  1        1.169741
## PCTRENTER  2.013352  1        1.418926
## HUAGE00    1.859476  1        1.363626
## BLS_UNEMP  1.681336  1        1.296663
## PCT_BLACK  1.566662  1        1.251664
## CBD_MI     1.193978  1        1.092693
## RURPOP     1.380897  1        1.175116
## COUNTY     2.672922  8        1.063375
shapiro.test(model.2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model.2$residuals
## W = 0.88925, p-value < 2.2e-16
bptest(model.2, varformula=NULL, studentize=TRUE, data=tracts.df)
## 
##  studentized Breusch-Pagan test
## 
## data:  model.2
## BP = 99.852, df = 17, p-value = 0.00000000000009476

Table 2. OLS Model Comparisons with White’s Standard Errors

# ---------------------------------------------------------
# Define function that generates robust regression 
# estimates using White's standard errors.
# ---------------------------------------------------------
source(paste0(path,"robust.R"),echo=TRUE)
## 
## > summaryw <- function(model) {
## +     s <- summary(model)
## +     X <- model.matrix(model)
## +     u2 <- residuals(model)^2
## +     XDX <- 0
## +     for (i in .... [TRUNCATED]
stargazer(list(summaryw(model.1),summaryw(model.2)),type = "text")
## 
## ============================================================
##                      Estimate Std. Error t value Pr(> | t| )
## ------------------------------------------------------------
## (Intercept)           6.105     1.055     5.789       0     
## PCTFORBORN            -0.051    0.004    -12.591      0     
## MED_AGE               -0.126    0.024    -5.170    0.00000  
## OOMEDVAL00           -0.00002  0.00000   -11.493      0     
## PCTRENTER             -0.021    0.004    -4.964    0.00000  
## HUAGE00               0.022     0.005     4.847    0.00000  
## BLS_UNEMP             0.965     0.061    15.721       0     
## COUNTYDeKalb County   -2.645    0.218    -12.138      0     
## COUNTYDuPage County   -0.620    0.170    -3.656    0.0003   
## COUNTYGrundy County   -2.838    0.228    -12.459      0     
## COUNTYKane County     -0.843    0.231    -3.642    0.0003   
## COUNTYKendall County  -2.124    0.253    -8.406       0     
## COUNTYLake County     -0.055    0.228    -0.242     0.809   
## COUNTYMcHenry County  -1.610    0.191    -8.438       0     
## COUNTYWill County     -0.785    0.213    -3.677    0.0002   
## CBD_MI                0.111     0.025     4.408    0.00001  
## ------------------------------------------------------------
## 
## ============================================================
##                      Estimate Std. Error t value Pr(> | t| )
## ------------------------------------------------------------
## (Intercept)           -0.248    0.286    -0.866     0.386   
## PCTFORBORN            -0.006    0.001    -4.131    0.00004  
## MED_AGE               -0.032    0.007    -4.719    0.00000  
## OOMEDVAL00           -0.00000  0.00000   -9.872       0     
## PCTRENTER             -0.007    0.001    -6.182       0     
## HUAGE00               0.007     0.001     5.994       0     
## BLS_UNEMP             0.243     0.016    15.638       0     
## PCT_BLACK             20.404    1.739    11.735       0     
## CBD_MI                0.015     0.007     2.343     0.019   
## RURPOP               -0.00001  0.00002   -0.718     0.473   
## COUNTYDeKalb County   -0.431    0.074    -5.811       0     
## COUNTYDuPage County   -0.119    0.042    -2.820     0.005   
## COUNTYGrundy County   -0.399    0.079    -5.076    0.00000  
## COUNTYKane County     -0.098    0.060    -1.642     0.101   
## COUNTYKendall County  -0.300    0.081    -3.686    0.0002   
## COUNTYLake County     0.062     0.057     1.095     0.274   
## COUNTYMcHenry County  -0.211    0.056    -3.799    0.0001   
## COUNTYWill County     -0.068    0.055    -1.225     0.220   
## ------------------------------------------------------------

Task 3: Spatial Regression

As before, both Lag and Error models improve when adding the PCT_BLACK variable to the estimations. The rho and lambda parameters are both positive and significant. The former implies that there is a positive and significant inflence of the spattially lagged variable on the dependent variable. The latter indicates positive spatial dependence of unobserved variables residing in the error term. Because of this, I included the Spatial Durbin Model in order to capture for the ‘common-factor hypothesis’, which essencially intends to capture the error component as a lag by applying the spatial lag component to the independent variables. Results indicate that although not all lagged independent variables are significant, there is a at least half of them that do show significance indicating that the spatial lag of these are important contributors to explain the rate of foreclosures.

Table 3. Spatial Lag, Error and Durbin Models

# ---------------------------------------------------------
# ESTIMATE LAG MODEL
# ---------------------------------------------------------
model.lag <- lagsarlm(EST_FCS_RT ~ PCTFORBORN + MED_AGE +
OOMEDVAL00 + PCTRENTER + HUAGE00 + BLS_UNEMP + COUNTY + CBD_MI, data=tracts.sub, listw=nb.W.queen)

# ---------------------------------------------------------
# ESTIMATE LAG MODEL 2
# ---------------------------------------------------------
model.lag2 <- lagsarlm(EST_FCS_RT ~ PCTFORBORN + MED_AGE +
OOMEDVAL00 + PCTRENTER + HUAGE00 + PCT_BLACK + BLS_UNEMP + COUNTY + CBD_MI, data=tracts.sub, listw=nb.W.queen)

# ---------------------------------------------------------
# ESTIMATE ERROR MODEL
# ---------------------------------------------------------
model.err <- errorsarlm(EST_FCS_RT ~ PCTFORBORN + MED_AGE +
OOMEDVAL00 + PCTRENTER + HUAGE00 + BLS_UNEMP + COUNTY + CBD_MI, data=tracts.sub, listw=nb.W.queen, method="eigen", quiet=TRUE)

# ---------------------------------------------------------
# ESTIMATE ERROR MODEL 2
# ---------------------------------------------------------
model.err2 <- errorsarlm(EST_FCS_RT ~ PCTFORBORN + MED_AGE +
OOMEDVAL00 + PCTRENTER + HUAGE00 + BLS_UNEMP + PCT_BLACK + COUNTY + CBD_MI, data=tracts.sub, listw=nb.W.queen, method="eigen", quiet=TRUE)

# ---------------------------------------------------------
# ESTIMATE Spatial Durbin
# ---------------------------------------------------------
model.durb <- lagsarlm(EST_FCS_RT ~ PCTFORBORN + MED_AGE +
OOMEDVAL00 + PCTRENTER + HUAGE00 + BLS_UNEMP + PCT_BLACK + COUNTY + CBD_MI, data=tracts.sub, listw=nb.W.queen, method="eigen", quiet=TRUE,type="mixed")

stargazer(list(model.lag,model.lag2,model.err,model.err2,model.durb), column.labels=c("model.lag","model.lag2","model.err","model.err2","model.durb"),type = "text")
## 
## ===========================================================================================
##                                                 Dependent variable:                        
##                          ------------------------------------------------------------------
##                                                      EST_FCS_RT                            
##                                   spatial                   spatial            spatial   
##                               autoregressive                 error           autoregressive
##                           model.lag    model.lag2   model.err    model.err2    model.durb  
##                              (1)          (2)          (3)          (4)           (5)      
## -------------------------------------------------------------------------------------------
## PCTFORBORN                -0.012***      0.0001       -0.002       0.006        0.013**    
##                            (0.003)      (0.0004)     (0.005)      (0.005)       (0.005)    
##                                                                                            
## MED_AGE                   -0.052***    -0.050***    -0.039***    -0.032***     -0.032***   
##                            (0.008)      (0.008)      (0.010)      (0.010)       (0.011)    
##                                                                                            
## OOMEDVAL00               -0.00001***  -0.00000***  -0.00001***  -0.00000***   -0.00000***  
##                           (0.00000)    (0.00000)    (0.00000)    (0.00000)     (0.00000)   
##                                                                                            
## PCTRENTER                 -0.007***    -0.010***      0.004        0.0002        -0.001    
##                            (0.002)      (0.002)      (0.003)      (0.003)       (0.005)    
##                                                                                            
## HUAGE00                    0.006**      0.009***     0.016***     0.019***      0.019***   
##                            (0.003)      (0.003)      (0.004)      (0.004)       (0.005)    
##                                                                                            
## PCT_BLACK                              35.243***                 30.906***     29.664***   
##                                         (2.759)                   (3.540)       (3.607)    
##                                                                                            
## BLS_UNEMP                  0.401***     0.385***     0.871***     0.831***      0.741***   
##                            (0.040)      (0.041)      (0.073)      (0.072)       (0.078)    
##                                                                                            
## COUNTYDeKalb County        -0.621*       -0.229       -0.107       0.014         1.296     
##                            (0.349)      (0.334)      (1.051)      (0.998)       (1.253)    
##                                                                                            
## COUNTYDuPage County         -0.003       0.051        -0.387       -0.379        -0.079    
##                            (0.017)      (0.171)      (0.439)      (0.418)       (0.295)    
##                                                                                            
## COUNTYGrundy County         -0.735       -0.209       -0.835       -0.556        -0.279    
##                            (0.513)      (0.496)      (1.286)      (1.237)       (0.638)    
##                                                                                            
## COUNTYKane County           -0.107       0.078        0.376        0.411         1.270*    
##                            (0.199)      (0.191)      (0.621)      (0.588)       (0.704)    
##                                                                                            
## COUNTYKendall County        -0.468       -0.095       0.153        0.342         1.040     
##                            (0.544)      (0.952)      (0.966)      (0.936)       (1.024)    
##                                                                                            
## COUNTYLake County           0.122        0.242*       -0.334       -0.255        -0.018    
##                            (0.142)      (0.145)      (0.529)      (0.493)       (0.181)    
##                                                                                            
## COUNTYMcHenry County        -0.245       0.071        -0.570       -0.415        -0.057    
##                            (0.231)      (0.236)      (0.751)      (0.709)       (0.517)    
##                                                                                            
## COUNTYWill County           -0.263       -0.038       -0.310       -0.044        -0.150    
##                            (0.183)      (0.144)      (0.502)      (0.480)       (0.597)    
##                                                                                            
## CBD_MI                      0.011        -0.009       0.030        0.035         0.032     
##                            (0.014)      (0.015)      (0.035)      (0.034)       (0.047)    
##                                                                                            
## lag.PCTFORBORN                                                                 -0.020***   
##                                                                                 (0.007)    
##                                                                                            
## lag.MED_AGE                                                                    -0.062***   
##                                                                                 (0.016)    
##                                                                                            
## lag.OOMEDVAL00                                                                 -0.00000**  
##                                                                                (0.00000)   
##                                                                                            
## lag.PCTRENTER                                                                  -0.021***   
##                                                                                 (0.005)    
##                                                                                            
## lag.HUAGE00                                                                      -0.006    
##                                                                                 (0.007)    
##                                                                                            
## lag.BLS_UNEMP                                                                  -0.466***   
##                                                                                 (0.100)    
##                                                                                            
## lag.PCT_BLACK                                                                   13.297**   
##                                                                                 (5.788)    
##                                                                                            
## lag.COUNTYDeKalb County                                                          -1.889    
##                                                                                 (1.357)    
##                                                                                            
## lag.COUNTYDuPage County                                                          -0.138    
##                                                                                 (0.320)    
##                                                                                            
## lag.COUNTYGrundy County                                                          -0.107    
##                                                                                 (0.532)    
##                                                                                            
## lag.COUNTYKane County                                                           -1.537**   
##                                                                                 (0.762)    
##                                                                                            
## lag.COUNTYKendall County                                                         -1.713    
##                                                                                 (1.315)    
##                                                                                            
## lag.COUNTYLake County                                                            0.106     
##                                                                                 (0.257)    
##                                                                                            
## lag.COUNTYMcHenry County                                                         -0.139    
##                                                                                 (0.756)    
##                                                                                            
## lag.COUNTYWill County                                                            -0.129    
##                                                                                 (0.642)    
##                                                                                            
## lag.CBD_MI                                                                       -0.039    
##                                                                                 (0.057)    
##                                                                                            
## Constant                   1.525***     1.370***      0.785        0.421        4.518***   
##                            (0.442)      (0.458)      (0.814)      (0.789)       (0.724)    
##                                                                                            
## -------------------------------------------------------------------------------------------
## Observations                1,845        1,845        1,845        1,845         1,845     
## Log Likelihood            -3,483.556   -3,413.137   -3,520.952   -3,484.599    -3,368.645  
## sigma2                      2.294        2.171        2.246        2.186         2.099     
## Akaike Inf. Crit.         7,003.111    6,864.274    7,077.905    7,007.199     6,807.291   
## Wald Test (df = 1)       1,877.107*** 1,424.893*** 3,556.154*** 2,788.689***   642.051***  
## LR Test (df = 1)         1,084.789***  897.047***  1,009.996***  754.122***    479.306***  
## ===========================================================================================
## Note:                                                           *p<0.1; **p<0.05; ***p<0.01

Task 4: Self-Directed Task

My deep apologies, by I just didn’t have time to do this.