Task 1: Data Preparation and basic Analytics

Figure 1: Foreclosure Rates by County with legend, colors and axis adjustments

Task 2:

Figure 2: Bi-variate Scatter Plot and Regression Analysis for 4 potential predictors of Foreclosure Rate by County

Question 1a:

It seems that Unemployment Rate and Housing Age are the variable that most clearly correlated with Foreclosure Rates by County. These two variables correlate positively with Foreclosure, hence it can be said that as they increase, also does the Foreclosure Rates in a County, by a certain degree proxied here by the R-Squared.

Question 1b:

Figure 2.1 and 2.2 below show an assesment of the idea of exploring racial concentration in space as a way to explain the rate of foreclosures by county. First, I collapsed the data set initially at the tract level to the county level by taking the mean of the available variables per county. This created a new variable which is the percentage of racial concentration a county has. Then I estimated an OLS regression of the percentage of racial concentration on the rate of foreclosures at the county level. As it is expected, the results are week on pointing a positive relation between the two variables.

Disclaimer: I has to be said that although it seems and it could be reasonable to not find much of a correlation between this two variables (it implies that ratially concentrated units have bad financial problems), there could be aggregation and ecological falacy problems in the previous graph and regression. The optimal case is always to run estimations in the level that makes sense running it, which in this case would have been Census Tracts, however there is no information abaut the foreclusure rate at this level.

Figure2.1 Two-way Scatterplot between the Perc. of Foreclosures and Perc. of Racial Concentration at the County level

Figure 2.2 Comparison between the Perc. of Foreclosures and Perc. of Racial Concentration at the County level

Task 2: The Spatial Weights Matrix and Global Tests

Question 2:

It seems that the both the moran test and scatter plot cannot reject the Null Hipotesis of global spatial-auto correlation of Foreclosure rates at the couty level. From results in Task 1, this is expected since what drives Foreclosures seems to be something that is more or less random in space, or has no spatial clustering component, at least at the county level. (Again, potential ecological falacy problem)

moran.test(counties$EST_FCS_RT, nb.W.queen, randomisation=TRUE, zero.policy=TRUE, alternative="two.sided", rank = FALSE, na.action=na.fail, spChk=NULL, adjust.n=TRUE)

## 
##  Moran I test under randomisation
## 
## data:  counties$EST_FCS_RT  
## weights: nb.W.queen  
## 
## Moran I statistic standard deviate = -1.2805, p-value = 0.2004
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -0.0386220126     -0.0099009901      0.0005030537

moran.plot(counties$EST_FCS_RT, nb.W.queen, zero.policy=TRUE, spChk=NULL, xlim=c(-10, 60), ylim=c(0, 50), labels=as.character(counties$COUNTY), xlab="Foreclosure Rate",ylab="Spatial Lag of Foreclosure Rate", quiet=FALSE, pch=20, cex.lab=1.2, cex.axis=0.7, las=1, family="Arial")

## Potentially influential observations of
##   lm(formula = wx ~ x) :
## 
##                dfb.1_  dfb.x   dffit   cov.r   cook.d  hat    
## Cook County    -1.38_*  2.12_*  2.13_* 15.41_*  2.28_*  0.93_*
## DuPage County   0.39   -0.17    0.42    0.78_*  0.08    0.01  
## Kane County     0.22   -0.04    0.28    0.89_*  0.04    0.01  
## Lake County     1.01_* -0.34    1.17_*  0.20_*  0.30    0.01  
## McHenry County  0.28   -0.10    0.31    0.87_*  0.05    0.01  
## Will County     0.22   -0.02    0.29    0.87_*  0.04    0.01

Task 3: Local Measures of Spatial Autocorrelation

Question 3:

Figure 3.1 and 3.2 show the spatial clustering patern emerging from the aplication of the Local Moran’s I test on the variables EST_FCS_RT (Foreclosure Rate) and EST_HCL_RT (High Cost Loans) respectivelly. The idea behind this analysis is that one should find a potential co-occurrence of spatial clusters and outliers in both variables since high cost loans would have put househols in a higher financial burden hence increasing the risk of foreclosure occurrence.

However, it seems that there is no spatial co-ocurrence of this two variables (at least at 5% and 10% of significance levels) when using the Local Moran’s I. The rate of High Cost Loans (HCL) seems to particularly have local spatial clustering in the central counties whitin Illinois, whereas the Rate of Foreclosures seems to have no concentration in space.

Figure 3.3 provides a more detailed way to assess the particular influence of each county in the clustering/outlier pattern for each of these two variables. This figure combines the assesment of clusters and outliers in a single graph. Here the values of the variables and their respective lags are contrasted with the significance value of their local moran’s I. This moran map reflects what was partially shown in the moran scatterplot of Task 2. Hence as it can be seen from figure 3.3, Foreclosure Rates seem to have a clustering pattern of negative (High-Low) spatial autocorrelation where Cook County is High and its neighboors are low. This maps precisely the findings form the moran scatterplot of Task 2. In the case of the Rate of High Cost Foreclosures, it seems that some central counties have clustered among negative values (Low-Low), whereas some southern counties have clustered with positive values (high-high). In sum, although Foreclosure Rates and Rates of High Cost Loans seem to be not related, each variable in itself can tell a more intresting story about how each county seems to be influenced by it’s neighboors.

Figure 3.1 Spatial Clusters and Spatial Outliers for the Foreclosure Rate by County

Figure 3.2 Spatial Clusters and Spatial Outliers for the Rate of High Cost Loans by County

Figure 3.3 Moran Map for Foreclosure Rates and the Rate of High Cost Loans by County

#### Another way to check for local moran's patterns
# create a lagged variable
counties$lag_EST_FCS_RT <- lag.listw(nb.W.queen, counties$EST_FCS_RT)
counties$lag_EST_HCL_RT <- lag.listw(nb.W.queen, counties$EST_HCL_RT)

mean <- mean(counties$EST_FCS_RT)
mean_lag <- mean(counties$lag_EST_FCS_RT)

counties$quad_sig <- NA
counties@data[(counties$EST_FCS_RT >= mean & counties$lag_EST_FCS_RT >= mean_lag) & (counties$locmoran[, 5] <= 0.05), "quad_sig"] <- 1 #high-high
counties@data[(counties$EST_FCS_RT <= mean & counties$lag_EST_FCS_RT <= mean_lag) & (counties$locmoran[, 5] <= 0.05), "quad_sig"] <- 2 #low-low
counties@data[(counties$EST_FCS_RT >= mean & counties$lag_EST_FCS_RT <= mean_lag) & (counties$locmoran[, 5] <= 0.05), "quad_sig"] <- 3 #high-low
counties@data[(counties$EST_FCS_RT <= mean & counties$lag_EST_FCS_RT >= mean_lag) & (counties$locmoran[, 5] <= 0.05), "quad_sig"] <- 4 #low-high
counties@data[ (counties$locmoran[, 5] > 0.05), "quad_sig"] <- 5 #Non-Significant

mean2 <- mean(counties$EST_HCL_RT)
mean2_lag <- mean(counties$lag_EST_HCL_RT)


counties$quad_sigHCL <- NA
counties@data[(counties$EST_HCL_RT >= mean2 & counties$lag_EST_HCL_RT >= mean2_lag) & (counties$locmoranHCL[, 5] <= 0.05), "quad_sigHCL"] <- 1 #high-high
counties@data[(counties$EST_HCL_RT <= mean2 & counties$lag_EST_HCL_RT <= mean2_lag) & (counties$locmoranHCL[, 5] <= 0.05), "quad_sigHCL"] <- 2 #low-low
counties@data[(counties$EST_HCL_RT >= mean2 & counties$lag_EST_HCL_RT <= mean2_lag) & (counties$locmoranHCL[, 5] <= 0.05), "quad_sigHCL"] <- 3 #high-low
counties@data[(counties$EST_HCL_RT <= mean2 & counties$lag_EST_HCL_RT >= mean2_lag) & (counties$locmoranHCL[, 5] <= 0.05), "quad_sigHCL"] <- 4 #low-high
counties@data[ (counties$locmoranHCL[, 5] > 0.05), "quad_sigHCL"] <- 5 #Non-Significant

Task 4: The MAUP problem

Question 4:

According to the Moran Test and it’s scatterplot, there seems to be significat evidence of positive spatial autocorrelation of Foreclosures at the Census Trakct Level

## 
##  Moran I test under randomisation
## 
## data:  ch.tracts.df$EST_FCS_RT  
## weights: nb.W.queen  
## 
## Moran I statistic standard deviate = 57.502, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.7396078127     -0.0005336179      0.0001656786

Question 5 and 6:

Figures 4.1 and 4.2 show much more clear evidence of the existance of Spatial Clusters of Foreclosures at the Census Tract Level. As mentioned before, this is expected since the Foreclosure phenomena (and it’s potential contagion) is likely to occurr at more dissagregated spatial levels since the negative externalities of it may occurr at the housing unit level which is more likely to be captured at smaller levels of aggregation such as Census Tracts. Again Figure 4.2 is presented as a way to asses the differences between negative and positive spatial autocorrelation in foreclosure rates. As it can be seen, this figure confirms the well-known pattern of clustering of high rates of foreclosures in the south side of Chicago, and the clustering of low rates of foreclosures in the northern areas of Chicago.

Finnaly, this is a clear example of the potential dangers of using aggregated data to analize patterns that may occurr at a different spatial scale (MAUP problem).

Figure 4.1 Spatial Clusters and Outliers of Foreclosure Rates at the Census Trackt Level Chicago MSA

Figure 4.2 Moran Map for Foreclosure Rates and the Rate of High Cost Loans by Census Tract Chicago MSA

Task 5: Self Directed Aplication

Question 7:

As it can be seen, the Moran Plot in Figure 5 shows that there are areas in the city with high spatial auto-correlation of Lead in 2013. Particularly, the area marked as high-high represents an interesting results since I believe these are old nighborhoods that migh have not been renewed in a long time.

Figure 5. Moran Plot for the Lead Screenning Rate for 2013

ELopezUP519LabW12

Esteban Lopez

April 13, 2016