Introduction

Food Appartheid in Chicago

Food Apartheid, first coined by food justice activist Karen Washington, refers to a system of segregation that divides those with access to an abundance of nutritious food and those who have been denied that access due to systemic injustice. While conveying similar meanings as food deserts, the term food apartheid is replacing food deserts in recent days because food apartheid better reflects the structural injustices and disparities in food access by low-income communities and communities of color than food deserts, which only explain the geographical area that experiences low access to healthy food without accounting for deeply rooted history of racial discrimination and injustice.


Chicago, despite being the third largest city in the United States, is one of the cities that experiences severe food apartheid problems, where one in five households in the Chicago area is facing food insecurity, according to the Greater Chicago Food Depository. Food insecurity issues are especially more prevalent in the community areas of the south-side of Chicago where the majority of residents are African-Americans. One reason these areas are suffering from food accessibility is that there are not enough grocery stores and even existing ones are disappearing one by one. The presence of the grocery store in a community area is a very important measure of food accessibility because it provides diverse line of nutritious groceries including fresh produce, fresh meat, deli, and other packaged goods, all of which are crucial factors of healthy diets.


In this analysis, I am focusing on the grocery store locations in the city of Chicago and their potential relationship with the demographic factors including race and socioeconomic status. In order to answer the main question of which areas of Chicago are affected by food apartheid and the special characteristics of those areas, I computed Moran’s I to measure the spatial autocorrelation of grocery store locations, and then performed a spatial regression using spatial autoregressive (SAR) models to look into the relationship between the grocery store locations and several independent factors, while accounting for spatial impact.


Spatial Autocorrelation


Exploratory Data Analysis

Table 1. List of Community Areas in Chicago
ID Name
1 ROGERS PARK
2 WEST RIDGE
3 UPTOWN
4 LINCOLN SQUARE
5 NORTH CENTER
6 LAKE VIEW
7 LINCOLN PARK
8 NEAR NORTH SIDE
9 EDISON PARK
10 NORWOOD PARK
11 JEFFERSON PARK
12 FOREST GLEN
13 NORTH PARK
14 ALBANY PARK
15 PORTAGE PARK
16 IRVING PARK
17 DUNNING
18 MONTCLARE
19 BELMONT CRAGIN
20 HERMOSA
21 AVONDALE
22 LOGAN SQUARE
23 HUMBOLDT PARK
24 WEST TOWN
25 AUSTIN
26 WEST GARFIELD PARK
27 EAST GARFIELD PARK
28 NEAR WEST SIDE
29 NORTH LAWNDALE
30 SOUTH LAWNDALE
31 LOWER WEST SIDE
32 LOOP
33 NEAR SOUTH SIDE
34 ARMOUR SQUARE
35 DOUGLAS
36 OAKLAND
37 FULLER PARK
38 GRAND BOULEVARD
39 KENWOOD
40 WASHINGTON PARK
41 HYDE PARK
42 WOODLAWN
43 SOUTH SHORE
44 CHATHAM
45 AVALON PARK
46 SOUTH CHICAGO
47 BURNSIDE
48 CALUMET HEIGHTS
49 ROSELAND
50 PULLMAN
51 SOUTH DEERING
52 EAST SIDE
53 WEST PULLMAN
54 RIVERDALE
55 HEGEWISCH
56 GARFIELD RIDGE
57 ARCHER HEIGHTS
58 BRIGHTON PARK
59 MCKINLEY PARK
60 BRIDGEPORT
61 NEW CITY
62 WEST ELSDON
63 GAGE PARK
64 CLEARING
65 WEST LAWN
66 CHICAGO LAWN
67 WEST ENGLEWOOD
68 ENGLEWOOD
69 GREATER GRAND CROSSING
70 ASHBURN
71 AUBURN GRESHAM
72 BEVERLY
73 WASHINGTON HEIGHTS
74 MOUNT GREENWOOD
75 MORGAN PARK
76 OHARE
77 EDGEWATER


To begin with, I created a map of the community areas of Chicago. There are a total of 77 community areas with each area surrounded by red borderlines. The names of community areas corresponding to the ID can be found in the table.


Table 2. Number of Grocery Stores in each ComArea of Chicago
ID Name Number of Grocery Stores
8 NEAR NORTH SIDE 15
19 BELMONT CRAGIN 13
22 LOGAN SQUARE 12
28 NEAR WEST SIDE 9
6 LAKE VIEW 9
7 LINCOLN PARK 8
1 ROGERS PARK 7
3 UPTOWN 7
30 SOUTH LAWNDALE 7
31 LOWER WEST SIDE 7
24 WEST TOWN 6
25 AUSTIN 6
32 LOOP 6
14 ALBANY PARK 5
15 PORTAGE PARK 5
2 WEST RIDGE 5
63 GAGE PARK 5
77 EDGEWATER 5
41 HYDE PARK 4
23 HUMBOLDT PARK 4
44 CHATHAM 4
59 MCKINLEY PARK 4
5 NORTH CENTER 4
61 NEW CITY 4
75 MORGAN PARK 4
4 LINCOLN SQUARE 3
42 WOODLAWN 3
16 IRVING PARK 3
17 DUNNING 3
34 ARMOUR SQUARE 3
51 SOUTH DEERING 3
56 GARFIELD RIDGE 3
57 ARCHER HEIGHTS 3
62 WEST ELSDON 3
66 CHICAGO LAWN 3
71 AUBURN GRESHAM 3
12 FOREST GLEN 2
20 HERMOSA 2
21 AVONDALE 2
26 WEST GARFIELD PARK 2
29 NORTH LAWNDALE 2
33 NEAR SOUTH SIDE 2
43 SOUTH SHORE 2
45 AVALON PARK 2
46 SOUTH CHICAGO 2
52 EAST SIDE 2
58 BRIGHTON PARK 2
65 WEST LAWN 2
68 ENGLEWOOD 2
70 ASHBURN 2
73 WASHINGTON HEIGHTS 2
35 DOUGLAS 1
39 KENWOOD 1
40 WASHINGTON PARK 1
11 JEFFERSON PARK 1
13 NORTH PARK 1
10 NORWOOD PARK 1
49 ROSELAND 1
50 PULLMAN 1
53 WEST PULLMAN 1
55 HEGEWISCH 1
60 BRIDGEPORT 1
64 CLEARING 1
67 WEST ENGLEWOOD 1
69 GREATER GRAND CROSSING 1
74 MOUNT GREENWOOD 1
76 OHARE 1
9 EDISON PARK 1
36 OAKLAND 0
37 FULLER PARK 0
38 GRAND BOULEVARD 0
18 MONTCLARE 0
27 EAST GARFIELD PARK 0
47 BURNSIDE 0
48 CALUMET HEIGHTS 0
54 RIVERDALE 0
72 BEVERLY 0


Once the map of Chicago was created, I then plotted the locations of grocery stores all over the Chicago in Figure 2. Each dot represents the grocery store location. From Figure 2, it is already quite intuitive that there are more grocery stores in the north side of Chicago than south side. Table 2 below lists the community areas and the number of grocery stores in each area. While a few of the areas have more than 10 grocery stores, there even exists community areas with zero grocery stores. In figure 3, I filtered the community areas so that only those areas with either more than 10 (in green) or zero (in red) grocery stores. This figure highlights the discrepency in the number of grocery stores between community areas and the fact that those areas filled in red tend to be located at the south side of the city. However, it is not the most appropriate to make any conclusions based solely on this map because this is simply counting the number of grocery stores in each area and there are many other factors that have not been accounted for. For example, although both the areas 18 and 54 have zero grocery stores, the degree of accessibility to grocery stores might be much lower for residents in area 18 than those living in area 54 because there are several grocery stores located right at the border of areas between 18 and 19. Therefore, it is not possible to assume that all of the nine red community areas have the same degree of accessibility to grocery stores.


Moran’s I

From the initial visualizations, it seems to be that the values close to one another tend to be similar, just like the number of grocery stores in each community area. Knowing the locations of grocery stores do not exhibit a completely random spatial pattern, I decided to measure a spatial pattern or clustering by computing Moran’s I statistic.



The Moran’s I statistic is the correlation coefficient for the relationship between a variable (like the number of grocery stores) and its neighboring values. But before computing the correlation, the neighbors have to be defined. While there are many different approaches for creating a list of neighbors, I used poly2nb function where it builds a neighbors list based on regions with contiguous boundaries, that is sharing one or more boundary point. The next step is to add spatial weights to a neighbors list, which is an important step to normalize the Moran’s I statistic so that the range of possible Moran’s I values are between -1 and 1.


## 
## Call:
## lm(formula = num_grocery_lag ~ num_grocery, data = chicago_sf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1069 -1.0166 -0.3729  0.6489  5.0663 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.26701    0.27986   8.100 7.68e-12 ***
## num_grocery  0.32102    0.06381   5.031 3.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.69 on 75 degrees of freedom
## Multiple R-squared:  0.2523, Adjusted R-squared:  0.2423 
## F-statistic: 25.31 on 1 and 75 DF,  p-value: 3.248e-06


Once the neighbors list is created and the weights are calculated, we can compute the aggregated values for each neighborhoods (i.e. a total number of grocery stores in the community area), which is referred to as a spatially lagged value (\(x_{lag}\)). Using the number of grocery stores in each community area of Chicago computed in the setup code chunk above, I plotted the summarized neighborhood value of the number of grocery store (\(X_{lag}\)) against the number of grocery store for each county (\(X\)) for each county. The Moran’s I coefficient between \(X_{lag}\) and \(X\) is the slope of the least squares regression line that best fits the points after having equalized the spread between both sets of data, which can be computed by the linear regression.


There is a slightly easier way to compute the Moran’s I statistic, which is to use a built-in moran.test function that would conveniently return the statistic. Steps are as follows:


## 
##  Moran I test under randomisation
## 
## data:  chicago_sf$num_grocery  
## weights: chicago_nbw    
## 
## Moran I statistic standard deviate = 4.7576, p-value = 9.796e-07
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.321015134      -0.013157895       0.004933677


The result of both linear regression and moran.test is the same at \(I = 0.287\). Although the strength of the relationship is quite weak, this suggests that there exists a positive spatial autocorrelation. If there is no degree of association between \(X_{lag}\) and \(X\), the slope will be close to flat, resulting in a Moran’s I value near 0.


Significance Test

With Moran’s I value of 0.287, what is left is to test the significance of this value. Here I used Monte-Carlo test to prove the significance of Moran’s I value I found above. In a Monte-Carlo test, the attribute values (the number of grocery stores in this case) are randomly assigned to community areas in the data set and, for each permutation of the attribute values, a Moran’s I value is computed. The output is a sampling distribution of Moran’s I values under the Null Hypothesis that attribute values are randomly distributed across the city of Chicago. I then compared the observed Moran’s I value to this sampling distribution. Below is the null and alternative hypothesis for this significance testing.


\[H_O: \text{There is NO spatial autocorrelation; Moran's I is close to 0}\]

\[H_A: \text{There EXISTS spatial autocorreation; Moran's I} \neq 0\]


## $I
## [1] -0.001786983
## 
## $K
## [1] 6.277104
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  chicago_sf$num_grocery 
## weights: chicago_nbw  
## number of simulations + 1: 500 
## 
## statistic = 0.32102, observed rank = 500, p-value = 0.002
## alternative hypothesis: greater


The last step is to create a visualization of 499 sampling distribution of simulated Moran’s I values in histogram and see where the observed Moran’s I value of 0.287 lies.



The histogram indicates that the observed value of 0.287 is not a value one would expect to compute if the number of grocery stores values were randomly distributed across each community area of Chicago. Additionally, with a p-value of 0.002, we can reject the null hypothesis and make a conclusion that there is a spatial autocorrelaiton of the number of grocery stores between community areas of Chicago.


Spatial Regression

To take a step further and investigate the grocery store location’s potential association with other features, I included the spatial regression part that briefly touches on the use of SAR (Simultaneous Autoregressive Model). In order to perform this type of regression, I used the lagsarlm function that takes the following form:

\[Y = \beta_0 + \beta_1X + \rho\sum w_iY_i\]


where \(\rho\) describes the degree of correlation with neighbors, \(w_i\) is the weight on neighbor \(i\), and \(\beta_i\) is the regression coefficients for the variables of interests just like the linear regression. If \(\rho\) value is close to 1, it indicates a high spatial autocorrelation between the variables of interests and it should be accounted for in the analysis. However, on the other hand, if \(\rho\) value is close to 0, it indicates that there is little to no spatial autocorrelation between the variables of interests, in which case the results of the ordinary least square (linear) regression can be trusted and used for the analysis.

Exploratory Spatial Data Analysis

Racial Factors


I only picked White and African American to be included in this analysis to prevent this analysis to be exceedingly long and primarily due to the fact that these two races show very clear contrasts in terms of community areas in which each group lives in. One of the observations that is very evident from Figure 8 and Figure 10 is that white people tend to live in the north side of Chicago, consisting of more than 40% of the total population of those community areas in north. On the other hand, African American people tend to be clustered in the south side of the city, consisting of more than 60% to 80% of the entire population of those community areas in south. This might suggests a moderate to strong spatial correlation in the race of residents in each community area, where the residents of the same race tend to live closer to each other just like the figures describe above. Looking at the scatter plot in Figure 9 and 11, it is possible to observe slightly positive linear association between the number of grocery stores and the percentage of white residents and slightly negative linear association between the number of grocery stores and the percentage of African-american residents in each community area. However, no conclusions can be made before the significance testing.


Socioeconomic Factors


Figure 14 displays the average per capita income for each community area. It seems as though the average per capita income is slightly higher, in general, in the community areas in the north side of Chicago than those in the south side of Chicago. But there are a few areas in the northeast side of the city where the average per capita income is much higher than the rest of the city, and those neighborhoods are clusterd together. Figure 15 describes the poverty rate (people who earn less than $25,000 annually) of each community area. It is quite evident that there are significantly less grocery stores in the same areas that show high rates of poverty, and the majority of residents in these community areas are African Americans. Looking at the scatter plot in Figure 13 and 15, it is possible to observe the positive linear association between the number of grocery stores and the Per Capita Income and the negative linear association between the number of grocery stores and the poverty rate of each community area. However, again, no conclusions can be made before the significance testing.


Interactive Map

This is an interactive map where the user can change the input of their interests and look into the distribution of demographic factors throughout the city of Chicago, overlaid with the grocery store locations, which could hint at the spatial association between the grocery store locations and other demographic factors that I did not include in this analysis. Click one variable of interest at a time other than grocery store.


Ordinary Least Squares Model (Linear Regression)

To test the significance of the independent variables in their relationship with the number of grocery stores in each community area, I started by fitting ordinary least squares regression models first.


1. Number of Grocery stores and the percentage of white residents in each community area

## 
## Call:
## lm(formula = num_grocery ~ Pct_white, data = chicago_sf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1192 -1.9137 -0.4893  1.2508 10.6488 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.25352    0.48603   4.637 1.47e-05 ***
## Pct_white    0.03358    0.01279   2.626   0.0105 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.927 on 75 degrees of freedom
## Multiple R-squared:  0.08419,    Adjusted R-squared:  0.07198 
## F-statistic: 6.895 on 1 and 75 DF,  p-value: 0.01047

2. Number of Grocery stores and the percentage of African-american residents in each community area

## 
## Call:
## lm(formula = num_grocery ~ Pct_black, data = chicago_sf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1902 -1.9168 -0.5258  0.8746 10.9073 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.286280   0.447768   9.573 1.22e-14 ***
## Pct_black   -0.030881   0.008688  -3.555 0.000659 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.829 on 75 degrees of freedom
## Multiple R-squared:  0.1442, Adjusted R-squared:  0.1328 
## F-statistic: 12.63 on 1 and 75 DF,  p-value: 0.0006594

3. Number of Grocery stores and the Per capita income (in thousands) in each community area

## 
## Call:
## lm(formula = num_grocery ~ income1000, data = chicago_sf)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.786 -1.850 -0.363  1.121 10.658 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.91187    0.62577   1.457    0.149    
## income1000   0.06616    0.01578   4.192 7.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.753 on 75 degrees of freedom
## Multiple R-squared:  0.1898, Adjusted R-squared:  0.179 
## F-statistic: 17.58 on 1 and 75 DF,  p-value: 7.474e-05

4. Number of Grocery stores and the percentage of residents earning less than $25,000 annually

## 
## Call:
## lm(formula = num_grocery ~ Pct_poverty, data = chicago_sf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3695 -1.9306 -0.6913  1.0628 11.3665 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.8597     0.6982   6.960  1.1e-09 ***
## Pct_poverty  -0.1650     0.0604  -2.732  0.00784 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.917 on 75 degrees of freedom
## Multiple R-squared:  0.09053,    Adjusted R-squared:  0.07841 
## F-statistic: 7.466 on 1 and 75 DF,  p-value: 0.007837


To briefly touch on the results of a series of linear regressions, all four independent variables of my interests (% white, % African-american, Per capita income, and Poverty rate) are significant predictors of the number of grocery stores in each community area. Interpretations of linear regression specific to each independent variable are as follows:


  1. For every 1% increase in the percentage of White residents, the mean number of grocery stores increases by about 0.03 (p-value = 0.011).
  2. For every 1% increase in the percentage of African-american residents, the mean number of grocery stores decreases by about 0.03 (p-value = 0.002).
  3. For every $1,000 increase in the per capita income, the mean number of grocery stores increases by about 1.07 (p-value = 0.0002).
  4. For every 1% increase in the percentage of residents earning less than $25,000, the mean number of grocery stores decreases by about 0.16 (p-value = 0.014).


Simultaneous Autoregressive Model

While the job seems to be done with the significant results above, we should not forget that we are dealing with spatial data. Therefore, it is necessary to test the existence of spatial autocorrelation. As mentioned before, if the spatial autocorrelation exists and is high, it needs to be accounted by using simultaneous autoregressive model. Otherwise, the assumption of independence, one of the conditions that have to be met to trust the results of linear regression, is violated. Below are the results from the simultaneous autoregressive models.


1. Number of Grocery stores and the percentage of white residents in each community area, accounting for spatial effects

## 
## Call:lagsarlm(formula = num_grocery ~ Pct_white, data = chicago_sf, 
##     listw = chicago_nbw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5.18597 -1.79870 -0.40542  1.04918 10.04833 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.159508   0.541014  2.1432   0.0321
## Pct_white   0.018964   0.011918  1.5912   0.1116
## 
## Rho: 0.45555, LR test value: 10.118, p-value: 0.0014684
## Asymptotic standard error: 0.13075
##     z-value: 3.4842, p-value: 0.00049358
## Wald statistic: 12.14, p-value: 0.00049358
## 
## Log likelihood: -185.8734 for lag model
## ML residual variance (sigma squared): 6.973, (sigma: 2.6406)
## Number of observations: 77 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model), (AIC for lm: 387.86)
## LM test for residual autocorrelation
## test value: 2.8694, p-value: 0.090278

2. Number of Grocery stores and the percentage of African-american residents in each community area, accounting for spatial effects

## 
## Call:lagsarlm(formula = num_grocery ~ Pct_black, data = chicago_sf, 
##     listw = chicago_nbw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5.49268 -1.67312 -0.55633  1.04410  9.60536 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  2.5664129  0.6668712  3.8484 0.0001189
## Pct_black   -0.0202729  0.0085562 -2.3694 0.0178174
## 
## Rho: 0.40763, LR test value: 7.8876, p-value: 0.0049775
## Asymptotic standard error: 0.13498
##     z-value: 3.0201, p-value: 0.0025273
## Wald statistic: 9.1207, p-value: 0.0025273
## 
## Log likelihood: -184.3804 for lag model
## ML residual variance (sigma squared): 6.7779, (sigma: 2.6034)
## Number of observations: 77 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model), (AIC for lm: 382.65)
## LM test for residual autocorrelation
## test value: 1.2581, p-value: 0.26201

3. Number of Grocery stores and the Per capita income (in thousands) in each community area, accounting for spatial effects

## 
## Call:lagsarlm(formula = num_grocery ~ income1000, data = chicago_sf, 
##     listw = chicago_nbw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.43027 -1.77474 -0.39477  0.92216 10.39496 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.369195   0.630457  0.5856 0.558145
## income1000  0.046726   0.015754  2.9660 0.003017
## 
## Rho: 0.36781, LR test value: 6.2974, p-value: 0.012091
## Asymptotic standard error: 0.1391
##     z-value: 2.6442, p-value: 0.0081876
## Wald statistic: 6.992, p-value: 0.0081876
## 
## Log likelihood: -183.0641 for lag model
## ML residual variance (sigma squared): 6.5994, (sigma: 2.5689)
## Number of observations: 77 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model), (AIC for lm: 378.43)
## LM test for residual autocorrelation
## test value: 5.5531, p-value: 0.018448

4. Number of Grocery stores and the percentage of residents earning less than $25,000 annually, accounting for spatial effects

## 
## Call:lagsarlm(formula = num_grocery ~ Pct_poverty, data = chicago_sf, 
##     listw = chicago_nbw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5.21852 -1.59665 -0.63306  1.11119  9.74928 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  2.707736   0.821263  3.2970 0.0009771
## Pct_poverty -0.099946   0.055843 -1.7898 0.0734934
## 
## Rho: 0.45318, LR test value: 10.202, p-value: 0.0014027
## Asymptotic standard error: 0.13026
##     z-value: 3.479, p-value: 0.0005033
## Wald statistic: 12.103, p-value: 0.0005033
## 
## Log likelihood: -185.5636 for lag model
## ML residual variance (sigma squared): 6.9209, (sigma: 2.6308)
## Number of observations: 77 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model), (AIC for lm: 387.33)
## LM test for residual autocorrelation
## test value: 2.9642, p-value: 0.085129


All four of models return \(\rho\) values between 0.33 and 0.42 with their respective p-values less than 0.05. Such moderately positive \(\rho\) values tell us that there does exists the difference, though not by much, in the results between SAR model and linear regression model. Therefore, it might be a good idea that we take into account of spatial autocorrelation when interpreting the regression results and making conclusions.

Interpretations of SAR models specific to each independent variable are as follows:


  1. After accounting for spatial autocorrelation between neighboring community areas, the p-value for the percentage of white residents is greater than 0.05. Therefore, we fail to reject the null hypothesis and conclude that this is not a statistically significant predictor for the number of grocery stores in each community area of Chicago. (p-value = 0.135).

  2. After accounting for spatial autocorrelation between neighboring community areas, for every 1% increase in the percentage of African-american residents, the mean number of grocery stores decreases by about 0.02 (p-value = 0.03).

3.After accounting for spatial autocorrelation between neighboring community areas, for every $1,000 increase in the per capita income, the mean number of grocery stores increases by about 0.05 (p-value = 0.004).

  1. After accounting for spatial autocorrelation between neighboring community areas, the p-value for the poverty rate is greater than 0.05. Therefore, we fail to reject the null hypothesis and conclude that this is not a statistically significant predictor for the number of grocery stores in each community area of Chicago. (p-value = 0.09).


To sum up, the results have changed after the use of SAR model, which accounted for the spatial correlations. Two of the variables (percentage of white residents and percentage of residents earning less than $25,000) that were significant in the linear regression are no longer significant under SAR model. However, the other two variables (percentage of African-american residents and Per capita income) remain significant even after considering the spatial autocorrelation. Therefore, percentage of African-american residents and the per capita income of each of the community area of Chicago can be statistically significant predictors of the number of grocery stores in the area.


About Ecological Fallacy

An Ecological Fallacy is a formal fallacy in the interpretation of statistical data that occurs when if the observed relationships at aggregate (group) levels are falsely attributed to individual levels. This should be avoided because it can lead to erroneous conclusions and false assumptions about relationships especially in social phenomena. An Ecological Fallacy can also be used to justify the false belief or assumption.


In the case of this analysis, all of the demographic data and the number of grocery stores are aggregated at the community area level. Therefore, all conclusions regarding spatial autocorrelation and the significance of demographic factors in relation to the number of grocery stores should not be applied to smaller neighborhoods, household, or individual levels. It is likely that the results could change (possibly drastically) if the same analysis would be done with data gathered at different aggregate levels.