Food Apartheid, first coined by food justice activist Karen Washington, refers to a system of segregation that divides those with access to an abundance of nutritious food and those who have been denied that access due to systemic injustice. While conveying similar meanings as food deserts, the term food apartheid is replacing food deserts in recent days because food apartheid better reflects the structural injustices and disparities in food access by low-income communities and communities of color than food deserts, which only explain the geographical area that experiences low access to healthy food without accounting for deeply rooted history of racial discrimination and injustice.
Chicago, despite being the third largest city in the United States, is one of the cities that experiences severe food apartheid problems, where one in five households in the Chicago area is facing food insecurity, according to the Greater Chicago Food Depository. Food insecurity issues are especially more prevalent in the community areas of the south-side of Chicago where the majority of residents are African-Americans. One reason these areas are suffering from food accessibility is that there are not enough grocery stores and even existing ones are disappearing one by one. The presence of the grocery store in a community area is a very important measure of food accessibility because it provides diverse line of nutritious groceries including fresh produce, fresh meat, deli, and other packaged goods, all of which are crucial factors of healthy diets.
In this analysis, I am focusing on the grocery store locations in the city of Chicago and their potential relationship with the demographic factors including race and socioeconomic status. In order to answer the main question of which areas of Chicago are affected by food apartheid and the special characteristics of those areas, I computed Moran’s I to measure the spatial autocorrelation of grocery store locations, and then performed a spatial regression using spatial autoregressive (SAR) models to look into the relationship between the grocery store locations and several independent factors, while accounting for spatial impact.
ID | Name |
---|---|
1 | ROGERS PARK |
2 | WEST RIDGE |
3 | UPTOWN |
4 | LINCOLN SQUARE |
5 | NORTH CENTER |
6 | LAKE VIEW |
7 | LINCOLN PARK |
8 | NEAR NORTH SIDE |
9 | EDISON PARK |
10 | NORWOOD PARK |
11 | JEFFERSON PARK |
12 | FOREST GLEN |
13 | NORTH PARK |
14 | ALBANY PARK |
15 | PORTAGE PARK |
16 | IRVING PARK |
17 | DUNNING |
18 | MONTCLARE |
19 | BELMONT CRAGIN |
20 | HERMOSA |
21 | AVONDALE |
22 | LOGAN SQUARE |
23 | HUMBOLDT PARK |
24 | WEST TOWN |
25 | AUSTIN |
26 | WEST GARFIELD PARK |
27 | EAST GARFIELD PARK |
28 | NEAR WEST SIDE |
29 | NORTH LAWNDALE |
30 | SOUTH LAWNDALE |
31 | LOWER WEST SIDE |
32 | LOOP |
33 | NEAR SOUTH SIDE |
34 | ARMOUR SQUARE |
35 | DOUGLAS |
36 | OAKLAND |
37 | FULLER PARK |
38 | GRAND BOULEVARD |
39 | KENWOOD |
40 | WASHINGTON PARK |
41 | HYDE PARK |
42 | WOODLAWN |
43 | SOUTH SHORE |
44 | CHATHAM |
45 | AVALON PARK |
46 | SOUTH CHICAGO |
47 | BURNSIDE |
48 | CALUMET HEIGHTS |
49 | ROSELAND |
50 | PULLMAN |
51 | SOUTH DEERING |
52 | EAST SIDE |
53 | WEST PULLMAN |
54 | RIVERDALE |
55 | HEGEWISCH |
56 | GARFIELD RIDGE |
57 | ARCHER HEIGHTS |
58 | BRIGHTON PARK |
59 | MCKINLEY PARK |
60 | BRIDGEPORT |
61 | NEW CITY |
62 | WEST ELSDON |
63 | GAGE PARK |
64 | CLEARING |
65 | WEST LAWN |
66 | CHICAGO LAWN |
67 | WEST ENGLEWOOD |
68 | ENGLEWOOD |
69 | GREATER GRAND CROSSING |
70 | ASHBURN |
71 | AUBURN GRESHAM |
72 | BEVERLY |
73 | WASHINGTON HEIGHTS |
74 | MOUNT GREENWOOD |
75 | MORGAN PARK |
76 | OHARE |
77 | EDGEWATER |
To begin with, I created a map of the community areas of Chicago. There are a total of 77 community areas with each area surrounded by red borderlines. The names of community areas corresponding to the ID can be found in the table.
ID | Name | Number of Grocery Stores |
---|---|---|
8 | NEAR NORTH SIDE | 15 |
19 | BELMONT CRAGIN | 13 |
22 | LOGAN SQUARE | 12 |
28 | NEAR WEST SIDE | 9 |
6 | LAKE VIEW | 9 |
7 | LINCOLN PARK | 8 |
1 | ROGERS PARK | 7 |
3 | UPTOWN | 7 |
30 | SOUTH LAWNDALE | 7 |
31 | LOWER WEST SIDE | 7 |
24 | WEST TOWN | 6 |
25 | AUSTIN | 6 |
32 | LOOP | 6 |
14 | ALBANY PARK | 5 |
15 | PORTAGE PARK | 5 |
2 | WEST RIDGE | 5 |
63 | GAGE PARK | 5 |
77 | EDGEWATER | 5 |
41 | HYDE PARK | 4 |
23 | HUMBOLDT PARK | 4 |
44 | CHATHAM | 4 |
59 | MCKINLEY PARK | 4 |
5 | NORTH CENTER | 4 |
61 | NEW CITY | 4 |
75 | MORGAN PARK | 4 |
4 | LINCOLN SQUARE | 3 |
42 | WOODLAWN | 3 |
16 | IRVING PARK | 3 |
17 | DUNNING | 3 |
34 | ARMOUR SQUARE | 3 |
51 | SOUTH DEERING | 3 |
56 | GARFIELD RIDGE | 3 |
57 | ARCHER HEIGHTS | 3 |
62 | WEST ELSDON | 3 |
66 | CHICAGO LAWN | 3 |
71 | AUBURN GRESHAM | 3 |
12 | FOREST GLEN | 2 |
20 | HERMOSA | 2 |
21 | AVONDALE | 2 |
26 | WEST GARFIELD PARK | 2 |
29 | NORTH LAWNDALE | 2 |
33 | NEAR SOUTH SIDE | 2 |
43 | SOUTH SHORE | 2 |
45 | AVALON PARK | 2 |
46 | SOUTH CHICAGO | 2 |
52 | EAST SIDE | 2 |
58 | BRIGHTON PARK | 2 |
65 | WEST LAWN | 2 |
68 | ENGLEWOOD | 2 |
70 | ASHBURN | 2 |
73 | WASHINGTON HEIGHTS | 2 |
35 | DOUGLAS | 1 |
39 | KENWOOD | 1 |
40 | WASHINGTON PARK | 1 |
11 | JEFFERSON PARK | 1 |
13 | NORTH PARK | 1 |
10 | NORWOOD PARK | 1 |
49 | ROSELAND | 1 |
50 | PULLMAN | 1 |
53 | WEST PULLMAN | 1 |
55 | HEGEWISCH | 1 |
60 | BRIDGEPORT | 1 |
64 | CLEARING | 1 |
67 | WEST ENGLEWOOD | 1 |
69 | GREATER GRAND CROSSING | 1 |
74 | MOUNT GREENWOOD | 1 |
76 | OHARE | 1 |
9 | EDISON PARK | 1 |
36 | OAKLAND | 0 |
37 | FULLER PARK | 0 |
38 | GRAND BOULEVARD | 0 |
18 | MONTCLARE | 0 |
27 | EAST GARFIELD PARK | 0 |
47 | BURNSIDE | 0 |
48 | CALUMET HEIGHTS | 0 |
54 | RIVERDALE | 0 |
72 | BEVERLY | 0 |
Once the map of Chicago was created, I then plotted the locations of grocery stores all over the Chicago in Figure 2. Each dot represents the grocery store location. From Figure 2, it is already quite intuitive that there are more grocery stores in the north side of Chicago than south side. Table 2 below lists the community areas and the number of grocery stores in each area. While a few of the areas have more than 10 grocery stores, there even exists community areas with zero grocery stores. In figure 3, I filtered the community areas so that only those areas with either more than 10 (in green) or zero (in red) grocery stores. This figure highlights the discrepency in the number of grocery stores between community areas and the fact that those areas filled in red tend to be located at the south side of the city. However, it is not the most appropriate to make any conclusions based solely on this map because this is simply counting the number of grocery stores in each area and there are many other factors that have not been accounted for. For example, although both the areas 18 and 54 have zero grocery stores, the degree of accessibility to grocery stores might be much lower for residents in area 18 than those living in area 54 because there are several grocery stores located right at the border of areas between 18 and 19. Therefore, it is not possible to assume that all of the nine red community areas have the same degree of accessibility to grocery stores.
From the initial visualizations, it seems to be that the values close to one another tend to be similar, just like the number of grocery stores in each community area. Knowing the locations of grocery stores do not exhibit a completely random spatial pattern, I decided to measure a spatial pattern or clustering by computing Moran’s I statistic.
The Moran’s I statistic is the correlation coefficient for the
relationship between a variable (like the number of grocery stores) and
its neighboring values. But before computing the correlation, the
neighbors have to be defined. While there are many different approaches
for creating a list of neighbors, I used poly2nb
function
where it builds a neighbors list based on regions with contiguous
boundaries, that is sharing one or more boundary point. The next step is
to add spatial weights to a neighbors list, which is an important step
to normalize the Moran’s I statistic so that the range of possible
Moran’s I values are between -1 and 1.
##
## Call:
## lm(formula = num_grocery_lag ~ num_grocery, data = chicago_sf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1069 -1.0166 -0.3729 0.6489 5.0663
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.26701 0.27986 8.100 7.68e-12 ***
## num_grocery 0.32102 0.06381 5.031 3.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.69 on 75 degrees of freedom
## Multiple R-squared: 0.2523, Adjusted R-squared: 0.2423
## F-statistic: 25.31 on 1 and 75 DF, p-value: 3.248e-06
Once the neighbors list is created and the weights are calculated, we can compute the aggregated values for each neighborhoods (i.e. a total number of grocery stores in the community area), which is referred to as a spatially lagged value (\(x_{lag}\)). Using the number of grocery stores in each community area of Chicago computed in the setup code chunk above, I plotted the summarized neighborhood value of the number of grocery store (\(X_{lag}\)) against the number of grocery store for each county (\(X\)) for each county. The Moran’s I coefficient between \(X_{lag}\) and \(X\) is the slope of the least squares regression line that best fits the points after having equalized the spread between both sets of data, which can be computed by the linear regression.
There is a slightly easier way to compute the Moran’s I statistic,
which is to use a built-in moran.test
function that would
conveniently return the statistic. Steps are as follows:
##
## Moran I test under randomisation
##
## data: chicago_sf$num_grocery
## weights: chicago_nbw
##
## Moran I statistic standard deviate = 4.7576, p-value = 9.796e-07
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.321015134 -0.013157895 0.004933677
The result of both linear regression and moran.test
is
the same at \(I = 0.287\). Although the
strength of the relationship is quite weak, this suggests that there
exists a positive spatial autocorrelation. If there is no degree of
association between \(X_{lag}\) and
\(X\), the slope will be close to flat,
resulting in a Moran’s I value near 0.
With Moran’s I value of 0.287, what is left is to test the significance of this value. Here I used Monte-Carlo test to prove the significance of Moran’s I value I found above. In a Monte-Carlo test, the attribute values (the number of grocery stores in this case) are randomly assigned to community areas in the data set and, for each permutation of the attribute values, a Moran’s I value is computed. The output is a sampling distribution of Moran’s I values under the Null Hypothesis that attribute values are randomly distributed across the city of Chicago. I then compared the observed Moran’s I value to this sampling distribution. Below is the null and alternative hypothesis for this significance testing.
\[H_O: \text{There is NO spatial autocorrelation; Moran's I is close to 0}\]
\[H_A: \text{There EXISTS spatial autocorreation; Moran's I} \neq 0\]
## $I
## [1] -0.001786983
##
## $K
## [1] 6.277104
##
## Monte-Carlo simulation of Moran I
##
## data: chicago_sf$num_grocery
## weights: chicago_nbw
## number of simulations + 1: 500
##
## statistic = 0.32102, observed rank = 500, p-value = 0.002
## alternative hypothesis: greater
The last step is to create a visualization of 499 sampling distribution of simulated Moran’s I values in histogram and see where the observed Moran’s I value of 0.287 lies.
The histogram indicates that the observed value of 0.287 is not a value one would expect to compute if the number of grocery stores values were randomly distributed across each community area of Chicago. Additionally, with a p-value of 0.002, we can reject the null hypothesis and make a conclusion that there is a spatial autocorrelaiton of the number of grocery stores between community areas of Chicago.
To take a step further and investigate the grocery store location’s
potential association with other features, I included the spatial
regression part that briefly touches on the use of SAR (Simultaneous
Autoregressive Model). In order to perform this type of regression, I
used the lagsarlm
function that takes the following
form:
\[Y = \beta_0 + \beta_1X + \rho\sum w_iY_i\]
where \(\rho\) describes the degree of correlation with neighbors, \(w_i\) is the weight on neighbor \(i\), and \(\beta_i\) is the regression coefficients for the variables of interests just like the linear regression. If \(\rho\) value is close to 1, it indicates a high spatial autocorrelation between the variables of interests and it should be accounted for in the analysis. However, on the other hand, if \(\rho\) value is close to 0, it indicates that there is little to no spatial autocorrelation between the variables of interests, in which case the results of the ordinary least square (linear) regression can be trusted and used for the analysis.
I only picked White and African American to be included in this analysis to prevent this analysis to be exceedingly long and primarily due to the fact that these two races show very clear contrasts in terms of community areas in which each group lives in. One of the observations that is very evident from Figure 8 and Figure 10 is that white people tend to live in the north side of Chicago, consisting of more than 40% of the total population of those community areas in north. On the other hand, African American people tend to be clustered in the south side of the city, consisting of more than 60% to 80% of the entire population of those community areas in south. This might suggests a moderate to strong spatial correlation in the race of residents in each community area, where the residents of the same race tend to live closer to each other just like the figures describe above. Looking at the scatter plot in Figure 9 and 11, it is possible to observe slightly positive linear association between the number of grocery stores and the percentage of white residents and slightly negative linear association between the number of grocery stores and the percentage of African-american residents in each community area. However, no conclusions can be made before the significance testing.
Figure 14 displays the average per capita income for each community area. It seems as though the average per capita income is slightly higher, in general, in the community areas in the north side of Chicago than those in the south side of Chicago. But there are a few areas in the northeast side of the city where the average per capita income is much higher than the rest of the city, and those neighborhoods are clusterd together. Figure 15 describes the poverty rate (people who earn less than $25,000 annually) of each community area. It is quite evident that there are significantly less grocery stores in the same areas that show high rates of poverty, and the majority of residents in these community areas are African Americans. Looking at the scatter plot in Figure 13 and 15, it is possible to observe the positive linear association between the number of grocery stores and the Per Capita Income and the negative linear association between the number of grocery stores and the poverty rate of each community area. However, again, no conclusions can be made before the significance testing.