Assignment4.knit

The Spatial Distribution of Farmers’ Markets in Philadelphia

Dennis Espejo and Brooke Porter MUSA 5000: Statistical and Data Mining Methods for Urban Data Analysis Eugene Brusilovskiy December 12, 2025

Introduction

Access to healthy food options is a key determinant of health. Food insecurity in low-income neighborhoods in Philadelphia, the site of this reports’ analysis, is an ongoing, public health concern (Swavely et al., 2019). Access to healthy food options varies by neighborhood in Philadelphia due to historical racism that has shaped the distribution of supermarkets, effectively creating food deserts in predominantly Black neighborhoods (Deneer, 2017). Farmers’ markets have potential as a community-led approach to addressing food insecurity. However, not every Philadelphia has access to these markets. This report examines the distribution of farmers’ markets in Philadelphia, shedding light on how farmers’ markets in the city address or exacerbate existing inequity. For this report, we conducted point pattern analysis in R to determine whether farmers markets are randomly located, dispersed or clustered throughout the city. To conduct this analysis, we used a shapefile from the PA Spatial Data Access website that contains information on the locations of farmer markets in Philadelphia in 2013. This spatial analysis of the distribution of farmers markets in Philadelphia provides insights into what neighborhoods lack access to locally grown fresh produce and can guide intervention efforts.

Methods

CSR and Hypothesis Testing

In point pattern analysis, which focuses exclusively on the spatial locations of events represented as points with no associated attributes beyond their coordinates, the primary goal is to evaluate whether the observed spatial pattern departs from complete spatial randomness (CSR). Where under CSR, the location of any one point provides no information about the location of any other point, and all locations within the study area are equally likely to contain a point. Specifically, point pattern analysis addresses the following hypotheses:

H₀: The observed point pattern is random and does not differ significantly from the pattern expected under complete spatial randomness.

Hₐ: The observed point pattern is not random and exhibits significant clustering or dispersion.

Spatial clustering occurs when points are, on average, located closer together than would be expected under CSR, whereas spatial dispersion occurs when the average distance between points is greater than that expected under CSR.

Quadrat Method

One approach for assessing spatial randomness, clustering, or dispersion is the quadrat method. This method divides a study region R into a fixed number of equally sized cells, referred to as quadrats, and evaluates the distribution of points across those cells. The figure below illustrates examples of random, uniform (dispersed), and clustered point patterns as represented using quadrats.

Figure 1: Sample Random, Uniform (dispersed), and Clustered Quadrats (Briggs Henson University, 2010, as cited by Brusilovskiy, 2025)

Under the quadrat method, spatial patterns are evaluated using the variance-to-mean ratio (VMR), with the following interpretations:

Random: VMR ≈ 1

Uniform/Dispersed: VMR < 1

Clustered: VMR > 1

In Figure 1, the pattern highlighted in yellow represents complete spatial randomness, as the VMR is approximately equal to 1. The patterns highlighted in blue and pink represent spatial dispersion and spatial clustering, respectively, corresponding to VMR values less than 1 and greater than 1.

VMR is calculated as the ratio of the variance of point counts across quadrats to the mean number of points per quadrat:

\[ \text{VMR} = \frac{\operatorname{Var}(x)}{\bar{x}} \]

For example, in Figure 1’s pink-highlighted case, there are m=10 quadrats. Quadrats 1–4 and 7–10 contain zero points, while Quadrats 5 and 6 each contain ten points. The total number of points is therefore 20, yielding a mean number of points per quadrat of 2 after dividing the amount of points by quadrat (20/10). Next, we calculate the squared deviation of each quadrat’s point count from the mean (2) using the following formula: \((x_i - \bar{x})^2\) and divide by the total number of quadrats minus one (9). The subtraction of one is needed to account for degrees of freedom. This results in a variance of approximately 17.778, which, when divided by 2, yields a value of about 8.889, indicating a highly clustered spatial pattern.

Although the quadrat method was historically a widely used tool in spatial point pattern analysis, its application has declined in recent years due to several well-documented limitations. Measures of clustering are highly sensitive to the number and size of quadrats used. As illustrated in Figure 2, two quadrat layouts may contain identical spatial point patterns but differ in the number of cells. In this example, one layout yields a quadrat containing three points, while the corresponding area in an alternative layout contains seven points, despite the underlying point locations being identical. Because the VMR depends on how points are aggregated within cells, changes in grid configuration alone can lead to different results, making the analysis dependent on an arbitrary quadrat specification rather than the underlying spatial pattern.

Figure 2 : Two quadrats with varying cell numbers, but points in the same area spatially (Brusilovskiy, 2025)

Second, the quadrat method can obscure within-cell spatial structure. Two cells may contain the same number of points yet differ substantially in how those points are arranged. As shown in Figure 3, each quadrat contains three points; however, the points in the quadrat on the right are tightly clustered near the center, while those in the left quadrat are more evenly dispersed. Despite these differences, the quadrat method weights both cells equally, potentially masking meaningful spatial clustering that occurs across cells.

Figure 3: Two quadrats with same amount of points within each quadrat despite clustered differently (Brusilovskiy, 2025)

Nearest Neighbor Analysis

Nearest Neighbor Analysis (NNA) is a point pattern analysis technique that addresses several limitations of the quadrat method by relying on interpoint distances rather than counts within arbitrarily defined cells. Specifically, NNA calculates the distance between each point 𝑖 and its closest neighboring point 𝑗, and then compares the average of these observed distances to the distance expected under complete spatial randomness (CSR). The ratio of the observed average distance to the expected average distance produces the Nearest Neighbor Index (NNI), which is defined as:

\[ \text{NNI} = \frac{\text{Observed Average Distance}}{\text{Expected Average Distance}} = \frac{\bar{D}_o}{\bar{D}_e} \]

In the NNI equation, the observed average distance (\(\bar{D}_o\)), our numerator value, is represented by the following formula:

\[ \bar{D}_o = \frac{\sum_{i=1}^{n} D_i}{n} \]

Where the numerator in the \(\bar{D}_o\) formula represents the sum of all distances between each point of i and its neighbor, divided by the total number of points in the study region (n) in the denominator . The Expected Average distance under complete spatial randomness (\(\bar{D}_e\)), the denominator of the NNI equation, is represented by the following formula:

\[ \bar{D}_e = \frac{0.5}{\sqrt{n/A}} \]

Where n is the number of points in the area of study, and A is the area of the study region. The following values of NNI correspond to a random, clustered, or dispersed pattern:

NNI ≈ 1, there is a random pattern

NNI ≈ 0, there is a clustered pattern

NNI≈ 2 (at most 2.149), there is a clustered pattern.

The null and alternative hypotheses for Nearest Neighbor Analysis are very similar to those of other point pattern analysis methods, where:

H₀: The observed point pattern in an observed area is significantly random

Ha: The observed point pattern in an observed area is not significantly random (clustered or dispersed).

The significance tests used to reject or fail to reject the null hypothesis are derived from the following formula:

\[ z = \frac{\bar{D}_o - \bar{D}_e}{SE_{\bar{D}_o}} \]

The test statistic provided by the previous formula “has a z (standard normal) distribution” (Brusilovskiy, 2025), meaning the center of the distribution is 0, and the standard deviation is 1 (Bhandari, 2023). What we are essentially doing to generate the test statistic is subtracting the estimate under spatial randomness (\(\bar{D}_e\)) from the observed average distance (\(\bar{D}_o\)) and dividing by an estimate of the standard error of (\(\bar{D}_o\)), which is derived from the following formula:

\[ SE_{\bar{D}_o} = \sqrt{\left(\frac{1}{4\tan^{-1}(1)}-\frac{1}{4}\right)\frac{A}{n^2}} = \frac{0.26136}{\sqrt{n^2/A}} \]

For the purpose of this paper we will not be exploring the precise mathematical components used to calculate \(SE_{\bar{D}_o}\) however, it is necessary to note that \(SE_{\bar{D}_o}\) is necessary when calculating the test statistic to compare the observed distance with the expected distance under CSR. Incorporating the standard error accounts for sampling variability in the observed distances and allows the difference between \(\bar{D}_o\) and \(\bar{D}_e\) to be evaluated on a standardized scale. If this variability were not accounted for, differences between the observed and expected distances could be overstated, potentially leading to incorrect conclusions about the presence of spatial clustering or dispersion. As is the case with most standard normal tables, a z value of 1.96 corresponds to a p value of 0.05. A z score less than −1.96 indicates significant clustering because a negative value means that the average observed distance is smaller than the distance expected under complete spatial randomness (CSR). In this case, the observed points are closer together than would be expected under randomness, and a z score less than −1.96 translates to a p value of 0.05 or smaller. For example, if the observed average distance is 1 foot and the expected distance under CSR is 5 feet, the difference between the two values is negative (1 − 5 = −4). This negative value reflects that distances between points in the observed data are much smaller than the distances expected under CSR, indicating clustering. Conversely, a z score greater than 1.96 indicates significant dispersion because the average observed distance is greater than the distance expected under CSR. Using the same example, if the observed average distance were 5 feet and the expected distance under CSR were 1 foot, the difference would be positive (5 − 1 = 4). This positive value reflects that distances between points in the observed data are much larger than expected under CSR, indicating dispersion. These are simple examples, and the actual significance of the values would depend on the area of study. Still, they illustrate the distinction between z < −1.96 and z > 1.96.

Figure 4: Standard Normal Table (Brusilovskiy, 2025)

Although considered a more advanced method compared to the quadrat method, due to the use of distances between points rather than being constrained to the area of a specific quadrat, Nearest Neighbor Analysis still has several limitations. First, there are various ways to determine our area of study in NNI, which may yield vastly different results. For example, if we use a minimum enclosing rectangle, a rectangle that creates the smallest possible rectangle that successfully captures all points, the analysis will be heavily skewed by outliers such as the point on the far right in Figure 5.

Figure 5: A minimum enclosing rectangle with an outlier on the far bottom right corner (Brusilovskiy, 2025)

Second, because NNI examines the average distance between a point and its neighbor, it may overlook points that are clustered together but distant from other clusters, resulting in the identification of spatial clustering but failing to capture cases of spatial dispersion across clusters, as demonstrated in Figure 6.

Figure 6: An area of analysis with clusters that are dispersed

To illustrate these limitations using real data, consider drawing a square around several hospitals in Philadelphia (Figure 7).

Figure 7: A set of points, representing hospitals, in Philadelphia (Brusilovskiy, 2025)

At first glance, a cluster of hospitals appears to be located near Philadelphia’s Center City area. However, if we use a minimum enclosing rectangle around these points, which the nearest neighbor tool produces, we get a rectangle as displayed in Figure 8.

Figure 8: A set of points, representing hospitals, in Philadelphia (Brusilovskiy, 2025)

When the nearest neighbor analysis is run, only including the points and area within the minimum enclosing rectangle yields a z score of 1.19, which falls within the range of -1.96 to 1.96, indicating no signs of significant clustering or dispersion. Still, there are signs of a clear visual cluster that the NNI analysis fails to detect. A possible solution is to create a rectangle that is similar in area to Philadelphia, although this would only resolve the first listed limitation of NNI, rectangle size, and not the second limitation, analysis of dispersed clusters, where clusters are internally dense but separated from one another across space.

K-Function Analysis

A potential point pattern analysis method for identifying dispersed clusters is Ripley’s K-Function Analysis. The K-function analysis measures the number of points within a given radius (d) around every point in the analysis, exlcuding the point that the radius is drawn around itself. It then compares the cluster of points that are within the given radius d across all points to the overall concentration of points in the study area by calculating the average of all points within each point’s radius d by the point density in the overall area (calculated by dividing the number of points n by the study’s area region a, or n/a). The K function at distance d, denoted as K(d), is then the mean # of points in all radii d divided by the overall point density in the area. The following formula represents how to get K(d):

\[ K(d) = \frac{ \left( \displaystyle \sum_{i=1}^{n} \# \big[ S \in \text{Circle}(s_i, d) \big] \right) / n }{ n / a } = \frac{\text{Mean # of points in all circles of radius } d} {\text{Mean point density in entire study region } a} \]

Within the K-function analysis, complete spatial randomness of K(d) is the area of the circle within radius d squared and multiplied by pi the formula for K(d) under CSR is:

\[ K_(d)=\pi d^2 \]

Aside from CSR, the following imply clustering or dispersion at scale d:

\[ \begin{aligned} K(d) &> \pi d^2 && \text{(clustering at scale } d\text{)}\\ K(d) &< \pi d^2 && \text{(dispersion at scale } d\text{)} \end{aligned} \]

Alternatively, many statisticians and software packages report results of Ripley’s K-function in L(d) terms (Brusilovskiy, 2025). Here, L(d), is derived from the following formula:

\[ L(d)=\sqrt{\frac{K(d)}{\pi}}-d \]

Where under CSR, in R, L(d) equals 0 since K(d)= ℼd2 under CSR, and when simplified, the formula is left with d-d=0.

\[ \text{Under CSR: } K(d)=\pi d^2 \Rightarrow L(d)=\sqrt{\frac{\pi d^2}{\pi}}-d=d-d=0 \]

Where for L(d) clustering and dispersion are measured

\[ \begin{aligned} L(d) &> 0 && \text{(clustering at scale } d\text{)}\\ L(d) &< 0 && \text{(dispersion at scale } d\text{)} \end{aligned} \]

Within k-function analysis, multiple distances around each point are utilized to measure the ranges where points, in general, are clustered or dispersed. For example, if we have an L(d) value that is greater than 0 with separate d radii that are between 60 and 540 feet, but an L(d) = 0 when our radius d is 600 feet, and 1,200 feet is our maximum pairwise distance, then that will imply spatial clustering at short distances, but spatial randomness at further distances. In general, 10 or 20 distance bands are used for k-function analysis (Brusilovskiy, 2025). To determine the maximum distance to include in the analysis, we will calculate the distance between the two points that are farthest apart, also known as the maximum pairwise distance. To then determine the beginning and incremental distance of bands, we use the following formula:

\[ \frac{1}{2}\ \text{maximum pairwise distance} \,/\, (\#\ \text{distance bands}) \]

For example, taking our previous example into account, where 1,200 feet is our maximum pairwise distance, and let’s suppose we want 10 distance bands, we would plug our values into the equation:

\[ \frac{1}{2} \times 1{,}200 \,/\, 10 = 60 \]

Meaning we would start our first radius d at 60 feet, then continue to increment each radius d by 60 feet for the remaining 10 bands until we reach 600.

Distance Bands	Radius (Feet)
1	60
2	120
3	180
4	240
5	300
6	360
7	420
8	480
9	540
10	600

Table 1: Sample distance bands used for Ripley’s K-function analysis based on a maximum pairwise distance of 1,200 feet and 10 distance bands.

Overall, the null and alternative hypothesis for K-Functions analysis are:

H₀: At distance d, the pattern is random (CSR at distance d)

Hₐ1: At distance d, the pattern is clustered

Hₐ2: At distance d, the pattern is uniform

As is necessary with most tests of hypotheses, a test statistic is needed to reject or fail to reject the null hypothesis. Although with k-function analysis, there isn’t a test statistic. Instead, an envelope is constructed. First, either 9, 99, or 999 random permutations of every point across each distance d are run. For each distance d across the permutations, we find the lowest value of L(d), L-(d), the lower envelope, and the highest value of L(d), L+(d), the upper envelope, which are the lowest and highest values, respectively, we would expect at each point d under CSR. We then compare our main L(d) with the envelope created. If our observed value of L(d) lies within the envelope, that is, between L-(d) and L+(d), we can conclude that at this range d there is significant spatial randomness. If our observed value of L(d) is greater than the envelope (larger than L+(d) ) , we can infer that there is significant spatial clustering at this radius d, If our observed value of L(d) is lower than the envelope (less than L-(d) ) we can infer that there is significant spatial dispersion. Figure 9 Represents the envelopes with the dotted line, the lower and upper envelopes, and the solid red line is the observed k-value.

Figure 9 : Sample Observed L(d) with plotted envelopes and Expected (L)d as produced by ArcGIS (Brusilovskiy, 2025)

As shown in Figure 9, distances up to around 200,000 exhibit significant spatial clustering because the observed 𝐿(𝑑) function lies above the upper simulation envelope, which represents the maximum values generated under complete spatial randomness (CSR). For distances between approximately 200,000 and 250,000, most observed values fall within the upper and lower envelopes, indicating no statistically significant departure from CSR at those spatial scales. At distances approaching 250,000, the observed 𝐿(𝑑) values fall below the lower envelope, indicating statistically significant spatial dispersion at larger distance bands.

The blue line in Figure 9 represents the theoretical expectation of 𝐿(𝑑)under CSR. Ideally, this expected value should lie within the simulation envelopes. However, when results are produced using ArcGIS, the theoretical 𝐿(𝑑) under CSR does not incorporate edge correction, which can cause it to fall outside the envelopes (Brusilovskiy, 2025). In contrast, the analysis conducted in R applies appropriate edge corrections, ensuring that the theoretical 𝐿(𝑑) under CSR lies within the upper and lower simulation envelopes.

When performing K-function or L-function analysis, edge effects arise because points near the boundary of the study region have neighborhoods of radius 𝑑 that extend beyond the area of observation. As a result, distances for boundary points are only partially observed, making them non-comparable to points located farther from the edge and potentially biasing estimates of spatial clustering or dispersion.

Ripley’s edge correction addresses this issue by weighting contributions to the K-function based on the proportion of each point’s neighborhood that lies within the study region. For example, if only half of a point’s radius falls within the study area (visualized in Figure 10), the observed number of neighbors within that radius is divided by 0.5 to account for the missing portion of the neighborhood (essentially multiplying all points by 2).

Figure 10 : Example where a radius d (in yellow) falls partially outside of the study area (Brusilovskiy,2025)

In ArcGIS, Ripley’s edge correction works best for rectangular areas of study. When study regions are non-rectangular in ArcGIS, alternative approaches such as the simulated outer boundary values correction may be more appropriate (r can handle other shapes). This method reflects points located near the boundary across the edge of the study region, effectively creating mirrored points that compensate for missing neighbors outside the boundary (Figure 11) (Brusilovskiy, 2025). Still, because we are doing our analysis in R,we will use Ripley’s Edge Correction rather than the simulated outer boundary values correction.

Figure 11 : Visual aid of the application of simulated outer boundary values correction (Brusilovskiy,2025)

Still, there are cases in which raw spatial clustering provides little insight beyond what is already explained by underlying population patterns. For example, consider an analysis of clusters of individuals with at least two dependents across Pennsylvania using point pattern methods. Large clusters of points are likely to appear in highly populated areas simply because more people live there, and therefore more families meet the criterion. In this case, the observed clustering reflects population concentration rather than a substantively meaningful spatial process.

To address this limitation, a non-homogeneous K-function can be used to account for variation in an underlying intensity measure, such as population. Unlike the standard (homogeneous) K-function, this approach generates simulated points according to a spatially varying intensity surface rather than assuming uniform randomness across the study region. Practically, this requires population data for the spatial units in which points may fall, such as county-level population counts obtained from an appropriate shapefile.

The probability that a point is placed within a given county is proportional to that county’s population relative to the total population of the study area, producing values between 0 and 1. As a result, more simulated points are placed in areas with larger populations. This procedure allows points to be randomly distributed in a way that reflects population density, effectively controlling for population when assessing spatial clustering. Simulation envelopes are then constructed in the same manner as in the homogeneous case, using the upper and lower values of 𝐿 ( 𝑑 ) across permutations at each distance 𝑑

Results

First, we plotted the occurrence of farmers markets within the city of Philadelphia. An initial look at the plotted points of farmers markets in the city reveal that there are very few markets in Northeast Philly and parts of North and South Philly. Whereas, Center City and Northwest Philly appear to have a high concentration of farmers markets near each other. Also of note, there are several markets that fall on or near the Philly city boundary, a characteristic which will come into play further along in our analysis.

Figure 12 : Point Pattern Analysis

Next, we used the Nearest Neighbor Index to determine whether the farmers markets data is random, clustered, or dispersed. The nndist.ppp command in R computed the distance from each point to its nearest neighbor. We then used these values to calculate: the mean expected distance (40001.35ft), the mean observed distance (3112.91ft), and the standard error of the average observed distance (265.63ft). With these values, we were then able to calculate the NNI for the data.

We calculated the NNI by dividing the observed average distance (3112.91) by the expected average distance (40001.35) and found that the NNI for this set of points is 0.778. A NNI close to 1 indicates a random pattern. A NNI close to 0 indicates a cultured pattern. And, a NNI close to 2 indicates a dispersed pattern. This dataset’s NNI value is less than 1, which suggests that points are clustered. The hypotheses for our analysis were:

H0: The observed point pattern is random.
Ha: The observed point pattern is NOT random; there is either significant clustering or dispersion.

To determine the significance of this NNI, we calculated the z-score and its p-value. The z-test value was -3.346. Notably, a negative z-score indicates clustering. We then used a 2-tailed test to calculate the p-value of our z-score. We specifically looked at the lower-tail of the test to determine whether or not there is clustering because our z value is less than -1.96. The p-value was 0.0004, which is <.05 and points to a statistically significant difference. It is unlikely that the observed clustering is due to chance. Therefore, we reject the null hypothesis of randomness in favor of the alternative of a non-random distribution.

NNI	.778
Z-score	-3.346
P-value	.0004
Mean expected distance	40001.35
Mean observed distance	3112.91
SE	265.63

Table 2: Nearest Neighbor Analysis Results

While this analysis provides useful information about the occurrence of farmer’s markets in the city. NNI has several limitations. NNI considers average distance only to nearest neighbors, depends greatly on the area of the study region, and doesn’t account for the presence of both clustering and dispersion. To address some of these limitations, we conducted K-function analysis.

K-function analysis considers the mean number of events within a certain radius and takes into account the overall point density in the study region. By capturing clustering/dispersion at multiple spatial scales, K-Function analysis can reveal patterns that NNI misses.The first step in our K-function analysis was to calculate the maximum pairwise distance between points, 56698.31, and divide that value by 2 to determine the maximum distance for our analysis. In this case, 56698.31 divided by 2 was approximately 28,500. We used R for our analysis so we did not specify the distances at which the K-functions should be evaluated. If we had conducted our analysis in ArcGIS, we would have set the beginning distance to 0 and the incremental distance to 2500 feet. A beginning distance of 0 ft allows us to see the entire curve, including clustering and dispersion at small scales. The incremental distance of 2500ft was calculated by taking half of the maximum pairwise distance and dividing it by the number of distance bands (10).

Before plotting the function, we applied Ripley’s edge correction formula, which gives extra weight to neighboring points inside of the study area. This correction is necessary for this analysis because there are several points that lie near or on the boundary (see Figure 12). Below is the plotted Estimated K-function with the Ridge edge correction.

Figure 13 : Ripley’s Estimated K-Function

Here we see that the estimated K(d) line (solid black line) for our observed data points falls above the expected K(d) line under Complete Spatial Randomness (CSR). The position of the observed K(d) line above the expected line indicates that the point pattern has more neighbors within distance r than expected under randomness. In other words, there is apparent clustering in the data.

Next, we commuted the Ripley’s Simulation Envelopes to get at significance. For our analysis we used 9 permutations. Figure 14 shows the plot of the Ripley corrected K-function with a 90% confidence envelope.

Figure 14 : Ripley’s Khat with Confidence Envelopes

The line of actual observations falls above the line of expected observations under CSR and falls outside of the 90% confidence envelope. The khat line falling above the envelope indicates statistically significant clustering at small, medium and large distances. Because we used 9 permutations, we assert that if the points were truly random, 90% of randomly simulated point patterns would fall within this envelope.

For our next step of analysis, we looked at the Ripley’s L-Function, a variance stabilized version of the K-function that makes it easier to visually compare values across distances. We computed the L-function with Ripley’s edge correction and plotted it together with the theoretical L-function under CSR as seen below:

Figure 15 : Ripley’s Estimated L-Function

Here, we see that the CSR is 0 and the observed L(d)-d line falls above 0, which suggests there is clustering in the point pattern.

To consider significance, we once again compute 9 simulations to create a 90% confidence interval.

Figure 16 : Ripley’s L-Function with confidence intervals

The observed L(d)-d remains above the upper confidence envelope at small, medium, and large distances, indicating statistically significant clustering across spatial scales.

Finally, for easier viewing and interpretation, we then rotated the plot clockwise by 45 degrees.

Figure 17 : Ripley’s L-Function with confidence intervals (Rotated)

Nevertheless, the results and our interpretation remain the same: the L(d)-d line remains above the confidence interval for all distances, pointing to statistically significant clustering across spatial scales.

After interpreting these graphs, we are able to address our hypotheses for our K-function analysis. Our hypotheses for this analysis were as follows: H0: At distance d, the pattern is random. Ha1: At distance d, the pattern is clustered. Ha2. At distance d, the pattern is uniform.

Based on our analysis across the K-function and L-function graphs, we reject the null hypothesis that at distance d, the pattern is random in favor of the first alternative hypothesis that at distance d, the pattern is clustered.

It is important to note that K-function analysis does not take population density into consideration. The clustering of farmers markets in certain parts of the city could be due to higher population in the areas where these clusters appear. In other words, the lack of farmers markets in Northeast Philly and certain parts of North and South Philly could be due to there being a smaller population in those parts of the city. We would expect that taking population into account would produce somewhat different results. However, more than anything, we suspect that the same social conditions that shape access to groceries stories, like poverty, might be at play in shaping where farmers markets are located in the city.

Discussion

Farmers markets are a community resource that provide access to locally grown fresh produce. Neighborhoods with high poverty levels and a dearth of supermarkets, also known as food desserts, could serve to particularly benefit from such a resource. In this report, we used Nearest Neighbor Analysis and K-Function Analysis to examine whether farmers markets in Philadelphia are randomly located, dispersed or clustered. The results of both methods indicate statistically significant clustering in the point pattern. Our Nearest Neighbor analysis produced a NNI < 1 (0.778), z < 0 (-3.346) , and a p<.05 (0.0004), pointing to significant clustering. These results are consistent with the results of the K-Function analysis. The K-Function and L-Function graphs show clustering because the observed lines lie above the confidence envelope. For both methods (NNA and K-Function Analysis), we reject the null hypothesis of randomness in favor of the alternative hypothesis that the point pattern is not random. Specifically, both methods detect significant clustering. These results are consistent with our expectations based on our initial visual inspection of the plotted map of farmers markets in the city. We see clusters of farmers markets in Center City and Northwest Philly, while Northeast Philly and parts of North and South Philly have very few markets. Considering both methods together provides us a more nuanced understanding of how this clustering occurs. NNI provides insights into overall clustering, and K-function analysis provides information on what clustering looks like at multiple distances. Overlaying the points for farmers markets over a map of median household income by ZIP code reveals another pattern in observations.

Figure 18 : Farmers’ Markets Overlaid on Median Household Income

The map in the figure above shows that clusters of farmers markets appear across different median income brackets. For example, there are several markets in Center City, an area with a higher median income but hardly any in NE Philly, another higher income area. On the other hand, we see a good amount of markets in West Philly and University City, lower income areas, but almost no markets in South Philly, which also has a lower median income. As such, this map presents a complicated representation about how income and farmer markets might be related. As previously mentioned, we do anticipate that population could be shaping the distribution of farmers markets in the situation. Regardless, we cannot assert that there is a causal relationship between income and the location of farmers markets so this map provides interesting, additional context that should be examined further in future reports.
In conclusion, our analysis indicates that farmers markets in Philadelphia are clustered. This report’s findings have several policy implications. Most notably, Philadelphia neighborhoods do not have equal access to farmers markets, meaning some residents are missing out on the benefits of the local fresh produce that these markets offer. Targeted interventions can address this disparity by creating and supporting community-driven initiatives focused on improving healthy food access. Such interventions could include investing in community gardens, sponsoring food markets, and providing subsidies to companies to open grocery stores in underserved areas.

References

Bhandari, P. (2023, June 21). The Standard Normal Distribution | Calculator, Examples & Uses. Scribbr. Retrieved December 11, 2025, from https://www.scribbr.com/statistics/standard-normal-distribution/

Brusilovskiy, E (2025) Statistical and Data Mining Methods for Urban Data Analysis.

Deener, A. (2017). The origins of the food desert: Urban inequality as infrastructural exclusion. Social Forces, 95(3), 1285-1309.

Swavely, D., Whyte, V., Steiner, J. F., & Freeman, S. L. (2019). Complexities of addressing food insecurity in an urban population. Population health management, 22(4), 300-307.