Creating a Walkability Raster: Version 1.0

A technical outline of the algorithm used to produce the walkability raster seen in ArcGIS Hub analytics

Aaron Weinstock, ESRI R&D Arlington

Summary

A High Level View of the Walkability Analysis

The algorithm producing the ArcGIS Hub analytics walkability raster for a particular area of interest (AOI) follows the following general steps:

Generate an appropriately sized sample of points for the AOI.
For each point in the sample, generate a service area of the walkable area around that sample point.
For each point in the sample, find the walk-time distance between the sample point and all facilities indicators that fall within the sample point’s service area.
For each point in the sample, calculate the percentage of area indicators that fall within the sample point’s service area, relative to all occurrences of that area indicator in the city.
For each point in the sample, apply the values collected in steps 3 and 4 to an existing formula to calculate a walkability score.
From the combination of all sample points and their walkability scores, interpolate a walkability score raster that covers the entirety of the AOI.

Below, the specifics of each step will be defined in technical detail. First, some terminology will be clarified.

Terminology used in the Analysis

AOI: An area of interest over which an administrative body would like to compute a walkability raster. Usually, this will be a city or county, but it must have a defined extent (available in a shapefile), and neighborhood boundaries should ideally be available.
Facilities Indicators: Private businesses, community infrastructure, or other amenities that a user might desire as a walkable facility. Generally, the amenities included here should be types of places to which a user would regularly go, either voluntarily or by need. Facilities indicators for raster creation include grocery stores, health centers, schools, and transit stops.
Area Indicators: Elements of a city that may generally influence whether or not a user walks on a particular route or through a particular area. Generally, the elements included here should be something that have obvious and intuitive correlation with likelihood of walking. Area indicators for raster creation include incidences of crime, vehicle crashes involving pedestrians, historical sites designated by the National Registry of Historical Places, and street trees.
Maximum Walk-Time Distance: The user-specified maximum distance, in minutes walking, that the user is willing to walk to each facilities indicator. This is defined for each facilities indicator, and the distances are not required to be unique. For raster creation, it is assumed that this distance is 10 minutes, regardless of facilities indicator.
Desired Number: The user-specified number of a facilities indicator that the user wants within the maximum walk-time distance for that facilities indicator. This is defined for each facilities indicator, and the numbers are not required to be unique. For raster creation, it is assumed that this number is 1 facility, regardless of facilities indicator.
Category Weight: The user-specified weight assigned to a facilities indicator, signifying the importance of that indicator to the user (e.g. a weight of 0 signifies an indicator does not matter to the user, while a weight of 1 signifies that indicator is the only one that matters to the user). This is defined for each facilities indicator, and the weights are not required to be unique. For raster creation, it is assumed that weights for all facilities indicators are equal.
Service Area: The area around a particular point accessible within a specified time or distance by using a specified travel mode. For the purpose of this walkability analysis, the travel mode is always walking. The time may be variable based on the maximum walk-time distances specified by a user; however, per the maximum walk-time distance assumption detailed above, raster creation involves 10 minute service areas.

Step 1: Generating a Sample

Generating a sample for the walkability analysis is a two step process:

1.1. Derive an appropriate sample size from the spatial extent of the AOI, and the presence of streets within it. This may be done using existing ArcGIS tools or other software, though ArcGIS tools will likely do it the most efficiently.
1.2. Perform spatial stratified regular sampling, where the strata are the neighborhoods in the AOI. This must be done outside of existing ArcGIS tools, as the software’s sampling functionality curerntly only supports random sampling.

Step 1.1: Calculating Sample Size

Sample size is induced from the following process:

1.1.1. Lay a grid of points over the bouding box of the AOI. The distance between two points is calculated as half of the maximum radius of a service area, in an effort to design for service area overlap at all points.
- For the walkability raster, 10 minute service areas are assumed. Per ESRI’s existing default, a 5 km/hr walking speed is assumed. So, maximally, a service area will have radius 5/6 km. Consequently, the difference between two points is set at 5/12 km (or 0.2589 miles)
1.1.2. Clip the grid obtained in 1.1.1 by the boundary of the AOI, leaving only the points that lie within the AOI boundary.
1.1.3. For each grid point remaining after the clip, calculate the distance between the grid point and the nearest street in the AOI (from a shapefile of streets in the AOI).
1.1.4. From the data gathered in 1.1.3, count the number of grid points that are considered “proximate” to a street.
- “Proximate” is defined as within a one minute walk-time distance to a street, assuming a 5 km/hr walking speed. As the avoidance of accounting for points that are far from roads is a primary goal in this step, the value representing “proximity” will be the shortest possible distance associated with a one minute walk-time distance. Assuming a user is always making positive progress toward a street in either the \(x\) or \(y\) direction, this is manifest in the hypotenuse of a right isosceles triangle, where the equivalent sides are 30 walking seconds, or 5/120 km in length. Consequently, the proximity value is 0.0589 km, or 58.9256 m.

The result of 1.1.4 will be the sample size. Henceforth, this overall sample size will be referred to as \(n\).

Step 1.2: Performing Stratified Regular Sampling

The sample itself is generated using the following process:

1.2.1. For each neighborhood in the AOI, perform a spatial intersect of the streets in the AOI and the neighborhood, yielding just the neighborhood and its streets.
1.2.2. For each neighborhood, calculate the percent of streets in that neighborhood relative to the AOI as a whole (in terms of absolute number of streets) using the intersections obtained in 1.2.1.
1.2.3. For each neighborhood, multiply the percentage obtained in 1.2.2 by \(n\) to obtain the number of samples allocated to that neighborhood. In an attempt to counteract the conservative nature of the proximity measurement applied in 1.1.4, round the obtained value up to the nearest whole number. Henceforth, the neighborhood sample sizes will be referred to as \(n_j\), where \(j\) represents a neighborhood.
1.2.4. For each neighborhood, perform spatial regular sampling within the neighborhood, with sample size \(n_j\). Upon obtaining these points, snap them to the nearest street. Regular sampling guarantees maximal spatial coverage of the sample points; snapping points to streets guarantees that each point will be a “valid” point from which to ultimately route. Note that spatial regular sampling is approximate, so the number of samples in a neighborhood may not exactly equal \(n_j\); however, it should always be close.
- Note that if neighborhoods are not provided, steps 1.2.1 - 1.2.3 should be skipped, and the regular sampling in this step should be completed with sample size \(n\). The same note applies that the realized number of samples will be approximately \(n\)

The result of 1.2.4 is the sample that will be used to create the walkability raster. Henceforth, the set of sample points will be referred to as \(S\), with individual sample points in \(S\) referred to as \(s\).

Step 2: Generating Service Areas

Generating service areas for \(S\) is a one step process, and can only be completed in ArcGIS:

2.1. Use the Generate Service Areas tool (toolbox: Ready To Use Tools > Generate Service Areas), with “Facilities” = \(S\), “Break Values” = the maximum of the maximum walk-time distances to the facilities indicators (which, under the assumptions of raster creation, is 10), “Break Units” = “Minutes”, and “Travel Mode” = “Walking”, to create service areas for the sample points.
- Note that if \(|S| > 1000\), service areas will need to be generated in a batch process [and then bound back together], as the Generate Service Areas tool can only take up to 1000 facilities points.

The result of 2.1 is the service areas polygons that will be used in analysis going forward; the file will consist of \(|S|\) polygons. Henceforth, the set of service area polygons will be referred to as \(A\), where \(A_s\) will represent the service area for sample point \(s\) in \(S\).

Step 3: Routing to Facilities Indicators

To calculate a walkability score, the walk-time distance (in minutes) between each sample point and all “walkable” facilities indicators must be calculated. An instance of a facilities indicator is considered “walkable” if it is less than the maximum walk-time distance [specified for that facilities indicator] away from the sample point; for the creation of the raster, this distance is assumed to be 10 minutes for all facilities indicators. Obtaining these walk-time distances can be accomplished using existing ArcGIS tools or other software, but it is currently designed to run outside of ArcGIS due to time and credit-cost constraints of ESRI’s network tools. The algorithm, a four step process, is documented as such below:

3.1. Approximate a street network for an AOI using the streets shapefile of the AOI – each intersection of streets will be a vertex, each street segment will be an edge, and the edge weights will be set equal to the length of the corresponding street segment.
3.2. For each service area, perform a spatial intersect between the service area and all facilities indicators, to yield the set of facilities indicators that are walkable from the sample point associated with that service area.
3.3. For each sample point and for each facilities indicator, use a shortest path algorithm on the approximated street network to find the shortest route between the sample point and walkable instances of that facilities indicator (resulting from the intersection performed in 3.2). Because the edge weights in the network are equal to street lengths, the length of the shortest path is equal to the on-the-ground distance of that path.
3.4 For each sample point, facilities indicator, and route, approximate walk-time distances using the 5 km/hr walking speed assumption. For interpretability and to be appropriately conservative, round all obtained walk-time distances up to the nearest whole value. If any of these walk-time distances is greater than the maximum walk-time distance, remove those values. This is possible due to variable maximum walk-time distances – though this particular issue is not relevant to the raster creation – or discrepancies between ESRI street networks and the approximate network defined above.

The result of 3.4 is a list of lists (or equivalent) – indexed by sample point ID and facilities indicator (respectively) – of walk-time distances from each sample point \(s\) to walkable facilities indicators. The outer list will have length \(|S|\), while each inner list will have length equal to the number of facilities indicators (which, for raster creation, is 4). Each of the four inner list elements is a vector (or equivalent) containing the walk-time distances for that facilities indicator; if no occurrences of a facilities indicator are walkable from a given sample point, the inner list vector associated with that facilities indicator is empty. Henceforth, this list will be referred to as \(M\), where \(M_{s,k,i}\) will represent the walk-time distance from sample point \(s\) to the \(i^{th}\) walkable instance of the \(k^{th}\) facilities indicator.

Step 4: Obtaining Non-Accessibility Qualities

Calculating a walkability score also entails understanding some non-accessibility-related qualities of the area around a point; for this analysis, these qualities will be manifest in the percent of an area indicator occurring in a sample point’s service area. Obtaining these percentages may be done using existing ESRI tools or other software, and is a two step process:

4.1. For each service area, perform a spatial intersect between the service area and all area indicators, to yield the set of area indicators that could be reasonably observed while walking near to the sample point associated with that service area.
4.2 For each sample point and each area indicator, calculate the percent of the area indicator occurring in the sample point’s service area, relative to the total number of the area indicator in the AOI.

The result of 4.2 is a table (or equivalent) of percent-occurrences of area indicators in a sample point’s service area, where each row represents a sample point and each column represents a unique area characteristic. Henceforth, this table will be referred to as \(T\), where \(T_{s,q}\) will represent the percentage of area indicator \(q\) occurring in the service area of sample point \(s\).

Step 5: Applying the Walkability Model

To obtain walkability scores, the data obtained in 3 and 4 will be applied to an existing model in a one step process. For each sample point, the walk-time distances to facilities indicators and percent of area indicators occurring in the sample point’s service area will be entered into a formula to calculate the score. The output will be a vector (or equivalent) of walkability scores for all sample points in \(S\), indexed by sample point ID, which can then be added as an attribute to \(S\) for ultimate interpolation. Henceforth, this vector of walkscores will be referred to as \(W\). The model used to produce \(W\) is fully defined below:

Model

The walkability score at sample point \(s\) is calculated as:

\(W_{s} = min\bigg\{\sum\limits_{k=1}^{K} \Big[100c_k \cdot \min\Big\{\Omega_{s}, 1\Big\}\Big] + sign(\Psi_s) \cdot min\Big\{\Big|\Psi_s\Big|, \Omega_s\Big\} \ \ , \ \ 100\bigg\}\)

Where:

\(\Omega_{s} = \frac{\sum\limits_{i=1}^{n_{k}} \alpha(M_{s,k,i}) \cdot \beta(rank(M_{s,k,i}))}{\sum\limits_{i:rank(M_{s,k,i})<z_{k}} \beta(rank(M_{s,k,i})}\)

And:

\(\Psi_{s} = \sum\limits_{q=1}^{Q} f_{q} \cdot 100T_{s,q}\)

Model notation is defined as follows:

\(W_s\): the walkability score for sample point \(s\)
\(s\): identifier for a sample point
\(k\): identifier for a facilities indicator
\(K\): the total number of facilities indicators considered; for raster creation, four are considered: grocery stores, health centers, schools, and transit stops
\(c_k\): the category weight associated with facilities indicator \(k\), subject to \(\sum_{c_{k}} = 1\); for raster creation, it is assumed that \(c_k = 0.25, \ \forall k\)
\(i\): identifier for a walkable instances of a facilities indicator (where “walkable” means within \(A_s\), the service area of sample point \(s\))
\(n_{k}\): the total number of instances of facilities indicator \(k\) in \(A_s\)
\(M_{s,k,i}\): the walk-time distance from sample point \(s\) to the \(i^{th}\) walkable instance of the \(k^{th}\) facilities indicator
\(\alpha(\cdot)\): the decay function to weight \(i\) based on \(M_{s,k,i}\)
- \(\alpha(M_{s,k,i}) = 1 - \frac{M_{s,k,i} - 1}{2 \cdot y_{k} - 2}\)
- \(y_{k}\): the maximum walk-time distance to an instance of facilities indicator \(k\) (in minutes). Also used to define the walk-time extent of \(A\), the set of all service area polygons; for raster creation, it is assumed that \(y_k = 10, \ \forall k\)
- Defines a linear decay of “distance weight” as walk-time distance increases. Instances of a facilities indicator within 1 minute of a sample point will always have distance weight 1, while instances within \(y_k-1\) to \(y_k\) minutes will always have weight 0.5 (i.e., assume that walking 1 minute is always considered twice as advantageous as walking the maximum walk-time distance)
\(rank(M_{s,k,i})\): the rank of \(i\) in terms of its closeness to \(s\) (i.e. the closest instance will have rank 1, second closest will have rank 2…)
\(z_k\): the desired number of facilities indicator \(k\) within a walkable distance of \(s\); for raster creation, it is assumed that \(z_k = 1, \ \forall k\)
\(\beta(\cdot)\): the decay function to weight \(i\) based on \(rank(M_{s,k,i})\)
- \(\beta(rank(M_{s,k,i})) = minmax\big\{\frac{1}{1+e^{h \cdot (rank(M_{s,k,i}) - z_k - 0.5)}}\big\} = \frac{\frac{1}{1+e^{h \cdot (rank(M_{s,k,i}) - z_k - 0.5)}} - \frac{1}{1+e^{h \cdot (l - z_k - 0.5)}}}{\frac{1}{1+e^{h \cdot (0.5 - z_k)}} - \frac{1}{1+e^{h \cdot (l - z_k - 0.5)}}}\)
- \(h\): the steepness of the drop-off after the rank \(z_k\) instance of facilities indicator \(k\) – defined as the root of the following expression: \(\frac{\frac{1}{1+e^{h \cdot (0.5)}} - \frac{1}{1+e^{h \cdot (l - z_k - 0.5)}}}{b*\big\{\frac{1}{1+e^{h \cdot (0.5 - z_k)}} - \frac{1}{1+e^{h \cdot (l - z_k - 0.5)}}\big\}} - 1\)
  - \(b\): the desired weight for the rank \(z_k + 1\) instance of facilities indicator \(k\) (always set such that \(b = 0.2\))
  - There is no closed form solution for \(h\), so it is solved for numerically
- Defines a logisitic decay of “proximity rank weight” as rank increases. Regardless of walk-time distance, the closest \(i\) to \(s\) will always have rank weight 1. More generally, all \(i\) with \(M_{s,k,i} \leq z_k\) will have a “high” rank weight, while all others will have a “low” rank weight. \(l\) defines a function length (the default is 100, though any large number will do), which defines the upper bound of the \(x\) domain on which the function is constructed, and is meant to guarantee that all \(i\) will have \(\beta(\cdot)\) > 0.
\(q\): identifier for an area indicator
\(Q\): the total number of area indicators considered; for raster creation, four are considered: incidences of crime, vehicle crashes involving pedestrians, historical sites designated by the National Registry of Historical Places, and street trees
\(f_{q}\): the sign associated with area indicator \(q\), indicating whether it considered to have a negative or positive impact on walkability (for example, crime would have \(f_{q} = -1\), because it is considered to make an area less walkable)
\(T_{s,q}\): the percentage of area indicator \(q\) occurring in the service area of sample point \(s\)

Assumptions for Raster Creation

It is reasonable to assume, as a default, that a user is willing to walk no more than 10 minutes to each of the facilities indicators, and desire only 1 of each of the facilities indicators within a 10 minute distance. Furthermore, it is reasonable to assume as a default that all facilities indicators matter equally to a user.
A linear decay is reasonable for modeling a user’s walking habits with relationship to distance. Furthermore, this type of decay is true across all facilities indicators and all users.
For a maximum walk-time distance \(y_{k}\) minutes, it is reasonable to think that walking between 0 and 1 minutes is twice as preferable as walking between \(y_{k} - 1\) and \(y_{k}\) minutes.
Marginal utility can be applied to having things within a walkable distance of you.
Logistic decay is appropriate for modeling marginal utility of having instances of a facilities indicator close to you. This means that, given a facilities indicator \(k\) and desired number \(z_k\), instances \(i\) of that facilities indicator with ranks \(1, 2, ..., z_k\) will have relatively high weight, followed by a steep drop such that instances \(z_k + 1, z_k + 2, ..., n_k\) will have relatively low weight.
Given a desired number \(z_k\) of a facilities indicator, it is reasonable to this that the rank \(z_k + 1\) instance of \(k\) has utility 0.2. Furthermore, this weight is true across all facilities indicators and all users.
Area indicators affect walkability in a “general” sense – they affect the walkability of an area overall, not the way in which users walk to facilities indicators in their service areas. In other words, the area indicators have no (or very limited) spatially or personally unique behavioral effects.

Step 6: Generating the Raster

Once walkability scores have been generated and added as an attribute to the original sample \(S\), generating the the walkability raster for an AOI is two step process, best completed using existing ArcGIS tools:

6.1. Use the inverse distance weighting (IDW) tool (toolbox: Spatial Analyst > Interpolation > IDW) with “Search Radius: Number of Points” = 4, and “Power” = 2. Using these parameters allows for the best preservation of existing values of \(W\) while still revealing a smooth pattern of walkability in the AOI.
6.2. Use either:
- the Extract by Mask tool (toolbox: Spatial Analyst > Extraction > Extract by Mask), with “Input Raster” = result of 6.1 and “Feature Mask Data” = AOI boundary
- the Clip tool (toolbox: Data Mangement > Raster > Raster Processing > Clip), with “Input Raster” = result of 6.1, “Output Extent” = AOI boundary, and “Maintain Clipping Extent” checked,
to clip the raster obtained in 6.1 to the area covered by the AOI. Henceforth, this AOI-clipped raster will be referred to as \(R\).

\(R\) is the final output of the algorithm producing the ArcGIS Hub analytics walkability raster for an AOI. It will be presented as a static raster. Users will interact with \(R\) by entering a particular address, and observing how that point performs relative to its neighborhood and city with regard to walkability.