Estimating Walkability: Model Updates

How has model of walkability changed after model building and verification?

Aaron Weinstock, ESRI R&D Arlington

1. Updated Mathematical Model for Walkability

We calculate walkability \(W\) at point \(o\) as:

\(W_{o} = max\Bigg\{\sum\limits_{i=1}^{g} 100c_i \cdot min\bigg\{\frac{\sum\limits_{p_{i}=1}^{n_{i}} \alpha(d_{p_{i}}) \cdot \beta(r_{p_{i}})}{\sum\limits_{{p_{i}:r_{p_{i}}<m_{i}}} \beta(r_{p_{i}})}, 1\bigg\} + \sum\limits_{j=1}^{t} f_{j} \cdot w_{j} \cdot 100s_{j} \ \ , \ \ 0\Bigg\}\)

For convenience, we may use the following short form notation:

\(\Omega_{o} = \frac{\sum\limits_{p_{i}=1}^{n_{i}} \alpha(d_{p_{i}}) \cdot \beta(r_{p_{i}})}{\sum\limits_{p_{i}:r_{p_{i}}<m_{i}} \beta(r_{p_{i}})} \ \ \ and \ \ \ \Psi_{o} = 100 \cdot \sum\limits_{j=1}^{t} f_{j} \cdot w_{j} \cdot s_{j}\)

Such that:

\(W_{o} = max\bigg\{\sum\limits_{i=1}^{g} c_i \cdot \min\Big\{\Omega_{o}, 1\Big\} + \Psi_{o} \ \ , \ \ 0\bigg\}\)

We define the following notation for the model:

\(i\): identifier for a DUC (Diverse Use Category) – the infrastructure deemed important for assessing walkability
\(g\): the total number of DUC considered (as a default, we consider four: grocery stores, hospitals, schools, and transit stops)
\(c_{i}\): the category weight associated with DUC \(i\), subject to \(\sum_{c_{i}} = 1\) (as a default, we consider all DUC to have \(c_i = 0.25\). In an interactive setting, users will specify this input, and will be free to set \(c_i = 0\) to indicate that they “do not care” about DUC \(i\))
\(p_{i}\): identifier for a point of DUC \(i\) in the service area (walkable area) of point \(o\)
\(n_{i}\): the total number of points in DUC \(i\) in the service area of point \(o\)
\(y_{i}\): the maximum walkable distance for DUC \(i\) (in minutes) – used to define the extent of a service area (as a default, we consider all \(y_i = 10\). In an interactive setting, users will specify this input)
\(d_{p_{i}}\): the walk-time distance from \(o\) to point \(p_{i}\) (in minutes)
\(\alpha(\cdot)\): the decay function to weight a point \(p_{i}\) based on its walk-time distance from \(o\) (defined below)
- \(\alpha(d_{p_{i}}) = 1 - \frac{d_{p_{i}} - 1}{2 \cdot y_{i} - 2}\)
- Defines a linear decay of “distance weight” as walk-time distance increases. Points within 1 minute will always have distance weight 1, while points within \(y_{i}-1\) to \(y_{i}\) minutes will always have weight 0.5 (i.e., assume that walking 1 minute is always considered twice as advantageous as walking the maximum walkable distance)
\(r_{p_{i}}\): the closeness rank of point \(p_{i}\) to \(o\) amongst all points of DUC \(i\) in the service area (i.e. the closest point will have rank 1, second closest will have rank 2, and so on)
\(m_{i}\): the number of points of DUC \(i\) desired within a walkable distance of \(o\) (as a default, we consider all DUC to have \(m = 1\). In an interactive setting, users will specify this input)
\(\beta(\cdot)\): the decay function to weight a point \(p_{i}\) based on its rank of closeness to point \(o\) amongst all points of DUC \(i\) in the service area (defined below)
- \(\beta(r_{p_{i}}) = minmax\big\{\frac{1}{1+e^{k \cdot (r_{p_{i}} - m_i - 0.5)}}\big\} = \frac{\frac{1}{1+e^{k \cdot (r_{p_{i}} - m_i - 0.5)}} - \frac{1}{1+e^{k \cdot (l - m_i - 0.5)}}}{\frac{1}{1+e^{k \cdot (0.5 - m_i)}} - \frac{1}{1+e^{k \cdot (l - m_i - 0.5)}}}\)
- Defines a logisitic decay of “closeness rank weight” as rank increases. Regardless of walk-time distance, the closest \(p_{i}\) to \(o\) will always have rank weight 1. More generally, all \(p_{i}\) with \(r_{p_{i}} \leq m_i\) will have a “high” rank weight, while all others will have a “low” rank weight. \(l\) defines a function length (default is 100, though any large number will do), which defines the upper bound of the x-domain on which the function is constructed, and is meant to guarantee that all points will have rank weight > 0
- \(k\) – the steepness of the drop-off after the rank \(m_{i}\) point – is defined as the root of the following expression: \(\frac{\frac{1}{1+e^{k \cdot (0.5)}} - \frac{1}{1+e^{k \cdot (l - m_i - 0.5)}}}{b*\big\{\frac{1}{1+e^{k \cdot (0.5 - m_i)}} - \frac{1}{1+e^{k \cdot (l - m_i - 0.5)}}\big\}} - 1\)
  - \(b\): the desired weight for the rank \(m_i + 1\) point (as a default, we set \(b = 0.2\))
  - There is no closed form solution for \(k\), so it is solved for numerically
\(j\): identifier for a CA (City Attribute) – city characteristics deemed to have an effect on walkability
\(t\): the total number of CA considered (as a default, we consider 4: crashes involving pedestrians, crime, historical sites [designated by the National Registry of Historical Places], and street trees)
\(f_{j}\): the sign associated with CA \(j\), indicating whether it considered to have a negative or positive impact on walkability (for example, crime would have \(f_{j} = -1\), because it is considered to make an area less walkable)
\(w_{j}\): the weight associated with CA \(j\) [currently, this is experimental, and isn’t implemented]
\(s_{j}\): the percentage of CA \(j\) occurring in the service area of point \(o\)

2. Ideas We’ve Left Behind

The list below documents previous model features or ideas, and the reasons for ultimately moving past them.

Variable functional form of the decay curve, defined by CA. A “decay parameter” took the form of what is currently defined
as \(\Psi_o\). If \(\Psi_o < 0\), we’d have exponential decay with parameter \(\Psi_o\); if \(\Psi_o = 0\), we’d have linear decay (same as currently defined); if \(\Psi_o > 0\) we’d have “flipped exponential” decay (exponential decay flipped across \(y = -x\)) with parameter \(\Psi_o\).
- Any reasonably interpretable model in the mathematical sense resulted in \(\alpha(y_i) = 0\)
- If a model was built ignoring the issue of interpretation, the curve too quickly approached \(y = 1\) on \(x \ \epsilon \ [1, y_i]\) when the decay parameter was > 0, which results in “clusters” of walk scores at 25, 50, 75, and 100 (because many category scores were being inaccurately inflated to the maximum category score)
- This makes assumptions of both the weight at distance \(y_i\) and potentially three different functional forms. This is more assumptions than we’d like to make.
- The CA should not control both relative weights and functional form, so, we’d have to introduce more inputs. This could be overkill [given our minimal background information on what affects walkability, and how the effects manifest]
- Putting the CA in the \(\alpha(d_{p_i})\) equation implies that all considered CA affect how far one walks in their service area. Though this may be true for some CA (e.g. crime), it may not be true for others (e.g. street trees). It is probably safer to generalize how CA affect walkability.

Solution: Eliminate CA from the distance decay curve altogether, fix the decay curve as linear, and add the CA as individual additive terms to the overall walk score. This generalizes the interpretation of the CA, and allows for more flexibility in weighting the CA if we see fit (e.g., maybe we want to model crime as more important to walkability than street trees)

Unbounded walk score – \(W_o \ \epsilon \ [0, \infty]\)
- Category weights on a [0, 100] scale (as was the present constraint) make no sense if we’re not bounding the category scores on a base-10 scale. To habe category weights make sense in an unbounded context, we would have to set weights based on the observed category scores, and this eliminates the ability to make relative comparisons (both within and across cities)

Solution: normalize category scores by a “best-case-scenario” (BCS) score, where \(BCS = \sum\limits_{r_{p_i} = 1}^{m_i} \beta(r_{p_i})\) (i.e., the category score if all desired points of DUC \(i\) are within 1 minute of \(o\)). If the category score is greater than the BCS, we simply record the category walk score as a “perfect” 1

Fixed steepness \(k\) for the drop off in marginal utility of the closeness rank weight decay curve
- There is no good reason to pick one! Using logistic decay is a novel concept in marginal utility modeling for walkability, so there is no research baseline on which to base a selection

Solution: numerically solve for steepness based on a pre-selected “boundary weight” for the rank \(m_i + 1\) point

3. Assumptions

Despite a user specifying a maximum walk time distance they’d allow, all walk times in that range are not equal to some extent unique to a user.
A linaer decay is desirable for modeling a user’s walking habits with relationship to distance. Furthermore, this type of decay is true across all DUC and all users.
For a maximum walk time distance \(y_{i}\) minutes, it is reasonable to think that walking between 0 and 1 minutes is twice as preferable as walking between \(y_{i} - 1\) and \(y_{i}\) minutes.
Marginal utility can be applied to having things within a walkable distance of you.
Logistic decay is appropriate for modeling marginal utility of having points in a DUC close to you. This means that, given a desired number \(m_i\), points with ranks \(1, 2, ..., m_i\) will have relatively high weight, followed by a steep drop such that points \(m_i + 1, m_i + 2, ..., n_i\) will have relatively low weight.
The weight for the rank \(m_i + 1\) point can be reasonably preselected. Furthermore, this weight is true across all DUC and all users.
CA affect walkability in a “general” sense – they affect the walkability of an area overall, not the way in which users walk to points in their service areas. In other words, the CA have no (or very limited) spatially unique, individual behavioral effects.