This document describes how we used the HUD-aggregated USPS administrative data on address vacancies, or “USPS vacancy data,” and the University of New Orleans (UNO) Geography Department property condition survey, or “UNO Survey,” to estimate blight reduction in New Orleans from 2010-2012 and 2010-2014.
## Loading required package: ggplot2
Estimates of blight reduction in New Orleans using USPS data and property condition surveys from UNO show that conservatively, blight was reduced by about 15,000 addresses and by perhaps as many as 20,000 addresses between September 2010 and October 2014-February 2015. This represents a 35%-46% decrease.
Through September 2010, the best known data source for tracking blight in New Orleans was United States Postal Service (USPS) vacancy data. This data reflected the numbers of “no-stat” addresses, which are vacant addresses that appear to postal service workers as if they have been/will be vacant for some time. Importantly, the registry covers data for addresses, rather than properties, meaning that the count of “no-stat” addresses is almost certainly greater than the count of blighted properties. The “no-stat” addresses also include vacant lots and unoccupied houses undergoing renovation. Despite these limitations, the USPS vacancy data was the most reliable source from 2005-2010. It had several benefits:
However, for reasons that remain unclear, the quality of the USPS vacancy data greatly diminished between September 2010 and March 2012. The data, which had been released on a quarterly basis before September 2010, became unavailable for 18 months. When it resurfaced, the registry of no-stat addresses in Orleans Parish were reduced from 43,755 addresses in September 2010 to an implausible 2,532 addresses in March 2012. GNOCDC analyzed the compromised data in August 2012, and, by holding constant the total amount of properties given in the September 2010 data release, estimated that there were actually roughly 35,700 “no-stat” addresses. However, given the poor quality of the data, GNOCDC made their August 2012 report the last of its kind.
Without the USPS vacancy data, the most reliable method for estimating blight in New Orleans is a longitudinal survey of blighted in New Orleans, maintained by Peter Yaukey of the UNO Geography Department since 2007. The three most recent surveys were performed in September 2010 (aligning with the last reliable version of the USPS data), in the period between October 2012 and April 2013, and in period between October 2014 and February 2015.
UNO used a two stage cluster sample, with census block groups as the primary sampling unit and houses within selected block groups as the secondary sampling unit. Block groups are selected at random from the population. Surveyors draw a route through each selected block group without prior knowledge of the survey area. All houses on this route were sampled. With minimal exceptions, the same sample was used in each iteration of the survey.
There are some drawbacks to using the UNO survey for making citywide estimates of blighted addresses. The surveye covers most of Orleans Parish, but excludes areas outside of the flood zone of Hurricane Katrina: the West Bank and much of the area between the Mississippi and St. Charles Ave. Also, due to the sample size and survey design, making direct estimates from the UNO data leads to very imprecise results with confidence intervals too large to make significant conclusions. But as the next section shows, we can use the UNO survey in combination with the USPS data to provide useful results.
Independently, neither the USPS Vacancy Data nor the UNO survey results are reliable enough to provide an estimate of the current level of blight in New Orleans. However, the USPS data, at least in the past, has served as a reasonable proxy for the total magnitude of blight, while the UNO survey gives a good sense of the change in blight. Therefore, combining the two data sets can offer reliable and precise estimates.
We make these estimates through the statistical technique of ratio estimation. Ratio estimation uses a well-known variable (called the auxiliary variable) to predict the value of a correlated variable of interest for which less data is available. In this case, the auxiliary variable is no-stat addresses in 2010, when there is both USPS and UNO data. The variable of interest is no-stat addresses in 2012-2013 and 2014-2015, for which there is only UNO data. By looking at the ratio in the level of blight from 2010 to more recent years in the UNO survey, we can estimate how many no-stats there would have been in the USPS data if it were still in good condition. And because patterns in blight should remain relatively constant over the span of 5 years (a high-blight neighborhood in 2010 will generally still be a high-blight neighborhood in 2015), the variables are correlated and ratio estimation should be a suitable technique.
To use ratio estimation, we first define the following variables:
\(N\): the total number of block groups under consideration
\(n\): the number of sampled block groups
\(M_i\): the total number of addresses in block group \(i\)
\(m_i\): the total number of sampled houses in block group \(i\)
\(X\): the total number of no-stat addresses in 2010 (43,755 from USPS data)
\(Y\): the total number of no-stat addresses in subsequent years
\(x_i\): the number of sampled houses in block group i that can be defined as no-stats in 2010
\(y_i\): the number of sampled houses in block group i that can be defined as no-stats in subsequent years
Standard ratio estimation to find a population total \(\hat{Y}\) is performed by: \[ \hat{Y}=\frac{\sum\limits_{i=1}^{n} y_i}{\sum\limits_{i=1}^{n} x_i} \cdot X \]
However, this does not account for the fact that the sample was performed in two stages (a random selection of block groups and then a random selection of houses within selected block groups), and that houses in different block groups had different probabilities of being selected. To account for these unequal selection probabilities, we use: \[ \hat{Y}=\frac{\sum\limits_{i=1}^{n} w_{i} y_{i}}{\sum\limits_{i=1}^{n} w_{i} x_i} \cdot X \] where \(w_i=\frac{N}{n} \frac{M_i}{m_i}\) is the sampling weight of unit \(i\), reflecting the probability of a particular block group being chosen in the first stage of sampling and then of a particular house being chosen within that block group in the second stage. Another factor to consider is the fact that we are using two different data sources with different definitions of what constitutes a blighted address. To address this, we consider three different models of looking at the UNO data. In Model A, only properties that are considered blighted in the UNO survey are used. This model is designed to serve as the most straightforward estimate of blight reduction. Model B, designed to capture everything that could feasibly be considered a no-stat, uses properties that the UNO survey coded as blighted, under the process of renovation, and vacant lots. The third model is designed to most closely line up with the definition of a no-stat. Model C incorporates blight, addresses under the process of renovation, and vacant lots that had been coded as vacant lots in the previous survey. This handling of vacant lots is chosen because as Alison Plyer and Elaine Ortiz note in their work Benchmarks for Blight: How much blight does New Orleans have? many vacant lots are counted as no-stat addresses. However, the US Department of Housing and Urban Developments documentation of the no-stat data also implies that in most cases where demolition occurs, the address is removed from the registry. Therefore, Model C keeps the lots that have been listed as no-stats for a substantial period of time, but removes recent demolitions.
To find the variance and confidence intervals around our estimates, we use jackknife variance estimation. Jackknife estimation takes the following form: \[ \hat{\sigma^2}=\frac{n-1}{n}\sum\limits_{i=1}^{n}(\hat{Y}_{-i}-\hat{Y})^2 \]
where \(\hat{Y}_{-i}\) is the estimate of \(\hat{Y}\) excluding the \(i\)th observation. Essentially, the greater the effect each individual data point has on the final estimate, the greater the resulting variance.