Analysis

Defining Variables

Once the data is cleaned, we can estimate the levels of blight under the following models. First we define our variables:

load("BlightData.RData")
X <- sum(USPS$NOSTAT_RES)
m.10 <- blight.counts$surveyed10
m.12 <- blight.counts$surveyed12
m.14 <- blight.counts$surveyed14
weight.10 <- 1/m.10
weight.12 <- 1/m.12
weight.14 <- 1/m.14

blighted.10 <- blight.counts$blighted10 #Model A
demo.10 <- blight.counts$lot10
demoNC.10 <- blight.counts$lotNC10
proc.10 <- blight.counts$proc10
BDP.10 <- blighted.10+demo.10+proc.10 #Model B
BDncP.10 <- blighted.10+demoNC.10+proc.10 #Model C

blighted.12 <- blight.counts$blighted12
demo.12 <- blight.counts$lot12
demoNC.12 <- blight.counts$lotNC12
proc.12 <- blight.counts$proc12
BDP.12 <- blighted.12+demo.12+proc.12
BDncP.12 <- blighted.12+demoNC.12+proc.12

blighted.14 <- blight.counts$blighted14
demo.14 <- blight.counts$lot14
demoNC.14 <- blight.counts$lotNC14
proc.14 <- blight.counts$proc14
BDP.14 <- blighted.14+demo.14+proc.14
BDncP.14 <- blighted.14+demoNC.14+proc.14

Blight Estimation

We can find the estimated blight in 2012 and 2014 based on each model:

wt.ratio <- function(var1,var2,weight1,weight2){
    #Calculates sample ratios for a two-stage sampling design
    #var1: a vector with the variables of interest in the auxiliary data
    #var2: same as above, for the data being estimated
    #weight1: a vector of sampling weights reflecting selection probabilities for units in the auxiliary data
    #weight2: same as above, for the data being estimated
    #value: the ratio between weight2 and weight1, weighted by selection probabilities
    return(sum(weight2*var2)/sum(weight1*var1))
}

ratioEstA.12 <- wt.ratio(blighted.10, blighted.12, weight.10, weight.12)
ratioEstA.14 <- wt.ratio(blighted.10, blighted.14, weight.10, weight.14)
ratioEstB.12 <- wt.ratio(BDP.10, BDP.12, weight.10, weight.12)
ratioEstB.14 <- wt.ratio(BDP.10, BDP.14, weight.10, weight.14)
ratioEstC.12 <- wt.ratio(BDncP.10, BDncP.12, weight.10, weight.12)
ratioEstC.14 <- wt.ratio(BDncP.10, BDncP.14, weight.10, weight.14)

estA.12 <- ratioEstA.12*X
estA.14 <- ratioEstA.14*X
estB.12 <- ratioEstB.12*X
estB.14 <- ratioEstB.14*X
estC.12 <- ratioEstC.12*X
estC.14 <- ratioEstC.14*X

require(reshape2)

## Loading required package: reshape2

require(ggplot2)

## Loading required package: ggplot2

require(scales)

## Loading required package: scales

blight.estimates <- data.frame(est.2010 <- rep(X,3), est.2012 <- c(estA.12, estB.12, estC.12), est.2014 <- c(estA.14, estB.14, estC.14))
colnames(blight.estimates) <- c("est.2010", "est.2012", "est.2014")
rownames(blight.estimates) <- c("Model A", "Model B", "Model C")
blight.estimates$Model <- c("Model A", "Model B", "Model C")
estimates.molten <- melt(blight.estimates, variable.name = "Year", value.name = "Blighted.Addresses", id.vars = c("Model"))
colnames(estimates.molten) <- c("Model", "Year", "Blighted.Addresses")
estimates.molten$Year <- substr(estimates.molten$Year, 5, 8)

formatNum <- function(x) sprintf("%s", formatC(x, format = "fg", big.mark = ","))
ggplot(data = estimates.molten, aes(x = Year, y = Blighted.Addresses)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75) + 
facet_wrap("Model") + 
geom_text(aes(label = formatNum(Blighted.Addresses), size = 12, vjust = -.3)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Number of Blighted Addresses In New Orleans")

By subtracting the 2012 and 2014 estimate from the USPS count in 2010, we can estimate blight reduction as:

reduction <- X-blight.estimates[,2:3]
reduction$Model <- rownames(reduction)
reduction.molten <- melt(reduction, variable.name = "Year", value.name = "Blighted.Addresses", id.vars = c("Model"))
colnames(reduction.molten) <- c("Model", "Year", "Blighted.Addresses")
reduction.molten$Year <- substr(reduction.molten$Year, 5, 8)

ggplot(data=reduction.molten, aes(x = Year, y = Blighted.Addresses)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75) + 
facet_wrap("Model") + 
geom_text(aes(label = formatNum(Blighted.Addresses), size = 12, vjust = -.3)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Blighted Addresses Reduced in New Orleans from 2010")

In terms of percent reduction this is:

options(digits=3)
1+((reduction[,1:2]-X)/X)

##         est.2012 est.2014
## Model A    0.316    0.614
## Model B    0.228    0.354
## Model C    0.315    0.460

From the above charts and table, we can see sizeable reduction in blight regardless of the model (at least 15,000 by 2014). However, there is a significant spread in the results, with Model A being the most aggressive and Model B the most conservative.

Confidence Intervals

Using the jackknife variance, we estimate confidence intervals around our estimates as:

jk.var <- function(var1, var2, weight1, weight2, X){
    #Calculates the jackknife variance from ratio estimation
    #var1: a vector with the variables of interest in the auxiliary data
    #var2: same as above, for the data being estimated
    #weight1: a vector of sampling weights reflecting selection probabilities for units in the auxiliary data
    #weight2: same as above, for the data being estimated
    #value: the jackknife variance of the estimates
    n <- length(var1)
    sample.estimate <- X*wt.ratio(var1,var2,weight1,weight2)
    jk.estimates = c()
    for(i in 1:n){
        var1.jk <- var1[-i]
        var2.jk <- var2[-i]
        weight1.jk <- weight1[-i]
        weight2.jk <- weight2[-i]
        jk.estimates[i] <- X*wt.ratio(var1.jk,var2.jk,weight1.jk,weight2.jk)
    }
    jk.variance <- ((n-1)/n)*sum((jk.estimates-sample.estimate)^2)
    return(jk.variance)
}

SE.est <- function(variance, n, crit.val = 2.0345){
    #calculates confidence intervals
    #estimate: a point estimate
    #variance: the variance around the point estimate
    #n: the number of primary sampling units used to make the estimate
    #crit.val: the desired critical value to use to form the confidence interval, 2.0345 corresponds to 95% confidence at 33 (n-1) df
    #value: a vector of length 2 giving the low and high ends of a confidence interval 
    SE <- crit.val*(sqrt(variance)/sqrt(n))
    return(SE)
}
n <- nrow(blight.counts)
SE.A.12 <- SE.est(variance = jk.var(blighted.10, blighted.12, weight.10, weight.12, X = X), n = n)
SE.A.14 <- SE.est(variance = jk.var(blighted.10, blighted.14, weight.10, weight.14, X = X), n = n)
SE.B.12 <- SE.est(variance = jk.var(BDP.10, BDP.12, weight.10, weight.12, X = X), n = n)
SE.B.14 <- SE.est(variance = jk.var(BDP.10, BDP.14, weight.10, weight.14, X = X), n = n)
SE.C.12 <- SE.est(variance = jk.var(BDncP.10, BDncP.12, weight.10, weight.12,X = X),n = n)
SE.C.14 <- SE.est(variance = jk.var(BDncP.10, BDncP.14, weight.10, weight.14, X = X), n = n)

From this, we can plot 95% confidence intervals around our estimates:

SE.vec <- c(0,0,0,SE.A.12,SE.B.12,SE.C.12,SE.A.14,SE.B.14,SE.C.14)
estimates.molten$SE <- SE.vec
estimates.molten$High <- formatNum(estimates.molten$Blighted.Addresses + estimates.molten$SE)
estimates.molten$Low <- formatNum(estimates.molten$Blighted.Addresses - estimates.molten$SE)
estimates.molten$Start <- formatNum(estimates.molten$Blighted.Addresses)
estimates.molten$Start[4:9] <- ""
estimates.molten$High[1:3] <- ""
estimates.molten$Low[1:3] <- ""

dodge <- position_dodge(width=0.95)
ggplot(data=estimates.molten, aes(x = Year, y = Blighted.Addresses,ymax=Blighted.Addresses+SE,ymin=Blighted.Addresses-SE)) +
geom_bar(stat = "identity", fill = "dodgerblue", position = dodge, width = .75) +
facet_wrap("Model") + 
geom_errorbar(position = dodge, width = 0.25) + 
geom_text(aes(label = formatNum(Start), size = 12, vjust = -.3)) +
geom_text(aes(label = High, size = 12, vjust = -1.5)) +
geom_text(aes(label = Low, size = 12, vjust = 2.5)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Number of Blighted Addresses In New Orleans")

This implies reductions from 2010 of:

reduction.molten$SE <- SE.vec[-c(1:3)]
reduction.molten$High <- formatNum(reduction.molten$Blighted.Addresses + reduction.molten$SE)
reduction.molten$Low <- formatNum(reduction.molten$Blighted.Addresses - reduction.molten$SE)

High <- reduction.molten$Blighted.Addresses + reduction.molten$SE
Low <- reduction.molten$Blighted.Addresses - reduction.molten$SE

(reduction.molten$Blighted.Addresses-High)/reduction.molten$Blighted.Addresses

## [1] -0.0814 -0.0700 -0.0546 -0.0323 -0.0466 -0.0356

ggplot(data=reduction.molten, aes(x = Year, y = Blighted.Addresses, ymax = Blighted.Addresses+SE, ymin = Blighted.Addresses-SE)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75)+ 
facet_wrap("Model") + 
geom_errorbar(position = dodge, width = 0.25)+ 
geom_text(aes(label = Low, size = 12, vjust = 2.5)) +
geom_text(aes(label = High, size = 12, vjust = -2.2)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank())+
labs(title = "Estimated Blighted Addresses Reduced in New Orleans from 2010")

While the estimates vary quite significantly between models (the spread between Models A and B is over 10,000 units by 2014), the estimates themselves are quite precise (which was the impetus behind using ratio estimation). The widest confidence interval is about 1,700 units. The coefficients of variation for each estimate is below:

SD.A.12 <- sqrt(jk.var(blighted.10, blighted.12, weight.10, weight.12, X = X))
SD.A.14 <- sqrt(jk.var(blighted.10, blighted.14, weight.10, weight.14, X = X))
SD.B.12 <- sqrt(jk.var(BDP.10, BDP.12, weight.10, weight.12, X = X))
SD.B.14 <- sqrt(jk.var(BDP.10, BDP.14, weight.10, weight.14, X = X))
SD.C.12 <- sqrt(jk.var(BDncP.10, BDncP.12, weight.10, weight.12,X = X))
SD.C.14 <- sqrt(jk.var(BDncP.10, BDncP.14, weight.10, weight.14, X = X))

CV.A.12 <- SD.A.12/estA.12
CV.A.14 <- SD.A.14/estA.14
CV.B.12 <- SD.B.12/estB.12
CV.B.14 <- SD.B.14/estB.14
CV.C.12 <- SD.C.12/estC.12
CV.C.14 <- SD.C.14/estC.14
CV <- c(CV.A.12, CV.A.14, CV.B.12, CV.B.14, CV.C.12, CV.C.14)
Estimate <- c("CV-A-12", "CV-A-14", "CV-B-12", "CV-B-14", "CV-C-12", "CV-C-14")
data.frame(Estimate, CV)

##   Estimate     CV
## 1  CV-A-12 0.1076
## 2  CV-A-14 0.1474
## 3  CV-B-12 0.0593
## 4  CV-B-14 0.0732
## 5  CV-C-12 0.0721
## 6  CV-C-14 0.0868

Cofficients of variation above 1 are generally considered to be high-variance. By this standard, all of the models are quite precise, though Model A has a higher variance than the other two models.

Discussion

The three models can be described as below:

Parameter	Model A	Model B	Model C
UNO Survey Variables	Blighted addresses only	Blighted addresses, houses undergoing renovations, all vacant lots	Blighted addresses, houses undergoing renovations, vacant lots that were also vacant in the previous survey
Parsimony	Most straightforward	Adds complexity	Most complex
USPS Consistency	Excludes lots and houses undergoing renovation that would likely be considered no-stats	Includes vacant lots that likely wouldn’t be considered no-stats	Most consistent

The models range from A, which is the most straightforward but possibly too simplistic, to C, which adds complexity in an attempt to align with the no-stat definition as closely as possible. Model B offers a middle ground, being more straightforward than C but more faithful to the USPS definitions than A.

All models show a sizeable reduction in no-stats since 2010, though the magnitude of that reduction varies substantially depending on the model, with Model A providing the most aggressive estimates and Model B the most conservative. This is to be expected based on the contruction of the models. After all, there are two ways to get rid of a blighted structure: renovation and demolition. Model A includes both categories, Model B includes only renovations, and Model C includes renovations as well as some demolitions.

Based on our results, we can likely dismiss Model A as an unrealistic model, particularly in light of the 2014-2015 results. A 60% reduction from 2010 to 2014-2015 is overly aggressive and does not match with historical trends. Both Models B and C provide more realistic, and more precise, results. Based on these two models, we can estimate that between 2010 and 2014-2015, New Orleans saw a reduction of approximately 15,000, and perhaps as many as 20,000, blighted addresses.