Once the data is cleaned, we can estimate the levels of blight under the following models. First we define our variables:
load("BlightData.RData")
X <- sum(USPS$NOSTAT_RES)
m.10 <- blight.counts$surveyed10
m.12 <- blight.counts$surveyed12
m.14 <- blight.counts$surveyed14
weight.10 <- 1/m.10
weight.12 <- 1/m.12
weight.14 <- 1/m.14
blighted.10 <- blight.counts$blighted10 #Model A
demo.10 <- blight.counts$lot10
demoNC.10 <- blight.counts$lotNC10
proc.10 <- blight.counts$proc10
BDP.10 <- blighted.10+demo.10+proc.10 #Model B
BDncP.10 <- blighted.10+demoNC.10+proc.10 #Model C
blighted.12 <- blight.counts$blighted12
demo.12 <- blight.counts$lot12
demoNC.12 <- blight.counts$lotNC12
proc.12 <- blight.counts$proc12
BDP.12 <- blighted.12+demo.12+proc.12
BDncP.12 <- blighted.12+demoNC.12+proc.12
blighted.14 <- blight.counts$blighted14
demo.14 <- blight.counts$lot14
demoNC.14 <- blight.counts$lotNC14
proc.14 <- blight.counts$proc14
BDP.14 <- blighted.14+demo.14+proc.14
BDncP.14 <- blighted.14+demoNC.14+proc.14
We can find the estimated blight in 2012 and 2014 based on each model:
wt.ratio <- function(var1,var2,weight1,weight2){
#Calculates sample ratios for a two-stage sampling design
#var1: a vector with the variables of interest in the auxiliary data
#var2: same as above, for the data being estimated
#weight1: a vector of sampling weights reflecting selection probabilities for units in the auxiliary data
#weight2: same as above, for the data being estimated
#value: the ratio between weight2 and weight1, weighted by selection probabilities
return(sum(weight2*var2)/sum(weight1*var1))
}
ratioEstA.12 <- wt.ratio(blighted.10, blighted.12, weight.10, weight.12)
ratioEstA.14 <- wt.ratio(blighted.10, blighted.14, weight.10, weight.14)
ratioEstB.12 <- wt.ratio(BDP.10, BDP.12, weight.10, weight.12)
ratioEstB.14 <- wt.ratio(BDP.10, BDP.14, weight.10, weight.14)
ratioEstC.12 <- wt.ratio(BDncP.10, BDncP.12, weight.10, weight.12)
ratioEstC.14 <- wt.ratio(BDncP.10, BDncP.14, weight.10, weight.14)
estA.12 <- ratioEstA.12*X
estA.14 <- ratioEstA.14*X
estB.12 <- ratioEstB.12*X
estB.14 <- ratioEstB.14*X
estC.12 <- ratioEstC.12*X
estC.14 <- ratioEstC.14*X
require(reshape2)
## Loading required package: reshape2
require(ggplot2)
## Loading required package: ggplot2
require(scales)
## Loading required package: scales
blight.estimates <- data.frame(est.2010 <- rep(X,3), est.2012 <- c(estA.12, estB.12, estC.12), est.2014 <- c(estA.14, estB.14, estC.14))
colnames(blight.estimates) <- c("est.2010", "est.2012", "est.2014")
rownames(blight.estimates) <- c("Model A", "Model B", "Model C")
blight.estimates$Model <- c("Model A", "Model B", "Model C")
estimates.molten <- melt(blight.estimates, variable.name = "Year", value.name = "Blighted.Addresses", id.vars = c("Model"))
colnames(estimates.molten) <- c("Model", "Year", "Blighted.Addresses")
estimates.molten$Year <- substr(estimates.molten$Year, 5, 8)
formatNum <- function(x) sprintf("%s", formatC(x, format = "fg", big.mark = ","))
ggplot(data = estimates.molten, aes(x = Year, y = Blighted.Addresses)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75) +
facet_wrap("Model") +
geom_text(aes(label = formatNum(Blighted.Addresses), size = 12, vjust = -.3)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Number of Blighted Addresses In New Orleans")
By subtracting the 2012 and 2014 estimate from the USPS count in 2010, we can estimate blight reduction as:
reduction <- X-blight.estimates[,2:3]
reduction$Model <- rownames(reduction)
reduction.molten <- melt(reduction, variable.name = "Year", value.name = "Blighted.Addresses", id.vars = c("Model"))
colnames(reduction.molten) <- c("Model", "Year", "Blighted.Addresses")
reduction.molten$Year <- substr(reduction.molten$Year, 5, 8)
ggplot(data=reduction.molten, aes(x = Year, y = Blighted.Addresses)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75) +
facet_wrap("Model") +
geom_text(aes(label = formatNum(Blighted.Addresses), size = 12, vjust = -.3)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Blighted Addresses Reduced in New Orleans from 2010")
In terms of percent reduction this is:
options(digits=3)
1+((reduction[,1:2]-X)/X)
## est.2012 est.2014
## Model A 0.316 0.614
## Model B 0.228 0.354
## Model C 0.315 0.460
From the above charts and table, we can see sizeable reduction in blight regardless of the model (at least 15,000 by 2014). However, there is a significant spread in the results, with Model A being the most aggressive and Model B the most conservative.
Using the jackknife variance, we estimate confidence intervals around our estimates as:
jk.var <- function(var1, var2, weight1, weight2, X){
#Calculates the jackknife variance from ratio estimation
#var1: a vector with the variables of interest in the auxiliary data
#var2: same as above, for the data being estimated
#weight1: a vector of sampling weights reflecting selection probabilities for units in the auxiliary data
#weight2: same as above, for the data being estimated
#value: the jackknife variance of the estimates
n <- length(var1)
sample.estimate <- X*wt.ratio(var1,var2,weight1,weight2)
jk.estimates = c()
for(i in 1:n){
var1.jk <- var1[-i]
var2.jk <- var2[-i]
weight1.jk <- weight1[-i]
weight2.jk <- weight2[-i]
jk.estimates[i] <- X*wt.ratio(var1.jk,var2.jk,weight1.jk,weight2.jk)
}
jk.variance <- ((n-1)/n)*sum((jk.estimates-sample.estimate)^2)
return(jk.variance)
}
SE.est <- function(variance, n, crit.val = 2.0345){
#calculates confidence intervals
#estimate: a point estimate
#variance: the variance around the point estimate
#n: the number of primary sampling units used to make the estimate
#crit.val: the desired critical value to use to form the confidence interval, 2.0345 corresponds to 95% confidence at 33 (n-1) df
#value: a vector of length 2 giving the low and high ends of a confidence interval
SE <- crit.val*(sqrt(variance)/sqrt(n))
return(SE)
}
n <- nrow(blight.counts)
SE.A.12 <- SE.est(variance = jk.var(blighted.10, blighted.12, weight.10, weight.12, X = X), n = n)
SE.A.14 <- SE.est(variance = jk.var(blighted.10, blighted.14, weight.10, weight.14, X = X), n = n)
SE.B.12 <- SE.est(variance = jk.var(BDP.10, BDP.12, weight.10, weight.12, X = X), n = n)
SE.B.14 <- SE.est(variance = jk.var(BDP.10, BDP.14, weight.10, weight.14, X = X), n = n)
SE.C.12 <- SE.est(variance = jk.var(BDncP.10, BDncP.12, weight.10, weight.12,X = X),n = n)
SE.C.14 <- SE.est(variance = jk.var(BDncP.10, BDncP.14, weight.10, weight.14, X = X), n = n)
From this, we can plot 95% confidence intervals around our estimates:
SE.vec <- c(0,0,0,SE.A.12,SE.B.12,SE.C.12,SE.A.14,SE.B.14,SE.C.14)
estimates.molten$SE <- SE.vec
estimates.molten$High <- formatNum(estimates.molten$Blighted.Addresses + estimates.molten$SE)
estimates.molten$Low <- formatNum(estimates.molten$Blighted.Addresses - estimates.molten$SE)
estimates.molten$Start <- formatNum(estimates.molten$Blighted.Addresses)
estimates.molten$Start[4:9] <- ""
estimates.molten$High[1:3] <- ""
estimates.molten$Low[1:3] <- ""
dodge <- position_dodge(width=0.95)
ggplot(data=estimates.molten, aes(x = Year, y = Blighted.Addresses,ymax=Blighted.Addresses+SE,ymin=Blighted.Addresses-SE)) +
geom_bar(stat = "identity", fill = "dodgerblue", position = dodge, width = .75) +
facet_wrap("Model") +
geom_errorbar(position = dodge, width = 0.25) +
geom_text(aes(label = formatNum(Start), size = 12, vjust = -.3)) +
geom_text(aes(label = High, size = 12, vjust = -1.5)) +
geom_text(aes(label = Low, size = 12, vjust = 2.5)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) +
labs(title = "Estimated Number of Blighted Addresses In New Orleans")
This implies reductions from 2010 of:
reduction.molten$SE <- SE.vec[-c(1:3)]
reduction.molten$High <- formatNum(reduction.molten$Blighted.Addresses + reduction.molten$SE)
reduction.molten$Low <- formatNum(reduction.molten$Blighted.Addresses - reduction.molten$SE)
High <- reduction.molten$Blighted.Addresses + reduction.molten$SE
Low <- reduction.molten$Blighted.Addresses - reduction.molten$SE
(reduction.molten$Blighted.Addresses-High)/reduction.molten$Blighted.Addresses
## [1] -0.0814 -0.0700 -0.0546 -0.0323 -0.0466 -0.0356
ggplot(data=reduction.molten, aes(x = Year, y = Blighted.Addresses, ymax = Blighted.Addresses+SE, ymin = Blighted.Addresses-SE)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = .75)+
facet_wrap("Model") +
geom_errorbar(position = dodge, width = 0.25)+
geom_text(aes(label = Low, size = 12, vjust = 2.5)) +
geom_text(aes(label = High, size = 12, vjust = -2.2)) +
theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank())+
labs(title = "Estimated Blighted Addresses Reduced in New Orleans from 2010")
While the estimates vary quite significantly between models (the spread between Models A and B is over 10,000 units by 2014), the estimates themselves are quite precise (which was the impetus behind using ratio estimation). The widest confidence interval is about 1,700 units. The coefficients of variation for each estimate is below:
SD.A.12 <- sqrt(jk.var(blighted.10, blighted.12, weight.10, weight.12, X = X))
SD.A.14 <- sqrt(jk.var(blighted.10, blighted.14, weight.10, weight.14, X = X))
SD.B.12 <- sqrt(jk.var(BDP.10, BDP.12, weight.10, weight.12, X = X))
SD.B.14 <- sqrt(jk.var(BDP.10, BDP.14, weight.10, weight.14, X = X))
SD.C.12 <- sqrt(jk.var(BDncP.10, BDncP.12, weight.10, weight.12,X = X))
SD.C.14 <- sqrt(jk.var(BDncP.10, BDncP.14, weight.10, weight.14, X = X))
CV.A.12 <- SD.A.12/estA.12
CV.A.14 <- SD.A.14/estA.14
CV.B.12 <- SD.B.12/estB.12
CV.B.14 <- SD.B.14/estB.14
CV.C.12 <- SD.C.12/estC.12
CV.C.14 <- SD.C.14/estC.14
CV <- c(CV.A.12, CV.A.14, CV.B.12, CV.B.14, CV.C.12, CV.C.14)
Estimate <- c("CV-A-12", "CV-A-14", "CV-B-12", "CV-B-14", "CV-C-12", "CV-C-14")
data.frame(Estimate, CV)
## Estimate CV
## 1 CV-A-12 0.1076
## 2 CV-A-14 0.1474
## 3 CV-B-12 0.0593
## 4 CV-B-14 0.0732
## 5 CV-C-12 0.0721
## 6 CV-C-14 0.0868
Cofficients of variation above 1 are generally considered to be high-variance. By this standard, all of the models are quite precise, though Model A has a higher variance than the other two models.
The three models can be described as below:
| Parameter | Model A | Model B | Model C |
|---|---|---|---|
| UNO Survey Variables | Blighted addresses only | Blighted addresses, houses undergoing renovations, all vacant lots | Blighted addresses, houses undergoing renovations, vacant lots that were also vacant in the previous survey |
| Parsimony | Most straightforward | Adds complexity | Most complex |
| USPS Consistency | Excludes lots and houses undergoing renovation that would likely be considered no-stats | Includes vacant lots that likely wouldn’t be considered no-stats | Most consistent |
The models range from A, which is the most straightforward but possibly too simplistic, to C, which adds complexity in an attempt to align with the no-stat definition as closely as possible. Model B offers a middle ground, being more straightforward than C but more faithful to the USPS definitions than A.
All models show a sizeable reduction in no-stats since 2010, though the magnitude of that reduction varies substantially depending on the model, with Model A providing the most aggressive estimates and Model B the most conservative. This is to be expected based on the contruction of the models. After all, there are two ways to get rid of a blighted structure: renovation and demolition. Model A includes both categories, Model B includes only renovations, and Model C includes renovations as well as some demolitions.
Based on our results, we can likely dismiss Model A as an unrealistic model, particularly in light of the 2014-2015 results. A 60% reduction from 2010 to 2014-2015 is overly aggressive and does not match with historical trends. Both Models B and C provide more realistic, and more precise, results. Based on these two models, we can estimate that between 2010 and 2014-2015, New Orleans saw a reduction of approximately 15,000, and perhaps as many as 20,000, blighted addresses.