This example will use R to download American Community Survey summary file tables using the tidycensus package. The goal of this example is to illustrate how to download data from the Census API using R, how to create basic descriptive maps of attributes and how to construct a change map between two time periods.
The example will use data from San Antonio, Texas from the American Community Survey summary file. Sys.setenv(CENSUS_KEY="“) readRenviron(”~/.Renviron“) Sys.getenv(”CENSUS_KEY")
library(tidycensus)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.3
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)
v10_Profile <- load_variables(2011, "acs5/profile", cache = TRUE) #demographic profile tables
v19_Profile <- load_variables(2019, "acs5/profile", cache = TRUE) #demographic
#Search for variables by using grep()
v10_Profile[grep(x = v10_Profile$label, "VACANT", ignore.case = TRUE), c("name", "label")]
## # A tibble: 2 x 2
## name label
## <chr> <chr>
## 1 DP04_0003 Estimate!!HOUSING OCCUPANCY!!Vacant housing units
## 2 DP04_0003P Percent!!HOUSING OCCUPANCY!!Vacant housing units
v19_Profile[grep(x = v19_Profile$label, "VACANT", ignore.case = TRUE), c("name", "label")]
## # A tibble: 2 x 2
## name label
## <chr> <chr>
## 1 DP04_0003 Estimate!!HOUSING OCCUPANCY!!Total housing units!!Vacant housing u…
## 2 DP04_0003P Percent!!HOUSING OCCUPANCY!!Total housing units!!Vacant housing un…
The data profile tables are very useful because they contain lots of pre-calculated variables.
vac10<-get_acs(geography = "tract",
state="TX",
county = "Dallas",
year = 2011,
variables="DP04_0003P" ,
geometry = T,
output = "wide")
## Getting data from the 2007-2011 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
#rename variables and filter missing cases
vac10 <- vac10%>%
mutate(pvac = DP04_0003PE,
pvac_er = DP04_0003PM/1.645,
pvac_cv =100* (pvac_er/pvac)) %>%
filter(complete.cases(pvac), is.finite(pvac_cv)==T)
head(vac10)
tm_shape(vac10)+
tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="quantile", n=5)+
#tm_format("World", legend.outside=T, title.size =4)+
tm_scale_bar()+
tm_layout(title="Dallas Vacancy Rate Estimates - Quantile Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_compass()
tm_shape(vac10)+
tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="pretty", n=5)+
#tm_format("World", legend.outside=T, title.size =4)+
tm_scale_bar()+
tm_layout(title="Dallas Vacancy Rate Estimates - Pretty Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_compass()
tm_shape(vac10)+
tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="jenks", n=5)+
#tm_format("World", legend.outside=T, title.size =4)+
tm_scale_bar()+
tm_layout(title="Dallas Vacancy Rate Estimates - Jenks Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_compass()
Here I generate a quantile break for the coefficient of variation in census tract vacerty estimates.
If you don’t remeber, the coefficient of variation is:
\[CV =\frac {\sigma }{\mu}\]
and is a measure of the variability in the estimate, relative to the estimate itself. If the CV is greater than 100, the estimate is very imprecise. The lower the value, the more precise the estimate. This is very important when using small area estimates from the ACS.
p1<-tm_shape(vac10)+
tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="quantile", n=5)+
#tm_format("World", legend.outside=T, title.size =4)+
tm_scale_bar()+
tm_layout(title="Dallas Vacancy Rate Estimates", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_compass()
p2<-tm_shape(vac10)+
tm_polygons(c("pvac_cv"), title=c("CV Vacancy"), palette="Blues", style="quantile", n=5)+
#tm_format("World", title="Dallas Vacancy Rate CV", legend.outside=T)+
tm_layout(title="Dallas Vacancy Rate CV", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_scale_bar()+
tm_compass()
tmap_arrange(p1, p2)
plot(vac10$pvac, vac10$pvac_cv)
When we have data that are collected over times on the same geographies, we may be interested in whether the variable we’re mapping has changed much over time.
In the ACS, we can compare two estimates if the years used to produce the estimates do not overlap. For instance, we could compare the 2006-2010 estimates to the 2015-2019 estimates, but we could not compare the 2006-2010 to the 2008-22012, because they share years of data.
See this for the official position on the subject.
There is a fix that will let us do this. see below
Here we take the vacerty rate in tracts derived from the 2006-2010 ACS and compare it to the estimate from the 2015-2019 ACS
# get the 2019 estimates
vac19<-get_acs(geography = "tract",
state="TX",
county = "Dallas",
year = 2019,
variables="DP04_0003P" ,
geometry = T,
output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
##
|
| | 0%
|
|==== | 6%
|
|===== | 7%
|
|====== | 9%
|
|========= | 14%
|
|========== | 14%
|
|=========== | 15%
|
|================== | 25%
|
|================== | 26%
|
|======================================== | 57%
|
|==================================================== | 75%
|
|======================================================== | 80%
|
|=========================================================== | 84%
|
|============================================================= | 87%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|======================================================================| 100%
#rename variables and filter missing cases
vac19<- vac19%>%
mutate(pvac19 = DP04_0003PE,
pvac19_er = DP04_0003PM/1.645,
pvac19_cv =100* (pvac19_er/pvac19)) %>%
filter(complete.cases(pvac19), is.finite(pvac19_cv)==T)
head(vac19)
## Simple feature collection with 6 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -97.01994 ymin: 32.74211 xmax: -96.75587 ymax: 32.97121
## geographic CRS: NAD83
## GEOID NAME DP04_0003PE DP04_0003PM
## 1 48113019204 Census Tract 192.04, Dallas County, Texas 7.4 5.0
## 2 48113015500 Census Tract 155, Dallas County, Texas 7.8 6.0
## 3 48113000900 Census Tract 9, Dallas County, Texas 8.0 4.2
## 4 48113014136 Census Tract 141.36, Dallas County, Texas 8.1 4.4
## 5 48113013711 Census Tract 137.11, Dallas County, Texas 4.5 3.5
## 6 48113013620 Census Tract 136.20, Dallas County, Texas 10.7 4.9
## geometry pvac19 pvac19_er pvac19_cv
## 1 MULTIPOLYGON (((-96.76902 3... 7.4 3.039514 41.07451
## 2 MULTIPOLYGON (((-97.01991 3... 7.8 3.647416 46.76175
## 3 MULTIPOLYGON (((-96.78961 3... 8.0 2.553191 31.91489
## 4 MULTIPOLYGON (((-96.99388 3... 8.1 2.674772 33.02188
## 5 MULTIPOLYGON (((-96.89057 3... 4.5 2.127660 47.28132
## 6 MULTIPOLYGON (((-96.82116 3... 10.7 2.978723 27.83854
#merge the two years worth of data
mdat<-tigris::geo_join(vac10, as.data.frame(vac19), by_sp="GEOID", by_df="GEOID")
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
head(mdat)
## Simple feature collection with 6 features and 14 fields
## Active geometry column: geometry.x
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -96.80067 ymin: 32.83427 xmax: -96.57525 ymax: 32.94708
## geographic CRS: NAD83
## GEOID NAME.x DP04_0003PE.x
## 1 48113018120 Census Tract 181.20, Dallas County, Texas 10.2
## 2 48113012702 Census Tract 127.02, Dallas County, Texas 5.2
## 3 48113012800 Census Tract 128, Dallas County, Texas 2.9
## 4 48113012900 Census Tract 129, Dallas County, Texas 7.4
## 5 48113013004 Census Tract 130.04, Dallas County, Texas 5.7
## 6 48113019400 Census Tract 194, Dallas County, Texas 6.5
## DP04_0003PM.x pvac pvac_er pvac_cv
## 1 6.4 10.2 3.890578 38.14292
## 2 5.7 5.2 3.465046 66.63549
## 3 3.0 2.9 1.823708 62.88649
## 4 4.9 7.4 2.978723 40.25302
## 5 4.6 5.7 2.796353 49.05882
## 6 4.8 6.5 2.917933 44.89128
## NAME.y DP04_0003PE.y DP04_0003PM.y pvac19
## 1 Census Tract 181.20, Dallas County, Texas 4.6 4.2 4.6
## 2 Census Tract 127.02, Dallas County, Texas 3.8 3.6 3.8
## 3 Census Tract 128, Dallas County, Texas 1.1 1.4 1.1
## 4 Census Tract 129, Dallas County, Texas 6.4 5.5 6.4
## 5 Census Tract 130.04, Dallas County, Texas 3.2 3.6 3.2
## 6 Census Tract 194, Dallas County, Texas 7.8 5.4 7.8
## pvac19_er pvac19_cv rank geometry.x
## 1 2.5531915 55.50416 1 POLYGON ((-96.57525 32.9365...
## 2 2.1884498 57.59079 1 POLYGON ((-96.68648 32.8400...
## 3 0.8510638 77.36944 1 POLYGON ((-96.68664 32.8402...
## 4 3.3434650 52.24164 1 POLYGON ((-96.71915 32.8612...
## 5 2.1884498 68.38906 1 POLYGON ((-96.73348 32.8677...
## 6 3.2826748 42.08557 1 POLYGON ((-96.78691 32.8507...
## geometry.y
## 1 MULTIPOLYGON (((-96.61445 3...
## 2 MULTIPOLYGON (((-96.68511 3...
## 3 MULTIPOLYGON (((-96.70107 3...
## 4 MULTIPOLYGON (((-96.7251 32...
## 5 MULTIPOLYGON (((-96.73332 3...
## 6 MULTIPOLYGON (((-96.80067 3...
Here I create a function that implements the testing procedure used by the Census for comparing estimates across year
acstest<-function(names,geoid, est1, err1, est2, err2, alpha, yr1, yr2, span){
se1<-err1/qnorm(.90)
se2<-err2/qnorm(.90)
yrs1<-seq(yr1, to=yr1-span)
yrs2<-seq(yr2, to=yr2-span)
C<-mean(yrs2%in%yrs1)
diff<- (est1-est2)
test<-(est1-est2) / (sqrt(1-C)*sqrt((se1^2+se2^2)))
crit<-qnorm(1-alpha/2)
pval<-2*pnorm(abs(test),lower.tail=F)
result<-NULL
result[pval > alpha]<-"insignificant change"
result[pval < alpha & test < 0]<- "significant increase"
result[pval < alpha & test > 0]<-"significant decrease"
data.frame(name=names,geoid=geoid, est1=est1, est2=est2, se1=se1, se2=se2,difference=diff, test=test, result=result, pval=pval)
}
Here I use the function I just made to do the comparisons
diff1019<-acstest(names = mdat$NAME.x, geoid = mdat$GEOID, est1 = mdat$pvac, est2 = mdat$pvac19, err1 = mdat$pvac_er, err2=mdat$pvac19_er,alpha = .1, yr1 = 2010, yr2=2019, span = 5)
head(diff1019)
## name geoid est1 est2 se1
## 1 Census Tract 181.20, Dallas County, Texas 48113018120 10.2 4.6 3.035834
## 2 Census Tract 127.02, Dallas County, Texas 48113012702 5.2 3.8 2.703789
## 3 Census Tract 128, Dallas County, Texas 48113012800 2.9 1.1 1.423047
## 4 Census Tract 129, Dallas County, Texas 48113012900 7.4 6.4 2.324310
## 5 Census Tract 130.04, Dallas County, Texas 48113013004 5.7 3.2 2.182006
## 6 Census Tract 194, Dallas County, Texas 48113019400 6.5 7.8 2.276875
## se2 difference test result pval
## 1 1.9922659 5.6 1.5422018 insignificant change 0.1230246
## 2 1.7076565 1.4 0.4377872 insignificant change 0.6615405
## 3 0.6640886 1.8 1.1462233 insignificant change 0.2517028
## 4 2.6089196 1.0 0.2861950 insignificant change 0.7747287
## 5 1.7076565 2.5 0.9022720 insignificant change 0.3669124
## 6 2.5614847 -1.3 -0.3793238 insignificant change 0.7044474
table(diff1019$result)
##
## insignificant change significant decrease significant increase
## 342 117 43
acs_merge<-left_join(mdat, diff1019, by=c("GEOID"="geoid"))
tmap_mode("plot")
## tmap mode set to plotting
p1<-tm_shape(acs_merge)+
tm_polygons(c("pvac"), title=c("% in vacerty 2010"), palette="Blues", style="quantile", n=5)+
#tm_format("World", legend.outside=T, title.size =4)+
tm_scale_bar()+
tm_layout(title="San Antonio vacerty Rate Estimates 2010", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_compass()
p2<-tm_shape(acs_merge)+
tm_polygons(c("pvac19"), title=c("% in vacerty 2019"), palette="Blues", style="quantile", n=5)+
#tm_format("World", title="San Antonio vacerty Rate CV", legend.outside=T)+
tm_layout(title="San Antonio vacerty Rate Estimate 2019", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_scale_bar()+
tm_compass()
p3 <- tm_shape(acs_merge)+
tm_polygons(c("result"), title=c("Changes"), palette = "Dark2")+
#tm_format("World", title="San Antonio vacerty Rate CV", legend.outside=T)+
tm_layout(title="San Antonio vacerty Rate Estimate Changes", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
tm_scale_bar()+
tm_compass()
tmap_arrange(p1, p2, p3)