This example will use R to download American Community Survey summary file tables using the tidycensus package. The goal of this example is to illustrate how to download data from the Census API using R, how to create basic descriptive maps of attributes and how to construct a change map between two time periods.

The example will use data from San Antonio, Texas from the American Community Survey summary file. Sys.setenv(CENSUS_KEY="“) readRenviron(”~/.Renviron“) Sys.getenv(”CENSUS_KEY")

Load Libraries

library(tidycensus)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.3
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)
v10_Profile <- load_variables(2011, "acs5/profile", cache = TRUE) #demographic profile tables
v19_Profile <- load_variables(2019, "acs5/profile", cache = TRUE) #demographic 

#Search for variables by using grep()

v10_Profile[grep(x = v10_Profile$label, "VACANT", ignore.case = TRUE), c("name", "label")]
## # A tibble: 2 x 2
##   name       label                                            
##   <chr>      <chr>                                            
## 1 DP04_0003  Estimate!!HOUSING OCCUPANCY!!Vacant housing units
## 2 DP04_0003P Percent!!HOUSING OCCUPANCY!!Vacant housing units
v19_Profile[grep(x = v19_Profile$label, "VACANT", ignore.case = TRUE), c("name", "label")]
## # A tibble: 2 x 2
##   name       label                                                              
##   <chr>      <chr>                                                              
## 1 DP04_0003  Estimate!!HOUSING OCCUPANCY!!Total housing units!!Vacant housing u…
## 2 DP04_0003P Percent!!HOUSING OCCUPANCY!!Total housing units!!Vacant housing un…

Extract from ACS summary file data profile variables from 2010 and 2019 for Bexar County, TX Census Tracts

The data profile tables are very useful because they contain lots of pre-calculated variables.

vac10<-get_acs(geography = "tract",
                state="TX",
                county = "Dallas",
                year = 2011,
                variables="DP04_0003P" ,
                geometry = T,
                output = "wide")
## Getting data from the 2007-2011 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
#rename variables and filter missing cases
vac10 <- vac10%>%
  mutate(pvac = DP04_0003PE,
         pvac_er = DP04_0003PM/1.645,
         pvac_cv =100* (pvac_er/pvac)) %>%
  filter(complete.cases(pvac), is.finite(pvac_cv)==T)

head(vac10)

Alternative break strategies

tm_shape(vac10)+
  tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="quantile", n=5)+
  #tm_format("World", legend.outside=T, title.size =4)+
  tm_scale_bar()+
  tm_layout(title="Dallas Vacancy Rate Estimates - Quantile Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_compass()

tm_shape(vac10)+
  tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="pretty", n=5)+
  #tm_format("World", legend.outside=T, title.size =4)+
  tm_scale_bar()+
  tm_layout(title="Dallas Vacancy Rate Estimates - Pretty Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_compass()

tm_shape(vac10)+
  tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="jenks", n=5)+
  #tm_format("World", legend.outside=T, title.size =4)+
  tm_scale_bar()+
  tm_layout(title="Dallas Vacancy Rate Estimates - Jenks Breaks", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_compass()

Mapping of errors in estimates

Here I generate a quantile break for the coefficient of variation in census tract vacerty estimates.

If you don’t remeber, the coefficient of variation is:

\[CV =\frac {\sigma }{\mu}\]

and is a measure of the variability in the estimate, relative to the estimate itself. If the CV is greater than 100, the estimate is very imprecise. The lower the value, the more precise the estimate. This is very important when using small area estimates from the ACS.

p1<-tm_shape(vac10)+
  tm_polygons(c("pvac"), title=c("% in Vacancy"), palette="Blues", style="quantile", n=5)+
  #tm_format("World", legend.outside=T, title.size =4)+
  tm_scale_bar()+
  tm_layout(title="Dallas Vacancy Rate Estimates", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_compass()

p2<-tm_shape(vac10)+
  tm_polygons(c("pvac_cv"), title=c("CV Vacancy"), palette="Blues", style="quantile", n=5)+
  #tm_format("World", title="Dallas Vacancy Rate CV", legend.outside=T)+
  tm_layout(title="Dallas Vacancy Rate CV", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_scale_bar()+
  tm_compass()


tmap_arrange(p1, p2)

plot(vac10$pvac, vac10$pvac_cv)

Change map between two time points

When we have data that are collected over times on the same geographies, we may be interested in whether the variable we’re mapping has changed much over time.

In the ACS, we can compare two estimates if the years used to produce the estimates do not overlap. For instance, we could compare the 2006-2010 estimates to the 2015-2019 estimates, but we could not compare the 2006-2010 to the 2008-22012, because they share years of data.

See this for the official position on the subject.

There is a fix that will let us do this. see below

Compare vacerty rates over time

Here we take the vacerty rate in tracts derived from the 2006-2010 ACS and compare it to the estimate from the 2015-2019 ACS

# get the 2019 estimates
vac19<-get_acs(geography = "tract",
                state="TX",
                county = "Dallas",
                year = 2019,
                variables="DP04_0003P" ,
                geometry = T,
                output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |======================================================================| 100%
#rename variables and filter missing cases
vac19<- vac19%>%
  mutate(pvac19 = DP04_0003PE,
         pvac19_er = DP04_0003PM/1.645,
         pvac19_cv =100* (pvac19_er/pvac19)) %>%
  filter(complete.cases(pvac19), is.finite(pvac19_cv)==T)

head(vac19)
## Simple feature collection with 6 features and 7 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -97.01994 ymin: 32.74211 xmax: -96.75587 ymax: 32.97121
## geographic CRS: NAD83
##         GEOID                                      NAME DP04_0003PE DP04_0003PM
## 1 48113019204 Census Tract 192.04, Dallas County, Texas         7.4         5.0
## 2 48113015500    Census Tract 155, Dallas County, Texas         7.8         6.0
## 3 48113000900      Census Tract 9, Dallas County, Texas         8.0         4.2
## 4 48113014136 Census Tract 141.36, Dallas County, Texas         8.1         4.4
## 5 48113013711 Census Tract 137.11, Dallas County, Texas         4.5         3.5
## 6 48113013620 Census Tract 136.20, Dallas County, Texas        10.7         4.9
##                         geometry pvac19 pvac19_er pvac19_cv
## 1 MULTIPOLYGON (((-96.76902 3...    7.4  3.039514  41.07451
## 2 MULTIPOLYGON (((-97.01991 3...    7.8  3.647416  46.76175
## 3 MULTIPOLYGON (((-96.78961 3...    8.0  2.553191  31.91489
## 4 MULTIPOLYGON (((-96.99388 3...    8.1  2.674772  33.02188
## 5 MULTIPOLYGON (((-96.89057 3...    4.5  2.127660  47.28132
## 6 MULTIPOLYGON (((-96.82116 3...   10.7  2.978723  27.83854
#merge the two years worth of data
mdat<-tigris::geo_join(vac10, as.data.frame(vac19), by_sp="GEOID", by_df="GEOID")
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
head(mdat)
## Simple feature collection with 6 features and 14 fields
## Active geometry column: geometry.x
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -96.80067 ymin: 32.83427 xmax: -96.57525 ymax: 32.94708
## geographic CRS: NAD83
##         GEOID                                    NAME.x DP04_0003PE.x
## 1 48113018120 Census Tract 181.20, Dallas County, Texas          10.2
## 2 48113012702 Census Tract 127.02, Dallas County, Texas           5.2
## 3 48113012800    Census Tract 128, Dallas County, Texas           2.9
## 4 48113012900    Census Tract 129, Dallas County, Texas           7.4
## 5 48113013004 Census Tract 130.04, Dallas County, Texas           5.7
## 6 48113019400    Census Tract 194, Dallas County, Texas           6.5
##   DP04_0003PM.x pvac  pvac_er  pvac_cv
## 1           6.4 10.2 3.890578 38.14292
## 2           5.7  5.2 3.465046 66.63549
## 3           3.0  2.9 1.823708 62.88649
## 4           4.9  7.4 2.978723 40.25302
## 5           4.6  5.7 2.796353 49.05882
## 6           4.8  6.5 2.917933 44.89128
##                                      NAME.y DP04_0003PE.y DP04_0003PM.y pvac19
## 1 Census Tract 181.20, Dallas County, Texas           4.6           4.2    4.6
## 2 Census Tract 127.02, Dallas County, Texas           3.8           3.6    3.8
## 3    Census Tract 128, Dallas County, Texas           1.1           1.4    1.1
## 4    Census Tract 129, Dallas County, Texas           6.4           5.5    6.4
## 5 Census Tract 130.04, Dallas County, Texas           3.2           3.6    3.2
## 6    Census Tract 194, Dallas County, Texas           7.8           5.4    7.8
##   pvac19_er pvac19_cv rank                     geometry.x
## 1 2.5531915  55.50416    1 POLYGON ((-96.57525 32.9365...
## 2 2.1884498  57.59079    1 POLYGON ((-96.68648 32.8400...
## 3 0.8510638  77.36944    1 POLYGON ((-96.68664 32.8402...
## 4 3.3434650  52.24164    1 POLYGON ((-96.71915 32.8612...
## 5 2.1884498  68.38906    1 POLYGON ((-96.73348 32.8677...
## 6 3.2826748  42.08557    1 POLYGON ((-96.78691 32.8507...
##                       geometry.y
## 1 MULTIPOLYGON (((-96.61445 3...
## 2 MULTIPOLYGON (((-96.68511 3...
## 3 MULTIPOLYGON (((-96.70107 3...
## 4 MULTIPOLYGON (((-96.7251 32...
## 5 MULTIPOLYGON (((-96.73332 3...
## 6 MULTIPOLYGON (((-96.80067 3...

Here I create a function that implements the testing procedure used by the Census for comparing estimates across year

acstest<-function(names,geoid, est1, err1, est2, err2, alpha, yr1, yr2, span){
  
  se1<-err1/qnorm(.90)
  se2<-err2/qnorm(.90)
  yrs1<-seq(yr1, to=yr1-span)
  yrs2<-seq(yr2, to=yr2-span)

  C<-mean(yrs2%in%yrs1)
  diff<- (est1-est2)
  test<-(est1-est2) / (sqrt(1-C)*sqrt((se1^2+se2^2)))
  crit<-qnorm(1-alpha/2)
  pval<-2*pnorm(abs(test),lower.tail=F)
  result<-NULL
  result[pval > alpha]<-"insignificant change"
  result[pval < alpha & test < 0]<- "significant increase"
  result[pval < alpha & test > 0]<-"significant decrease" 
  
  data.frame(name=names,geoid=geoid, est1=est1, est2=est2, se1=se1, se2=se2,difference=diff, test=test, result=result, pval=pval)
}

Here I use the function I just made to do the comparisons

  diff1019<-acstest(names = mdat$NAME.x, geoid = mdat$GEOID, est1 = mdat$pvac, est2 = mdat$pvac19, err1 = mdat$pvac_er, err2=mdat$pvac19_er,alpha = .1, yr1 = 2010, yr2=2019, span = 5)

head(diff1019)
##                                        name       geoid est1 est2      se1
## 1 Census Tract 181.20, Dallas County, Texas 48113018120 10.2  4.6 3.035834
## 2 Census Tract 127.02, Dallas County, Texas 48113012702  5.2  3.8 2.703789
## 3    Census Tract 128, Dallas County, Texas 48113012800  2.9  1.1 1.423047
## 4    Census Tract 129, Dallas County, Texas 48113012900  7.4  6.4 2.324310
## 5 Census Tract 130.04, Dallas County, Texas 48113013004  5.7  3.2 2.182006
## 6    Census Tract 194, Dallas County, Texas 48113019400  6.5  7.8 2.276875
##         se2 difference       test               result      pval
## 1 1.9922659        5.6  1.5422018 insignificant change 0.1230246
## 2 1.7076565        1.4  0.4377872 insignificant change 0.6615405
## 3 0.6640886        1.8  1.1462233 insignificant change 0.2517028
## 4 2.6089196        1.0  0.2861950 insignificant change 0.7747287
## 5 1.7076565        2.5  0.9022720 insignificant change 0.3669124
## 6 2.5614847       -1.3 -0.3793238 insignificant change 0.7044474
table(diff1019$result)
## 
## insignificant change significant decrease significant increase 
##                  342                  117                   43
acs_merge<-left_join(mdat, diff1019, by=c("GEOID"="geoid"))

tmap_mode("plot")
## tmap mode set to plotting
p1<-tm_shape(acs_merge)+
  tm_polygons(c("pvac"), title=c("% in vacerty  2010"), palette="Blues", style="quantile", n=5)+
  #tm_format("World", legend.outside=T, title.size =4)+
  tm_scale_bar()+
  tm_layout(title="San Antonio vacerty Rate Estimates 2010", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_compass()

p2<-tm_shape(acs_merge)+
  tm_polygons(c("pvac19"), title=c("% in vacerty 2019"), palette="Blues", style="quantile", n=5)+
  #tm_format("World", title="San Antonio vacerty Rate CV", legend.outside=T)+
  tm_layout(title="San Antonio vacerty Rate Estimate 2019", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_scale_bar()+
  tm_compass()


p3  <- tm_shape(acs_merge)+
  tm_polygons(c("result"), title=c("Changes"), palette = "Dark2")+
  #tm_format("World", title="San Antonio vacerty Rate CV", legend.outside=T)+
  tm_layout(title="San Antonio vacerty Rate Estimate Changes", title.size =1.5, legend.frame = TRUE, title.position = c('right', 'top'))+
  tm_scale_bar()+
  tm_compass()
  

tmap_arrange(p1, p2, p3)