Getting Started

This sheet builds off of “Handling Census Data in R - Part 1”. Please complete that first

Again, everytime you start R you will need to load any libraries that you need for analysis. You will need to run this bit of code every time you open R to get this to work.

library(tidycensus)
library(sf)
library(ggplot2)
library(ggthemes)
library(dplyr)

To get started working with tidycensus, users should load the package along with the tidyverse package, and set their Census API key. A key can be obtained from Census API Key. It will provide you with a 40 digit text string. Please keep track of this number. Store it in a safe place.

API_Key = 'yourapikey'  # non working example - please paste your own in
census_api_key(API_Key, install = F, overwrite=TRUE)  

Creating New Variables

Let’s say we were interested in gentrification and wanted to map the percentage of the population that is white in DC at the census tract level for 2018. Notice that geography = 'tract' which download census tract data now. We are going to download two variables one for the white population and the other for total population. The following c(wht_pop ="B02001_002",total_pop="B01001_001") helps us set the name of the columns right away.

wht_pop =  get_acs(state = "DC", 
                   geography = "tract", 
                   year=2018,
                  variables =  c(wht_pop ="B02001_002",total_pop="B01001_001"), 
                  geometry = TRUE,
                  output = 'wide')   # just do this
Getting data from the 2014-2018 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '11' for state 'DC'

  |                                                                                                                                                         
  |                                                                                                                                                   |   0%
  |                                                                                                                                                         
  |======                                                                                                                                             |   4%
  |                                                                                                                                                         
  |=================================================                                                                                                  |  33%
  |                                                                                                                                                         
  |===========================================================                                                                                        |  40%
  |                                                                                                                                                         
  |======================================================================                                                                             |  48%
  |                                                                                                                                                         
  |=================================================================================                                                                  |  55%
  |                                                                                                                                                         
  |===========================================================================================                                                        |  62%
  |                                                                                                                                                         
  |======================================================================================================                                             |  69%
  |                                                                                                                                                         
  |=================================================================================================================                                  |  77%
  |                                                                                                                                                         
  |===========================================================================================================================                        |  84%
  |                                                                                                                                                         
  |======================================================================================================================================             |  91%
  |                                                                                                                                                         
  |===================================================================================================================================================| 100%
head(wht_pop )
Simple feature collection with 6 features and 6 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -77.07889 ymin: 38.88983 xmax: -76.95102 ymax: 38.9843
geographic CRS: NAD83
        GEOID                                                           NAME wht_popE wht_popM total_popE total_popM                       geometry
1 11001009501 Census Tract 95.01, District of Columbia, District of Columbia     2082      334       7277        644 POLYGON ((-77.01093 38.9446...
2 11001009604 Census Tract 96.04, District of Columbia, District of Columbia      105       57       2343        257 POLYGON ((-76.9618 38.89612...
3 11001010300   Census Tract 103, District of Columbia, District of Columbia      812      228       4008        617 POLYGON ((-77.03636 38.9748...
4 11001000600     Census Tract 6, District of Columbia, District of Columbia     4042      427       4969        338 POLYGON ((-77.07651 38.9422...
5 11001001200    Census Tract 12, District of Columbia, District of Columbia     4244      355       5247        487 POLYGON ((-77.07854 38.9465...
6 11001002102 Census Tract 21.02, District of Columbia, District of Columbia      824      197       5223        518 POLYGON ((-77.0199 38.95873...

Again we are interested in the variables ending in ‘E’ so white_popE and total_popE.

Now let’s calculate the percentage of the population that is white for each census tract. We are going to use the mutate function which is part of the Dplyr library see help.

We are going to pass our wht_pop object to mutate using %>%, which is called a ‘pipe’. The pipe indicates that wht_pop holds the data we want to use.

wht_pop = wht_pop %>% mutate(white_per = wht_popE/total_popE)
head(wht_pop)
Simple feature collection with 6 features and 7 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -77.07889 ymin: 38.88983 xmax: -76.95102 ymax: 38.9843
geographic CRS: NAD83
        GEOID                                                           NAME wht_popE wht_popM total_popE total_popM                       geometry
1 11001009501 Census Tract 95.01, District of Columbia, District of Columbia     2082      334       7277        644 POLYGON ((-77.01093 38.9446...
2 11001009604 Census Tract 96.04, District of Columbia, District of Columbia      105       57       2343        257 POLYGON ((-76.9618 38.89612...
3 11001010300   Census Tract 103, District of Columbia, District of Columbia      812      228       4008        617 POLYGON ((-77.03636 38.9748...
4 11001000600     Census Tract 6, District of Columbia, District of Columbia     4042      427       4969        338 POLYGON ((-77.07651 38.9422...
5 11001001200    Census Tract 12, District of Columbia, District of Columbia     4244      355       5247        487 POLYGON ((-77.07854 38.9465...
6 11001002102 Census Tract 21.02, District of Columbia, District of Columbia      824      197       5223        518 POLYGON ((-77.0199 38.95873...
   white_per
1 0.28610691
2 0.04481434
3 0.20259481
4 0.81344335
5 0.80884315
6 0.15776374

Visualize Populations

Now wht_pop has a new variable called ‘white_per’. Let’s look at the histogram of this data again using ggplot2’s geom_histogram, see help here. Here aes(white_per) is telling R to base the aestetics (e.g. ‘aes’) on the variable white_per. You can also set the number of bins to use in your histogram.

ggplot()+geom_histogram(data=wht_pop, aes(white_per), bins = 15)

This bimodal distribution already speaks to the fact that places tend to either be morely white or mostly black. Clear signs of segregation in the city.

But let’s look at the map using the geom_sf (spatial features geometry).

ggplot()+geom_sf(data=wht_pop, aes(fill = white_per))

We can also control the color scheme and legend name using scale_fill_continuous. Notice that aes(fill = white_per) and scale_fill_continuous both refer to ‘fill’, which is the shading of each polygon. You can also set the ‘name’ of the fill color scheme with name="% White".

You can also, if useful, change the color of the lines based on the data using scale_colour_continuous add aes(colour = white_per) instead if you wanted.

ggplot()+geom_sf(data=wht_pop, aes(fill = white_per),color='white') + 
                ggtitle('Segregation in DC 2018')+ 
                scale_fill_continuous(type = "viridis", name="% White")

Reading Data from the Internet

One problem with this map is that waterbodies aren’t included. We can directly access data online easily. Let’s take the example of the waterbodies dataset on DC open data here. To access the data online we just need the URL of the GeoJson file as seen below:

For this we will use read_sf from the sf package. SF will allow us to do pretty much anything that arcmap can do. In this case read_sf just reads the object from the internet for you and stores it in the same format as our census data.

Notice this is just the attribute data with a column called geometry that stores the outline of each river.

water = read_sf('https://opendata.arcgis.com/datasets/db65ff0038ed4270acb1435d931201cf_24.geojson')
head(water)
Simple feature collection with 6 features and 8 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -77.08418 ymin: 38.88663 xmax: -77.01135 ymax: 38.94176
geographic CRS: WGS 84

We can simply plot it using geom_sf(data=water). This time I am setting the fill and color outside of aes() because aes is only used if you want to use data from the attribute table to control an aesthetic property, here we just want all rivers, regardless of their attributes, to be lightblue and the outline to be white.

ggplot()+geom_sf(data=water, fill='lightblue', color='white')

Now let see if we can overlay it on our census data. We create a new plot using ggplot(). Now we can just overlay each geom_sf plot. White pop ends up on the bottom because it is put onto the map first. Notice that addition attributes or plots are added to the map by using +.

ggplot() + 
          # percent white plot
          geom_sf(data=wht_pop, aes(fill = white_per),color='white') + 
                ggtitle('Segregation in DC 2018')+ 
                scale_fill_continuous(type = "viridis", name="% White")+
          # water plot
          geom_sf(data=water, fill='lightblue', color='white')

Saving Map

Ggplot2 also makes it easy to save you maps or graphs out as images. You can save a variety of differnt graphics types including tif, png, or even svg (for use in illustrator). See the help here.

ggsave(filename='path_to_folder/segregationmap.png', 
       plot = last_plot(),  # save the last thing you ploted 
       device='png')
