Week 8 Lesson

In today’s class we’ll download the Congressional maps that we created, and create maps and charts to display them.

Outline

Download Congressional Maps

Spatial data in R

Spatial data in R, the sf package

Process your spatial data for mapping and analysis

Creating maps in R

ggplot2 geoms

Building a map

In-class assignment

Final Projects

Assignment 8

Congressional Maps

Creating district maps that balance many criteria is hard to do! Many criteria conflict with each other and the map drawers are required to decide between them. As we have seen through our readings and examples, the way the districts are drawn can have large impacts on whose voices are heard.

Let’s explore your Congressional maps using R to view, measure, and compare. Go to your map in Dave’s Redistricting App and download the geographic file using the instructions below. If you haven’t completed your map, download another published map from DRA here.

Open your map on DRA (or another published map)
Select Export Map (arrow icon on the upper-right)
Select ’District Shapes (as .geojson)
Save the .geojson
Do the same for the Official Map for your state
- find current congressional maps on DRA with “Latest Census and Voting Age Data” here
- link to GA 2020 Congressional Map

Spatial data

There are special file types necessary for adding a spatial dimension to your data. The two most common are:

shapefiles
geojsons

Both formats contain geographic information that describes the location of each of observation. For a point file, that is most commonly the latitude and longitude of the point. For example, a spatial file of all schools in New York City would include the latitude and longitude coordinates of each school. In the file below, the latitude and longitude are in the geometry column (longitude, latitude)

shapefiles are a collection of files that contain the location, data, and projection.

geojsons are one file that contain all the same information

Spatial data in R, the `sf` package

There are many R packages that you can use to work with spatial data, sf is the easiest and best because it treats spatial data exactly the same as a regular dataframe with the geometry as the last column.

Let’s import the official Congressional map for your state into our R environment to learn about spatial data in R. We will use the sf function st_read() to import our geojson file. To learn more about the basics of sd, take a look at the documentation and vignettes:

sf documentation
vignette 1

In your class8 project, create a new R script called “congressional_maps_exploration.R”

write a summary of the script purpose at the top
import the packages you’ll use in this script (tidyverse, sf)

First let’s look at the help section about the function st_read in R by typing into the console:

# exploring the official Congressional map in Georgia, and comparing it to the map that I drew in DRA

library(tidyverse)
library(sf)
library(scales)
library(viridis)

??st_read

# You will see something like the image below in your "Help" window in R Studio:

………..

Now let’s use it

raw_congress_off <- st_read("data/raw/ga_cong_official.geojson")

## Reading layer `ga_cong_official' from data source 
##   `/Users/sarahodges/spatial/NewSchool/methods1-materials-fall2021/class8/data/raw/ga_cong_official.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 14 features and 17 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -85.60516 ymin: 30.356 xmax: -80.78296 ymax: 35.0013
## Geodetic CRS:  WGS 84

You’ll see a message similar to this in your console:

This describes the data. The file

Is a geojson
It has 14 rows and 19 columns
It is a polygon, so the geometry column is a list of all of the points that make up the shape
The map projection is WGS 84 *

We are not going to cover map projections*. In a nutshell, they are the mathematical equation used to represent the round Earth on a flat map. The equation used in this case is based on the World Geodetic System from 1984 (WGS 84). If you decide to do more work with mapping and spatial data, you will learn a lot more about projections, but for now we’ll smooth right over it.

Let’s look at the geometry column

print(raw_congress_off$geometry)

## Geometry set for 14 features 
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -85.60516 ymin: 30.356 xmax: -80.78296 ymax: 35.0013
## Geodetic CRS:  WGS 84
## First 5 geometries:

## POLYGON ((-81.92944 31.8484, -81.93218 31.84604...

## POLYGON ((-84.28625 32.74763, -84.28733 32.7496...

## POLYGON ((-84.13301 33.47467, -84.13318 33.4746...

## POLYGON ((-83.91482 33.7442, -83.91562 33.7447,...

## POLYGON ((-84.44718 33.5912, -84.44623 33.59066...

Processing and exploring spatial data

You can treat a spatial data frame just like a regular data frame. Just be aware that the geometry column can be very large so it can take a long time to open. We’ll process the official congressional map so that it is ready to analyze and map:

select columns id, TotalPop, DemPct to Margin, and geometry
calculate the number of minority people and voters in each district

congress_off <- raw_congress_off %>% 
  select(id, TotalPop, DemPct:Margin, geometry) %>% 
  mutate(MinorityVAP = round(MinorityPct * TotalVAP, 0),
         BlackVAP = round(BlackPct * TotalVAP, 0),
         DemocratVoters = round(DemPct * TotalVAP, 0),
         RepublicanVoters = round(RepPct * TotalVAP, 0),)

We’ll calculate some quick statistics about the official map. The fastest way to do that is to drop the geometry column using the st_drop_geometry() function. We’ll make the table look nice and readable as we create it by using:

put back-ticks around the summary variables so that we can include spaces in the variable names
the scales package to format the data

congress_off_stats <- st_drop_geometry(congress_off) %>% 
  summarise(map = "official congressional map",
            `total population` = comma(sum(TotalPop)),
            `total voting-age population` = comma(sum(TotalVAP)),
            `percent Minority voting-age population (MVAP)` = percent(sum(MinorityVAP)/sum(TotalVAP)),
            `percent Black voting-age population (BVAP)` = percent(sum(BlackVAP)/sum(TotalVAP)),
            `percent Democratic voters` = percent(sum(DemocratVoters)/sum(TotalVAP)),
            `percent Republican voters` = percent(sum(RepublicanVoters)/sum(TotalVAP)),
            districts = n(),
            `number of Minority-majority districts` = length(id[MinorityPct >= .5]), #count the number of times the conditional is true
            `number of Black-majority districts` = length(id[BlackPct >= .5]),
            `number of Democrat districts` = length(id[DemPct > RepPct]), 
            `number of Republican districts` = length(id[RepPct > DemPct]),
            `number of competitive districts` = length(id[RepPct >= .45 & RepPct <= .55 & DemPct >= .45 & DemPct <= .55]))

congress_off_stats

Create maps in R using ggplot2

You can use ggplot2 to create maps in R using the geom_sf

ggplot2 geoms

You can define the type of plot by the type of geom that you add to your ggplot. There are MANY types of geoms.

Here is a full list and description
Here is the description of geom_sf

Build a map

Just like other plots, you build a map by adding instructions to the ggplot. Let’s start with a simple map of the official congressional map.

ggplot()  + 
  geom_sf(data = congress_off)

We’re actually just plotting the latitude and longitude from the geometry column. But I think it’s confusing to show the lat/long. Let’s remove the grid lines and labels, and create a choropleth map where we color each district by Percent Minority voters.

ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = MinorityPct)) +
  theme_void()

A choropleth map uses graduated color or patterns to show the range of a statistic.

Let’s select our color scheme, we’ll use the viridis color scheme, and format the legend title and labels.

ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = MinorityPct)) +
  theme_void() +
  scale_fill_viridis(name="Minority Voting-Age Population (%)", labels=percent_format(accuracy = 1L))

I prefer for the the color to get darker when concentrations increase. In the next iterations I’m going to:

reverse the direction of the color scale
change the color of the lines to white
add a title and source

ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = MinorityPct), color = "#ffffff") +
  theme_void() +
  scale_fill_viridis(name="Minority Voting-Age Population (%)", direction=-1, labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Georgia, Official Congressional Districts",
    subtitle = "Percent Minority Voting-Age Population",
    caption = "Source: DRA | Voting: 2016-2020, Demographics: US Census, 2020 "
  )

Now we’ll create a map of partisan lean where we define the breaks specifically to show competitiveness as well as winner:

style by PctDem
use scale_fill_stepsn to define specific breaks and colors manually

# define the partisan colors
partisan_colors <- c('#bc131e', '#eb4956', '#c36e9e', '#7279db', '#3c6ebf', '#1f4bae')

ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = DemPct), color = "#ffffff") +
  theme_void() +
  scale_fill_stepsn(breaks=c(0, .4, .45, .5, .55, .6, 1),
                    colors = partisan_colors, 
                    name="Percent Democratic Votes (%)",
                    labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Georgia, Official Congressional Districts",
    subtitle = "Percent Democratic Voters (2016-2020)",
    caption = "Source: Dave's Redistricting App"
  )

Now that I have finalized two maps, let’s name them so that I can call them as objects and save them to my output folder

minority_official <- ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = MinorityPct), color = "#ffffff") +
  theme_void() +
  scale_fill_viridis(name="Minority Voting-Age Population (%)", direction=-1, labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Georgia, Official Congressional Districts",
    subtitle = "Percent Minority Voting-Age Population",
    caption = "Source: DRA | Voting: 2016-2020, Demographics: US Census, 2020 "
  )

partisan_official <- ggplot()  + 
  geom_sf(data = congress_off, mapping = aes(fill = DemPct), color = "#ffffff") +
  theme_void() +
  scale_fill_stepsn(breaks=c(0, .4, .45, .5, .55, .6, 1),
                    colors = partisan_colors, 
                    name="Percent Democratic Votes (%)",
                    labels=percent_format(accuracy = 1L)) +
  labs(
    title = "Georgia, Official Congressional Districts",
    subtitle = "Percent Democratic Voters (2016-2020)",
    caption = "Source: Dave's Redistricting App"
  )

# save ggplot object using ggsave
ggsave("figures/congress_official_percent_minority.png", 
       plot = minority_official, # specify the ggplot object you stored
       units = "in",
       height = 5, width = 7)

ggsave("figures/congress_official_percent_dem.png", 
       plot = partisan_official, # specify the ggplot object you stored
       units = "in",
       height = 5, width = 7)

In-class exercise

Make maps and a summary table of your DRA maps (or pick other published map)

Read in the geojsons of your competitive map from last week and the more balanced map from this week
Process them in the same way as the official map
Create the same summary table as above
Create the same map of Pct Minority
Create the same map of DemPct
Create a map of Margin - this variable show how much the winner of the election won by, how polarized the district is
- For this map, use the scale_fill_gradient() function to define a color scale by the lowest color and the highest color
- scale_fill_gradient() help section in R Studio
- Read the color scales chapter for more context

Final Projects

The final project for this course is a research paper that uses R to answer a research question and visualize the results. The project can be on a topic of your choosing, and can be a small group project, or individual. The deliverables will include:

The raw data collected for the project
The scripts used to complete the analysis
- processing script(s)
- analysis script(s)
- visualization script(s)
A research paper that includes:
- Description of your research question(s)
- Context for why this question important
- Short description of the data sources and analysis (the full methods will be at the end)
- Description of results + formatted data tables + formatted charts, maps
- Discussion of the results
- Description of the next steps to continue this research
- Methods appendix - description of your data sources, methods, and the assumptions that underlie your research

The research paper should be 3 pages without graphics and methods.

If you are working with a group your research paper should include at least one research question per person. Each person should have should have their own analysis and visualization scripts and their own methods appendix.

Assignment 8a. Readings

Read chapter 2: Collect, Analyze, Imagine, Teach. from Data Feminism by Catherine D’Ignazio and Lauren F. Klein

Assignment 8b. R Assignment

Did you know that you can also import census data as a shapefile with the tidycensus package?! Just add the parameter geometry = T to your get_acs() function. See example below:

library(tidyverse)
library(tidycensus)

### load all the variables for the ACS
# acs201519 <- load_variables(2019, "acs5", cache = T)

raw_income = get_acs(geography = "state",
                   variables = "B19013_001",
                   year = 2019, 
                   geometry = T) # this parameter imports the geometry

## Getting data from the 2015-2019 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

Create 2 maps using data you download from the 2015-19 American Community Survey with the tidycensus package. You can create any maps you like. You can even use this assignment to start thinking about your final project if you are ready for that.

Here are some example of 2 maps you could make if you want some inspiration:

Idea 1. Download the median rent for every county in New York (The variable is called MEDIAN CONTRACT RENT in the ACS).

Map ideas:

Create a choropleth map of the median rent in each county in New York
Create a choropleth map of areas that are affordable for a person making New York minimum wage
- assume 30% of income for rent, and 40 hours a week at NY minimum wage (currently the minimum wage is $12.50 outside of NYC, Westchester and Long Island, source). Col

Idea 2. Download the PEOPLE REPORTING ANCESTRY table for every census tract in Kings County (Brooklyn)

Map ideas:

Create a choropleth map of the percent of people that have West Indian ancestry in each census tract
Create a choropleth map of the percent of people that have Polish ancestry in each census tract

Idea 3. Download table for LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER for every census tract in all 5 counties in New York

Map ideas:

Create a choropleth map of the percent of people that “Speak only English”
Create a choropleth map of the percent of people that “Speak Spanish”

When you have finished your maps, save them to the figures folder in your class8 folder. Then go back through your script and clean it up: delete any lines of code that are not pertinent to your final analysis, write comments to explain all of your steps. Make this a script that you could open in a year and rerun.

At the bottom of the script write a commented out paragraph about your experience with this assignment. What was hard about this assignment, what you learned in this assignment, and how long it took you.

Upload your finalized script.

Week 8 Lesson

Class preparations

Outline

Download Congressional Maps

Spatial data in R

Creating maps in R

ggplot2 geoms

Building a map

In-class assignment

Final Projects

Assignment 8

Congressional Maps

Spatial data

Spatial data in R, the sf package

Processing and exploring spatial data

Create maps in R using ggplot2

ggplot2 geoms

Build a map

In-class exercise

Final Projects

Assignment 8a. Readings

Assignment 8b. R Assignment

Spatial data in R, the `sf` package