The final project for this course is a research paper that uses R to answer a research question and visualize the results. The project can be on a topic of your choosing, and can be a small group project, or individual. The deliverables will include:
The research paper should be 3 pages without graphics and methods.
There are special file types necessary for adding a spatial dimension to your data. The two most common are:
Both formats contain geographic information that describes the location of each of observation. For a point file, that is most commonly the latitude and longitude of the point.
shapefiles are a collection of files that contain the coordinates that make up the shapes, the data associated with each shape, and other information about how your computer should draw the the shapes on earth. You need all of the files in a shapefile. They are meaningless if they are separated. This is the most common spatial data format.
geojsons are one file that contain all the same information.
sf
spatial data packageThere are many R packages that you can use to work with spatial data. We’ll use sf
because it treats spatial data exactly the same as a regular dataframe with the geometry as the last column.
cb_2018_us_state_5m.zip [1.0 MB]
part2
project in Rexplore_educational_attainment.R
scripts
folderst_read()
function from sf
to import the states shapefileraw_attainment_2020 <- get_acs(geography = "State",
variables = c(total_25_over = "B15003_001",
bachelors = "B15003_022",
masters = "B15003_023",
professional = "B15003_024",
phd = "B15003_025"),
year = 2020,
output = "wide")
# Create a new dataframe -- to calculate the percentage of bachelor's degree
# and remove Puerto Rico and Hawaii
attainment_2020 <- raw_attainment_2020 |>
rename(state = NAME) |>
mutate(pct_bachelors_plus = (bachelorsE + mastersE +
professionalE + phdE)/total_25_overE) |>
filter(state != "Puerto Rico",
state != "Hawaii",
state != "Alaska")
We’ll use a right join so that we only keep the continental US
right_join()
Just like other plots, you build a map with ggplot.
Percent of people with at least a bachelors degree
.A choropleth map uses graduated color to show the variation in a your data across your study area.
ggplot(data = states_ed_attain,
mapping = aes(fill = pct_bachelors_plus)) +
geom_sf() +
theme_void() +
scale_fill_viridis(direction = -1,
name="Bachelors Degree or Higher (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Educational Attainment",
subtitle = "Percent of Adults with at least a Bachelors Degree",
caption = "Source: American Community Survey, 2020 "
)
So far we have just accepted the defaults on how to display the data. An important step to building a map is understanding the shape of your data and how to best represent it in a map.
Percent of people with at least a bachelors degree
to see how we should define the color scheme.display.brewer.all()
in your console to see the names of RColorBrewer palettes that we can use to represent our data in a choropleth map.We’ll use scale_fill_fermenter()
to define the bins as every 5 percentage points from 0% to 60%, and select the color as Blue to Purple.
ggplot(data = states_ed_attain,
mapping = aes(fill = pct_bachelors_plus)) +
geom_sf() +
theme_void() +
scale_fill_fermenter(breaks=c(0, .05, .1, .15, .2, .25, .3, .35,
.4, .45, .5, .55, .6),
palette = "BuPu",
direction = 1,
name="Bachelors Degree or Higher (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Educational Attainment",
subtitle = "Percent of Adults with at least a Bachelors Degree",
caption = "Source: American Community Survey, 2020 "
)
This map is being dominated by DC, and it’s so small you can’t even see it! How does it look if we remove it?
ggplot(data = states_ed_attain |>
filter(state != "District of Columbia"),
mapping = aes(fill = pct_bachelors_plus)) +
geom_sf() +
theme_void() +
scale_fill_fermenter(breaks=c(0, .05, .1, .15, .2, .25, .3, .35,
.4, .45, .5, .55, .6),
palette = "BuPu",
direction = 1,
name="Bachelors Degree or Higher (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Educational Attainment",
subtitle = "Percent of Adults with at least a Bachelors Degree",
caption = "Source: American Community Survey, 2020"
)
Normally I might provide a note explaining that we removed an outlier. I’ll skip it this time since DC is so small, and not a state.
You can also import census data for most geograhies as a spatial dataframe with the tidycensus package!
geometry = T
to your get_acs()
or get_decennial()
functions. See example below:library(tidyverse)
library(tidycensus)
### load all the variables for the ACS
# acs201519 <- load_variables(2019, "acs5", cache = T)
raw_income = get_acs(geography = "county",
variables = c(total_25_over = "B15003_001",
bachelors = "B15003_022",
masters = "B15003_023",
professional = "B15003_024",
phd = "B15003_025"),
state = "GA",
year = 2020,
output = "wide",
geometry = T) # this parameter imports the geometry
** Note, Some geographies are not available from tidycensus as spatial dataframes.
Create 2 maps using data you download from the 2018-22 American Community Survey with the tidycensus package. You can create any maps you like. You can even use this assignment to start thinking about your final project if you are ready for that.
When you have finished your maps, save them in the output folder of part2.
Upload your finalized script to CANVAS.
See the next slides for some example of 2 maps you could make if you want some inspiration:
Download the median rent for every county in New York (The variable is called MEDIAN CONTRACT RENT in the ACS).
Map ideas:
if else
statement to define NYC, Westchester and Long Island’s minimum wage differently than the rest of the state. Get an extra 1 point if you figure out how to do that on your own. OR just define minimum wage as $15 everywhere for now and I’ll show you how to do it next week.Download the PEOPLE REPORTING ANCESTRY table for every census tract in Queens County
Map ideas:
Download table for LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER for every census tract in all 5 counties in New York
Map ideas: