The final project for this course is a research paper that uses R to answer a research question and visualize the results. The project can be on a topic of your choosing, and can be a small group project, or individual. The deliverables will include:
The research paper should be 3 pages without graphics and methods.
There are special file types necessary for adding a spatial dimension to your data. The two most common are:
Both formats contain geographic information that describes the location of each of observation. For a point file, that is most commonly the latitude and longitude of the point.
shapefiles are a collection of files that contain the location, data, and projection.
geojsons are one file that contain all the same information
sf spatial data packageThere are many R packages that you can use to work with spatial data, sf is the easiest and best because it treats spatial data exactly the same as a regular dataframe with the geometry as the last column.
main_data project in Rexplore_educational_attainment.Rscripts/data_exploration folderst_read() function to import the states shapefileraw_attainment_2020 <- get_acs(geography = "State",
variables = c(total_25_over = "B15003_001",
bachelors = "B15003_022",
masters = "B15003_023",
professional = "B15003_024",
phd = "B15003_025"),
year = 2020,
output = "wide")
# Create a new dataframe -- to calculate the percentage of bachelor's degree
# and remove Puerto Rico and Hawaii
attainment_2020 <- raw_attainment_2020 %>%
rename(state = NAME) %>%
mutate(pct_bachelors_plus = (bachelorsE + mastersE +
professionalE + phdE)/total_25_overE) %>%
filter(state != "Puerto Rico",
state != "Hawaii",
state != "Alaska")We’ll use a right join so that we only keep the continental US
right_join()Just like other plots, you build a map by adding instructions to the ggplot. Let’s start with a simple map of the Percent of people with at least a bachelors degree..
A choropleth map uses graduated color or patterns to show the range of a statistic.
ggplot() +
geom_sf(data = states_ed_attain,
mapping = aes(fill = pct_bachelors_plus)) +
theme_void() +
scale_fill_viridis(direction = -1,
name="Bachelors Degree or Higher (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Educational Attainment",
subtitle = "Percent of Adults with at least a Bachelors Degree",
caption = "Source: American Community Survey, 2020 "
)Using the RColorBrewer palette
display.brewer.all() in your console to see the names of palettesggplot() +
geom_sf(data = states_ed_attain,
mapping = aes(fill = pct_bachelors_plus)) +
theme_void() +
scale_fill_distiller(breaks=c(0, .1, .2, .3, .4, .5, .6),
palette = "Blues",
name="Bachelors Degree or Higher (%)",
labels=percent_format(accuracy = 1L)) +
labs(
title = "Educational Attainment",
subtitle = "Percent of Adults with at least a Bachelors Degree",
caption = "Source: American Community Survey, 2020 "
)You can also import census data as a shapefile with the tidycensus package! Just add the parameter geometry = T to your get_acs() or get_decennial() functions. See example below:
Create 2 maps using data you download from the 2016-20 American Community Survey with the tidycensus package. You can create any maps you like. You can even use this assignment to start thinking about your final project if you are ready for that.
When you have finished your maps, save them in the output folder of main_data.
Upload your finalized script to CANVAS.
See the next slides for some example of 2 maps you could make if you want some inspiration:
Download the median rent for every county in New York (The variable is called MEDIAN CONTRACT RENT in the ACS).
Map ideas:
Download the PEOPLE REPORTING ANCESTRY table for every census tract in Queens County
Map ideas:
Download table for LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER for every census tract in all 5 counties in New York
Map ideas: