Introduction

Using artificial student recruitment data for the Arizona-based Faux University, this code-through explores how to create simple maps—primarily via the ‘usmap’ package—that spatially display frequencies for each of the following recruitment funnel stages: students who applied to Faux University, students who were then admitted to Faux University, and students who enrolled (matriculated) at Faux University.


Content Overview

This code-through will first walk you through wrangling and transforming the recruitment data prior to mapping each stage separately using the ‘usmap’ package.

Secondary packages include:

mapview

pander

ggplot2

dplyr

sp

Use the install.packages() function when necessary.


Why You Should Care

Because spatial data has many applications across many sectors and industry verticals, there is no question that there is great utility in learning how to visually illustrate the size of a group while simultaneously showing where each group is located on a map.


Learning Objectives

Within this code-through, you will learn how to do the following:

1) Create separate datasets from the Fall 2023 Faux University applicant pool:

Applicants

Admitted Students

Enrolled Students

2) Retain key fields:

Longitude

Latitude

ID

City

3) Complete a coordinate map test:

Conversion to spatial data

Plot maps

4) Transform data for mapping

5) Plot maps using the ‘usmap’ package


Load Packages and Data

#Be sure to set default directory

library( pander )
library( usmap )
library( ggplot2 )
library( dplyr )
library( sp )
library( mapview )

Applicant_Data <- read.csv( "https://raw.githubusercontent.com/Bslyter/Faux_Files/main/Fall23_Faux_University_Recruit.csv")


head(Applicant_Data)


Create Separate Datasets

#Using 'filter' Function

Admit_Data <- filter( Applicant_Data, ADMIT_COUNT == "1" )

Enroll_Data <- filter( Applicant_Data, ENRL_COUNT == "1" )


Keep Key Fields

#Important note: The Fall23_Faux_University_Recruit dataset already contains latitude and longitude. 
#More often than not, most data you intend to plot will not include coordinates. 
#At best, the data will include the full address block, city/state, or just zip codes. 
#You will either need to manually look up coordinates or update your data en bulk 
#using a geocoding service.

#I would recommend using the tidygeocoder package which is tied to the Nominatim
#("osm") geocoding service: https://cran.r-project.org/web/packages/tidygeocoder/readme/README.html

#Note that most geocoding services require an API key. 

#Applicant_Data1 will be used for a coordinate map test, 
#just to ensure the dummy longitude and latitude values 
#make sense—spatially speaking.

Applicant_Data1 <- select(Applicant_Data, c( 'Longitude','Latitude' ) )

Applicant_Data2 <- select(Applicant_Data, c( 'Longitude','Latitude','ID','City' ) )

Admit_Data2 <- select(Admit_Data, c('Longitude','Latitude','ID','City') )

Enroll_Data2 <- select(Enroll_Data, c( 'Longitude','Latitude','ID','City' ) )


Coordinate Map Test

Applicant_Data.sp <- SpatialPointsDataFrame(coords = Applicant_Data1, 
                                            data = Applicant_Data1, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

mapview(Applicant_Data.sp)


Group Data

#Group by Count of Unique Records

Applicant_Summary <- Applicant_Data2 %>%
  group_by(Longitude, Latitude, City) %>%
  summarize(count = n_distinct(ID))

Admit_Summary <- Admit_Data2 %>%
  group_by(Longitude, Latitude, City) %>%
  summarize(count = n_distinct(ID))

Enroll_Summary <- Enroll_Data2 %>%
  group_by(Longitude, Latitude, City) %>%
  summarize(count = n_distinct(ID))

head(Applicant_Summary)


Transform Data

#Transform for Mapping (usmap)

Applicant_Map <- usmap_transform(Applicant_Summary, 
                                 input_names = c("Longitude","Latitude"),
                output_names = c("x","y") )

Admit_Map <- usmap_transform(Admit_Summary, 
                             input_names = c("Longitude","Latitude"),
                output_names = c("x","y") )

Enroll_Map <- usmap_transform(Enroll_Summary, 
                              input_names = c("Longitude","Latitude"),
                output_names = c("x","y") )

head(Applicant_Map)


Plot Applicant Map

Within the following code, think of how you would like this map to look.

  1. State color is currently set to “gray93.” Experiment with using different colors; note that a full palette of colors is available here: http://sape.inf.usi.ch/quick-reference/ggplot2/colour.

  2. The line connecting city names to each geom_point, the segment line, is current set to “black” and a segment size of 1. This can be changed to make the line less or more obvious.

  3. Because we want the size of the points to represent the number of applicants at each set of coordinates, it is important to set size to “count” within the aes() function. This is why we grouped by count using the group_by() and summarize() functions.

  4. Each point (geom_point) is currently set to “magenta4.” This color can be changed, as needed.

  5. Note that labels can also be updated within the lab() function.

  6. Ensure that “Applicant_Map” is listed after “data” in all relevant functions.

plot_usmap(fill = "gray93", alpha = 0.99) +
  ggrepel::geom_label_repel(data = Applicant_Map,
             aes(x = x, y = y, label = City),
             size = 2, alpha = 0.99,
             label.r = unit(0.5, "lines"), label.size = 0.5,
             segment.color = "black", segment.size = 1,
             seed = 1234) +
  geom_point(data = Applicant_Map,
             aes(x = x, y = y, size = count),
             color = "magenta4", alpha = 0.69) +
  scale_size_continuous(range = c(1, 16),
                        label = scales::comma) +
  labs(title = "Fall 2023 Faux University",
       subtitle = "Applications by City",
       size = "Count of Applicants") +
  theme(legend.position = "right")


Plot Admit Map

plot_usmap(fill = "gray93", alpha = 0.99) +
  ggrepel::geom_label_repel(data = Admit_Map,
             aes(x = x, y = y, label = City),
             size = 2, alpha = 0.99,
             label.r = unit(0.5, "lines"), label.size = 0.5,
             segment.color = "black", segment.size = 1,
             seed = 1234) +
  geom_point(data = Admit_Map,
             aes(x = x, y = y, size = count),
             color = "turquoise4", alpha = 0.69) +
  scale_size_continuous(range = c(1, 16),
                        label = scales::comma) +
  labs(title = "Fall 2023 Faux University",
       subtitle = "Admitted Students by City",
       size = "Count of Admitted Students") +
  theme(legend.position = "right")


Plot Enroll Map

plot_usmap(fill = "gray93", alpha = 0.99) +
  ggrepel::geom_label_repel(data = Enroll_Map,
             aes(x = x, y = y, label = City),
             size = 2, alpha = 0.99,
             label.r = unit(0.5, "lines"), label.size = 0.5,
             segment.color = "black", segment.size = 1,
             seed = 1234) +
  geom_point(data = Enroll_Map,
             aes(x = x, y = y, size = count),
             color = "tomato1", alpha = 0.69) +
  scale_size_continuous(range = c(1, 16),
                        label = scales::comma) +
  labs(title = "Fall 2023 Faux University",
       subtitle = "Enrolled Students by City",
       size = "Count of Enrolled Students") +
  theme(legend.position = "right")



Works Cited

This code through references and cites the following sources:


Lorenzo, P. D. (2022, February 27). Advanced Mapping. Cran.r-project.org: Advanced Mapping. Retrieved April 29, 2022, from https://cran.r-project.org/web/packages/usmap/vignettes/advanced-mapping.html

Wasser, L. (2014, December 1). R create a spatial bubble plot that overlays a basemap of the US and other spatial layers as needed. Stack Overflow. Retrieved April 29, 2022, from https://stackoverflow.com/questions/27328372/r-create-a-spatial-bubble-plot-that-overlays-a-basemap-of-the-us-and-other-spati

Rao, B., & ‘mk04.’ (2016, April 6). How to convert data frame to spatial coordinates. Stack Overflow. Retrieved April 29, 2022, from https://stackoverflow.com/questions/29736577/how-to-convert-data-frame-to-spatial-coordinates

Pauloo, R. (2018, October 27). Create spatialpointsdataframe. Stack Overflow. Retrieved April 29, 2022, from https://stackoverflow.com/questions/32583606/create-spatialpointsdataframe

Appelhans, T., Detsch, F., Reudenbach, C., & Woellauer, S. (n.d.). Interactive viewing of spatial data in R. Interactive Viewing of Spatial Data in R •. Retrieved April 29, 2022, from https://r-spatial.github.io/mapview/

Wilson, A. (2012). Working with Spatial Data. Retrieved April 29, 2022, from https://cmerow.github.io/RDataScience/04_Spatial.html

Filter: Return rows with matching conditions. RDocumentation. (n.d.). Retrieved April 29, 2022, from https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter

Marsja, E. (2020, November 24). Select columns in R by name, index, letters, & certain words with dplyr. Select Columns in R by Name, Index, Letters, & Certain Words with dplyr. Retrieved April 29, 2022, from https://www.marsja.se/select-columns-in-r-by-name-index-letters-certain-words-with-dplyr/

Schork, J. (n.d.). Count unique values by group in R (3 examples): Distinct numbers. Statistics Globe. Retrieved April 29, 2022, from https://statisticsglobe.com/count-unique-values-by-group-in-r

USMAP_TRANSFORM: Convert coordinate date frame to USMAP projection. RDocumentation. (n.d.). Retrieved April 29, 2022, from https://www.rdocumentation.org/packages/usmap/versions/0.6.0/topics/usmap_transform

ggplot2 Quick Reference: colour (and fill). Software and programmer Efficiency Research Group. (n.d.). Retrieved April 29, 2022, from http://sape.inf.usi.ch/quick-reference/ggplot2/colour