Mapping in R

This is a brief introduction to mapping in R using both ggplot2 (static maps) and leaflet (interactive maps).

# install and load packages ----
  #install.packages(c("rgdal","leaflet","ggplot2","acs","tigris","ggmap","stringr"))
  library(rgdal)
  library(leaflet)
  library(ggplot2)
  library(acs)
  library(rgeos)
  library(tigris)
  library(ggmap)
  library(stringr)
  library(raster)

# set working directory ----
  setwd('[DIRECTORY_WHERE_DATA_IS_STORED]')

# prep ----
  #Sys.setenv(GOOGLE_MAP_GEOCODING_KEY="AIzaSyDobnUnB0yf5LOK8sJmIBU8-fuzKfeYmYs") # need Google geocoding key to geocode addresses
  register_google(key = "[YOUR_GOOGLE_MAPS_API_KEY]")
  options(tigris_use_cache=TRUE) # will temporarily save downloaded data so it won't need to re-download 
                                 # - **only use if you're downloading data through R**
  state <- c(26) # also for downloading data - Michigan's FIPS code is 26
  api.key.install(key="[YOUR_CENSUS_API_KEY]") # need for downloading ACS data in R

We’ve installed and loaded packages that contain functions we’ll need to map. We’ve also set the working directory, loaded keys for geocoding using Google Maps API and for downloading American Community Survey data directly in R.

Next, we’ll initialize different boundaries that we can use when creating static maps in ggplot2.

# set map boundaries ----
  # State
    statex <- c(-90.41, -81.5)
    statey <- c(41.63,47.54)
  # Detroit
    detroitx <- c(-84, -81.5)
    detroity <- c(42.04,43)
  # Dearborn/Dearborn Heights
    ddhx <- c(-83.315,-83.137)
    ddhy <- c(42.267,42.353)
  # Flint 
    flintx <- c(-83.830,-83.557)
    flinty <- c(42.913,43.124)
  # Tecumseh
    tecumsehx <- c(-84.024,-83.862)
    tecumsehy <- c(41.965,42.050)

We’ll only use the state of Michigan boundaries, but you can play around and zoom in on Detroit, Dearborn/Dearborn Heights, Flint and Tecumseh. These are all areas of interest according to Leigh and Rafael.

Next, we’ll load a basic CSV file and explore some basic functions in R for exploring the data.

# loading data ----
  # CSV
    ruralclinics <- read.csv("ruralclinics.csv",header=T,stringsAsFactors = F)
    
    # some basic functions
    dim(ruralclinics) # gives dimensions of dataframe [rows columns]
    names(ruralclinics) # gives variable names of dataframe
    head(ruralclinics,n=5) # prints first 5 observations
    View(ruralclinics) # shows dataframe as a spreadsheet
    class(ruralclinics) #tells you what kind of object ruralclinics is
    class(ruralclinics$physician) # tells you the type of data the physician variable is
    class(ruralclinics$zip) # tells you the type of data the zip variable is
    
    class(ruralclinics$system) # character variable
    ruralclinics$system # prints the values of the variable
    ruralclinics$system <- as.factor(ruralclinics$system) # convert to factor
    class(ruralclinics$system) # factor variable
    levels(ruralclinics$system)

The class function is particularly useful. A lot of times errors will pop-up when variables are not in the form you think they are (sometimes numbers can be character variables or variables you think are factors will be character variables).

The following chunk will load a KML (can be seen as .kml.xml).

This next chunk covers downloading American Community Survey (ACS) data directly into R.

# downloading data ----
  # downloading ACS data
      geo<-geo.make(state='MI', # we want to map census tracts in Michigan
                    county="*",
                    tract="*")
      endyr=2015 # we want 2015 ACS data
      sp=5 # we want the 5-year estimate data
      tabnum = "B04006" # we want the ancestry data table
      ancestry <- acs.fetch(endyear=endyr,span=sp,geography=geo, # we can now pull the 2015 ACS 5-year Ancestry table by census tract
                            table.number=tabnum,col.names="pretty") # "pretty" keeps the long variable names
      View(ancestry)
  # see quick_acs.R file for coding on manipulating and mapping ACS data

A more detailed code exploring and mapping ACS data can be found in the quick_acs.R file in the CODE folder.

The next section shows how to download maps with different geographic subdivisions. It also demonstrates how to save these files for future use.

  # downloading geodata
  pumas <- pumas(state='MI',cb=TRUE) # will download Public Use Microdata Survey areas
  plot(pumas) # plot these for fun
  zipcodes <- read.csv("zip codes.csv",header=T)
  zips <- zctas(starts_with = c('48','49'),cb=TRUE) # will download zip codes map of Michigan 
  counties <- counties <- counties(state='MI',cb=T) # will download counties map of Michigan
  
# saving geodata ----
  writeOGR(obj=zips,dsn=".",layer="testzips",driver="ESRI Shapefile") # save zip code map for future use
  testzips <- readOGR(dsn=".",layer="testzips") # load new shapefile
  plot(testzips) # see what it looks like
  
  rm(testzips,counties,pumas) # removes objects from environment

Similarly, if you already have shapefiles that were saved previously or were downloaded (for example, via TIGER from the US Census), you can load the files as follows.

# loading Shapefiles ----
  # Shapefiles
  catch.zips <- readOGR(dsn=".",layer="catch.zips") # loads outline of UMRCC catchment area
  rcount <- readOGR(dsn=".",layer="rcount") # loads outline of rural counties according to Leigh
  counties <- readOGR(dsn=".",layer="counties") # loads county map of Michigan

This example will demonstrate how to geocode addresses (turn them into coordinates). You will need a Google Maps API key that was run earlier in the code to access the geocoding feature.

# geocoding addresses ----
  # we have the rural clinics CSV file loaded earlier
  View(ruralclinics)
  # it shows different variables, including street addresses, cities, state, and ZIP code
  
  # this is all we need to geocode addresses
  # Read in the CSV data and store it in a variable
  ruralclinics$fulladdress <- paste0(ruralclinics$address,", ",ruralclinics$city,", ",ruralclinics$state)

  # Initialize the data frame
  geocoded <- data.frame(stringsAsFactors = FALSE)

  # Loop through the addresses to get the latitude and longitude of each address and add it to the
  # origAddress data frame in new columns lat and lon
  for(i in 1:nrow(ruralclinics)){
    # Print("Working...")
    result <- geocode(ruralclinics$fulladdress[i], output = "latlona", source = "google")
    ruralclinics$lon[i] <- as.numeric(result[1])
    ruralclinics$lat[i] <- as.numeric(result[2])
    ruralclinics$geoAddress[i] <- as.character(result[3])
  }

  View(ruralclinics)
  write.csv(ruralclinics,file="test_geocoded.csv") # saves geocoded dataframe as CSV for future use

Now for the fun part. This next code will create a map of Michigan with county names, an outline of the UM Rogel Cancer Center catchment area (defined using ZIP codes), and a shaded area representing rural counties in the state (according to Leigh).

# mapping using ggplot ---- map of Michigan counties with names, UMRCC catchment area and rural counties

  # Find a center point for each region
  centers <- data.frame(gCentroid(counties, byid = TRUE, id=counties$NAME))
  centers$NAME <- rownames(centers)
  
  # plot
  ggplot() + # starts ggplot
    geom_polygon(data = counties , aes(x=long, y=lat, group = group),color="black",fill=NA) + # plot counties
    geom_polygon(data = rcount , aes(x=long, y=lat, group = group),color="red",fill="red",alpha=0.3) + # plots rural county outline
    geom_polygon(data = catch.zips , aes(x=long, y=lat, group = group),color="blue",fill=NA) + # plots UMRCC outline
    coord_map(xlim = statex, ylim = statey) + # sets map zoom
    geom_text(data = centers, aes(x=x,y=y,label = paste(as.character(NAME), sep="")), hjust = 0.5, color = "black",size=2) + # plots county names
    theme(axis.line=element_blank(),         # these are just visual options for the map - basically removes axes, lines, background, legend, etc
          axis.text.x=element_blank(),
          axis.text.y=element_blank(),
          axis.ticks=element_blank(),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          legend.position="none",
          panel.background=element_blank(),
          panel.border=element_blank(),
          panel.grid.major=element_blank(),
          panel.grid.minor=element_blank(),
          plot.background=element_blank())

Lastly, here is an example of mapping using leaflet - a package that creates interactive maps. It will map Michigan with points for pharmacies, hospitals, and lung cancer screening centers. It will also contain the outline for the UM Rogel Cancer Center catchment center and a shaded area representing rural counties.

# mapping using leaflet ---- map of Michigan with pharmacies, hospitals, lung cancer screening centers,
#                            UMRCC catchment area and rural counties
  pharms <- read.csv("pharmacy_geocoded.csv",header=T,stringsAsFactors = F)
  lung <- read.csv("acrlcc_geocoded.csv",header=T,stringsAsFactors = F)
  
  r <- 3 # marker radius
  w <- 2 # marker width
  o <- 1 # opacity
  
  # specify pop-up info for clicking on markers
  popup.pharms <- paste0("Brand: ", pharms$Brand,"<br>",
                         "Address: ",pharms$Address,"<br>",
                         "City: ",pharms$City,"<br>",
                         "Phone: ",pharms$Phone)
  popup.rhcs <- paste0("Name: ",rhcs$name,"<br>",
                       ifelse(rhcs$description=="","",paste0("Phone: ",rhcs$description)))
  popup.lung <- paste0("Name: ",lung$facility.name,"<br>",
                       "Address: ",lung$address,"<br>",
                       "City: ",lung$city,"<br>",
                       "Status: ",lung$status,"<br>",
                       "Expiration: ",lung$expiration)
  
  leaflet() %>%
    # Base groups
    addProviderTiles("CartoDB.Positron") %>%
    # Overlay outlines
    addPolygons(data=rcount,
                color='#FF0000',
                fillOpacity=0.3,
                weight=.01,
                smoothFactor=0.2
                #popup = paste0("County: ",counties$GEOID,"<br>","rural: ",counties$rural)
                ) %>%
    addPolygons(data=catch.zips,
                opacity=1,
                fill=0,
                color='#000000',
                weight=1.5,
                smoothFactor=0.5) %>%
    addPolygons(data=counties,
                opacity=.5,
                fill=0,
                color='#cccce5',
                weight=1,
                smoothFactor=0.5) %>%
    # Overlay groups
    addCircleMarkers(data=pharms, lat=~lat,lng=~lon,color="#f44336",radius=r,weight=w,opacity=o,group="Pharmacies",
                     popup=popup.pharms) %>%
    addCircleMarkers(data=rhcs, lat=~lat,lng=~long,color="#4caf50",radius=r,weight=w,opacity=o,group="RHCs",
                     popup=popup.rhcs) %>%
    addCircleMarkers(data=lung, lat=~lat,lng=~lon,color="#3f51b5",radius=r,weight=w,opacity=o,group="LCS Centers",
                     popup=popup.lung) %>%
    addLabelOnlyMarkers(data=centers,~x, ~y, label =  ~as.character(NAME),
                        labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T,style=list('font-weight'='normal'),padding='3px 8px',textsize='8px')) %>%
    #Layers control
    addLayersControl(
      overlayGroups=c("Pharmacies","RHCs","LCS Centers"),
      options=layersControlOptions(collapsed=F)
    ) %>%
    addLegend(position="bottomright",
              colors=c("#f44336","#4caf50","#3f51b5"),
              values=c("Pharmacies","RHCs","LCS Centers"),
              labels=c("Pharmacies","RHCs","LCS Centers"))

And there you have it! You’ve learned how to map static maps using ggplot2 and dynamic maps using leaflet.

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] raster_2.6-7  ggmap_2.7.900 tigris_0.7    rgeos_0.3-26  acs_2.1.3    
##  [6] XML_3.98-1.11 stringr_1.3.0 ggplot2_3.0.0 leaflet_1.1.0 rgdal_1.2-18 
## [11] sp_1.2-7     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17      lattice_0.20-35   png_0.1-7        
##  [4] class_7.3-14      assertthat_0.2.0  rprojroot_1.3-2  
##  [7] digest_0.6.15     mime_0.5          R6_2.2.2         
## [10] plyr_1.8.4        backports_1.1.2   evaluate_0.10.1  
## [13] e1071_1.6-8       httr_1.3.1        pillar_1.2.1     
## [16] RgoogleMaps_1.4.2 rlang_0.2.1       lazyeval_0.2.1   
## [19] uuid_0.1-2        rmarkdown_1.9     labeling_0.3     
## [22] udunits2_0.13     foreign_0.8-69    htmlwidgets_1.2  
## [25] munsell_0.4.3     shiny_1.0.5       compiler_3.4.0   
## [28] httpuv_1.3.6.2    pkgconfig_2.0.1   htmltools_0.3.6  
## [31] tidyselect_0.2.4  tibble_1.4.2      dplyr_0.7.6      
## [34] withr_2.1.2       sf_0.6-1          bitops_1.0-6     
## [37] rappdirs_0.3.1    grid_3.4.0        jsonlite_1.5     
## [40] xtable_1.8-2      gtable_0.2.0      DBI_0.8          
## [43] magrittr_1.5      units_0.5-1       scales_0.5.0     
## [46] stringi_1.2.2     mapproj_1.2.6     bindrcpp_0.2.2   
## [49] rjson_0.2.20      tools_3.4.0       glue_1.3.0       
## [52] markdown_0.8      purrr_0.2.5       maps_3.3.0       
## [55] crosstalk_1.0.0   jpeg_0.1-8        yaml_2.1.19      
## [58] colorspace_1.3-2  maptools_0.9-2    classInt_0.1-24  
## [61] knitr_1.20        bindr_0.1.1