Mapping your own geocoded coordinates in R Studio using sf!

Prepared by Aaron Snow

As a graduate in GIS from BYU, I can tell you that ArcMap does NOT have an easy way to geocode custom coordinates (though ArcMap is very capable with pretty much everything else, just sayin’.) R Studio is where it’s at. I want to demonstrate how to make ANY map with whatever coordinates you want! It is easy even if you have never mapped stuff in R before.

To demonstrate a good example of this, you will need to download county data (not municipality or any other kind) from the following website. Make sure to download the shapefile, and not the geodatabase. R Studio is not very good with geodatabases, so get the shapefile.

Un-zip that file in a folder called “data”, and place that folder in your current directory. You should be able to see it the files tab.

Here is where we get into the nitty gritty. You will need to download the tidyverse package, and the sf, maps, maptools, and rgeos packages.

IF YOU ARE DOING THIS ON A MAC then pay close attention. There are some extra steps to take:

  1. Open Terminal (the tab right next to Console)

  2. Type xcode-select –install

A software update popup window should appear that will ask if you want to install command line developer tools. Click on “Install” (you don’t need to click on “Get Xcode”)

  1. Go to brew.sh, copy the big long command under “Install Homebrew” (starts with /usr/bin/ruby -e “$(curl -fsSL…), paste it into Terminal, and press enter.

This installs Homebrew, which is special software that lets you install Unix-y programs from the terminal.

  1. Type this line in Terminal to install gdal, geos, and proj:

brew install gdal geos proj udunits

  1. Wait for a few seconds/minutes while all that stuff compiles.

Verify that this all worked by running library(sf) in RStudio. You should see a message saying something like Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3. Hooray! That means it’s working!

So now, after downloading these packages, let’s throw the ones we will need into our code. While I am at it, I will add a variable where my shapfile will load into.

library(tidyverse)
library(sf)

utah_counties <- st_read("data/Counties/Counties.shp")
## Reading layer `Counties' from data source `/Users/aaronsnow/Documents/MPA/Fall 2018/DataViz/dataviz/CodeRunThrough/CodeThrough/data/Counties/Counties.shp' using driver `ESRI Shapefile'
## Simple feature collection with 29 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 228583.4 ymin: 4094744 xmax: 673944.6 ymax: 4653572
## epsg (SRID):    26912
## proj4string:    +proj=utm +zone=12 +datum=NAD83 +units=m +no_defs

Notice the st_read. That is basically the same as csv_read, except it is designed to read shapefiles!

Unless you are a geographer and are familiar with projections, don’t worry about the text that comes afterwards. That is basically letting you know that it worked.

Let’s see what that shapefile looks like: ggplot() usually has no parameters in the parentheses, all the action goes into the sf() function. Check it out!

ggplot() +
  geom_sf(data = utah_counties,color = "black", fill = "#228B22")+
  coord_sf(crs = 32612, datum = NA) +
  theme_void()

the coord_sf() function allows me to input the projection code that I would like. To be honest, there are thousands of projections and which one you use is entirely up to you. the 32612 is the UTM North Zone 12, which looks great with the state of Utah and Arizona.

Here is where I add my own specific geocoded coordinates. You can either get these coordinates from a GPS device, or you can simply go to Google Maps, right click on the spot you want to map, click on the link that says what’s here? and get the coordinates from there.

In this example, I want to map the national parks in Utah. In Google Maps, I found cooridnates for each park, and will load them into the tribble function.

park_data <- tribble(
  ~park, ~long, ~lat,
  "Zion", -112.999889, 37.280259,
  "Bryce Canyon", -112.181514, 37.584526,
  "Canyonlands", -109.86378, 38.326600,
  "Arches",-109.605731, 38.733819,
  "Capitol Reef", -111.270130, 38.364047
) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% 
  st_transform(crs = 32612) %>% 
  mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2])

Here is what the resulting data look like:

## Simple feature collection with 5 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 322703.3 ymin: 4127838 xmax: 621185.9 ymax: 4288162
## epsg (SRID):    32612
## proj4string:    +proj=utm +zone=12 +datum=WGS84 +units=m +no_defs
## # A tibble: 5 x 4
##   park            long      lat           geometry
##   <chr>          <dbl>    <dbl>        <POINT [m]>
## 1 Zion         322703. 4127838. (322703.3 4127838)
## 2 Bryce Canyon 395682. 4160375. (395681.8 4160375)
## 3 Canyonlands  599315. 4242664. (599314.6 4242664)
## 4 Arches       621186. 4288162. (621185.9 4288162)
## 5 Capitol Reef 476401. 4246243.   (476401 4246243)

Notice how I am making three columns, naming them “park”, “long”, and “lat”, and below that, in that order, adding the name of the park, and longitude of the park, then the latitude!

R Studio will not originally recognize that those latitudes and longitudes are not just random numbers, so we tell R Studio that those are actual coordinates with the st_as_sf(coords = c…) thing. The crs is the coordinate system code that the shapefile includes, but I do not like that one, so I am transforming the coordinate projection from 4326 to 32612 (to be honest, either number is fine, but I will show you the difference later).

Let’s see what it looks like so far:

ggplot() +
  geom_sf(data = utah_counties,color = "black", fill = "#228B22") + 
  geom_sf(data = park_data, color = "white") +
  coord_sf(crs = 32612, datum = NA) +
  theme_void()

Notice that there is a second sf() function with my park_data object.

Now I want to label each point. To do so I am going to do the same thing with the tribble function, except instead of using the same coordinates, I am going to adjust them a bit (if I use the same coordinates, then the labels will cover the points. Changing them slightly will just move the location of the labels.)

park_label <- tribble(
  ~park, ~long, ~lat,
  "Zion", -113.599889, 37.280259,
  "Bryce Canyon", -112.181514, 37.884526,
  "Canyonlands", -109.86378, 38.026600,
  "Arches",-109.605731, 39.033819,
  "Capitol Reef", -111.270130, 38.564047
) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% 
  st_transform(crs = 32612) %>% #
  mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2])

Notice the small differenes in coordinate values. I had to run this a few times just to make sure that the labels were in a good location. If they weren’t I adjusted accordingly (adding an extra 0.05 to longitude to move it over just a smidge.)

Let’s look at what we have now!

ggplot() +
  geom_sf(data = utah_counties,color = "black", fill = "#228B22") + 
  geom_sf(data = park_data, color = "white") +
  geom_label(data = park_label, aes(x = long, y = lat, label = park)) +
  #coord_sf(crs = 4326, datum = NA) + #
  coord_sf(crs = 32612, datum = NA) + #
  theme_void()

What a lovely looking map!

Just for fun, let’s look at the difference in projections. If I had been doing this whole report with a 4326 value instead of a 32612, here is what it would look like!

Notice how this map is a little more rigid and square. This is a NAD 83 projection instead of a UTM Zone 12. Thought the projections are super different, the overall look isn’t. Projections for cartographers is more a design tool than anything else.