Lesson 4. More Data, More Maps!

Now that we know how to pull in data, check and transform Coordinate Reference Systems (CRS), and plot sf data.frames together - let’s practice doing the same thing with other geometry types. In this notebook we’ll be bringing in bike boulevards and schools, which will get us primed to think about spatial relationship queries.


Instructor Notes

Import Libraries

library(sf)
## Warning: package 'sf' was built under R version 4.1.2
library(here)

Load some data

counties = st_read(here("notebook_data",
                        "california_counties",
                        "CaliforniaCounties.shp"))
## Reading layer `CaliforniaCounties' from data source 
##   `/Users/ada/Projects/Events/2023/2023-02-16_21_DLab-GIS_in_R/github/R-Geospatial-Fundamentals-master/notebook_data/california_counties/CaliforniaCounties.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022
## Projected CRS: NAD83 / California Albers
ca_places = st_read(here("notebook_data",
                         "census",
                         "Places",
                         "cb_2019_06_place_500k.shp"))
## Reading layer `cb_2019_06_place_500k' from data source 
##   `/Users/ada/Projects/Events/2023/2023-02-16_21_DLab-GIS_in_R/github/R-Geospatial-Fundamentals-master/notebook_data/census/Places/cb_2019_06_place_500k.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1521 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.2694 ymin: 32.53416 xmax: -114.229 ymax: 41.99314
## Geodetic CRS:  NAD83

Mapping sf objects

Just like with other plots we can make in R, we can customize our maps’ colors, title, etc.

The most basic way to do this would be to use the sf::plot function.

# Make a map with plot
plot(counties$geometry, col='tan', border='darkgreen', main="CA counties")

However, we’ll get much more functionality and customizability if we use a special-purpose mapping package, rather than just relying on sf methods of base R functions.

R packages for Mapping sf objects

There are many different packages you can use to make maps of sf data, including:

  • ggplot2: which is fantastic if you are already a ggplot user
  • mapview: for a quick and easy interactive map
  • tmap: for great custom static and interactive maps
  • leaflet: for highly custom interactive maps that you can output and host on a website
  • shiny: for interactive R based applications that use leaflet maps

We won’t cover leaflet or shiny, but we will present an quick look at ggplot2 and mapview and then dive into tmap.

Maps with ggplot

Here is a simple example of a ggplot map syntax.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.2
ggplot(counties, aes(fill = MED_AGE)) + 
  geom_sf() +  # tells ggplot that geographic data are being plotted
  scale_fill_viridis_c() +
  theme_minimal() + 
  labs(title = "2010 Median Age by County")

Maps with mapview

The mapview package is the easiest way to make a basic interactive map with very little code.

library(mapview)

# Interactive map of counties
mapview(counties)

# with areas colored by data values
mapview(counties, zcol = 'MED_AGE')

# and overlaying CA Places data
mapview(counties, zcol = 'MED_AGE') + mapview(ca_places)

Mapview is great for exploring your data. Less great for making pretty custom maps.

4.2 Mapping with tmap

Our go-to mapping package of choice is tmap. Its name stands for “thematic maps”, i.e. maps in which you can use the attribute values of the features to control the style properties of the geometry, thus creating effective data visualizations.

tmap is great for creating both static and interactive maps.

You’ll get plenty of introduction here in the workshop, but for additional support you can check out the tmap vignette or Google other tutorials and references.

The Geocomputation with R book also has a great chapter on making maps with tmap and other R libraries.

Let’s start by loading the package and creating a ‘quick tmap’.

# load tmap
library(tmap)
## Warning: package 'tmap' was built under R version 4.1.2
# make tmap fix invalid polygons
tmap_options(check.and.fix = TRUE)

# plot a 'quick tmap'
qtm(counties)
## Warning: The shape counties is invalid. See sf::st_is_valid

Nice!

That’s the quickest, simplest example of a static map that tmap can make.

About tmap_options(check.and.fix = TRUE)

  • Sometimes during data creation or processing, polygon geometry gets a bit messed up. It may look great but one or more of the polylines may self-intersect or not close (i.e. snap to a node). This can cause some functions to return an error message or warning. The tmap_option check.and.fix and repair invalid geometry so that it can render an interactive or static map properly. You can also use the sf function st_make_valid to repair invalid geometry. See the function documentation for more information.
# save fixed geom
counties <- st_make_valid(counties)

Toggling tmap modes

tmap has 2 modes:

  • Use tmap_mode('plot') to set the mapping mode for static maps

  • Use tmap_mode('view') to set the mapping mode for interactive maps

  • use ttm() to quickly toggle between modes

tmap loads up in ‘plot’ mode. Let’s switch it to ‘view’ mode and then take a look at that same map.

# toggle the mode (or ttm!)
ttm()
## tmap mode set to interactive viewing
# then make our quick tmap again
qtm(counties)

That’s outstanding! We get a clickable, scrollable, zoomable map built with the Leaflet Javascript library… right out of the box!

  • You can change the basemap using the Layer Control under the Zoom Controls.

And to create thematic maps, we can use tmap’s more verbose mapping functions to create a new tmap object and then add geometry layers to it, setting different aesthetic aspects of those layers.

For now, let’s recreate that same map we made above with plot, but this time using tmap syntax:

tm_shape(counties) +  # use the `tm_shape` function to create a tmap object
  tm_polygons(col = 'tan', # add `tm_polygons` layer, coloring as before
              border.col = 'darkgreen', 
              alpha = 0.5) # & making transparent

Above, tm_shape creates the new tmap object by identifying the data source for the first map layer (counties). Then, tm_polygons provides the aesthetics for how the data should be displayed.

  • If we only want to display the polygon outlines we would use tm_borders instead of tm_polygons:
tm_shape(counties) +  # use the `tm_shape` function to create a tmap object
  tm_borders(col = 'darkgreen', 
             alpha = 0.85) 

Now we have two primary mapping tools:

- `plot`: nice for simple, pared down plotting tasks
- `tmap`: quick maps, both static and interactive, with greater flexibility

4.3 More Data, More Maps!

In this section we are going to bring in some new data for mapping

  • Berkeley Bike Boulevards
  • Alameda County Schools

First, we will read in the Berkeley bike boulevards in data. Note two things that are different from our previous data:

  • We’re reading in data from a GeoJSON file and not a shapefile

  • The data have line geometry (our county and states data had polygon geometries)

bike_blvds = st_read(here("notebook_data",
                          "transportation",
                          "BerkeleyBikeBlvds.geojson"))
## Reading layer `BerkeleyBikeBlvds' from data source 
##   `/Users/ada/Projects/Events/2023/2023-02-16_21_DLab-GIS_in_R/github/R-Geospatial-Fundamentals-master/notebook_data/transportation/BerkeleyBikeBlvds.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## Projected CRS: WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)

Of course, we could also use tmap to plot our lines:

# set to interactive view mode
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(bike_blvds) +
  tm_lines()

As usual, we’ll want to do a bit of data exploration…

head(bike_blvds)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 562293.8 ymin: 4189795 xmax: 562786.5 ymax: 4189899
## Projected CRS: WGS 84 / UTM zone 10N
##       BB_STRNAM BB_STRID    BB_FRO     BB_TO BB_SECID DIR_   Status ALT_bikeCA
## 1 Heinz/Russell      RUS       7th       8th    RUS01  E/W Existing         No
## 2 Heinz/Russell      RUS       8th       9th    RUS02  E/W Ezisting         No
## 3 Heinz/Russell      RUS       9th      10th    RUS03  E/W Existing         No
## 4 Heinz/Russell      RUS      10th San Pablo    RUS04  E/W Existing         No
## 5     San Pablo      RUS     Heinz   Russell    RUS05  N/S Existing         No
## 6       Russell      RUS San Pablo   Wallace    RUS06  E/W Exisitng          3
##   Shape_len len_km                       geometry
## 1 101.12817  0.101 MULTILINESTRING ((562293.8 ...
## 2 100.81407  0.101 MULTILINESTRING ((562391.6 ...
## 3 100.03740  0.100 MULTILINESTRING ((562489 41...
## 4 106.59288  0.107 MULTILINESTRING ((562585.7 ...
## 5  89.56348  0.090 MULTILINESTRING ((562688.9 ...
## 6  76.95699  0.077 MULTILINESTRING ((562711.4 ...
dim(bike_blvds)
## [1] 211  11
colnames(bike_blvds)
##  [1] "BB_STRNAM"  "BB_STRID"   "BB_FRO"     "BB_TO"      "BB_SECID"  
##  [6] "DIR_"       "Status"     "ALT_bikeCA" "Shape_len"  "len_km"    
## [11] "geometry"

Our bike boulevard data includes the following information:

  • BB_STRNAM - bike boulevard Streetname
  • BB_STRID - bike boulevard Street ID
  • BB_FRO - bike boulevard origin street
  • BB_TO - bike boulevard end street
  • BB_SECID- bike boulevard section id
  • DIR_ - cardinal directions the bike boulevard runs
  • Status - status on whether the bike boulevard exists
  • ALT_bikeCA - ?
  • Shape_len - length of the boulevard in meters
  • len_km - length of the boulevard in kilometers
  • geometry

Question

Why are there 211 features when we only have 8 bike boulevards?

And now take a look at our CRS…

st_crs(bike_blvds)
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 10N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 10N",
##     BASEGEOGCRS["WGS 84",
##         ENSEMBLE["World Geodetic System 1984 ensemble",
##             MEMBER["World Geodetic System 1984 (Transit)"],
##             MEMBER["World Geodetic System 1984 (G730)"],
##             MEMBER["World Geodetic System 1984 (G873)"],
##             MEMBER["World Geodetic System 1984 (G1150)"],
##             MEMBER["World Geodetic System 1984 (G1674)"],
##             MEMBER["World Geodetic System 1984 (G1762)"],
##             MEMBER["World Geodetic System 1984 (G2139)"],
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[2.0]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 10N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-123,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK)."],
##         BBOX[0,-126,84,-120]],
##     ID["EPSG",32610]]

We can see that the CRS of the bike blvds dataframe is UTM Zone 10N, NAD83. This is a common CRS for locations in Northern CA.

Alameda County Schools

Alright! Now that we have reviewed the bike boulevard data, we’re going to bring in Alameda County school data.

schools_df = read.csv(here("notebook_data",
                           "alco_schools.csv"))

head(schools_df)
##           X        Y                      Site               Address    City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900       Bay Farm Elementary   200 Aughinbaugh Way Alameda
## 3 -122.2589 37.76206  Donald D. Lum Elementary    1801 Sandcreek Way Alameda
## 4 -122.2348 37.76525         Edison Elementary  2700 Buena Vista Ave Alameda
## 5 -122.2381 37.75396     Frank Otis Elementary      3010 Fillmore St Alameda
## 6 -122.2616 37.76911       Franklin Elementary  1433 San Antonio Ave Alameda
##   State Type API    Org
## 1    CA   ES 933 Public
## 2    CA   ES 932 Public
## 3    CA   ES 853 Public
## 4    CA   ES 927 Public
## 5    CA   ES 894 Public
## 6    CA   ES 893 Public
dim(schools_df)
## [1] 550   9

Questions Without looking ahead:

  1. Is this a geodataframe?
  2. How do you know?



This is not an sf data.frame! A couple of clues to figure that out are..

  1. We’re reading in data from a Comma Separated Value (CSV) file, which is not a geospatial data file format.
  2. There is no geometry column (although we do have latitude and longitude values).

We can also check the object class of the dataframe.

class(schools_df)
## [1] "data.frame"

Although we are loading the school data as a regular, non-spatial data.frame, we can use the function st_as_sf to transform it into an sf spatial data.frame by:

  • specifying the columns that contain the point coordinates

  • and the identifying the CRS of the data

Let’s take another look at the dataframe.

  • What columns contain the point coordinates?
  • What is the CRS of the point data?
head(schools_df, 2)
##           X        Y                      Site               Address    City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900       Bay Farm Elementary   200 Aughinbaugh Way Alameda
##   State Type API    Org
## 1    CA   ES 933 Public
## 2    CA   ES 932 Public

Now lets use the st_as_sf function and what we know about the data to convert the dataframe to an sf spatial dataframe.

schools_sf = st_as_sf(schools_df,               # the name of the dataframe to make spatial
                       coords = c('X', 'Y'),    # the cols in that dataframe that contain the X & Y coordinate values
                       crs = 4326)              # the CRS, expressed as an EPSG code

# Take a look at the output
head(schools_sf)
## Simple feature collection with 6 features and 7 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -122.2616 ymin: 37.739 xmax: -122.2348 ymax: 37.76911
## Geodetic CRS:  WGS 84
##                        Site               Address    City State Type API    Org
## 1 Amelia Earhart Elementary 400 Packet Landing Rd Alameda    CA   ES 933 Public
## 2       Bay Farm Elementary   200 Aughinbaugh Way Alameda    CA   ES 932 Public
## 3  Donald D. Lum Elementary    1801 Sandcreek Way Alameda    CA   ES 853 Public
## 4         Edison Elementary  2700 Buena Vista Ave Alameda    CA   ES 927 Public
## 5     Frank Otis Elementary      3010 Fillmore St Alameda    CA   ES 894 Public
## 6       Franklin Elementary  1433 San Antonio Ave Alameda    CA   ES 893 Public
##                     geometry
## 1 POINT (-122.2388 37.74476)
## 2   POINT (-122.2519 37.739)
## 3 POINT (-122.2589 37.76206)
## 4 POINT (-122.2348 37.76525)
## 5 POINT (-122.2381 37.75396)
## 6 POINT (-122.2616 37.76911)

You’ll notice that the spatial dataframe is almost the same as the regular data.frame, except with one less column (because the two coordinate columns, X, and Y, were consumed into a single geometry column.

We can also check the class of the output spatial dataframe…

class(schools_sf)
## [1] "sf"         "data.frame"

Check the dimensions (number of rows and columns) - is it the same as for schools_df?

dim(schools_sf)
## [1] 550   8

Plot the point data

Now that it’s an sf data.frame, we can make a map of the schools with the plot function, just as we did for our other data sets. Notice that this is our first point dataset.

plot(schools_sf)

Question

How do we plot just the school points?

# plot the school points only
#plot(...)  

How can we use plot to overlay the schools on the Berkeley Bike Boulevards?

  • What might we need to first?
# Hint
st_crs(schools_sf) == st_crs(bike_blvds)
## [1] FALSE

So, we’ll want to transform the CRS to match that of our bike boulevard data.

# What are we setting the CRS to?
schools_utm10 = st_transform(schools_sf, st_crs(bike_blvds))

And make a map overlay of the two layers

plot(bike_blvds$geometry)
plot(schools_utm10$geometry, col= "red", add=T)

Plotting point data in tmap

We can also use tmap to plot any of our spatial dataframes.

Here, we can use the tmap tm_dots function for point data:

tm_shape(schools_utm10) +
  tm_dots(col = 'purple', 
          border.col = 'white', 
          border.lwd = 2, 
          size = 0.2)

Writing Data to a CSV file

In Lesson 2 we reviewed how to save sf data.frames to multiple file formats, such as the GeoJSON or shapefile formats, etc. However, point data is also often saved to a CSV file.

st_write(obj = schools_utm10, 
         dsn = here("outdata",
                    "schools_utm10.csv"), 
         layer_options = "geometry=AS_XY", 
         delete_dsn = T)

Use the RStudio File browser to take a look at the output.

4.4 Map Overlays with tmap

We can combine multiple spatial dataframes of different geometry types to create overlay maps.

Let’s take a look at how we do this in tmap with the schools and bike boulevard dataframes.

  • Note, we add lines to the tmap with the tm_lines function.
tm_shape(schools_utm10) + 
  tm_dots(size = 0.1) +
  tm_shape(bike_blvds) +
  tm_lines(col = 'red')

Questions What tmap functions do we use to

  • identify the spatial object that will be plotted?
  • add polygons to the map?
  • add polygon outlines but not the fill?
  • add points?
  • add lines?

Practice

Now try remaking the previous map with the schools_sf dataframe instead of the schools_utm10 dataframe. What does the output tell you about tmap?

## Your code here!

Yeh, tmap will dynamically transform (reproject) spatial dataframes so that they can overlay in the same map if the CRS of each dataframe is defined. If it is not defined, tmap will assume the WGS84 CRS (EPSG code 4326).

Asking questions of the data

If we want to answer questions like “What schools are close to bike boulevards in Berkeley?”, the above plot isn’t super helpful, since the extent covers all of Alameda county.

  • How could you answer this question with the data we have used in this lesson?

Exercise: Overlay Mapping

Let’s practice reading in and mapping some additional spatial data.

In the code cell provided below, write code to:

  1. Extract the Berkeley boundary from the CA Places data
  2. Create a static tmap with the Berkeley boundary, the Berkeley Bike Boulevards, and the Alameda County Schools, subset to the city of Berkeley.
  3. Repeat the above as an interactive tmap.
# YOUR CODE HERE

Solution hidden here!

To see it, right-click and select “inspect element” in your browser (or look in the 04_More_Data_More_Maps.Rmd file near line 427).

4.5 Recap

In this lesson we learned several new skills:

  • We looked at different R packages for making maps.
  • Took a deeper look at tmap and interactive mapping
  • Transformed a non-spatial data.frame into an sf spatial points data.frame - a very common work flow!
  • Worked with point and line spatial dataframes and created map overlays

4.6 Teaser for Part 2…

You may be wondering if and how we could make our maps more interesting and informative than we have so far.

To give you a tantalizing taste of Part 2, the answer is yes, we can! And here’s how (using an approach we hinted at earlier on)!

tm_shape(schools_utm10) + 
  tm_dots(col = 'Org', 
          palette = 'RdYlGn',
          size = 0.15, 
          border.col = 'black',
          title = 'Public and Private Schools, Alameda County')

 D-Lab @ University of California - Berkeley
 Team Geo