CLASS 7: Projecting data

library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)
## Loading required package: sp
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
## 
##     intersect, select, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(spData)
library(spDataLarge)
library(tmap)
library(maps)
library(ggplot2)
library(mapproj)

Coordinate Reference Systems

All kind of geographic information such as vector and raster data have intrinsic basic information, and probably the most important characteristic of all geographic information, the coordinates, based in the Coordinate Reference System. This CRS determine how the map is distributed on the earth surface. There are two kind of coordinate systems:

Geographic coordinate Systems:

These systems identify any location by using longitude and latitude. Longitude is the East-West direction from the first meridian. Latitude is angular distance North-South from the equatorial line.

The geographic coordinate systems are not measured in meters or miles, and usually represented by a spherical or ellipsoidal surface. Spherical representations are more beautiful but not useful since earth is not a perfect sphere and it distorses the real measures of earth. Therefore, sllipsoids are more accurate than sphere representations; these models are part of a larger set of components: the datum, which contains information of how ellipsoids must be represented according to certain parameters. All the information is recorded in the argument of poj4string. For more details: proj.org/usage/projections.html.

There are two types of datum, local and geocentric ones. The first ones, local, such as NAD83, are align with the surface and measurements of a particular location or zone. On the other hand, the geocentric one, the center of reference is the Earth’s center of gravity and the accuracy of projections are not optimized for any zone in particular, e.g. WGS84.

rm(list = ls())
data(World)  # The data is stored as an sf object
 
# Let's check its current coordinate system
st_crs(World)
## Coordinate Reference System:
##   User input: +proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Eckert IV"],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
World.ae <- st_transform(World, "+proj=aeqd +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

tm_shape(World.ae) + tm_fill() + tm_graticules()

World.aemaine <- st_transform(World, "+proj=aeqd +lat_0=44.5 +lon_0=-69.8 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

tm_shape(World.aemaine) + tm_fill()  + tm_graticules() 

World.sin <- st_transform(World,"+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
tm_shape(World.sin) + tm_fill()    + tm_graticules() 

Projected coordinate reference systems:

Projected CRRs are draw on cartesian plane, a flat surface. As a consequence, there is an origin determined by x and y axis. All of the projections are based on a geographic CRS and convert then into a surface map. This projection from a 3D surface to 2D surface cannot be done without incorporating some distortion on area, direction, distance, or shape. Some of the projection methods can preserve one or two of these characteristics: equal-area preserves area, azimuthal preserve direction, equidistant preserve distance, and conformal preserve local shape.

states <- map_data("state")

usamap <- ggplot(states, aes(x=long, y=lat, group=group)) +
  geom_polygon(fill="white", colour="black")

usamap + coord_map("mercator")

usamap + coord_map("azequalarea")

usamap + coord_map("gilbert")

usamap + coord_map("orthographic")

usamap + coord_map("conic", lat0 = 30)

Projections in R

There are two kinds of CRS in R, epsg and proj4string. An epsg code is usually short; consequently, easier to remember, and it refers to a unique CRS. proj4string, on the other hand, has more flexibility; therefore, it is more complicated and harder to remember.

A quick way to find out about available CRS is via the rgdal::make_EPSG() function, which outputs a data frame of available projections:

crs_data <- rgdal::make_EPSG()
View(crs_data)

In sf the CRS of an object can be retrieved using st_crs():

vector_filepath <- system.file("vector/zion.gpkg", package = "spDataLarge")
new_vector <- st_read(vector_filepath)
## Reading layer `zion' from data source `C:\Program Files\R\R-4.0.2\library\spDataLarge\vector\zion.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 11 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 302903.1 ymin: 4112244 xmax: 334735.5 ymax: 4153087
## projected CRS:  UTM Zone 12, Northern Hemisphere
st_crs(new_vector) # get CRS
## Coordinate Reference System:
##   User input: UTM Zone 12, Northern Hemisphere 
##   wkt:
## BOUNDCRS[
##     SOURCECRS[
##         PROJCRS["UTM Zone 12, Northern Hemisphere",
##             BASEGEOGCRS["GRS 1980(IUGG, 1980)",
##                 DATUM["unknown",
##                     ELLIPSOID["GRS80",6378137,298.257222101,
##                         LENGTHUNIT["metre",1,
##                             ID["EPSG",9001]]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]]],
##             CONVERSION["UTM zone 12N",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-111,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["Meter",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",0,
##                     LENGTHUNIT["Meter",1],
##                     ID["EPSG",8807]],
##                 ID["EPSG",16012]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["Meter",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["Meter",1]]]],
##     TARGETCRS[
##         GEOGCRS["WGS 84",
##             DATUM["World Geodetic System 1984",
##                 ELLIPSOID["WGS 84",6378137,298.257223563,
##                     LENGTHUNIT["metre",1]]],
##             PRIMEM["Greenwich",0,
##                 ANGLEUNIT["degree",0.0174532925199433]],
##             CS[ellipsoidal,2],
##                 AXIS["latitude",north,
##                     ORDER[1],
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 AXIS["longitude",east,
##                     ORDER[2],
##                     ANGLEUNIT["degree",0.0174532925199433]],
##             ID["EPSG",4326]]],
##     ABRIDGEDTRANSFORMATION["Transformation from GRS 1980(IUGG, 1980) to WGS84",
##         METHOD["Position Vector transformation (geog2D domain)",
##             ID["EPSG",9606]],
##         PARAMETER["X-axis translation",0,
##             ID["EPSG",8605]],
##         PARAMETER["Y-axis translation",0,
##             ID["EPSG",8606]],
##         PARAMETER["Z-axis translation",0,
##             ID["EPSG",8607]],
##         PARAMETER["X-axis rotation",0,
##             ID["EPSG",8608]],
##         PARAMETER["Y-axis rotation",0,
##             ID["EPSG",8609]],
##         PARAMETER["Z-axis rotation",0,
##             ID["EPSG",8610]],
##         PARAMETER["Scale difference",1,
##             ID["EPSG",8611]]]]
# If your coordinate system is wrong or fails, you can set a new one:
new_vector <- st_set_crs(new_vector, 4362) # set CRS, 4362 is for NAD83
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
ggplot(new_vector) + geom_sf() 

ggplot(data = world) +  geom_sf() +
  coord_sf(xlim = c(-102.15, -30.12), ylim = c(-30, 15), expand = FALSE)

#Japan
ggplot(data = world) +  geom_sf() +
  coord_sf(xlim = c(110, 150.12), ylim = c(25, 55), expand = FALSE)

Reprojecting geographic data

As it was mentioned, there are two kinds of CRS: geographic (long/lat) and projected (meters from datum). Some of functions assume that data has projected information, therefore, we have to be careful and check once we work with out geographic data.

japan <- data.frame(lon = 135, lat = 40) %>% 
  st_as_sf(coords = c("lon", "lat"))
st_is_longlat(japan)
## [1] NA

unless a CRS is manually specified or is loaded from a source that has CRS metadata, the CRS is NA.

japan_geo <- st_set_crs(japan, 4326)
st_is_longlat(japan_geo)
## [1] TRUE
ggplot(data = world) +  geom_sf() + geom_sf(data=japan_geo) +
  coord_sf(xlim = c(110, 150.12), ylim = c(25, 55), expand = FALSE) 

Datasets without a specified CRS can cause problems

japan_buff_no_crs <- st_buffer(japan, dist = 7)
japan_buff <- st_buffer(japan_geo, dist = 7)
## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
ggplot(data = world) +  geom_sf() + geom_sf(data=japan_geo) + 
  geom_sf(data=japan_buff, color=alpha("red",1), fill=NA) + 
  coord_sf(xlim = c(110, 150.12), ylim = c(25, 55), expand = FALSE) 

The warning message is telling us that the result may be of limited use because it is in units of latitude and longitude, rather than meters assumed by st_buffer().

The warning suggests to reproject the data onto a projected CRS. This suggestion does not always need to be heeded: performing spatial and geometric operations makes little or no difference in some cases (e.g., spatial subsetting). But for operations involving distances such as buffering, the only way to ensure a good result is to create a projected copy of the data and run the operation on that. This is done in the code chunk below:

japan_proj <- data.frame(x = 4000000, y = 9000000) %>% 
  st_as_sf(coords = 1:2, crs = 27700)

The result is a new object that is identical to japan point, but reprojected onto a suitable CRS (in this case, the British National Grid, which has an EPSG code of 27700) that has units of meters.

st_crs(japan_proj)
## Coordinate Reference System:
##   User input: EPSG:27700 
##   wkt:
## PROJCRS["OSGB 1936 / British National Grid",
##     BASEGEOGCRS["OSGB 1936",
##         DATUM["OSGB 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["British National Grid",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996012717,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E"],
##         BBOX[49.75,-9.2,61.14,2.88]],
##     ID["EPSG",27700]]

Notable components of this CRS description include the EPSG code, the projection, the origin, and units. The fact that the units of the CRS are meters tells us that this is a projected CRS. This is used as the new buffer distance:

japan_proj_buff <- st_buffer(japan_proj, 1000000)
ggplot(data = world) +  geom_sf() + geom_sf(data=japan_proj) + 
  geom_sf(data=japan_proj_buff, color=alpha("red",1), fill=NA) + 
  coord_sf(datum = sf::st_crs(27700), xlim = c(110, 150.12), ylim = c(25, 55), expand = FALSE) 

When to reproject?

In the previous section we set the CRS manually, with st_set_crs(japan, 4326). In real applications, CRSs are usually set automatically. The main task involving CRSs is often to transform objects, from one CRS into another. But when should data be transformed? And into which CRS?

First, lets check what are the distances between two datsets:

#st_distance(japan_geo, japan_proj)

To make the japan and japan_proj objects geographically comparable one of them must be transformed into the CRS of the other. But which CRS to use? The answer is usually ‘to the projected CRS’, which in this case is the British National Grid (EPSG:27700):

japan2 <- st_transform(japan_geo, 27700)

Now that a transformed version of japan has been created, using the sf function st_transform(), the distance between the two representations of japan can be found.

st_distance(japan2, japan_proj)
## Units: [m]
##          [,1]
## [1,] 110357.7

Which CRS to use?

For geographic CRS, the answer is often WGS84 because it is the most common CRS in the world, so it is worth knowing its EPSG code: 4326.

In cases where an appropriate CRS is not immediately clear, the choice of CRS should depend on the properties that are most important to preserve in the subsequent maps and analysis. All CRSs are either equal-area, equidistant, conformal (with shapes remaining unchanged), or some combination of compromises of those.

When deciding on a custom CRS, consider these ones:

  • Lambert azimuthal equal-area (LAEA) projection for a custom local projection, which is an equal-area projection at all locations but distorts shapes beyond thousands of kilometers.
  • Azimuthal equidistant (AEQD) projections for a specifically accurate straight-line distance between a point and the centre point of the local projection.
  • Lambert conformal conic (LCC) projections for regions covering thousands of kilometers, with the cone set to keep distance and area properties reasonable between the secant lines.
  • Stereographic (STERE) projections for polar regions, but taking care not to rely on area and distance calculations thousands of kilometers from the center.
  • Universal Transverse Mercator (UTM), a set of CRS that divides the Earth into 60 longitudinal wedges and 20 latitudinal segments. The transverse Mercator projection used by UTM CRS is conformal but distorts areas and distances with increasing severity with distance from the center of the UTM zone.

To learn how the system works, let’s create a function, lonlat2UTM() to calculate the EPSG code associated with any point on the planet as follows:

lonlat2UTM = function(lonlat) {
  utm = (floor((lonlat[1] + 180) / 6) %% 60) + 1
  if(lonlat[2] > 0) {
    utm + 32600
  } else{
    utm + 32700
  }
}

The following command uses this function to identify the UTM zone and associated EPSG code for Nagoya and the Japan point we previously created:

epsg_utm_nag <- lonlat2UTM(c(136.9, 35.179))
epsg_utm_jpn <- lonlat2UTM(st_coordinates(japan))
st_crs(epsg_utm_nag)$proj4string
## [1] "+proj=utm +zone=53 +datum=WGS84 +units=m +no_defs"
st_crs(epsg_utm_jpn)$proj4string
## [1] "+proj=utm +zone=53 +datum=WGS84 +units=m +no_defs"

Maps of UTM zones such as that provided by dmap.co.uk confirm that London is in UTM zone 53U.

Reprojecting vector geometries

Using data for the cycle hire locations across London. The function st_crs() gives us a list of class crs, with names proj4string and epsg for its code. This is demonstrated below:

crs_lnd <- st_crs(cycle_hire_osm)
class(crs_lnd)
## [1] "crs"
crs_lnd$epsg
## [1] 4326

This duality of CRS objects means that they can be set either using an EPSG code or a proj4string.

cycle_hire_osm_projected <- st_transform(cycle_hire_osm, 27700)

The resulting object has a new CRS with an EPSG code 27700. We can find the name of the CRS:

crs_codes <- rgdal::make_EPSG()[1:2]
dplyr::filter(crs_codes, code == 27700)
##    code                              note
## 1 27700 OSGB 1936 / British National Grid
st_crs(27700)$proj4string
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

Modifying map projections

When mapping the world while preserving area relationships, the Mollweide projection is a good choice. To use this projection, we need to specify it using the proj4string element, “+proj=moll”, in the st_transform function:

world_mollweide <- st_transform(world, crs = "+proj=moll")
ggplot(world_mollweide) + geom_sf()

The below code transforms the coordinates to the Lambert azimuthal equal-area projection centered on longitude and latitude of 0.

world_laea1 <- st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=0 +lat_0=0")
ggplot(world_laea1) + geom_sf()

We can change the PROJ parameters, for example the center the map centered on New York City, using the +lon_0 and +lat_0 parameters.

world_laea2 <- st_transform(world, crs = "+proj=laea +x_0=0 +y_0=0 +lon_0=-74 +lat_0=40")
ggplot(world_laea2) + geom_sf()