Citation

Please cite this tutorial as:

Palacios, D.M. 2016. Marine Boundaries in R: Reading EEZ Shapefiles. RPubs. Accessed dd/mm/yyyy. https://rpubs.com/danielequs/marine_boundaries

2017 updates

As an additional resource, I recently became aware of the excellent Intro to Spatial Analysis in R by Jamie Afflerbach, which covers handling shapefiles, GeoTIFF files, and satellite remote sensing data in R.

Also, the new R package sf (for simple features) by Edzer Pebesma, makes working with complex geospatial objects much easier, including reading and mapping shapefiles. In the future I hope to update this post to implement sf functions.

The reluctant shapefile user

I don’t expect to win the 2016 R Shapefile Contest with this entry, but the contest inspired me to document a simple but rather tedious process I have had to perform on occasion. The task consists of reading in R a shapefile containing a marine political boundary (i.e., a polygon defined by longitude and latitude coordinates) known as the Exclusive Economic Zone (EEZ) that can be used to constrain further analyses.

(August 1, 2016 update: the Results from the R Shapefile Contest were posted… I got an honorable mention :-))

I’ve never been a fan of GIS software and my choice for using a language like R is that it has robust libraries for conducting virtually any type of analysis I might encounter, from data input to final graphing and mapping. However, on occasion I am provided data by colleagues, government agencies, or funders that come as shapefiles, and by necessity I need to be able to bring them into R. I was initially bogged down by the number of R packages that can read shapefiles (but grateful they exist, nonetheless). From my informal survey, these packages and corresponding functions include:

However, my impression from the majority of posts I read on Stack Overflow, GitHub, and elsewhere on the internet is that rgdal is the preferred package for vector data, and raster for raster data. If there are better options please let me know via Twitter. Also, it may be that some of the more derived packages simply implement wrappers around the functions from the more general packages.

One of the things I find especially frustrating about working with shapefiles is their complex hierarchical structure and the large amounts of metadata they contain. One can spend a lot of time trying to understand their structure and contents. That said, this is not unique to shapefiles; I have worked extensively with other file formats like netCDF and HDF that follow analogous structures.

As we will see in the remainder of this post, in the case of shapefiles the standard appears to be that the main data are under the object @data and, within it, the longitude and latitude vectors of interest are under $long and $lat, respectively. Also, the rgdal function ogrInfo() can be used to generate a listing of the “features” and “fields” in a shapefile, which can come in handy for further refining the specification of the data we’re interested in extracting.

Preliminaries

Load the R packages we will require:

library("rgdal") # for `ogrInfo()` and `readOGR()`
library("tools") # for `file_path_sans_ext()`
library("dplyr") # for `inner_join()`, `filter()`, `summarise()`, and the pipe operator (%>%)
library("ggplot2") # for `fortify()` and for plotting
library("sp") # for `point.in.polygon()` and `spDists()`
library("tidyr") # for `gather()`
library("readr") # for `write_tsv()`
library("mapproj") # for ggplot:coord_map()

Provide the function fortify.shape(), which puts the shapefile data in the object class data.frame, so that it can be used by ggplot2. (These steps follow Hadley Wickham’s post “Plotting polygon shapefiles”).

fortify.shape <- function(x){
  x@data$id <- rownames(x@data)
  x.f <- fortify(x, region = "id")
  x.join <- inner_join(x.f, x@data, by = "id")
}

Provide the function subset.shape(), which we will use to extract portions of the data (from the fortified data.frame object) for a smaller domain, since shapefiles often contain global data.

subset.shape <- function(x, domain){
  x.subset <- filter(x, long > domain[1] & 
                       long < domain[2] & 
                       lat > domain[3] & 
                       lat < domain[4])
  x.subset
}
# domain should be a vector of four values: c(xmin, xmax, ymin, ymax)

Note that the two functions fortify.shape() and subset.shape() could be provided inside a single function, such as the quick.subset() function in this example by Simon Goring (which, by the way, beautifully illustrates the use of various layers available from Natural Earth). While this conserves memory by not retaining the intermediate fortified object (which can be quite large), in our application we sometimes require fortifying but not subsetting a shapefile, so we provide the two functions separately.

Background on the EEZ

First some background on the EEZ. Management and conservation strategies dealing with a nation’s marine resources rely on relevant data from its jurisdictional waters. Through the United Nations Convention on the Law of the Sea (UNCLOS), countries have established their maritime limits and boundaries. The largest jurisdictional boundary is the Exclusive Economic Zone (EEZ), which extends 200 nautical miles from the Territorial Sea baseline, and includes the maritime boundaries with adjacent/opposite countries. Shoreward of this baseline, the Territorial Sea extends for 12 nautical miles.

For applications at the national level, scientists and managers are regularly required to assess marine resources within a country’s jurisdictional waters, i.e., where policy decisions can have an effect. For practical purposes, we’re interested in obtaining observations occurring in both the Territorial Sea and the adjacent EEZ, and we refer to them as “occurring within the EEZ” (also known as the “internal waters”). Because marine wildlife (say whales, which happen to be the taxa I study) regularly move for large distances across these boundaries, the first task for the analyst is to extract only those observations (collected by surveys or from satellite tracking) occurring within the EEZ.

Plotting the coastline and some animal observations

To exemplify the process we will focus on the EEZ jurisdictional boundary of the USA in the Pacific Ocean. But first things first: in order to get situated, we start by drawing the coastline and producing some observations.

There are many sources of data for the coastline, including data sets built into R packages maps and mapdata (for examples see this RPub presentation). However, since this post is about shapefiles we will use the Natural Earth 10-m-resolution coastline shapefile:

Specify the local directory and name of the Natural Earth shapefile (previously downloaded) and read its contents (global coastline data):

path.ne.coast <- ("/Users/palaciod/Documents/DMP/dmp_data/Natural Earth/ne_10m_coastline")
fnam.ne.coast <- "ne_10m_coastline.shp"
dat.coast <- readOGR(dsn = path.ne.coast, 
                     layer = file_path_sans_ext(fnam.ne.coast))
# A Large SpatialLinesDataFrame object with 4132 features and 2 fields (12.8 Mb)

Next, let’s fortify the global data and then extract the western coast of the USA along the Pacific Ocean for easier handling:

# Fortify the shapefile data using `fortify.shape()`:
dat.coast <- fortify.shape(dat.coast) # a 410951x8 dataframe

# Specify the desired domain (the West Coast of the USA):
domain <- c(-128, -115, 29, 50)

# Extract the coastline data for the desired domain using `subset.shape()`:
dat.coast.wc <- subset.shape(dat.coast, domain) # a 4871x8 dataframe

Produce a map of this coastline:

# Specify the spatial extent for our map (i.e., our study area; notice that its
# dimensions are different from the domain for which we extracted the
# coastline):
xlims <- c(-135, -116)
ylims <- c(29.5, 49.5)

# Generate a base map with the coastline:
p0 <- ggplot() + 
  geom_path(data = dat.coast.wc, aes(x = long, y = lat, group = group), 
            color = "black", size = 0.25) + 
  coord_map(projection = "mercator") + 
  scale_x_continuous(limits = xlims, expand = c(0, 0)) + 
  scale_y_continuous(limits = ylims, expand = c(0, 0)) + 
  labs(list(title = "", x = "Longitude", y = "Latitude"))
p0

Next, we simulate 500 spatial data points representing whale observations at sea. We pick a spot on the map as the center around which observations will be generated in a 10-degree radius (taking care not to generate observations on land). We also simulate the number of animals associated with each observation as a discrete variable with a Poisson distribution that is a function of the distance (using function spDists() in the sp package) between adjacent observations, suggesting a spatial aggregation behavior.

# Simulate 500 whale observations:
set.seed(250)
coord.ctr <- c(-124.25, 40)
long <- runif(n = 500, min = coord.ctr[1]-10, max = coord.ctr[1])
lat <- runif(n = 500, min = coord.ctr[2]-10, max = coord.ctr[2]+10)

# Compute all pair-wise distances between observations:
dist.obs <- spDists(cbind(long, lat), longlat = TRUE) # a matrix, in km
diag(dist.obs) <- NA # convert 0s to NAs along the diagonal

# For each observation obtain the distance to the nearest observation:
obs.dist.near <- as.data.frame(dist.obs) %>% 
  gather(key = obs_pair, value = pair_dist) %>% 
  group_by(obs_pair) %>% 
  summarise(near_dist = min(pair_dist, na.rm = TRUE))
obs.dist.near <- transform(obs.dist.near, obs_pair = as.factor(obs_pair))
#range(obs.dist.near$near_dist) # 1.668178 132.716169 km

# Simulate group size for each observation:
set.seed(123)
n.whales <- rpois(500, lambda = sqrt(obs.dist.near$near_dist)/4)+1
#range(n.whales) # 1-8

# Combine all vectors into a data frame:
sim.obs <- data.frame(long = long, lat = lat, n.whales = n.whales) # 500x3

Note that the scaling of the lambda parameter above (square-root of the distance divided by 4) is simply to achieve values like those observed in nature. Adding 1 avoids zeros, as all observations should be of at least one animal (the final range is 1-8 in this particular case).

Let’s examine the distribution of n.whales. Note that the peaks reflect the discrete nature of this variable.

ggplot(data = sim.obs, aes(x = n.whales, ..count..)) + 
  geom_density(adjust = 1, colour = "blue", fill = "blue", alpha = 0.25)

Let’s also map the observations, color-coded by group size.

p1 <- p0 + 
  geom_point(data = sim.obs, 
             aes(x = long, y = lat, colour = as.factor(n.whales)), 
             shape = 16, size = 1) + 
  scale_colour_brewer(palette = "YlOrBr", 
                      guide_legend(title = "Number \nof whales"))
p1

Reading NOAA’s USA EEZ shapefile

For the USA, the maritime boundaries are provided as a shapefile by NOAA’s Office of Coast Survey.

Specify the local directory and name of the NOAA shapefile (previously downloaded) and read the USA EEZ data:

path.eez.usa <- ("/Users/palaciod/Documents/DMP/dmp_data/World_EEZ/USMaritimeLimitsAndBoundariesSHP")
fnam.eez.usa <- "USMaritimeLimitsNBoundaries.shp"
eez.usa <- readOGR(dsn = path.eez.usa, layer = file_path_sans_ext(fnam.eez.usa))
# eez.usa has 259 features and 16 fields
# A Large SpatialLinesDataFrame object with 259 features and 16 fields (3.3 Mb)

# Fortify the shapefile data using `fortify.shape()`:
dat.eez.usa1 <- fortify.shape(eez.usa) # a 180400x22 dataframe

This file contains boundaries for the USA and its territories in several oceans including the Atlantic, Pacific, and around Alaska. Maintaining our focus, let’s visualize the file contents for the Pacific Ocean. We use the field REGION to subset the area of interest.

p.eez.noaa <- p0 +  
  geom_path(data = filter(dat.eez.usa1, REGION == "Pacific Coast"), 
            aes(x = long, y = lat, group = group), 
            colour = "red", size = 0.75)
p.eez.noaa

As promised in the information on NOAA’s Office of Coast Survey webpage, this shapefile contains maritime boundaries for the EEZ, the Contiguous Zone, and the Territorial Sea, in this case for the Pacific Coast region. Let’s separate them by color for better visualization by also specifying the fields TS, CZ, and EEZ.

p.eez.noaa2 <- p0 + 
  geom_path(data = filter(dat.eez.usa1, REGION == "Pacific Coast" & TS == 1), 
            aes(x = long, y = lat, group = group), 
            colour = "tomato2", size = 0.75) + 
  geom_path(data = filter(dat.eez.usa1, REGION == "Pacific Coast" & CZ == 1), 
            aes(x = long, y = lat, group = group), 
            colour = "purple2", size = 0.75) + 
  geom_path(data = filter(dat.eez.usa1, REGION == "Pacific Coast" & EEZ == 1), 
            aes(x = long, y = lat, group = group), 
            colour = "royalblue2", size = 0.75)
p.eez.noaa2

While technically these boundaries are correct, two issues arise that affect our ability to use these data:

Ideally, we want a closed polygon that extends from the EEZ to the coast and that includes the coastline (i.e., the “internal waters”). This could be achieved by brute force after much manipulation of the EEZ and coastline data sets, but fortunately there is a better option …

Reading the global EEZ shapefile from Marineregions.org

As an alternative to NOAA’s shapefile, the EEZ for countries worldwide can be obtained in a single shapefile from the Maritime Boundaries section of the Marineregions.org website.

(September 7, 2017 update: my original post used version 8 of the EEZ file. Marineregions.org has since produced version 9, which was completely recalculated and required some revisions to my code. The code for reading v. 8 is left commented below and new lines for v. 9 are added).

Note that low- and high-resolution options of the boundaries are available. For this application the low-resolution version is quite sufficient, especially because the difference in file size is quite substantial (12.4 MB vs. 164.7 MB). (Also note also that v. 9 has options for coordinates in the 180- and 360-degree systems, which can come in handy for some applications).

Specify the local directory and name of the Marineregions.org shapefile (previously downloaded) and read the global EEZ data:

#path.eez.world.v8 <- ("/Users/danielpalacios/Documents/DMP/dmp_data/World_EEZ/World_EEZ_v8_20140228_LR")
path.eez.world.v9 <- ("/Users/palaciod/Documents/DMP/dmp_data/World_EEZ/World_EEZ_v9_20161021_LR")
#fnam.eez.world.v8 <- "World_EEZ_v8_2014.shp"
fnam.eez.world.v9 <- "eez_lr.shp"
#eez.world.v8 <- readOGR(dsn = path.eez.world.v8, 
#                        layer = file_path_sans_ext(fnam.eez.world.v8))
eez.world.v9 <- readOGR(dsn = path.eez.world.v9, 
                        layer = file_path_sans_ext(fnam.eez.world.v9))
# A Large SpatialLinesDataFrame object with 281 features and 23 fields (18.9 Mb)

# Extract the EEZ for the USA:
#dat.eez.usa2 <- eez.world.v8[eez.world.v8@data$Country == "United States", ]
# For v. 9 use $Territory1 instead of $Country:
dat.eez.usa2 <- eez.world.v9[eez.world.v9@data$Territory1 == "United States", ]
# A Formal class Large SpatialPolygonsDataFrame

# Fortify the shapefile data:
#dat.eez.usa2 <- fortify(dat.eez.usa2)
# `fortify.shape()` did not work for v. 8 so we had to use `fortify()`.
# message: Regions defined for each Polygons
dat.eez.usa2 <- fortify.shape(dat.eez.usa2) # a 10298x30 dataframe

Some snooping around with str(dat.eez.usa2) indicates that dat.eez.usa2$piece == 2 indexes the polygons in the Pacific Ocean of the USA east of the Dateline, while dat.eez.usa2$PolygonID == 221 indexes just the West Coast (the polygon we’re interested in). (Also, note that dat.eez.usa2$hole == TRUE indexes islands/smaller polygons, so this variable might offer an additional criterion to include/exclude small polygons internal to the main polygons of interest).

p.eez.vliz <- p0 + 
  geom_path(data = filter(dat.eez.usa2, piece == 2 & PolygonID == 221), 
            aes(x = long, y = lat, group = group), 
            colour = "blue", size = 0.75)
p.eez.vliz

This looks exactly like what we want. We’re in business!

Extracting simulated whale observations inside the EEZ

Let’s extract the Pacific EEZ polygon into a dataframe for further use:

dat.eez.pac <- droplevels(filter(dat.eez.usa2, piece == 2 & PolygonID == 221)) # 2031x30

And export this polygon (long, lat vectors) as a tab-separated text file so that we don’t have to read and extract it every time we want to use it:

dat.eez.exp <- dat.eez.pac %>% select(long, lat) # 2031x2
path.eez.exp <- ("/Users/danielpalacios/Documents/DMP/dmp_data/World_EEZ/")
fnam.eez.exp <- "eez_usa_pac_lr_v9.txt"
write_tsv(dat.eez.exp, path = paste(path.eez.exp, fnam.eez.exp, sep = "/"))

[Unlike write.table() or write.csv(), write_tsv() in the readr package writes the data frame to a delimited text file without row names].

Use function point.in.polygon() in the sp package to determine if a simulated whale observation is inside the EEZ (0 is outside, 1 is inside), and extract those observations in a separate dataframe:

inside.eez <- point.in.polygon(sim.obs$long, sim.obs$lat, 
                           dat.eez.pac$long, dat.eez.pac$lat) # 500x1

# Add a column to sim.obs with this information:
sim.obs$eez <- inside.eez # 500x4

# Extract the observations in sim.obs that occur inside the EEZ:
sim.obs.eez <- sim.obs %>% 
  filter(eez == 1)  %>% 
  select(-eez) # 126x3

In this case 126 observations occurred inside the EEZ. Plot them to verify the process worked:

p2 <- p0 + 
  geom_point(data = sim.obs, aes(x = long, y = lat), 
             colour = "gray75", shape = 16, size = 1) + 
  geom_point(data = sim.obs.eez, 
             aes(x = long, y = lat, colour = as.factor(n.whales)), 
             shape = 16, size = 1) + 
  scale_colour_brewer(palette = "YlOrBr", 
                      guide_legend(title = "Number \nof whales")) + 
  geom_path(data = dat.eez.pac, aes(x = long, y = lat, group = group), 
            colour = "blue", size = 0.75)
p2

Looks good! Here’s where the fun begins for the analyst, who might be interested in finding out if there are spatial patterns in these data. For example, let’s see if there are “hotspots” of aggregation using 2-D kernel density estimation.

p3 <- p0 + 
  stat_density_2d(data = sim.obs.eez, geom = "polygon", 
                  aes(x = long, y = lat, fill = ..level..)) + 
  scale_fill_distiller(palette = "Spectral", 
                       guide_legend(title = "Probability \ndensity")) + 
  geom_point(data = sim.obs, aes(x = long, y = lat), 
             colour = "gray75", shape = 16, size = 1) + 
  geom_point(data = sim.obs.eez, aes(x = long, y = lat), 
             colour = "black", shape = 16, size = 1) + 
  geom_path(data = dat.eez.pac, aes(x = long, y = lat, group = group), 
            colour = "blue", size = 0.75) + 
  labs(list(title = "Simulated observations in \nthe Pacific EEZ of the USA", 
            x = "Longitude", y = "Latitude"))
p3

Obviously, since the simulated data were generated at random we don’t see strong patterns like we might observe in nature (the north-south anisotropy is an effect of the data having been generated with a restriction in longitude to avoid observations on land). Observed spatial patterns from such visualization with real data might be very useful in delineating zones of biological importance where special measures for management and conservation might be targeted, for instance in the designation of Critical Habitat.

We might also want to look a little closer at the spatial pattern we built into group size (n.whales), which could be achieved by generating an interpolated surface for this variable over the relevant longitude-latitude domain, for example using techniques like loess, GAM, inverse distance weighting, or kriging. The fun is almost endless, but we’ll leave it at this for now.

References

Flanders Marine Institute (2016). Maritime Boundaries Geodatabase: Maritime Boundaries and Exclusive Economic Zones (200NM), version 9. Available online at http://www.marineregions.org/ http://dx.doi.org/10.14284/242.

Office of Coast Survey (2013). Maritime Limits and Boundaries of United States of America, edition 4.2. Department of Commerce, National Oceanic and Atmospheric Administration, National Ocean Service, Office of Coast Survey (OCS). Silver Spring, Maryland. Available online at http://www.nauticalcharts.noaa.gov/csdl/mbound.htm