This is part of a series of introductory tutorials to work with geo-spatial data in R. I prepared these tutorials as a intuitive “hands on” introduction, but I provide links for those interested in more background and theory.
For this tutorial we will import and read spatial objects using some libraries in R. We will explore three packages that allow us to read spatial data in vector and raster format, and give examples of operations, analysis and visualisation of such data.
This tutorial consists entirely of commented code, and does not have exercises or questions. You can read the sections here and try the chunks of code in your own Rstudio session.
The focus of this tutorial will be on three packages for download of spatial data:
library(wdpar)
## Loading required package: sf
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(elevatr)
## elevatr v0.99.0 NOTE: Version 0.99.0 of 'elevatr' uses 'sf' and 'terra'. Use
## of the 'sp', 'raster', and underlying 'rgdal' packages by 'elevatr' is being
## deprecated; however, get_elev_raster continues to return a RasterLayer. This
## will be dropped in future versions, so please plan accordingly.
library(chirps)
Note that the package wdpar
requires the PhantomJS
executable, you might need to install this with:
install.packages("wdpar")
webdriver::install_phantomjs()
We will also be using additional functions for spatial operations and visualisation from the following packages:
library(sf)
library(terra)
## terra 1.7.55
library(leaflet)
Throughout this tutorial we will be using the pipe operator,
%>%
. You can use the pipe to rewrite multiple operations
in a way that you can read left-to-right, top-to-bottom. We’ll use
piping frequently because it considerably improves the readability of
code. The pipe is a defining feature of the tidyverse, so we will load
this set of packages as well.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks terra::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The tidyverse package also lubridate
that has functions
for handling dates, and ggplot2
, which allow us to create
plots in a similar modular fashion, but using the +
operator instead of the pipe.
We will focus on Bhutan. According to wikipedia, Bhutan is a landlocked South Asian country situated in the Eastern Himalayas, between China in the north and India in the south.
A mountainous country, Bhutan is known locally as “Druk Yul” or “Land of the Thunder Dragon”. The subalpine Himalayan mountains in the north rise from the country’s lush subtropical plains in the south. In the Bhutanese Himalayas, there are peaks higher than 7,000 metres (23,000 ft) above sea level.
Bhutan’s climate is as varied as its altitudes and is affected by monsoons. Western Bhutan is particularly affected by monsoons that bring between 60 and 90 per cent of the region’s rainfall.
Protected Planet provides the most comprehensive data for conservation areas worldwide. Specifically, it provides the World Database on Protected Areas (WDPA) and the World Database on Other Effective Area-Based Conservation Measures (WDOECM). We can take a look at the protected areas of Bhutan.
The wdpar
package provides an interface to data available on these datasets.
It also provides methods for cleaning data from these databases
following best practices.
We use the function wdpa_fetch
to download protected
area data for Bhutan from Protected Planet. We can achieve this by
specifying Bhutan’s country name or its ISO3 code (i.e. “BTN”).
I preloaded this query for the tutorial, but if you want to replicate these steps in an R session, you have two options:
This is the default behaviour, good enough for a test:
wdpa_query <- wdpa_fetch("BTN")
If you download the data into a persistent directory R won’t have to re-download the same dataset every time we restart our R session, and R can simply re-load previously downloaded datasets as needed.
First locate a local data folder for storing the data between
sessions. For example, using package here
:
here::i_am("workshop/tutorial-5/tutorial-5-read-raster-data.Rmd")
## here() starts at /Users/z3529065/proyectos/codeRs/geospatial-data-in-R
local_data_dir <- here::here("data")
Now, check if the data folder exists, and apply the
wdpa_query
function with an extra argument:
# Check if data folder exists
if (dir.exists(local_data_dir)) {
wdpa_query <- wdpa_fetch("BTN", download_dir = local_data_dir)
}
We can display the multiple columns of information returned by the database:
wdpa_query
## Simple feature collection with 22 features and 28 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 88.87734 ymin: 26.70286 xmax: 92.12475 ymax: 28.24768
## Geodetic CRS: WGS 84
## # A tibble: 22 × 29
## WDPAID WDPA_PID PA_DEF NAME ORIG_NAME DESIG DESIG_ENG DESIG_TYPE IUCN_CAT
## * <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 555555576 5555555… 1 Bumd… Bumdeling Rams… Ramsar S… Internati… Not Rep…
## 2 5061 5061 1 Jigm… Jigme Do… Nati… Jigme Do… National II
## 3 5066 5066 1 Jigm… Jigme Si… Nati… Jigme Si… National II
## 4 5067 5067 1 Jomo… Jomotsan… Wild… Jomotsan… National IV
## 5 7996 7996 1 Roya… Royal Ma… Nati… Royal Ma… National II
## 6 20445 20445 1 Phru… Phrumsen… Nati… Phrumsen… National II
## 7 64414 64414 1 Jigm… Jigme Kh… Stri… Jigme Kh… National Ia
## 8 64415 64415 1 Sakt… Sakteng … Wild… Sakteng … National IV
## 9 555555575 5555555… 1 Khot… Khotokha Rams… Ramsar S… Internati… Not Rep…
## 10 64417 64417 1 Phib… Phibsoo … Wild… Phibsoo … National IV
## # ℹ 12 more rows
## # ℹ 20 more variables: INT_CRIT <chr>, MARINE <chr>, REP_M_AREA <dbl>,
## # REP_AREA <dbl>, NO_TAKE <chr>, NO_TK_AREA <dbl>, STATUS <chr>,
## # STATUS_YR <int>, GOV_TYPE <chr>, OWN_TYPE <chr>, MANG_AUTH <chr>,
## # MANG_PLAN <chr>, VERIF <chr>, METADATAID <int>, SUB_LOC <chr>,
## # PARENT_ISO <chr>, ISO3 <chr>, SUPP_INFO <chr>, CONS_OBJ <chr>,
## # geometry <GEOMETRY [°]>
The data is in the simple feature format, and we can apply functions
from package sf
to this dataset. For example calculate the
area of the spatial polygons:
st_area(wdpa_query)
## Units: [m^2]
## [1] 0 4380694698 1732881847 362780280 1059071253 907923436
## [7] 785350528 743018377 0 287333358 1535923663 4921758698
## [13] 255925613 292240882 408386824 595556490 206088757 232976328
## [19] 420192741 559502046 9813200 91342995
We will clean the data set using the function
wdpa_clean()
. This function triggers a workflow for
cleaning data from the WDPA database following best practices (outlined
in Runge et al. (2015) and Butchart et al. (2015)). For more information on
the data cleaning procedures applied here, see the help page of that
function.
Bhutan_PAs <- wdpa_clean(wdpa_query)
We can use functions from package leaflet
to create an
interactive preview of our data. Notice that we transform the data to a
rectangular coordinate reference system when we feed this into the
leaflet
function:
leaflet(Bhutan_PAs %>% st_transform("EPSG:4326")) %>%
addTiles() %>%
addPolygons(color = "green", weight = 2, label = Bhutan_PAs$ORIG_NAME)
Alternatively you could use functions from packages
mapview
or tmap
for the same purpose. See
other tutorials in this workshop for more information on these
packages.
Elevation data is used for a wide array of applications, including,
for example, visualization, hydrology, and ecological modelling. A
variety of APIs now exist that provide programmatic access to elevation
data. The elevatr
package was written to standarize access to elevation data from web
APIs.
There are several sources for digital elevation models such as the
Shuttle Radar Topography Mission (SRTM), the USGS National Elevation
Dataset (NED), Global DEM (GDEM), and others. The Terrain
Tiles on Amazon Web Services provides access to a synthesis
elevation product (created by the defunct Mapzen project) and can be
downloaded using the function get_elev_raster()
.
Here we select one protected area from our dataset:
JSW_NP <- Bhutan_PAs %>% filter(ORIG_NAME %in% "Jigme Singye Wangchuck National Park")
We provide this simple feature object to the function
get_elev_raster
and choose an intermediate zoom level
(z = 9
) that is a trade
off between resolution and time for download.
elev_data <- get_elev_raster(JSW_NP, z = 9)
## Mosaicing & Projecting
## Note: Elevation units are in meters.
Now we apply the function rast
from package
terra
to create a raster layer object:
JSW_NP_DEM <- rast(elev_data)
We can plot the raster data and show the protected area boundary on top:
plot(JSW_NP_DEM)
lines(JSW_NP, lwd=3, alpha=.6)
Or we can generate contour lines from this elevation surface.
plot(JSW_NP_DEM)
contour(JSW_NP_DEM,add=T)
The chirps package provides functionalities for reproducible analysis using the CHIRPS data (Funk et al. 2015). CHIRPS is a daily precipitation data set developed by the Climate Hazards Group for high resolution precipitation gridded data. Spanning 50°S - 50°N (and all longitudes) and ranging from 1981 to near-present (normally with a 45 day lag), CHIRPS incorporates 0.05 arc-degree resolution satellite imagery, and in-situ station data to create gridded precipitation time series for trend analysis and seasonal drought monitoring.
We take four random points within Bhutan protected areas to get the precipitation data for a full year.
First we will use the function st_transform
to reverse
the projection of the spatial object created before. Then we will use
the st_sample
function to select four random points within
the polygons of the protected areas.
set.seed(1227834)
Bhutan_PAs_latlong <- st_transform(Bhutan_PAs,"EPSG:4326")
JSW_NP_sample <- st_sample(Bhutan_PAs_latlong, 4) %>%
st_as_sf()
Let’s look at the location of these points:
plot(st_geometry(Bhutan_PAs_latlong))
plot(JSW_NP_sample, col=1:4, cex=2, pch=19, add=TRUE)
legend("left", fill=1:4,legend=1:4)
For this example we fetch the data from January to December 2020
using get_chirps()
. We use the server “ClimateSERV” using
the argument server = "ClimateSERV"
. This option is
recommended when working with few data points as the request could be
faster.
qry_dates <- c("2020-01-01","2020-12-31")
precip_data <- get_chirps(JSW_NP_sample,
qry_dates,
server = "ClimateSERV")
## Fetching data from ClimateSERV
## Getting your request...
We summarise the data by months using functions from libraries
lubridate
and dplyr
and use
ggplot
to create a plot of monthly precipitations.
precip_data %>%
mutate(month = factor(month(date))) %>%
group_by(id,month) %>%
summarise(total=sum(chirps), .groups="drop") %>%
ggplot() +
geom_col(aes(x=month, y = total)) +
facet_wrap(~id)
Points 1 and 2 are drier, while point locations 3 and 4 receive much more precipitation but have different seasonal patterns.
We can download raster data by specifying a polygon or bounding box.
We will first transform our protected area layer into a spatial vector
object using the function vect
from the terra
package (this is a bit annoying, but package terra
uses
different functions as sf
):
Bhutan_vect <- vect(Bhutan_PAs_latlong)
We will download data for all days in the month of July of 2020. The
default server = "CHC"
is used for this query, as it is
more efficient for multiple data points and dates.
qry_dates <- c("2020-07-01","2020-07-30")
precip_map <- get_chirps(Bhutan_vect,
qry_dates,
server = "CHC",
as.raster = TRUE)
## Fetching data as GeoTIFF files from CHC server
## Getting CHIRPS in a .05 deg resolution
The downloaded data includes one raster layer for each day of the month of July of 2020, we can view a couple of dates using the index of the layers:
plot(precip_map,c(1,10,20,30))
Or we can summarise the month data applying the sum
function to the raster layer object:
plot(sum(precip_map))
lines(Bhutan_vect)
points(JSW_NP_sample, col=1:4, cex=2)
text(Bhutan_vect,"ORIG_NAME",cex=0.5)
legend("top", fill=1:4,legend=1:4, xpd=NA, ncol=4)
I hope this has been useful. After this brief introduction you probably want to keep exploring other packages and data sources that can help you in your quest to discover the world!
Feel free to check the other tutorials in this workshop at the UNSW codeRs’ geospatial-data repository.