ANM302: Spatial analysis using RStudio: terra, sf and tmap

Author

Kylie Scales

Visualising and manipulating environmental data using the packages terra, ggplot2 and sf

Computer Lab 1: Importing, plotting, cropping spatial data in terra

Creating a new project

One of the most powerful and useful aspects of RStudio is its project management functionality. This allows us to ensure all our R Code, Data, and R-generated figures and data are all stored in a single project folder in your directory.

To create a New Project in RStudio:

  • Click the File menu button, then New Project….
  • Click Existing Directory.
  • Browse… to the location where your saved folder ANM302_ComputerLab is stored.
  • Click on this folder, then Click Open. You want to read data (and write code, data and figures) to this ANM302_ComputerLab folder, not the in R or Data sub folders that lie within.
  • Click Create Project.

This step will allow you to keep your code for ANM302 together and easily accessible.

Now save and close any script files you have open, then quit RStudio.

Next, use your file browser to navigate to the ANM302_ComputerLab folder where your ANM302 data is stored. You should now see a file with the extension .Rproj. This is your ANM302 project.

Remembering that you have quit RStudio, you can now relaunch it by double-clicking your .Rproj file. This should be your “go-to” way of opening RStudio. It has one specific advantage beyond simply keeping track of your scripts: it automatically sets the working directory, saving us a bit of time and effort every week.

Installing R packages

If this is the first time you have run this R script, you will need to install the necessary R packages to your local R library. We’ve put together a handy R script that allows you to install these packages in a couple of lines of code. You can source this R function from your local file using the code below.

#### Install packages
source("ANM302_install_packages_2026.R")
#### Update packages
update.packages(ask = FALSE,
                lib = my_lib,
                repos = "https://cloud.r-project.org")

Note that you only install R packages once (well, until the next time you download or update R).

Loading R libraries

Once packages are installed, you can load them from your local library using the library() function.

#### Load packages
library(terra)
library(maps)
library(sf)
library(ggplot2)
library(tmap)
library(tmaptools)
library(ggthemes)
library(dplyr)

Importing spatial data as a spatRaster

The first step in manipulating environmental data in R is to import your dataset as a spatRaster. This is a gridded object, with a value for the variable of interest in each grid cell. A gridded object is essentially a matrix of data organised in rows and columns. This is the format of pixelated images, such as .jpg. In the spatial context, grids are extremely useful, because we can think of maps as being specified in terms of latitude (rows) and longitude (columns).

We know that koala can be sensitive to heat stress, so perhaps a good place to start would be to explore the maximum temperatures that our study region in Southeast Queensland experiences.

We can get an indication of maximum temperature using climatological variables from the WorldClim portal http://worldclim.org. These climatological variables represent average conditions over the period 1970-2000. We’ll use the new version 2.0 dataset, at 30 arc-second resolution. As our surveys are taking place in September, we’ll use the monthly layer with the ‘09’ suffix.

We’ll use the rast function to import the data directly into our current R environment as a spatRaster.

R knows this is an object with spatial characteristics.

tmax <- rast("Data/wc2.0_30s_tmax_09.tif")

We can look at the properties of the object, such as dimensions, resolution, extent and coordinate reference system by simply calling the object to the Console.

tmax
class       : SpatRaster 
size        : 21600, 43200, 1  (nrow, ncol, nlyr)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : wc2.0_30s_tmax_09.tif 
name        : wc2.0_30s_tmax_09 
min value   :             -57.9 
max value   :              44.5 

Initial visualisation

Let’s plot the data to check it. Note that because this is a BIG, global dataset, so it might take a moment or two to plot, especially on older machines.

@fig-tmax further explores the impact of temperature on ozone level.

plot(tmax)
Figure 1: Max. temp. (°C) in September

Looks good.

But, this global dataset is too large for our purposes. Let’s try restricting it to the extent of the Sunshine Coast and surrounds.

We can do that using the crop function from the terra package.

Cropping a raster to a smaller spatial extent

First, we need to define a spatial extent, using the ext function. For this, we need to know the vertices (corners) of our study area.

The function ext takes four arguments, in order: xmin, xmax, ymin, ymax. Remembering that we don’t have to name the arguments if we enter values in the same order as the arguments themselves, we can supply vertices using approximate latitude and longitude of the area we’re interested in.

studyarea <- ext(c(151, 154, -27, -24))

Or, alternatively, it’s possible to take a spatial extent from an existing raster or spatial object (assuming we have one, which we don’t, but we can use studyarea, which we just created to illustrate the principle).

studyarea2 <- ext(studyarea)

Now we can use this extent object to crop the original global spatRaster, tmax, to the area we’ve specified.

The new extent is shown in the object summary.

tmaxSC <- crop(tmax, studyarea)
tmaxSC
class       : SpatRaster 
size        : 360, 360, 1  (nrow, ncol, nlyr)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : 151, 154, -27, -24  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : wc2.0_30s_tmax_09 
name        : wc2.0_30s_tmax_09 
min value   :              17.5 
max value   :              25.9 

Let’s run a quick plot to check, too…

plot(tmaxSC)

Great! Looks familiar! So we now have the data for our study region.

Challenge:
Now try crop the global raster to a different part of the globe using the ext() function we introduced in this section. (e.g. Using Google Earth we can see Scotland lies between longitudes -8 and 0, and between latitudes 54 and 61).

Computer Lab 2: Reprojecting and plotting spatial data in terra

Currently, our data are projected in a latitude-longitude Coordinate Reference System (CRS). However, this will distort the data, because the Earth is a sphere and we are plotting on a 2-dimensional plane. To eliminate associated artefacts, we’re going to want to make sure our data are projected appropriately.

Let’s reproject the data from a latitude-longitude format to a Universal Transverse Mercator (UTM) projection, which allows us to measure things in metres.

The UTM system divides the Earth’s surface into 60 zones, each 6° of longitude in width. Each has a corresponding alphanumeric code. You can see all of the UTM zones here: http://www.dmap.co.uk/utmworld.pdf. The region of Southeast Queensland that we are interested in is in UTM Zone 56J.

We need to tell R how to reproject our dataset to this UTM zone, using a standardised character string. We can get the relevant string for our region from https://epsg.io.

epsg.io doesn’t need the ‘J’ from ‘56J’, so we can drop that. It does, however, need to know that we are in the Southern Hemisphere, so we add an ‘S’ to make ‘UTM Zone 56S’ and use that as our SEARCH term on the home page. That gives us three options. We need to select the ‘WGS84’ option. This is the datum that is used in most GPS technologies. If we look at the EPSG number next to the reference to “WGS 84 / UTM zone 56S”, it gives us the EPSG numeric code assigned to this projection: “32756”.

We’ll assign it to an object, here called UTM56S.

UTM56S <- crs("EPSG:32756")

Now we’ll use this to reproject our cropped spatRaster, tmaxSC. We’ll use the built-in project function.

tmaxSC.UTM <- project(tmaxSC,UTM56S)
tmaxSC.UTM
class       : SpatRaster 
size        : 379, 346, 1  (nrow, ncol, nlyr)
resolution  : 881.2017, 881.2017  (x, y)
extent      : 296550, 601445.8, 7011798, 7345773  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
name        : wc2.0_30s_tmax_09 
min value   :          17.70946 
max value   :          25.90000 

Success! You can see that the coord.ref description has changed to the new projection, as has the extent.

Just while we’re at it, let’s take the CRS of the original spatRaster; we’ll likely need it later:

lonlat <- crs(tmaxSC) 

This is the equivalent to specifying the EPSG code of the original spatRaster:

lonlat <- crs("EPSG:4326")

Changing colour scales

R has a number of built-in colour scales. There is a good guide to colours in R here:https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf

We could use the `heat.colors` scale to replot the data, using R’s basic plot function for a quick look at the data. The rev function reverses the colour scale, so that warmer colours are in warmer shades.

plot(tmaxSC.UTM, col = rev(heat.colors(255)))

We can also change the range of values defining the colour scale, using the range argument to plot.

plot(tmaxSC.UTM, col = rev(heat.colors(255)), range = c(15,27))

Challenge:
Use EPSG code would you use for Scotland? Hint: enter the appropriate UTM Zone into https://epsg.io.

Changing the spatial resolution of a spatRaster

You may wish to change the spatial resolution of your spatRaster. This means changing the size of the grid cells. You can do this using the resample function.

First, let’s get the resolution of the original layer (this is in meters, as x and then y):

res(tmaxSC.UTM)

Then, we need to define a new raster grid with appropriate extent, and projection. We can redefine the resolution using a new object, res.new

Let’s go for 1km x 1km.

res.new <- c(1000,1000)
newgrid <- rast(ext = ext(tmaxSC.UTM), crs = crs(tmaxSC.UTM), res = res.new)
tmaxSC.1km <- resample(tmaxSC.UTM, newgrid)
tmaxSC.1km
class       : SpatRaster 
size        : 334, 305, 1  (nrow, ncol, nlyr)
resolution  : 1000, 1000  (x, y)
extent      : 296550, 601550, 7011798, 7345798  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
name        : wc2.0_30s_tmax_09 
min value   :          17.83633 
max value   :          25.89612 
plot(tmaxSC.1km, col = rev(heat.colors(255)), range = c(15,27))

We could also use aggregate to make the spatial resolution of an existing spatRaster coarser. Let’s aggregate our 1-km resolution spatRaster to make a 5-km resolution spatRaster.

tmaxSC.5km <- aggregate(tmaxSC.1km, fact = 5, fun=mean, na.rm = TRUE)
tmaxSC.5km
class       : SpatRaster 
size        : 67, 61, 1  (nrow, ncol, nlyr)
resolution  : 5000, 5000  (x, y)
extent      : 296550, 601550, 7010798, 7345798  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
name        : wc2.0_30s_tmax_09 
min value   :          18.66823 
max value   :          25.82130 
plot(tmaxSC.5km, col = rev(heat.colors(255)), range = c(15,27))

Challenge:
Make a 20-km resolution spatRaster of our study area in south east Queensland, and do so using the maximum values contained in the underlying 1km cells.

Computer Lab 3: Creating maps using ggplot and tmap

First, let’s try plotting our spatial data as a map using ggplot...

We’ve used base plotting routines, but we’re going to get more fancy, so we need to start playing in ggplot. Unfortunately, we need to convert a spatRaster into a data.frame before we can use ggplot to make a map. However, this isn’t difficult:

# Returns xy coordinates for each cell with the corresponding values in those cells
dgrid <- as.data.frame(tmaxSC.1km, xy = TRUE) 
names(dgrid)[3] <- "vals" # Rename the values to something less unwieldy

Now, we can use ggplot:

ggplot() +
    geom_tile(data = dgrid,
                        aes(x = x, y = y, fill = vals))

Not perfect, but interesting. Let’s save this plot to an object, then modify it:

p1 <- ggplot() +
            geom_tile(data = dgrid,
                                aes(x = x, y = y, fill = vals))

OK, so we now have an object called p1, which is the basic plot, which we can now modify!

Let’s start by modifying the colour. Previously, we have used heat.colors, but we might want to use a red-yellow-blue diverging colour palette. But where can we find out what sorts of palette’s are available, you might ask? Well, here is the most useful place I know of: https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3.

The default page identifies a blue-green (‘BuGn’) sequential palette. Let’s see if we can find a red-yellow-blue diverging palette. Yes, it’s called ‘RdYlBu’. Here’s how we use it:

p2 <- p1 +
        scale_fill_distiller(type = "seq", palette = "RdYlBu")
p2

This seems to have warm areas mapped in red, cool areas in blue, and intermediate temperatures mapped in lighter shades. We could reverse this pattern using the argument direction = -1, but we don’t need to do so here. But can we get rid of that dark grey sea?

p2 <- p1 +
  scale_fill_distiller(
    type = "seq",
    palette = "RdYlBu",
    na.value = "grey99",
    # grey99 is pretty close to white, but not quite...
    name = "Temperature (°C)"
  ) # A name for the legend
p2

Another thing that appears problematic is the aspect ratio: the map seems “stretched out” horizontally. Since our base spatRaster has a resolution of 1 km x 1 km, we could use an equal-coordinate projection on p2 to resolve this:

p3 <- p2 +
            coord_equal()
p3

Looking good…but perhaps we want to clear up some niggles, like the coordinates being huge numbers. We could fix this sort of thing manually, but we can also resort to a theme. There are plenty of neat themes in package ggthemes, but the one we want is theme_map:

p4 <- p3 +
  # Use a clean map theme
  theme_map() +
  # Place the legend near the top in an area of blank sea
  theme(legend.position = c(0.66, 0.615),
        # Make the the legend background transparent
        legend.background = element_blank())  
p4

That’s looking quite good now. Of course, we have built the plot stage by stage here (to save us repeating the code), but you could easily have done it in one go.

Adding contours

Perhaps we want to add isoterms at 1° intervals to the plot…we can do this in ggplot using the contour geom:

p5 <- p4 +
    geom_contour(data = dgrid,
                                aes(x = x, y = y, z = vals),
                             binwidth = 1, # Contours at 1° intervals
                             colour = grey(0.5), # Make them light grey
                             linewidth = .5, # Half the default line width
                             alpha = .5) # Slightly transparent
p5

Adding a scalebar and north arrow

The ggspatial package allows the functionality to add a scalebar and a north arrow to our ggplot2 objects. If you have installed the ggspatial package, you can add a scale bar using the annotation_scale() and annotation_north_arrow() functions

library(ggspatial)
p6 <- p5 + 
  ggspatial::annotation_scale(width_hint = 0.5,  #A scale bar 
                              location = "br") + # In the bottom-right 
  ggspatial::annotation_north_arrow(which_north = "true", # Point to actual north -
                                    location = "br", # In the bottom-right-hand corner of the plot
                                    # Make it small-ish 
                                    height = unit(.8, "cm"), width = unit(.5, "cm"), 
                                    # move it in from the frame edge by a cm in each direction 
                                    pad_x = unit(.3, "cm"), pad_y = unit(1, "cm"))  # Make the "N" half the size of its 10-point default  
p6
Using plotunit = 'm'
Warning in draw_panel(...): True north is not meaningful without coord_sf()

Save the map as a PDF

We should know how to do this from ANM203…

ggsave("Outputs/My first map.pdf", create.dir = TRUE, p6)
Saving 7 x 5 in image
Using plotunit = 'm'
Warning in draw_panel(...): True north is not meaningful without coord_sf()
Next, tmap
tm_shape(tmaxSC.1km) + # The baseline data; uses "+" to add complexity
  # Note that the legend scale runs low (top) to high (bottom), so reverse the colour sequence
  tm_raster(col.scale = tm_scale_continuous(values="yl_or_rd"),
            col.legend = tm_legend("Mean Sept. Temp. (°C)")) + # Legend title
  tm_pos_auto_in() + # Adds the legend
  tm_scalebar() + # Add a scale bar
    tm_compass(position = c("right", "top")) + # Add a compass arrow in the top right
  tm_graticules() # If you happen to want them

Challenge:
In your console, try switching to an interactive view by running this code: tmap_mode("view")…then recreate the map. Notice how you can now zoom in and out? If you search the internet a bit, you will find all sorts of cool background layers you can add like this. To return to the conventional map for printing, run tmap_mode("plot").

Note: You can explore colour palettes to pass to the tm_scale_continuous() function using cols4all::c4a_gui

Writing a spatRaster to a portable format

We’ve made some new objects for our project, which we now need to save outside the R environment. To do this, we can use the writeRaster function.

R has a native format for raster files, known as .grd. To save in .grd format,

writeRaster(tmaxSC.UTM, filename = "Outputs/tmaxSC_UTM.grd", overwrite= TRUE)

This works well enough, but the resultant files are not transferable to other platforms such as ArcGIS. An sensible alternative is to save as a different, more portable format, such as ASCII or GeoTIFF.

NB. Be sure to change the file extension, and don’t use full stops in the filename, or this might not work.

# An ASCII file readable by other GIS packages
writeRaster(tmaxSC, filename = "Outputs/tmaxSC.asc", filetype = "AAIGrid", overwrite = TRUE) 
# A georeferenced TIFF, or GeoTIFF
writeRaster(tmaxSC.UTM, filename = "Outputs/tmaxSC_UTM.tif", filetype = "GTiff", overwrite = TRUE) 

For a list of other possible formats, see ?writeRaster.

Computer Lab 4: Creating and using a RasterStack; Making a SpatialPoints object

Making a Raster Stack

Raster Stacks allow us to store several spatRasters of matching extent, resolution and projection in one multi-layer object.

Let’s make a Raster Stack of maximum temperature and minimum temperature over the study area.

First, we need to repeat the steps above to create a spatRaster of minimum temperature.

tmin <- rast("Data/wc2.0_30s_tmin_09.tif")
tminSC <- crop(tmin, studyarea)
tminSC.UTM <- project(tminSC, UTM56S)
plot(tminSC.UTM, col = rev(heat.colors(255)))

Now we can stack the two spatRaster objects using the c function.

tStack <- c(tmaxSC.UTM, tminSC.UTM)
tStack
class       : SpatRaster 
size        : 379, 346, 2  (nrow, ncol, nlyr)
resolution  : 881.2017, 881.2017  (x, y)
extent      : 296550, 601445.8, 7011798, 7345773  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
names       : wc2.0_30s_tmax_09, wc2.0_30s_tmin_09 
min values  :          17.70946,          6.837477 
max values  :          25.90000,         18.200001 

We can now see in the dimensions slot of tStack that there are two layers, and these are listed by names.

Basic raster algebra

One advantage of a raster stack is that we can do quick-and-dirty plots:

plot(tStack)

Results are not pretty, but are OK for an initial inspection.

Another advantage (because the values in each grid cell line up across spatRasters) is that we can manipulate layers simultaneously, and do fancy maths. Let’s illustrate this by using our RasterStack of maximum and minimum temperatures to calculate a new spatRaster of temperature difference.

To achieve this, we will subtract minimum temperature tmin from maximum temperature tmax. We can do this easily, just by subtracting our tmin spatRaster from the tmax spatRaster.

tvar.UTM <- tmaxSC.UTM - tminSC.UTM
tvar.UTM
class       : SpatRaster 
size        : 379, 346, 1  (nrow, ncol, nlyr)
resolution  : 881.2017, 881.2017  (x, y)
extent      : 296550, 601445.8, 7011798, 7345773  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
name        : wc2.0_30s_tmax_09 
min value   :           5.00000 
max value   :          14.98775 

As we can see, this yields a single spatRaster of the same resolution and extent as the RasterStack, and with each grid cell containing the difference between the minimum and maximum temperatures for that grid cell.

Performing calculations over a RasterStack using lapp

We could also use the function lapp, with a short user-defined function that we want to apply to the layers, using x and y. For more information on the range of ways in which one can apply this function, try typing the following into your R Console: ?lapp:

tvarSC.UTM <- lapp(tStack, fun = function(x, y) {x-y})
tvarSC.UTM
class       : SpatRaster 
size        : 379, 346, 1  (nrow, ncol, nlyr)
resolution  : 881.2017, 881.2017  (x, y)
extent      : 296550, 601445.8, 7011798, 7345773  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 56S (EPSG:32756) 
source(s)   : memory
name        :     lyr1 
min value   :  5.00000 
max value   : 14.98775 

Challenge:
Now try to use ggplot() or tmap() to create a more impactful figure using this new tvarSC.UTM object

Representing point locations as SpatialPoints

OK, so spatRasters are a great way to store, manipulate and plot data. But we often want to extract data from maps. This entails querying spatial points with in the mapped domain, and extracting resultant data.

Assuming we have a series of sample locations in the file called PointLocations.csv, let’s see what we can do:

pts <- read.csv("Data/PointLocations.csv") # Get the data
head(pts)
         x         y
1 152.8292 -25.43750
2 152.3125 -26.97083
3 151.5542 -24.60417
4 151.4875 -24.42917
5 151.9542 -26.10417
6 151.5125 -24.82083

So, we have a data frame of x (= longitude) and y (= latitude) coordinates. Before we can really use these in a mapping context (at least in a serious way), we need to convert them into a type of spatial object. We do this with the sf (Simple Features) package.

When we create new spatial points objects from an excel table, we need to assign our new points object a CRS (in this case WGS84).

# Convert pts from a data frame into an sf object, using x and y as coordinates
pts.sp <- st_as_sf(pts, coords = c("x", "y")) 

# Make sure that we assign the lon-lat CRS so that R knows how this object is projected
st_crs(pts.sp) <- crs(lonlat) 
pts.sp

Great. R tells us that we now have a 50 feaure ‘geometry’, 50 POINTS that are projected on the standard lon-lat scale (in WGS 84). Obviously, we couldn’t plot these directly onto our UTM-projected map, because the map and the points are on different projections, so we need to fix that before proceeding.

Reprojecting spatial data

Let’s reproject our lon-lat sf geometries into UTM space:

pts.UTM <- st_transform(pts.sp, UTM56S)
pts.UTM
Simple feature collection with 50 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 302702 ymin: 7016610 xmax: 496272.5 ymax: 7340466
Projected CRS: WGS 84 / UTM zone 56S
First 10 features:
                   geometry
1  POINT (482822.8 7186596)
2    POINT (431771 7016610)
3  POINT (353627.2 7278112)
4  POINT (346664.7 7297420)
5  POINT (395424.3 7112360)
6  POINT (349668.9 7254072)
7    POINT (473071 7040764)
8  POINT (312823.5 7117898)
9  POINT (467722.5 7193951)
10 POINT (307701.4 7069806)

As simple as that…although it takes a little practice to avoid making mistakes!

Plotting spatial objects over spatRasters objects

Now, we can simply add our spatial points to a plot of a projected spatRaster:

plot(tmaxSC.UTM, col = rev(heat.colors(255)))
plot(pts.UTM, add = T)

Or in ggplot:

p6 +
    geom_sf(data = pts.UTM)
Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.

Notice how R adjusts the projection of the plot to match the projection of the sf point geometries we added. We can now also see the power of writing a ggplot to an object: it saves us many lines of typing code!

Computer Lab 5: Extracting point data from a RasterStack

Sampling environmental data from point locations

Plotting the points is just one trick. A equally powerful trick is extracting data from the spatRaster cells corresponding to those points. For this, we use the extract function:

tmax.pts <- extract(tmaxSC.UTM, pts.UTM, ID = F) 
# ID = FALSE otherwise it creates a column with a number for each point

head(tmax.pts)
  wc2.0_30s_tmax_09
1          24.48788
2          24.28773
3          23.10054
4          24.61214
5          23.76772
6          22.72997

Plot the extracted values as a histogram - are they as we would expect?

# Plot a histogram
# select only the column with the extracted values (n. 1)
hist(tmax.pts[, 1], col="grey", main="Max temp (°C)") 

Sampling environmental data from point locations over a RasterStack

Extracting values from a single spatRaster is a useful trick, but it would be much more useful if we could extract values over an entire RasterStack. And it turns out that we can:

tStack.pts <- extract(tStack, pts.UTM, ID = F)

head(tStack.pts)
  wc2.0_30s_tmax_09 wc2.0_30s_tmin_09
1          24.48788          14.71119
2          24.28773          10.08944
3          23.10054          12.07985
4          24.61214          12.36176
5          23.76772          10.13163
6          22.72997          10.69059

As you can see, the names have been taken directly from the filenames used to build the RasterStack, so it is easy to keep track of which values go where. We can rename the variables in this object using names:

names(tStack.pts) <- c("tmax", "tmin")
head(tStack.pts)
      tmax     tmin
1 24.48788 14.71119
2 24.28773 10.08944
3 23.10054 12.07985
4 24.61214 12.36176
5 23.76772 10.13163
6 22.72997 10.69059

Combining extracted data and spatial locations into a dataframe

Of course, just having the values isn’t that useful if we don’t also have the meta-data (e.g., the coordinates) that come with those values. Thankfully R returns the extracted values in the same sequence that the points were provided to the extract function.

This means that tying together the coordinates (or other meta-data) with the extracted values is relatively straightforward:

latlons <- st_coordinates(pts.sp) # Extract sf point geometries in lonlats and save as a matrix of coordinates
    # Note that we made pts.UTM from pts.sp, so these are in the same order!
latlons <- as.data.frame(latlons) # Convert from matrix to data frame
names(latlons) <- c("lon","lat") # Change the names from XY to lonlat
coords.UTM <- as.data.frame(st_coordinates(pts.UTM)) ## Extract sf point geometries in eastingsnorthings and save as a matrix of coordinates
names(coords.UTM) <- c("UTMx", "UTMy") # Change the names from XY to specify UTM eastingsnorthings
temps <- cbind(latlons, coords.UTM, tStack.pts) # Combine these columns of data into a single data frame
head(temps)
       lon       lat     UTMx    UTMy     tmax     tmin
1 152.8292 -25.43750 482822.8 7186596 24.48788 14.71119
2 152.3125 -26.97083 431771.0 7016610 24.28773 10.08944
3 151.5542 -24.60417 353627.2 7278112 23.10054 12.07985
4 151.4875 -24.42917 346664.7 7297420 24.61214 12.36176
5 151.9542 -26.10417 395424.3 7112360 23.76772 10.13163
6 151.5125 -24.82083 349668.9 7254072 22.72997 10.69059

Presto!

Challenge:
Change the names from wc2.0_30s_tmax_09 and wc2.0_30s_tmin_09 in temps to tmax and tmin, respectively. Then use ggplot() to create a histogram of tmin and tmax. you can find help on how to do this here - https://r-graph-gallery.com/220-basic-ggplot2-histogram.html

Computer Lab 6: Working with shapefiles

Lets import a shapefile of Local Government Authority (LGA) boundaries from ESRI ArcGIS into our R environment.

**NB*.** Do not move or copy the .shp out of its folder, or you risk corrupting it!

councils <- st_read("Data/Local_Government_Areas.shp")
Reading layer `Local_Government_Areas' from data source 
  `/Users/kscales/Library/CloudStorage/OneDrive-UniversityoftheSunshineCoast/USC/Course/ANM302/2026/ANM302_Workshop_Quarto/Data/Local_Government_Areas.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 78 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 137.9946 ymin: -29.17927 xmax: 153.5519 ymax: -9.087991
Geodetic CRS:  WGS 84

OK, so what sort of object do we have now?

class(councils) 
[1] "sf"         "data.frame"

Interesting, it’s an sf that includes a data.frame.

What variables are in the data.frame?

names(councils) 
[1] "ADMINTYPEN" "ADMINAREAN" "LGA_CODE"   "ABBREV_NAM" "LGA"       
[6] "CA_AREA_SQ" "SHAPE_Leng" "SHAPE_Area" "geometry"  

One of these is the listing of local government areas. Let’s look at it, then make sure that R knows this is a factor variable:

councils$LGA
class(councils$LGA) # by default coded as a character string
#councils$LGA <- as.factor(councils$LGA) # set as a factor!

Let’s subset the sf to just the four regional Local Government Authorities that we are interested in. We’ll do this using squar-bracket indexing, but we could as easily have used subset, filter or some other similar function:

counc4 <-
  councils[councils$LGA == "Noosa Shire" |
             councils$LGA == "Sunshine Coast Regional" |
             councils$LGA == "Fraser Coast Regional" |
             councils$LGA == "Gympie Regional", ]

As with all spatial objects, we need to check the Coordinate Reference System is appropriate

st_crs(counc4)

This is in longitude-latitude. Let’s reproject to UTM Zone 56S

counc4.UTM <- st_transform(counc4, UTM56S)
counc4.UTM

Piece of cake!

Now, we’ll plot the boundaries of these four Local Government Authorities on one of the spatRasters that we made earlier. To do this, we simply use the add = TRUE argument to plot.

plot(tmaxSC.UTM)
plot(st_geometry(counc4.UTM), add = TRUE)

Note that we plot the st_geometry, because the sf object has many different variables, any of which could be used to provide a colour fill for the spatial polygons. Anyway, it seems as if the shapfile lines up perfectly with our reprojected spatRaster, so all is good in the world.

As with many other objects, we can save this transformed sf object to our working directory as a shapefile, using the st_write function:

st_write(counc4.UTM,"Outputs/LGA_areas.shp",append=FALSE)

Here, we will receive warings if we try to write the shapefile when one with the same name already exists.

Rasterizing a shapefile

It’s one thing to plot the shapefile, but it would be more useful for extracting the underlying values if we turn it into a spatRaster. Let’s try:

# Here we turn convert the sf object we have into a spatRaster based on the structure of tmaxSC.UTM
# Select the field you want to rasterize
LGA.raster <- rasterize(counc4.UTM, tmaxSC.UTM, field = "LGA") 
LGA.raster
levels(LGA.raster)

Note that there are four ID values in the spatRaster (0 - 3). These are the factor levels of the four councils we have selected, in the sf object (which also happens to be alphabetical order).

Let’s explore with a plot…

plot(LGA.raster, col = terrain.colors(4))
plot(pts.UTM, add = TRUE)

You can see that some of our points fall outside these LGA boundaries. Let’s see what happens when we extract from the spatRaster object

temps$council <- extract(LGA.raster, temps[,3:4], ID = F)[, 1]
summary(temps)
      lon             lat              UTMx             UTMy        
 Min.   :151.0   Min.   :-26.97   Min.   :302702   Min.   :7016610  
 1st Qu.:151.5   1st Qu.:-26.40   1st Qu.:347416   1st Qu.:7079165  
 Median :151.8   Median :-25.85   Median :377402   Median :7140406  
 Mean   :151.9   Mean   :-25.62   Mean   :385361   Mean   :7166331  
 3rd Qu.:152.1   3rd Qu.:-24.76   3rd Qu.:412147   3rd Qu.:7260814  
 Max.   :153.0   Max.   :-24.04   Max.   :496272   Max.   :7340466  
      tmax            tmin                           council  
 Min.   :20.84   Min.   : 8.266   Fraser Coast Regional  : 3  
 1st Qu.:23.07   1st Qu.:10.096   Gympie Regional        : 5  
 Median :23.87   Median :11.851   Noosa Shire            : 0  
 Mean   :23.82   Mean   :11.609   Sunshine Coast Regional: 2  
 3rd Qu.:24.63   3rd Qu.:12.808   NAs                    :40  
 Max.   :25.18   Max.   :14.983                               

You’ll notice there are 40 NAs in the summary. We started with 50 points, so we know that only 10 of these fall inside one of our four councils.

Let’s look at the points that are not NAs, using the is.na(), function preceded by a !:

temps[!is.na(temps$council),]
        lon       lat     UTMx    UTMy     tmax     tmin
1  152.8292 -25.43750 482822.8 7186596 24.48788 14.71119
7  152.7292 -26.75417 473071.0 7040764 22.89594 10.75138
9  152.6792 -25.37083 467722.5 7193951 24.50000 14.18523
13 152.1292 -26.14583 412954.9 7107874 24.57445 10.54473
16 152.6458 -25.31250 464351.9 7200402 24.45067 14.44933
18 152.9208 -25.94583 492073.6 7130312 23.76119 13.36119
19 152.6875 -26.25417 468793.2 7096131 25.00000 11.80000
24 152.2458 -25.95417 424494.1 7129174 22.84997 10.66404
29 152.9208 -26.08750 492083.1 7114624 23.08954 12.03417
42 152.9625 -26.78750 496272.5 7037100 23.40462 12.60142
                   council
1    Fraser Coast Regional
7  Sunshine Coast Regional
9    Fraser Coast Regional
13         Gympie Regional
16   Fraser Coast Regional
18         Gympie Regional
19         Gympie Regional
24         Gympie Regional
29         Gympie Regional
42 Sunshine Coast Regional

Cropping a spatRaster to an sf object

You also might wish to crop your spatRaster to the extent of an sf object.

Let’s use the Noosa Shire LGA object of our counc4.UTM sf polygon

# Extract the Noosa Shire-only polygon
councN.UTM <- counc4.UTM[counc4.UTM$LGA == "Noosa Shire",]
councN.UTM # inspect the new sf object containing only Noosa Shire 

# mask and crop the raster
tmax.Noosa <- tmaxSC.UTM %>% 
  crop(councN.UTM)

plot(tmax.Noosa) # Plots the new raster cropped to the same extent as the Noosa Shire Polygon
plot(councN.UTM[1], add = TRUE,colour=NA) # Plot the Noosa Shire polygon on top of the spatRaster

Now, you can see that the spatRaster data is cropped to the extent of the sf object. This is really helpful when we’re working on projects that require a known spatial footprint. Note, however, that both objects in the call to crop must have the same projection, or things might not turn out the way you expect.

Masking a spatRaster to an sf object

Alternatively, you might wish to mask an area of a spatRaster using an sf object

tmax.NoNoosa <- tmaxSC.UTM %>% 
  mask(councN.UTM,inverse=T)
plot(tmax.NoNoosa) 

We can see that Noosa Shire LGA has now disappeared from our spatRaster. If we were to extract values from the spatRaster that fell within Noosa Shire, they would now be NA. Of course we could mask everything but the councils in a fairly similar way:

tmax.NoosaMask <- tmaxSC.UTM %>% 
  mask(councN.UTM) %>%
  crop(councN.UTM)
plot(tmax.NoosaMask) 

Challenge:
Now try to use ggplot() or tmap() to create a more impactful figure by integrating the pts.UTM object with a cropped AND masked version of the 4 council areas.

Computer Lab 7: Building a Species Distribution Model

Now that we know the basics of loading and manipulating spatial data in R, we can turn our attention to building a species distribution model.

Species distribution models (SDMs) use observations of species occurrences, from systematic surveys, citizen science, or animal tracking, and environmental data describing the conditions in the locations in which the species have been observed to provide information about the environmental conditions preferred by our species of interest. SDMs ask the question: which environmental variables might be important in predicting where this species might be found?

Workflow:

# Working with presence/absence data to predict probability of occurrence

# Creating a conditional inference tree to assess which order to include potentially important explanatory terms in our model

# Checking for colinearity between explanatory variables

# Build a glm() to assess which terms are significant predictors of species presence/absence

# Assess the quality of model fit using AUC() and evaluate()

# Predict where we might find a species of interest in space, based on those environmental features which were significant in our model

Real-world context

Usually, in ecology, we don’t have absence data, we only really collect presence data. In those instances you have to find a way generate those absences to then model the likelihood of presence of your species of interest. We have already done this for you here, but just something to be mindful of when you are working with this kind of data.

First, we need to install the packages needed:

library(dismo)
library(partykit)
library(GGally)

1. Load data

Download “envtrain.csv” from Canvas, and save it in your Data folder. This is a .csv file of locations of an organism throughout South America.

dat <- read.csv("Data/envtrain.csv") 
head(dat) # pa = presence/absence (binomial) # bio's = all numeric 

# biome = needs to be a factor 
str(dat)
dat$biome <- as.factor(dat$biome) # Convert biome to a factor variable str(dat) 

How many presence and absence records do we have?

table(dat$pa)

  0   1 
800  93 

How many records do we have in each biome?

table(dat$biome)

  1   2   3   4   5   7   8   9  10  12  13  14 
450  92  10  10   6 151  46  13  35  11  62   7 

But maybe some of these are correlated… Which predictor variables do we need to include in our model, and in what order?

2. Conditional Inference Tree

Generalised linear models (GLMs) include a number of assumptions which, if you break, the glm will not be doing what it is meant to do

Independence - that every observation is independent of the others

Explanatory variables are not colinear - several variables fighting to explain the same thing in a model

We need to check these assumptions, so we can drop these colinear terms

t1 <- ctree(pa ~ ., data = dat) # presence/absence, as a function of every variable in the data
plot(t1)

Run it again with pa as a factor

t1 <- ctree(as.factor(pa) ~ ., data = dat) # presence/absence, as a function of every variable in the data 
plot(t1)

The conditional inference tree shows that bio16 and bio7 are likely the two most important predictors for us here.

We don’t rely on these conditional inference trees all the time — we only really use these when we have unequal sample sizes and suspect we might have correlated variables.

Because we suspect that bio7 and bio16 might be the more important predictors here, we might want these to be first in our dataframe.

head(dat)
  pa bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8 biome
1  1  263  1639   724    62  338  191  147  261     1
2  1  263  1639   724    62  338  191  147  261     1
3  1  243  1693   775   186  318  150  168  264     1
4  1  240  1214   516   146  317  150  168  261     2
5  1  275  2259   956   208  335  231  104  270     1
6  1  271  2212   807   281  327  220  107  266     1
# Reorder our dataset, 
dat1 <- dat %>% 
    dplyr::select(pa, bio7, bio16, everything()) # This selects the predictors we want in the order we have specified.
  
head(dat1)
  pa bio7 bio16 bio1 bio12 bio17 bio5 bio6 bio8 biome
1  1  147   724  263  1639    62  338  191  261     1
2  1  147   724  263  1639    62  338  191  261     1
3  1  168   775  243  1693   186  318  150  264     1
4  1  168   516  240  1214   146  317  150  261     2
5  1  104   956  275  2259   208  335  231  270     1
6  1  107   807  271  2212   281  327  220  266     1

3. Check for colinearity

Using functions from the dismo package

Colinearity = the same variation in the data is explained by multiple variables. We only need to keep one set.

pairs(dat1)

Colinearity will appear as a clearly a identifiable relationship (linear line) between two predictors (e.g., bio1 and bio6, bio1 and bio5, bio1 and bio8, bio12 and bio16)

Another way to do this:

ggpairs(dat1) 

Here we actually have a correlation coefficient, which gives a quantitative measure of the strength of the correlation (or colinearity) between each pair of variables. Values closer to 1 indicate that the two variables are highly correlated with one another. Let’s remove variables with a correlation coefficient > 0.7.

Which variables do we remove first? We usually make this decision based on our ecological knowledge, and results of the conditional inference tree. This plot shows us that bio16 and bio12 are highly correlated, so we will remove bio1, and bio12 to allow us to keep bio16.

head(dat1)
    ## Drop colinear predictors bio1 and bio12 

dat2 <- dat1 %>% 
  dplyr::select(-"bio1", -"bio12") 
    
head(dat2)

4. Make a binomial GLM and simplify to a minimal adequate model

We’ll use dat2, which is our dataframe of observations without the colinear variables to model the influence of our predictors on the likelihood of presence of our focal organism.

Because pa is the response (0 = absence, 1 = presence), we use the binomial family.

m1 <- glm(pa ~ ., data = dat2, family = binomial) 
summary(m1) # This is quite tough to interpret, but we can see that bio16 and bio17 might be important here.

Call:
glm(formula = pa ~ ., family = binomial, data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)   
(Intercept)  5.251e+00  1.757e+00   2.989  0.00280 **
bio7        -1.293e-01  2.526e-01  -0.512  0.60890   
bio16        1.566e-03  5.190e-04   3.018  0.00254 **
bio17       -1.668e-03  7.708e-04  -2.164  0.03047 * 
bio5         8.274e-02  2.516e-01   0.329  0.74227   
bio6        -9.283e-02  2.528e-01  -0.367  0.71341   
bio8        -1.297e-04  1.780e-02  -0.007  0.99419   
biome2      -6.169e-01  6.578e-01  -0.938  0.34835   
biome3      -1.695e+01  3.335e+03  -0.005  0.99595   
biome4      -1.597e+01  3.128e+03  -0.005  0.99593   
biome5      -1.304e+01  4.247e+03  -0.003  0.99755   
biome7      -1.610e+01  7.823e+02  -0.021  0.98359   
biome8      -1.192e+01  1.462e+03  -0.008  0.99349   
biome9      -1.500e+01  2.683e+03  -0.006  0.99554   
biome10     -1.532e+01  1.595e+03  -0.010  0.99234   
biome12     -1.546e+01  3.083e+03  -0.005  0.99600   
biome13     -7.823e-04  5.249e-01  -0.001  0.99881   
biome14     -1.692e+00  1.173e+00  -1.442  0.14917   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.69  on 892  degrees of freedom
Residual deviance: 421.53  on 875  degrees of freedom
AIC: 457.53

Number of Fisher Scoring iterations: 18

How much of the overall deviance does our initial model explain?

 1 - (m1$deviance/m1$null.deviance) # 
[1] 0.2935475

m1 explains approximately 30% of the deviance.

We need to simplify our model down to the minimum adequate model to narrow it down to the main drivers of species occurrence. We’ll do this using the AIC value.

 AIC(m1) 
[1] 457.5327
m2 <- step(m1) 

The step function removes variables that are not important for the model fit, using AIC

summary(m2) 

Call:
glm(formula = pa ~ bio7 + bio16 + bio17 + bio6, family = binomial, 
    data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  5.3531252  1.6568504   3.231 0.001234 ** 
bio7        -0.0506727  0.0075221  -6.737 1.62e-11 ***
bio16        0.0015171  0.0004556   3.330 0.000869 ***
bio17       -0.0013347  0.0007325  -1.822 0.068435 .  
bio6        -0.0091471  0.0046363  -1.973 0.048504 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.69  on 892  degrees of freedom
Residual deviance: 437.50  on 888  degrees of freedom
AIC: 447.5

Number of Fisher Scoring iterations: 7

step has dropped a few predictors — we now only have 4. But, AIC is quite conservative, so we can be a bit harsher (on the basis of p-values).

drop1(m2, test = "Chi")
Single term deletions

Model:
pa ~ bio7 + bio16 + bio17 + bio6
       Df Deviance    AIC    LRT  Pr(>Chi)    
<none>      437.50 447.50                     
bio7    1   488.03 496.03 50.529 1.174e-12 ***
bio16   1   448.78 456.78 11.275 0.0007857 ***
bio17   1   440.88 448.88  3.375 0.0661844 .  
bio6    1   440.94 448.94  3.436 0.0638003 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

bio17 and bio6 are not significant (p > 0.0638), so we can remove these, but only one at a time!

Let’s start with bio17.

m3 <- update(m2, ~. -bio17) 
summary(m3)

Call:
glm(formula = pa ~ bio7 + bio16 + bio6, family = binomial, data = dat2)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.524712   1.617869   2.797  0.00516 ** 
bio7        -0.046127   0.007084  -6.512 7.43e-11 ***
bio16        0.001106   0.000405   2.732  0.00630 ** 
bio6        -0.007370   0.004625  -1.593  0.11107    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.69  on 892  degrees of freedom
Residual deviance: 440.88  on 889  degrees of freedom
AIC: 448.88

Number of Fisher Scoring iterations: 7

Seems like we can drop bio6.

drop1(m3, test = "Chi") # Yep this agrees (p = 0.130240)
Single term deletions

Model:
pa ~ bio7 + bio16 + bio6
       Df Deviance    AIC    LRT  Pr(>Chi)    
<none>      440.88 448.88                     
bio7    1   488.81 494.81 47.932 4.413e-12 ***
bio16   1   448.79 454.79  7.909  0.004918 ** 
bio6    1   443.17 449.17  2.290  0.130240    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m4 <- update(m3, ~. -bio6) 
summary(m4) 

Call:
glm(formula = pa ~ bio7 + bio16, family = binomial, data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.3527667  0.8382263   2.807   0.0050 ** 
bio7        -0.0393479  0.0054645  -7.201 5.99e-13 ***
bio16        0.0010282  0.0003995   2.574   0.0101 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.69  on 892  degrees of freedom
Residual deviance: 443.17  on 890  degrees of freedom
AIC: 449.17

Number of Fisher Scoring iterations: 7

Seems like everything is significant, but we’ll check again with drop1:

drop1(m4, test = "Chi") 
Single term deletions

Model:
pa ~ bio7 + bio16
       Df Deviance    AIC    LRT  Pr(>Chi)    
<none>      443.17 449.17                     
bio7    1   526.50 530.50 83.335 < 2.2e-16 ***
bio16   1   450.14 454.14  6.971  0.008283 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All predictors are now significant. If we dropped anything else, our model fit would deteriorate significantly. We’ve reached our minimum adequate model!

According to this model, bio16 and bio7 seem to be most important for predicting organism presence here. This matches our conditional inference tree results.

How much of the overall deviance does our minimum adequate model explain? We’d need to report this as a measure of model performance if we were publishing these results in a scientific paper or report.

1 - (m4$deviance/m4$null.deviance)
[1] 0.2572914

m4 explains approx 26% of the deviance, which is slightly worse, but we now don’t have colinearity to compete with.

The more predictors we have, the muddier the picture becomes. This is why, in ecology, we strive to simplify our models to really nut out what the main drivers of these patterns could be.

Having a low deviance explained (25%) is actually really common in ecology, because there are lots of other predictors that we didn’t measure that could be responsible for explaining the remaining deviance.

5. AUC

AUC (Area Under the receiver operating Curve) is another useful tool for understanding how well our model performs in terms of predicting 1s where 1s should be and 0s where 0s should be in the observational dataset.

To calculate the AUC of our model, let’s split our model dataframe into separate presence absence dataframes.

pres <- dat2 %>% filter(pa == 1)
abs <- dat2 %>% filter(pa == 0)

Now let’s calculate the AUC using the evaluate function from the dismo package.

evaluate(p = pres, a = abs, model = m4) 
class          : ModelEvaluation 
n presences    : 93 
n absences     : 800 
AUC            : 0.8595968 
cor            : 0.3326106 
max TPR+TNR at : -2.915618 

The AUC stat (0-1) explains how well the model predicts species occurrence in space, where 1 = perfect and 0 = awful.

Here our AUC is 0.86, which is pretty good.

6. Model predictions: Making a map of habitat suitability of the focal species

Lets predict from our model to make maps of habitat suitability or likelihood of occurrence of the species of interest.

First, let’s load in the raster datasets from the dismo package.

r <- list.files(path = paste0(system.file(package = "dismo"), "/ex"), 
                pattern = "grd", full.names = TRUE) 
r
[1] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio1.grd" 
[2] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio12.grd"
[3] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio16.grd"
[4] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio17.grd"
[5] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio5.grd" 
[6] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio6.grd" 
[7] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio7.grd" 
[8] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/bio8.grd" 
[9] "/Users/kscales/Library/R/arm64/4.6/library/dismo/ex/biome.grd"

Build our predictor surfaces into a rasterstack (include all of them, even if not significant in the model. R will ignore the non-significant ones).

predr <- rast(r) # Read them all in as rasters that are stacked together 
plot(predr) 

Lets predict from our model (m4) over this rasterstack to get predictions of thh likelihood of species occurrence.

p <- predict(predr, # Our raster stack of predictors 
             m4, # Our minimum adequate model 
             type = "response") # Classifies the type of predicted values we will get are based on our response variable (so between 0-1, because it is a binomial model). 
plot(p)

There we are! A map of the likelihood of presence of our species of interest, with yellow shades indicating a higher likelihood of encountering the species in that region.

Let’s check out our model summary to remind us which variables are important in predicting species’ presence:

summary(m4) 

Call:
glm(formula = pa ~ bio7 + bio16, family = binomial, data = dat2)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.3527667  0.8382263   2.807   0.0050 ** 
bio7        -0.0393479  0.0054645  -7.201 5.99e-13 ***
bio16        0.0010282  0.0003995   2.574   0.0101 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.69  on 892  degrees of freedom
Residual deviance: 443.17  on 890  degrees of freedom
AIC: 449.17

Number of Fisher Scoring iterations: 7

And predr is a raster itself, which we can use to plot using ggplot() or tmap(), as we learnt earlier.

predr
class       : SpatRaster 
size        : 192, 186, 9  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -125, -32, -56, 40  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs 
sources     : bio1.grd  
              bio12.grd  
              bio16.grd  
              ... and 6 more sources
names       : bio1, bio12, bio16, bio17, bio5, bio6, ... 
min values  :  -23,     0,     0,     0,   61, -212, ... 
max values  :  289,  7682,  2458,  1496,  422,  242, ... 

Conclusion

With these basics in hand, you should now be able to:

1 - Read in raster-type data

2 - Stack, crop, reproject, resample and mask raster-type data

3 - Read in spatial points using sf

4 - Extract data from a single spatRaster or a stack of spatRaster’s , and store the output in a data.frame

5 - Read a shapefile into R using sf and use the resulting object to manipulate raster-type data

6 - Produce pretty maps using tmap and ggplot2

7 - Build and evaluate a species distribution model using dismo

These are the skills that will underpin your spatial data analysis for Task 3.

Remember that both the terra and sf packages have excellent vignettes get them from CRAN, which you could use if you need to check functions, or to explore the functionality of the packages further.