I thought it would be useful (for me, if no one else) to document some R code I wrote for a recent project-within-project. The code brings together some of the functions and techniques I’ve been learning for handling spatial and non-spatial data in the R programming environment.

The aim of the research project I’m working on is to develop a model to quantify the resilience of the agricultural production system at regional (county) level in England and Wales, UK. One of the elements of the ‘resilience index’ is a composite environmental index, comprised of several sub-indexes including drought risk. Despite its reputation as a grey and rainy place, the impact of drought on the agricultural economy (particularly arable farming) of the UK is very real, and we needed to take this into consideration when formulating an environmental risk index for our model.

Spatial data was acquired for:

The first task was to decide how to go about analysing the Standardised Precipitation Index (SPI) data from the CEH (https://eip.ceh.ac.uk/apps/droughts/). The SPI is a widely-used index to characterise meteorological drought on a range of timescales. Over short timescales, the SPI is closely related to soil moisture, while at longer timescales, the SPI can be related to groundwater and reservoir storage. The SPI can be created for differing periods (1 to 24 months), using monthly input data.



For this project, a simple approach to creating a meaningful drought risk index was required to avoid any complex meteorological modelling beyond the scope of the project. The strategy adopted was to concentrate on using 1-month SPI data, to represent shorter, more intense droughts which could have immediate impacts on crop growth. Monthly SPI grids for 2006 to 2015 (the timescale over which the main project is focussed) were downloaded from the CEH in NetCDF format. I struggled to handle this format in R however, and so I contacted the CEH who kindly supplied me the data in TIFF format.

To analyse this data, we decided to concentrate on analysing the spatial distribution of the most intense 1-month droughts – those with a SPI value of -2 and below (extreme drought). This would hopefully give us a realistic picture of the potential threat to crop production from drought within the context of the temperate oceanic climate of England and Wales.

Raster SPI grids for the months of the main growing season (April-October) for each of the 10 years were extracted from the 120 original SPI grids, leaving a total of 70 TIFF files. These grids were then reclassified using R to extract data where SPI <=-2 (note looping can also be done with ‘lapply’):

#import required libraries
library(raster)
library(rgdal)

#get the list of rasters
#--pattern = "*.tif$"-- ensures only main tif files are selected and avoids error when
#reading rasters by skipping any associated aux/world files
grids <- list.files("./Data/Rasters/", pattern = "*.tif$")

#generate a reclassification matrix 
#in this example, values less that -2 are assigned a new value of '1' ('severe to extreme drought')
#and values greater than -2 assigned a new value of 0 (less then 'severe drought')
m <- c(-999, -2, 1,  -2, 999, 0)
rclmat <- matrix(m, ncol=3, byrow=TRUE)

#function to reclassify rasters and write a new reclassified tif file for each
batch_reclass <- function(grids){
for (i in 1:length(grids)) {
#read in raster
r <- raster(paste0("./Data/Rasters/", grids[i]))
#perform the reclassifcation
rc <- reclassify(r, rclmat)
#write each reclass to a new file 
writeRaster(rc, filename = paste0("./Data/Reclass/", "rc_", grids[i]), format="GTiff", overwrite=TRUE)
  }
}
#run the function
batch_reclass(grids)

The reclassified monthly grids were then added together for each year’s growing season, to compute raster SPI grids showing the extent of the land areas that had experienced at least one extreme 1-month drought for each year. The image below shows the SPI grid calculated for the year 2013 (1-month extreme drought areas in black).

The R code for the batch raster calculation process:

#import required libraries
library(raster)
library(rgdal)

#list of folders containing the raster files
#in this example the folders are named after years 
years <- 2006:2015

#initiate function (passing years/folders argument)
batch_rastSum <- function(years){

#loop through the different folders
for (i in 1:length(years)) {

#get a list of the input rasters in each folder
#pattern = "*.tif$" filters for main raster files only and skips any associated files (e.g. world files)
grids <- list.files(paste0("./Data/Reclass/", years[i], "/"), pattern = "*.tif$")

#create a raster stack from the input grids (in this example there are 12 tif files in each folder)
s <- raster::stack(paste0("./Data/Reclass/", years[i], "/", grids))

#run the sum function on the raster stack - i.e. add (non-cumulatively) the rasters together
r <- sum(s)

#write the output raster to file
r <- writeRaster(r, filename = paste0("./Data/YearSums/", "rc_Year_", years[i]), format="GTiff", overwrite=TRUE)

  }
}

#run the function 
batch_rastSum(years)

With the annual SPI <=-2 grids now generated, we could start to develop a way of quantifying the geographical relationship between the extent of extreme drought and arable farming within each of our study areas (counties) in England and Wales.

The goal was calculate the area of arable land affected by at least one extreme (SPI<=-2) 1-month drought in an annual growing season (April-Oct), as proportion of the overall agricultural land area in each county. This would, we think, provide us with a simple and meaningful metric that we could plug into our resilience model.

Agricultural land (arable and grassland) was extracted from the LCM 2007 1km raster data. Areas of arable land were extracted as a separate layer using the raster::reclassify function in R. This data, along with the annual SPI grid data was converted into vector polygon format. Annual SPI layers were clipped to the extent of arable land (i.e. we retained only the drought data that was geographically coincident with arable farmland).

With all data now represented as vector polygons we could now begin the process of calculating areas using intersections. The following code shows how this was done in R. No doubt this code could be improved, but it worked!

This first section of code performs geometric intersections using polygon shapefiles to calculate the areas of agricultural (all types) land and arable-only land within each county:

#import libraries
library(sf)
library(tidyverse)

#remove scientific notation when printing numeric values
#(allows easier visual checking of large values computed in square meters)
options(scipen = 999)

#load county areas polygon shapefile
counties <- st_read("./Data/counties_poly.shp")

#calculate the area of each county as new column "areaCounty"
#converting column type from "S3: Units" to numeric in the process
counties$areaCounty <- as.numeric(st_area(counties$geometry))

#load the arable land polygon data extracted from the CEH LCM2007 1km land cover raster) as tibble
arable <- st_read("./Data/lcm_arable_poly.shp")

#intersect the "arable" and "counties" layers 
#note: R will throw an error if coordinate reference systsems don't match -  "st_crs(x) == st_crs(y) is not TRUE"
int <- as_tibble(st_intersection(arable, counties))

#calculate the area of each polygon in intersect layer and add to attribute table
int$areaArable <- st_area(int$geoms)

#group data by county and calculate the total arable land area per county
tb_ArableSpiByCounty <- int %>%
  group_by(County_UA) %>%
  summarise(areaArable = (as.numeric(sum(areaArable)))) 

#remove temporary "int" layer
rm(int)

#join to original "counties" layer to add column of county area to table
tb_ArableSpiByCounty <- merge(counties, tb_ArableSpiByCounty, by = 'County_UA')

#load a shapefile of total agri areas in England and Wales
agri <- st_read("./Data/lcm_agri_poly.shp")

#intersect the "agri" and "counties" layers
int <- as_tibble(st_intersection(agri, counties))

#calculate the area of each polygon in intersect layer and add to attribute table
int$areaAgri <- st_area(int$geoms)

tb_AgriByCounty <- int %>%
  group_by(County_UA) %>%
  summarise(areaAgri = (as.numeric(sum(areaAgri)))) 

#join intersect to main "tb_ArableByCounty" tibble to add column of total agri land area within each county
tb_ArableSpiByCounty <- merge(tb_ArableSpiByCounty, tb_AgriByCounty, by = 'County_UA')

#remove temporary intersect layer
rm(int)

#calculate some statistics and add to the table:
#area of agricultural land as proportion of overall land area in each county 
tb_ArableSpiByCounty$prop_AgriCounty <- (tb_ArableSpiByCounty$areaAgri/tb_ArableSpiByCounty$areaCounty) *100
#area of arable land as a proportion of agricultural land in each county
tb_ArableSpiByCounty$prop_ArabCounty <- (tb_ArableSpiByCounty$areaArable/tb_ArableSpiByCounty$areaAgri) *100

#export the above merge as a shapefile to save progress
st_write(tb_ArableSpiByCounty, "./Data/AgriArablebyCounty.shp")

#import shapefile to resume analysis 
tb_ArableSpiByCounty <- st_read("./Data/AgriArablebyCounty.shp")



The next section of code then performs intersections on the annual SPI extreme drought layers (using a loop), and calculates the area of extreme drought as a proportion of the total area of agricultural land in each county. The results are exported as a CSV file.

#generate a list of years for use as filenames in the loop
#remove year 2012 fromt the list as no SPI<=-2 dtaa was recorded for this year
years <- 2006:2015
years <- years[-7]

#function
batch_int <- function(years) {
  
  for (i in 1:length(years)) {
  
  #load a polygon shapefile of SPI<-2 extracted for arable land only
  #"vClean" in the filepath refers to the fact that the data needed to be pre-processed to...
  #...clean self-interecting polygons that were causing errors when running intersection...
  #...the only bit done outside R using GRASS::v.vlean
  spi<- st_read(paste0("./Data/SPI_Arable_Polys_vClean/", years[i], ".shp"))
  
  #intersect the drought and counties polygons
  intSpi <- as_tibble(st_intersection(spi, counties))
  
  #calculate intersected SPI areas within each county and add as new column to intersect layer
  intSpi$spiArea <- as.numeric(st_area(intSpi$geoms))
  
  #group the intersect layer by county and sum areas of SPI polygons within each county
  intSpi <- intSpi %>%
    group_by(County_UA) %>%
    summarise(spiArea  = (sum(spiArea))) 
  
  #change spiArea column name to unique value (for each year/SPI shapefile)
  colnames(intSpi)[colnames(intSpi)=="spiArea"] <- paste0("spiArea", years[i])
  
  #merge the intersection with the main data table generated earlier (see code section one)
  #ensure "all.x = TRUE" to retain all county rows
  tb_ArableSpiByCounty <- merge(tb_ArableSpiByCounty, intSpi, by = 'County_UA', all.x = TRUE)
  
  #add new field to main table to calculate proportion of arable land affected by extreme drought (SPI<-2)...
  #...as a proportion of the overall agricultural land area in each county
  tb_ArableSpiByCounty[[paste0("spiProp", years[i])]] <- 
    (tb_ArableSpiByCounty[[paste0("spiArea", years[i])]]/tb_ArableSpiByCounty$areaAgri)*100
  
  #remove temporary layers
  rm(intSpi)
  rm(spi)
  
  #convert main table to tibble and remove geometry column for quick export to CSV
  result <- as_tibble(tb_ArableSpiByCounty)
  result$geometry <- NULL
  
  #write output as CSV, converting any "NA" values to "0" in the process
  write.csv(result, file = "./Data/1_final_table.csv", append =  TRUE, na= "0")
  
  }
}

batch_int(years)

# import and check the final table
df_final <- read.csv("./Data/1_final_table.csv")



And here’s part of the final results table. The key column is “spiProp2006” - the area of arable land affected by an extreme drought as a proportion of the area of agricultural land in each county. This example shows just the data for 2006, but the loop calculates the spiProp for all years.