Welcome to the second script of Module 2!
At the example of the CHIRPS dataset, you will here learn how to:

  • import, edit and display raster datasets
  • perform spatial operations with raster objects
  • download raster datasets
  • calculate time trends and assess their significance
  • parallelize workflows with foreach and doParallel

Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to change the code or write your own code, and execute it as well!


1. Import, edit and display raster datasets

Again, we start by defining our working directory and loading all the packages that we will need:

dir <- "C:/Users/Max/Desktop/IAMO_neu/eLearning/Module_2_Climate_trends/M02_Part02_CHIRPS/"

library(R.utils)
library(raster)
library(exactextractr)
library(rgdal)
library(reshape2)
library(dplyr)
library(ggplot2)
library(viridis)
library(pals)
library(lubridate)
library(Kendall)
library(stringr)
library(doParallel)
library(foreach)

In the folder “data”, there are ten files with the extension “.gz”, which is a format similar to “.rar” or “.zip”. Each file is a zipped TIFF file of the CHIRPS dataset that displays the global precipitation on a specific day, with a resolution of 0.05 degrees. We choose the file for January 4th 1981 and unzip it with the function gunzip() from the package R.utils. The argument “remove = F” means that the “.gz”-file does not get deleted after unzipping:

gunzip(filename = paste0(dir, "data/chirps-v2.0.1981.01.04.tif.gz"),
       destname = paste0(dir, "data/chirps-v2.0.1981.01.04.tif"), remove = F, overwrite = T)

Have a look at the folder “data”, where you should now find the unzipped TIFF file! We can import this file as a RasterLayer object with the function raster() from the package raster:

chirps_raster <- raster(paste0(dir, "data/chirps-v2.0.1981.01.04.tif"))
class(chirps_raster)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

When we call this object, we get some basic information about its properties. “chirps_raster” is a raster that has 2000 rows and 7200 columns of data, with a spatial resolution of 0.05 degrees. The dataset extends from 50 degrees north to 50 degrees south. The coordinate reference systems (CRS), i.e. the coordinate system, is WGS84. The slot “names” contains the names of all layers in our object, which currently only has one layer:

chirps_raster
## class      : RasterLayer 
## dimensions : 2000, 7200, 14400000  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : -180, 180, -50, 50  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : chirps-v2.0.1981.01.04.tif 
## names      : chirps.v2.0.1981.01.04

We can quickly plot the data in the following way:

plot(chirps_raster)

We notice that all land masses appear uniformly green and that the legend extends far below zero, which does not seem right. This is because all NA-values in the CHIRPS dataset are coded as -9999. We can substitute all values of “-9999” to “NA” in the following way. The new plot looks more plausible. The numbers in the legend represent millimeters of precipitation!

chirps_raster[chirps_raster == -9999] <- NA
plot(chirps_raster)

For our analysis, we only need the part of the data that refers to Armenia. We can use a shapefile of the marzes to cut out the respective part from “chirps_raster”. However, we first have to make sure that all data is in the same coordinate system:

marzes <- readOGR(paste0(dir, "shapefiles/ARM_marzes_short_simplified_500.shp"), verbose=F)
marzes_new <- spTransform(marzes, proj4string(chirps_raster))

By applying the functions crop() and mask() from the package raster, we narrow down the spatial extent of the raster dataset to Armenia, and select only raster cells inside Armenia. We can check the extent of the newly created object “chirps_raster_crop_mask” by calling its properties:

chirps_raster_crop      <- crop(chirps_raster, marzes_new)
chirps_raster_crop_mask <- mask(chirps_raster_crop, marzes_new)

chirps_raster_crop_mask
## class      : RasterLayer 
## dimensions : 49, 64, 3136  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : 43.45, 46.65, 38.85, 41.3  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : chirps.v2.0.1981.01.04 
## values     : 0, 6.394869  (min, max)

Let us plot the new, cropped raster, and add the marzes boundaries for reference:

plot(chirps_raster_crop_mask)
plot(marzes_new, add = T)

So far, we have only worked on one raster file. Let us unzip and import all 10 files.

For unzipping, we apply a for-loop, in which we iteratively change the file name. We can retrieve all file names in a given directory with the function list.files():

zipfiles <- list.files(paste0(dir, "data"), pattern = ".gz$")

for (i in 1:length(zipfiles))
{ gunzip(filename = paste0(dir, "data/", zipfiles[i]),
         destname = paste0(dir, "data/" , substring(zipfiles[i],1,26) ), remove = F, overwrite = T) }

We can import all unzipped TIFF files at once, using the function stack() from the package raster:

tiffiles <- list.files(paste0(dir, "data"), pattern = ".tif$", full.names = T)

chirps_stack <- stack(tiffiles)

The resulting object “chirps_stack” is a RasterStack, which is a multi-layered raster object. You can notice that the attributes are almost the same as of a single RasterLayer, with the difference that we now have a fourth dimension (nlayers = 10), and several values under the slot “names” which refer to the individual layers:

class(chirps_stack)
## [1] "RasterStack"
## attr(,"package")
## [1] "raster"
chirps_stack
## class      : RasterStack 
## dimensions : 2000, 7200, 14400000, 10  (nrow, ncol, ncell, nlayers)
## resolution : 0.05, 0.05  (x, y)
## extent     : -180, 180, -50, 50  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## names      : chirps.v2.0.1981.01.01, chirps.v2.0.1981.01.02, chirps.v2.0.1981.01.03, chirps.v2.0.1981.01.04, chirps.v2.0.1981.01.05, chirps.v2.0.1981.01.06, chirps.v2.0.1981.01.07, chirps.v2.0.1981.01.08, chirps.v2.0.1981.01.09, chirps.v2.0.1981.01.10

We can call individual layers with double squared brackets, or by calling the layer name, similar to list objects:

chirps_stack[[2]]
## class      : RasterLayer 
## dimensions : 2000, 7200, 14400000  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : -180, 180, -50, 50  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : chirps-v2.0.1981.01.02.tif 
## names      : chirps.v2.0.1981.01.02
chirps_stack$chirps.v2.0.1981.01.02
## class      : RasterLayer 
## dimensions : 2000, 7200, 14400000  (nrow, ncol, ncell)
## resolution : 0.05, 0.05  (x, y)
## extent     : -180, 180, -50, 50  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : chirps-v2.0.1981.01.02.tif 
## names      : chirps.v2.0.1981.01.02

We can plot all 10 layers of “chirps_stack” at once in the following way:

plot(chirps_stack)

As you can see, we still have to process the stack in the same way we did before with “chirps_raster”. Even though we are now working with a RasterStack instead of a RasterLayer, we do not have to change anything in the syntax, the functions are automatically applied to each layer in “chirps_stack”. However, to decrease the time needed for processing, we will first crop and mask the stack, and only then convert the values of “-9999” to “NA”:

chirps_stack_crop      <- crop(chirps_stack, marzes_new)
chirps_stack_crop_mask <- mask(chirps_stack_crop, marzes_new)

chirps_stack_crop_mask[chirps_stack_crop_mask == -9999] <- NA

Let us plot the resulting 10 layers of Armenia. When there is not a single cell with rainfall on a given day, the country turns yellow because of the default color scale that is applied:

plot(chirps_stack_crop_mask)

We can of course change the coloration scheme, and only select some layers for plotting:

plot(chirps_stack_crop_mask[[7:10]], col = viridis(50))

2. Spatial operations with raster objects

The advantage of having raster layers of several days in the same RasterStack object is that we can easily perform mathematical operations on the entire stack. For example, we can calculate the sum of all 10 layers, i.e. the cumulative precipitation from January 1 to 10, 1981, in the following way:

chirps_sum <- sum(chirps_stack_crop_mask)
plot(chirps_sum, main="Cumulative precipitation from Jan 01 to Jan 10, 1981", col = rev(coolwarm(50)[1:25]))
plot(marzes_new, add = T)

We can of course apply other functions as well:

chirps_mean <- mean(chirps_stack_crop_mask)
plot(chirps_mean, main="Average daily precipitation between Jan 01 to Jan 10, 1981")

chirps_max <- max(chirps_stack_crop_mask)
plot(chirps_max, main="Maximum daily precipitation between Jan 01 to Jan 10, 1981")

chirps_min <- min(chirps_stack_crop_mask)
plot(chirps_min, main="Minimum daily precipitation between Jan 01 to Jan 10, 1981")

We can use the same syntax with which we changed the values “-9999” to “NA” to create a stack that for each day indicates whether there was rain or not, and then count the number of rainy days:

chirps_stack_2 <- chirps_stack_crop_mask

chirps_stack_2[chirps_stack_2 > 0] <- 1

rainy_days <- sum(chirps_stack_2)
plot(rainy_days, main="Number of rainy days from Jan 01 to Jan 10, 1981", col = rev(ocean.haline(10)))
plot(marzes_new, add = T)

We can of course also select individual layers from our raster stack and apply mathematical operators between them, and mix them with some constants:

nonsense_layer <- ((chirps_stack_crop_mask[[4]]+1) * (chirps_stack_crop_mask[[10]]+1)) + chirps_stack_2[[4]]*10
plot(nonsense_layer)

You can also select layers from an existing stack to create new stacks. When the layers come from different stacks, make sure they have the same properties, i.e. the same extent, the same coordinate system, etc.

new_stack <- stack(chirps_stack_2[[1]], chirps_stack_2[[3]])
new_stack
## class      : RasterStack 
## dimensions : 49, 64, 3136, 2  (nrow, ncol, ncell, nlayers)
## resolution : 0.05, 0.05  (x, y)
## extent     : 43.45, 46.65, 38.85, 41.3  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## names      : chirps.v2.0.1981.01.01, chirps.v2.0.1981.01.03 
## min values :                      0,                      0 
## max values :                      1,                      0

Let us now combine raster and vector data. Recall the raster that contains cumulative precipitation, and the marzes shapefile:

plot(chirps_sum)
plot(marzes_new, add = T)

To calculate the mean of all pixels underlying a marz, we can apply the function exact_extract() from the package exactextractr. In other GIS software, this is often referred to as Zonal Statistics. The resulting vector “marz_means” contains 11 elements, that correspond to the 11 marzes contained in “marzes_new”. We can simply add this vector to the attribute table, because we know that the order of the values in “marz_means” corresponds to the order of the elements in the attribute table. We can then plot the marz-level values with spplot():

marz_means <- exact_extract(x=chirps_sum, y=marzes_new, fun="mean", progress = F)
marzes_new$precip_mean <- marz_means
spplot(marzes_new, "precip_mean", col.regions= rev(terrain.colors(20)), at=c(1:11))

Let’s also calculate the maximum instead of mean cell value for each marz:

marz_max <- exact_extract(x=chirps_sum, y=marzes_new, fun="max", progress = F)
marzes_new$precip_max <- marz_max
spplot(marzes_new, "precip_max", col.regions= rev(terrain.colors(20)), at=c(1:11))

3. Download CHIRPS data from the internet

Obviously, we want to scale up everything that we have done before, to calculate trends in marz-level precipitation over time! For this, we first need more data. Let us download all January CHIRPS data from 1981 to 2020.

All daily CHIRPS files are publicly available on an HTTPS-Server here. We can download any file from this server directly from within R, by specifying the respective link in the function download.file(). Let us download the file for January 11, 1981 to the folder “download”:

# download.file(url = "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/tifs/p05/1981/chirps-v2.0.1981.01.11.tif.gz",
#               destfile = paste0(dir, "download/chirps-v2.0.1981.01.11.tif.gz") )

To download all January files, we have to sequentially change the file name by iterating over the year (1981 to 2020), and over the day (1 to 31). Note that every day needs to be a two-digit character. We can create a vector of two-digit characters with the function str_pad() from the package stringr. When you start the code below, go to the folder “download” to check if the download started. However, you can then cancel this chunk, because we will make this process faster in the second next chunk.

years <- c(1981:2020)
days  <- str_pad(c(1:31), 2, pad="0")
# 
# for (y in 1:length(years))
# { for (d in 1:length(days))
#   {  download.file(url = paste0("https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/tifs/p05/", 
#                                 years[y], "/chirps-v2.0.", years[y], ".01.", days[d], ".tif.gz"),
#               destfile = paste0(dir, "download/chirps-v2.0.", years[y], ".01.", days[d], ".tif.gz") )
# }}

You might have noticed that the sequential download as initiated by the previous chunk takes a lot of time. We can speed the download up by parallelizing it, i.e. we make use of the multiple CPU cores of our computer to download several files at once. How many cores you have, and hence how many jobs you can parallelize, depends on your computer. You can find out how many cores you have available in the following way:

detectCores()
## [1] 8

For example, if the above chunk yields “8”, it means that you could download 8 files in parallel. However, you should reserve at least one core for other tasks on your computer. We have to tell R to use several cores by creating a cluster. Let us create a cluster from n-1 cores, and repeat the download procedure by iterating over “year” in a foreach loop, which means we will parallelize the download for each year. Afterwards, we should stop the cluster before we proceed with the script. Start the following chunk, and then go to the folder “download” - you should see that several files are now always downloaded at once, and that the folder fills up faster than before. Wait until the download is finished:

# UseCores <- detectCores()-1              
# cl <- makeCluster(UseCores)     
# registerDoParallel(cl)
# 
# foreach(y=1:length(years)) %dopar%
# { for (d in 1:length(days))
#   {  download.file(url = paste0("https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/tifs/p05/", 
#                                 years[y], "/chirps-v2.0.", years[y], ".01.", days[d], ".tif.gz"),
#               destfile = paste0(dir, "download/chirps-v2.0.", years[y], ".01.", days[d], ".tif.gz") )
# }}    
#     
# stopCluster(cl)