Getting started with GHSLs

The Global Human Settlement Layer (GHSL) is a scientific tool offered by the European Commission Science Hub in order to provide global spatial information about the human presence on the planet over time. According to creators, it provides a globally-consistent, multi-scale and detailed representation of built-up areas. The data are free to download. More information, as well as the data repository can be found at the dedicated GSHL portal.

In this practical we will experiment with the capabilities that R currently offers in raster manipulation, particularly the raster library. We will use two raster files containing world population estimates at 1 km x 1 km Grid, on for 1975 and one for 2015. Let’s start by downloading this datasets into R:

Data Input

options(scipen=999)
# Don't forget to set your working directory first; it will download all the files there.

# Load necessary libraries
library(raster)
## Loading required package: sp
library(rgdal) 
## rgdal: version: 1.2-8, (SVN revision 663)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
##  Path to GDAL shared files: /usr/share/gdal/1.10
##  Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
##  Path to PROJ.4 shared files: (autodetected)
##  Linking to sp version: 1.2-5
# Download Raster data 

# For 1975, 1x1 km Grid
temp <- tempfile()
data_url <- "http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_POP_GPW4_GLOBE_R2015A/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k/V1-0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.zip"
download.file(data_url, temp)
unzip(temp, exdir = "GHSL_pop/1975/")
unlink(temp)

# For 2015, 1x1 km Grid
temp <- tempfile()
data_url <- "http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_POP_GPW4_GLOBE_R2015A/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k/V1-0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.zip"
download.file(data_url, temp)
unzip(temp, exdir = "GHSL_pop/2015/")
unlink(temp)

# Import the download rasters (tiff) for 1975 and 2015
pop1975 <- raster("GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif")
pop2015 <- raster("GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif")

# This will print some basic raster infomormation
GDALinfo("GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif")
## rows        15236 
## columns     35497 
## bands       1 
## lower left origin.x        -17619595 
## lower left origin.y        -6484971 
## res.x       1000 
## res.y       1000 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs 
## file        GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif 
## apparent band summary:
##    GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32          FALSE           0        256        256
## apparent band statistics:
##   Bmin    Bmax    Bmean     Bsd
## 1    0 3489988 7.510408 460.785
## Metadata:
## AREA_OR_POINT=Area
GDALinfo("GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif")
## rows        15236 
## columns     35497 
## bands       1 
## lower left origin.x        -17619595 
## lower left origin.y        -6484971 
## res.x       1000 
## res.y       1000 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs 
## file        GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif 
## apparent band summary:
##    GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32          FALSE           0        256        256
## apparent band statistics:
##   Bmin   Bmax    Bmean      Bsd
## 1    0 634492 13.59264 407.3207
## Metadata:
## AREA_OR_POINT=Area
# Download a basic world land polygons layer

temp <- tempfile()
data_url <- "http://data.openstreetmapdata.com/simplified-land-polygons-complete-3857.zip"
download.file(data_url, temp)
unzip(temp, exdir = "world_poly")
world_poly <- readOGR("world_poly/simplified-land-polygons-complete-3857", "simplified_land_polygons")
## OGR data source with driver: ESRI Shapefile 
## Source: "world_poly/simplified-land-polygons-complete-3857", layer: "simplified_land_polygons"
## with 62864 features
## It has 1 fields
unlink(temp)

# Fix to Mollweide Projection (equal area)
world_poly <- spTransform(world_poly, CRS("+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"))

Preparing the rasters

Looking at the raster information above, the max cell values are 3,489,988 and 634,492 respectively, which are way off a normal density. These are probably outliers but we can easily correct them by adjusting cell values from a cut-off point and above. This can be calculated based on quantiles, but be aware that calculating this may take some time:

max_cval75 <- quantile(values(pop1975), 0.995) # no need to run this

Assuming a 99.5% upper cut-off point, the max cell value for 1975 would be max_cval75 = 116.8624. The 99.5% however is kind of arbitrary, and a more informative approach, possibly using density plots would be required to find a better value. For this practical, we can use the original max values that were placed by the creators, which can be found in the .ovr files:

# Set max values
max_cval75 <- 89.74 
max_cval15 <- 162.73  

We can now reclassify cells using these values. Still, working with rasters of that size is computationally very expensive, so in this instance we could take advantage of the core parallel processing that the raster package offers.

beginCluster() # Parallel processing - START
## Loading required namespace: parallel
## 28 cores detected, using 27
# Reclassify cell values 

pop1975_corr <- reclassify(pop1975, rcl = c(max_cval75, Inf, max_cval75))
pop2015_corr <- reclassify(pop2015, rcl = c(max_cval15, Inf, max_cval15))

endCluster() # Parallel processing - END

Simple Plots

Let’s make some basic plots of the two rasters. These are very simple plots and at this scale, they’re not too informing. Perhaps it would be best to work with a smaller region in the future.

# Simple plot of 1975
plot(pop1975_corr, 
     main="Population 1975, 1x1km Grid")