The Global Human Settlement Layer (GHSL) is a scientific tool offered by the European Commission Science Hub in order to provide global spatial information about the human presence on the planet over time. According to creators, it provides a globally-consistent, multi-scale and detailed representation of built-up areas. The data are free to download. More information, as well as the data repository can be found at the dedicated GSHL portal.
In this practical we will experiment with the capabilities that R currently offers in raster manipulation, particularly the raster library. We will use two raster files containing world population estimates at 1 km x 1 km Grid, on for 1975 and one for 2015. Let’s start by downloading this datasets into R:
options(scipen=999)
# Don't forget to set your working directory first; it will download all the files there.
# Load necessary libraries
library(raster)
## Loading required package: sp
library(rgdal)
## rgdal: version: 1.2-8, (SVN revision 663)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
## Path to GDAL shared files: /usr/share/gdal/1.10
## Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
## Path to PROJ.4 shared files: (autodetected)
## Linking to sp version: 1.2-5
# Download Raster data
# For 1975, 1x1 km Grid
temp <- tempfile()
data_url <- "http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_POP_GPW4_GLOBE_R2015A/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k/V1-0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.zip"
download.file(data_url, temp)
unzip(temp, exdir = "GHSL_pop/1975/")
unlink(temp)
# For 2015, 1x1 km Grid
temp <- tempfile()
data_url <- "http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_POP_GPW4_GLOBE_R2015A/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k/V1-0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.zip"
download.file(data_url, temp)
unzip(temp, exdir = "GHSL_pop/2015/")
unlink(temp)
# Import the download rasters (tiff) for 1975 and 2015
pop1975 <- raster("GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif")
pop2015 <- raster("GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif")
# This will print some basic raster infomormation
GDALinfo("GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif")
## rows 15236
## columns 35497
## bands 1
## lower left origin.x -17619595
## lower left origin.y -6484971
## res.x 1000
## res.y 1000
## ysign -1
## oblique.x 0
## oblique.y 0
## driver GTiff
## projection +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs
## file GHSL_pop/1975/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW41975_GLOBE_R2015A_54009_1k_v1_0.tif
## apparent band summary:
## GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32 FALSE 0 256 256
## apparent band statistics:
## Bmin Bmax Bmean Bsd
## 1 0 3489988 7.510408 460.785
## Metadata:
## AREA_OR_POINT=Area
GDALinfo("GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif")
## rows 15236
## columns 35497
## bands 1
## lower left origin.x -17619595
## lower left origin.y -6484971
## res.x 1000
## res.y 1000
## ysign -1
## oblique.x 0
## oblique.y 0
## driver GTiff
## projection +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs
## file GHSL_pop/2015/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0/GHS_POP_GPW42015_GLOBE_R2015A_54009_1k_v1_0.tif
## apparent band summary:
## GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32 FALSE 0 256 256
## apparent band statistics:
## Bmin Bmax Bmean Bsd
## 1 0 634492 13.59264 407.3207
## Metadata:
## AREA_OR_POINT=Area
# Download a basic world land polygons layer
temp <- tempfile()
data_url <- "http://data.openstreetmapdata.com/simplified-land-polygons-complete-3857.zip"
download.file(data_url, temp)
unzip(temp, exdir = "world_poly")
world_poly <- readOGR("world_poly/simplified-land-polygons-complete-3857", "simplified_land_polygons")
## OGR data source with driver: ESRI Shapefile
## Source: "world_poly/simplified-land-polygons-complete-3857", layer: "simplified_land_polygons"
## with 62864 features
## It has 1 fields
unlink(temp)
# Fix to Mollweide Projection (equal area)
world_poly <- spTransform(world_poly, CRS("+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"))
Looking at the raster information above, the max cell values are 3,489,988 and 634,492 respectively, which are way off a normal density. These are probably outliers but we can easily correct them by adjusting cell values from a cut-off point and above. This can be calculated based on quantiles, but be aware that calculating this may take some time:
max_cval75 <- quantile(values(pop1975), 0.995) # no need to run this
Assuming a 99.5% upper cut-off point, the max cell value for 1975 would be max_cval75 = 116.8624
. The 99.5% however is kind of arbitrary, and a more informative approach, possibly using density plots would be required to find a better value. For this practical, we can use the original max values that were placed by the creators, which can be found in the .ovr files:
# Set max values
max_cval75 <- 89.74
max_cval15 <- 162.73
We can now reclassify cells using these values. Still, working with rasters of that size is computationally very expensive, so in this instance we could take advantage of the core parallel processing that the raster
package offers.
beginCluster() # Parallel processing - START
## Loading required namespace: parallel
## 28 cores detected, using 27
# Reclassify cell values
pop1975_corr <- reclassify(pop1975, rcl = c(max_cval75, Inf, max_cval75))
pop2015_corr <- reclassify(pop2015, rcl = c(max_cval15, Inf, max_cval15))
endCluster() # Parallel processing - END
Let’s make some basic plots of the two rasters. These are very simple plots and at this scale, they’re not too informing. Perhaps it would be best to work with a smaller region in the future.
# Simple plot of 1975
plot(pop1975_corr,
main="Population 1975, 1x1km Grid")
# Plot of 2015, adding coastline polygons
plot(pop2015_corr,
main="Population 2015, 1x1km Grid")
plot(world_poly, add=T)
Working with such large datasets usually takes a copious amount of time; it is best if we extract an area of interest and performe some analysis there. Let’s take a look at the Guangzhou / Shenzhen / Hong Kong conurbation in SE Asia. We can input this region’s coordinates (source: http://www.findlatitudeandlongitude.com/?loc=guangzhou+china#.WAiBjp4oA_s) using the extent
and crop
command.
# Extract the Area of Interest (AOI)
# Note the extent order: {xmin, xmax, ymin, ymax}
Guangzhou_poly <- extent(10669476, 11019836, 2650781, 2946565) # or as(..., "SpatialPolygons") if there is need for a complex polygon
# Clip to the extents of the AOI
pop1975_corr_AOI <- crop(pop1975_corr, Guangzhou_poly)
pop2015_corr_AOI <- crop(pop2015_corr, Guangzhou_poly)
Before we plot the map, we need to descide on the colour palette. We will use a custom colour palette; it’s actually very easy to make and manipulate.
# Make custom colour palette
custom_ColPal <- colorRampPalette(c("white", "pink", "maroon"))
# Plots the palette, slicing it into x colours, which is customizable using custom_ColPal(x), eg:
plot(rep(1,50), col=(custom_ColPal(50)), pch=19, cex=2)
# Plot for 1975
plot(pop1975_corr_AOI,
col = custom_ColPal(32),
main="Guangzhou Region, Population 1975, 1x1km Grid")
plot(world_poly, add=T)
# Plot for 2015
plot(pop2015_corr_AOI,
col = custom_ColPal(32),
main="Guangzhou Region, Population 2015, 1x1km Grid")
plot(world_poly, add=T)
As far as map colours are concerned, you can also check out the
RColorbrewer
package for some very nice built in color ramps, and also the classInt
package for making breakpoints.
Now we can calculate and map the difference in population between 1975 and 2015. The easiest way to do so is to “subtract” the 1975 values from the 2015 values for each grid cell. In doing so however, cells with zero population (e.g.lakes, deserts, etc.) will appear to be classified the same as those cells in palces with no or very small change in population.
We can fix that by replacing 0’s with NAs for the two rasters prior to subtracting them. We use the mask
command here to update values:
# Convert zeros to NAs - may take some time
pop1975_corr_AOI_masked <- mask(pop1975_corr_AOI, world_poly, updatevalue = NA)
pop2015_corr_AOI_masked <- mask(pop2015_corr_AOI, world_poly, updatevalue = NA)
Now let’s calculate and save the resulting raster before plotting it.
# Calculate difference
pop_diff_AOI <- pop2015_corr_AOI_masked - pop1975_corr_AOI_masked
# Write new raster to disk
writeRaster(pop_diff_AOI, "Guangzhou_Region_Pop_Diff_1975-2015_1km_Grid.tif", format="GTiff", overwrite=TRUE)
# Make custom colour palette
custom_ColPal <- colorRampPalette(c("deepskyblue4", "darkcyan", "springgreen4", "yellow", "red", "darkmagenta"))
# Plots the palette, slicing it into x colours, which is customizable using custom_ColPal(x)
plot(rep(1,50), col=(custom_ColPal(50)), pch=19, cex=2)
# Plot population difference
plot(pop_diff_AOI,
col = custom_ColPal(36),
main="Guangzhou Region, Population Difference 1975-2015, 1x1km Grid")
plot(world_poly, add=T)
This map clearly shows the level of exapansion of the conurbation at the urban cluster’s periphery.
Hopefully, this is a good starting point to working with rasters. The actual GHS Layers are also very interesting to work with. The same datasets we used here are provided at even more granular level, at 250m x 250m Grid cell, and some amazing maps can be produced with them.