Background

A common GIS operation is that of dissolve - creating larger areas from a set of smaller ones using the value of an attribute. Typically we might start with a set of subnational regions, and create national boundaries from the country code extracted from the region codes.

This turns out to be slightly tricky in R, and my students have been confused by its complexity in the past.

Boundaries for EU countries and their subdivisions are available from the GISCO website. The map projection chosen by the EU for the display of their maps is the Lambert Azimuthal Equal Area, with a central meridian at 10$deg;E and a latitude of origin at 52°N (in a field about 20km south of Hildesheim in Germany).

Loading the libraries

library(rgdal)
library(rgeos)

The libraries are used for the following tasks:

Library Task
rgdal Read an ESRI shapefile with readOGR()
rgeos We use gUnaryUnion() for the dissolve

Loading the data

The NUTS (Nomenclature of Terroritial Units for Statistics) regions1 represent a hierarchical tesselation of the European territory. NUTS0 represent the countries of the EU. The EUROSTAT description identifies the various levels as:

Level Type
NUTS1 Major socio-economic regions
NUTS2 Basicegions for the application of regional policies
NUTS3 Small regions for specific diagnoses

The associated NUTS codes are also hierarchical, for example:

Code Level Area
AT 0 Austria
AT1 1 Eastern Austria (Ostösterreich)
AT11 2 AT11 Burgenland
AT111 3 AT111 Mittelburgenland

There are further subdivisions into Local Administrative Units - in Austria LAU 1 are the same as NUTS3, and LAU 2 are the 2357 Municipalities.

I have already done some processing on the NUTS boundaries, in particular extracting the NUTS3 boundaries from the dataset which I downloaded from GISCO2. The NUTS3 boundaries represent the 2010 version of the NUTS.

We plot the boundaries as a starting point.

NUTS3 <- readOGR(dsn=".",layer="NUTS3_2010",stringsAsFactors=FALSE)
plot(NUTS3)
box()

Preparing the data

The shapefile includes boundaries for the French colonies in the Caribbean, South America, and the Indian Ocean, as well as the Portuguese Açores and Madeira, and the Canary Islands (Spain). For the moment, we will omit them. The regions in question are:

Region Name
FR910 Gaudeloupe
FR920 Martinique
FR930 Guyane
FR940 Réunion
PT200 Região Autónoma dos Açores
PT300 Região Autónoma da Madeira
ES703 El Hierro [Canary Islands ES7)
ES704 Fuerteventura
ES705 Gran Canaria
ES706 La Gomera
ES707 La Palma
ES708 Lanzarote
ES709 Tenerife

Rather ironically El Hierro used to be the site of the Europe Meridian (Isla del Meridiano) prior to the adoption of the Greenwich Meridian as the mapping standard in the 19th century. As the most western island in Europe, longitudes measured from El Hierro3 are all positive.

To undertake the removal, we create a vector of the codes, and match this with the codes in the NUTS3 spatial polygons data frame (SPDF). The which() function identifies the record numbers in the SPDF, and we remove them.

ExReg <- c("FR910","FR920","FR930","FR940",
           "PT200","PT300",
           "ES703","ES704","ES705","ES706","ES707","ES708","ES709")
omits <- which(NUTS3$NUTS_ID %in% ExReg)
NUTS3 <- NUTS3[-omits,]

I have discovered that removing spatial units in this fashion can create a problem for the internal coherence of the SPDF. The IDs for the polygons and the attribute data records become mismatched. We have to update the the row names for the records in the data slot with the IDs from the polygons slot.

rownames(NUTS3@data) <- sapply(slot(NUTS3, "polygons"), function(x) slot(x, "ID"))

To create a country SPDF we can use the gUnaryUnion() function from the rgeos library to dissolve each country’s internal NUTS3 boundaries. The output from gUnaryUnion() is a Spatial Polygons object, so we have to add on some attribute data with the correct row IDs. The NUTS0 codes are the first two characters of the NUTS_ID item in the NUTS3 data.

Country <- substr(NUTS3$NUTS_ID,1,2)
Cty <- gUnaryUnion(NUTS3,Country)
IDs <- data.frame(ID=sapply(slot(Cty, "polygons"), function(x) slot(x, "ID")))
rownames(IDs)  <- IDs$ID
Cty <- SpatialPolygonsDataFrame(Cty,IDs)

Et voila!.

plot(NUTS3,border="lightgrey")
plot(Cty,add=T)
xy <- coordinates(Cty)
text(xy[,1],xy[,2],Cty$ID,cex=0.5,col="red")
box()

Putting it all togther

We need a function to act as a shortcut. Something along the lines of:

dissolve <- function(SPDF,attribute){
   rownames(SPDF@data) <- sapply(slot(SPDF, "polygons"), function(x) slot(x, "ID"))   
   Temp <- gUnaryUnion(SPDF,attribute)
   IDlist <- data.frame(ID=sapply(slot(Temp, "polygons"), function(x) slot(x, "ID")))
   rownames(IDlist)  <- IDlist$ID
   SpatialPolygonsDataFrame(Temp,IDlist)
}

We can now try it out. We start by re-reading the data:

NUTS3 <- readOGR(dsn=".",layer="NUTS3_2010",stringsAsFactors=FALSE)
ExReg <- c("FR910","FR920","FR930","FR940",
           "PT200","PT300",
           "ES703","ES704","ES705","ES706","ES707","ES708","ES709")
omits <- which(NUTS3$NUTS_ID %in% ExReg)
NUTS3 <- NUTS3[-omits,]
Country <- substr(NUTS3$NUTS_ID,1,2)

Having re-read the data, and created the dissolve item, we can do the dissolve:

Cty <- dissolve(NUTS3,Country)
plot(NUTS3,border="lightgrey")
plot(Cty,add=T)
xy <- coordinates(Cty)
text(xy[,1],xy[,2],Cty$ID,cex=0.5,col="red")
box()

And all seems to function satisfactorily

Why have I done this?

Currently the input to gUnaryUnion() is a spatial polygons data frame (SPDF) and what ArcGIS would call a dissolve item. The vector which is the dissolve item should contain data of type character, and be of the same length as the SPDF.

The gUnaryUnion() output is a spatial polygons object, and the user is required to rebuild this as he or she sees fit. One approach to rebuilding is to create another SPDF with a single item in the data slot - the values of the dissolve item. That means the user can subsequently carry out table joins, merges, spatial joins or whatever, based on the ID.

If there are associated pre-dissolve data, these can be processed across the IDs to create vectors of means, medians, standard errors, matrices of dummy variables, and so on, which then then easily merged into the output SPDF’s data slot using the ID as a key.



  1. http://ec.europa.eu/eurostat/web/nuts/overview

  2. http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts10

  3. In English publications this meridian is sometimes known as the Ferro Meridian.