A common GIS operation is that of dissolve - creating larger areas from a set of smaller ones using the value of an attribute. Typically we might start with a set of subnational regions, and create national boundaries from the country code extracted from the region codes.
This turns out to be slightly tricky in R, and my students have been confused by its complexity in the past.
Boundaries for EU countries and their subdivisions are available from the GISCO website. The map projection chosen by the EU for the display of their maps is the Lambert Azimuthal Equal Area, with a central meridian at 10$deg;E and a latitude of origin at 52°N (in a field about 20km south of Hildesheim in Germany).
library(rgdal)
library(rgeos)
The libraries are used for the following tasks:
| Library | Task |
|---|---|
| rgdal | Read an ESRI shapefile with readOGR() |
| rgeos | We use gUnaryUnion() for the dissolve |
The NUTS (Nomenclature of Terroritial Units for Statistics) regions1 represent a hierarchical tesselation of the European territory. NUTS0 represent the countries of the EU. The EUROSTAT description identifies the various levels as:
| Level | Type |
|---|---|
| NUTS1 | Major socio-economic regions |
| NUTS2 | Basicegions for the application of regional policies |
| NUTS3 | Small regions for specific diagnoses |
The associated NUTS codes are also hierarchical, for example:
| Code | Level | Area |
|---|---|---|
| AT | 0 | Austria |
| AT1 | 1 | Eastern Austria (Ostösterreich) |
| AT11 | 2 | AT11 Burgenland |
| AT111 | 3 | AT111 Mittelburgenland |
There are further subdivisions into Local Administrative Units - in Austria LAU 1 are the same as NUTS3, and LAU 2 are the 2357 Municipalities.
I have already done some processing on the NUTS boundaries, in particular extracting the NUTS3 boundaries from the dataset which I downloaded from GISCO2. The NUTS3 boundaries represent the 2010 version of the NUTS.
We plot the boundaries as a starting point.
NUTS3 <- readOGR(dsn=".",layer="NUTS3_2010",stringsAsFactors=FALSE)
plot(NUTS3)
box()
The shapefile includes boundaries for the French colonies in the Caribbean, South America, and the Indian Ocean, as well as the Portuguese Açores and Madeira, and the Canary Islands (Spain). For the moment, we will omit them. The regions in question are:
| Region | Name |
|---|---|
| FR910 | Gaudeloupe |
| FR920 | Martinique |
| FR930 | Guyane |
| FR940 | Réunion |
| PT200 | Região Autónoma dos Açores |
| PT300 | Região Autónoma da Madeira |
| ES703 | El Hierro [Canary Islands ES7) |
| ES704 | Fuerteventura |
| ES705 | Gran Canaria |
| ES706 | La Gomera |
| ES707 | La Palma |
| ES708 | Lanzarote |
| ES709 | Tenerife |
Rather ironically El Hierro used to be the site of the Europe Meridian (Isla del Meridiano) prior to the adoption of the Greenwich Meridian as the mapping standard in the 19th century. As the most western island in Europe, longitudes measured from El Hierro3 are all positive.
To undertake the removal, we create a vector of the codes, and match this with the codes in the NUTS3 spatial polygons data frame (SPDF). The which() function identifies the record numbers in the SPDF, and we remove them.
ExReg <- c("FR910","FR920","FR930","FR940",
"PT200","PT300",
"ES703","ES704","ES705","ES706","ES707","ES708","ES709")
omits <- which(NUTS3$NUTS_ID %in% ExReg)
NUTS3 <- NUTS3[-omits,]
I have discovered that removing spatial units in this fashion can create a problem for the internal coherence of the SPDF. The IDs for the polygons and the attribute data records become mismatched. We have to update the the row names for the records in the data slot with the IDs from the polygons slot.
rownames(NUTS3@data) <- sapply(slot(NUTS3, "polygons"), function(x) slot(x, "ID"))
To create a country SPDF we can use the gUnaryUnion() function from the rgeos library to dissolve each country’s internal NUTS3 boundaries. The output from gUnaryUnion() is a Spatial Polygons object, so we have to add on some attribute data with the correct row IDs. The NUTS0 codes are the first two characters of the NUTS_ID item in the NUTS3 data.
Country <- substr(NUTS3$NUTS_ID,1,2)
Cty <- gUnaryUnion(NUTS3,Country)
IDs <- data.frame(ID=sapply(slot(Cty, "polygons"), function(x) slot(x, "ID")))
rownames(IDs) <- IDs$ID
Cty <- SpatialPolygonsDataFrame(Cty,IDs)
Et voila!.
plot(NUTS3,border="lightgrey")
plot(Cty,add=T)
xy <- coordinates(Cty)
text(xy[,1],xy[,2],Cty$ID,cex=0.5,col="red")
box()
We need a function to act as a shortcut. Something along the lines of:
dissolve <- function(SPDF,attribute){
rownames(SPDF@data) <- sapply(slot(SPDF, "polygons"), function(x) slot(x, "ID"))
Temp <- gUnaryUnion(SPDF,attribute)
IDlist <- data.frame(ID=sapply(slot(Temp, "polygons"), function(x) slot(x, "ID")))
rownames(IDlist) <- IDlist$ID
SpatialPolygonsDataFrame(Temp,IDlist)
}
We can now try it out. We start by re-reading the data:
NUTS3 <- readOGR(dsn=".",layer="NUTS3_2010",stringsAsFactors=FALSE)
ExReg <- c("FR910","FR920","FR930","FR940",
"PT200","PT300",
"ES703","ES704","ES705","ES706","ES707","ES708","ES709")
omits <- which(NUTS3$NUTS_ID %in% ExReg)
NUTS3 <- NUTS3[-omits,]
Country <- substr(NUTS3$NUTS_ID,1,2)
Having re-read the data, and created the dissolve item, we can do the dissolve:
Cty <- dissolve(NUTS3,Country)
plot(NUTS3,border="lightgrey")
plot(Cty,add=T)
xy <- coordinates(Cty)
text(xy[,1],xy[,2],Cty$ID,cex=0.5,col="red")
box()
And all seems to function satisfactorily
Currently the input to gUnaryUnion() is a spatial polygons data frame (SPDF) and what ArcGIS would call a dissolve item. The vector which is the dissolve item should contain data of type character, and be of the same length as the SPDF.
The gUnaryUnion() output is a spatial polygons object, and the user is required to rebuild this as he or she sees fit. One approach to rebuilding is to create another SPDF with a single item in the data slot - the values of the dissolve item. That means the user can subsequently carry out table joins, merges, spatial joins or whatever, based on the ID.
If there are associated pre-dissolve data, these can be processed across the IDs to create vectors of means, medians, standard errors, matrices of dummy variables, and so on, which then then easily merged into the output SPDF’s data slot using the ID as a key.
http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts10↩
In English publications this meridian is sometimes known as the Ferro Meridian.↩