Basic Choropleth Mapping with GISTools

Overview

GISTools is a set of utilities for creating maps and manipulating spatial data in R. It is very useful as a quick way to get websites with maps published in them, or to make reproducible documents containing maps.

For example, this web page was created from an R Markdown document. Markdown is a simple formatting syntax for authoring web pages. (If you use RStudio then open an R Markdown document and click the MD toolbar button for help on Markdown - when you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document).

You will need to have installed the GISTools package and its dependencies (maptools, RColorBrewer, rgeos, MASS) before you can work through the example below. If do not already have this package, the installion is probably most easily done by checking the 'install dependencies' option when installing via the GUI, or by using

install.packages('GISTools',depend=T)

from the command line. The package requires a version of R no older than 2.15.0.

Getting started with GISTools

Initially, load the package and select the georgia data set:

library(GISTools)
data(georgia)

georgia is a SpatialPolygonsDataFrame containing various information about counties in the state of Georgia, from the 1990 US Census. One variable record is the median income per county (MedInc):

hist(georgia$MedInc, col = "red")

plot of chunk unnamed-chunk-2


With GISTools you can also create choropleth maps quickly:

choropleth(georgia, georgia$MedInc)

plot of chunk unnamed-chunk-3


That was a very quick but not very helpful map. It is useful for exploration of spatial pattern if you know more about the magnitude of the income values, the name of the area you are examining and so on, but for a wider audience you should add more detail to the map, such as a legend, a north arrow and some explanatory text. This is possible with a few more GISTools function calls:

# Compute a rate for mapping and put it in a variable - re-scale to avoid
# a profusion of zeros
incomes <- georgia$MedInc/1000
# Create a shading scheme
shades <- auto.shading(incomes, n = 6, cutter = rangeCuts, cols = brewer.pal(6, 
    "Greens"))
# Draw a map based on this scheme
choropleth(georgia, incomes, shades)
# Add a legend based on this scheme
choro.legend(-82, 34.87, shades, fmt = "%4.1f", title = "Median Income (1000$'s)")
# Add a title to the map
title("Median Income in Georgia (1990)")
# ... and a little more explanatory text
text(-85.3, 30.4, "Source: US Census")
# ... and a north-point arrow
north.arrow(-80.9, 30.4, 0.1)
# and draw a box around it
box(which = "outer")

plot of chunk unnamed-chunk-4


A Little More Explanation

The choropleth command above pretty much does as the name suggests. Given a SpatialPolygons or SpatialPolygonsDataFrame object and a numeric vector with as many elements as there are polygons, it draws a choropleth map. As a default it automatically provides a shading colour scheme and a set of intervals. However these schemes are only automatic, not necessarily optimal - and in some cases not even appropriate! The intention was just to provide a 'quick look' option for users with some prior knowledge of the variable being mapped. This is very much for private rather than public use.

However, if a shading parameter is also given, more control is offered. This is what happened in the second map example above. The shading parameter is an object of class shading - this contains information about the two key characteristics of choropleth map shading - the colours used and the numerical intervals into which the data are divided. These are passed to the choropleth function, to actually create the map, and also to the choro.legend function, to create the legend. Obviously, for the map to make sense, the same shading object should be used for the map and the legend.

One way of creating a shading object is through the auto.shading command. As the name suggests, this isn't a completely manual approach – in fact this is what the default approach in choropleth uses – but by overriding defaults, quite a few characteristics can be changed. In particular n controls the number of classes, cols specify the shading colours, and cutter is an R function to choose a set of interval cut-points given the data to be plotted. rangeCuts and quantileCuts are two ready-made such functions, and respectively provide equal interval and quantile-based categorisations.

Another way of creating a shading is by using the shading command. This simply takes a list of numerical cut values, and a corresponding list of colours, to create the shading object. This provides full manual control, if none of the automated methods work well. Note that there must be one more colour than there are break values. If the cut points list is c(x1,x2,...,xn) then this implies intervals of c(-Inf,x1),c(x1,x2), … , c(xn,Inf) - so that the number of classes is one more than the number of cut points provided.

As an example, here is a choropleth map based on round numbers of income:

# Create a different shading scheme - this time based on multiples of 10k
shades2 <- shading(c(30, 40, 50, 60, 70), cols = brewer.pal(6, "Blues"))
# Draw a map based on this scheme
choropleth(georgia, incomes, shades2)
# Add a legend based on this scheme
choro.legend(-82, 34.87, shades2, fmt = "%4.1f", title = "Median Income (1000$'s)")
# add the embellishments
title("Median Income in Georgia (1990)")
text(-85.3, 30.4, "Source: US Census")
north.arrow(-80.9, 30.4, 0.1)
box(which = "outer")

plot of chunk unnamed-chunk-5

a final thing to note is that the colour list can consist of elements of any kind of valid colour specification in R. Here, brewer.pal is used to generate lists of palettes from Cynthia Brewer's ColorBrewer map colouring system.


brewer.pal can be used to produce divergent colour schemes, where hue is used to indicate whether mapped values are above or below a datum, and intensity is used to show the amount above or below. A typical datum might be zero, where maps show whether values are positive or negative. The following example, based on the income data, uses the mean value as a datum.

# Function to make symmetric equal size shading intervals centred on the
# mean
symRangeCuts <- function(x, n, params = NA) {
    x.bar <- mean(x)
    cuts.list <- rangeCuts(c(-x + x.bar, x - x.bar), n) + x.bar
    return(cuts.list)
}
# Create a shading scheme using this function
shades3 <- auto.shading(incomes, n = 9, cutter = symRangeCuts, cols = brewer.pal(9, 
    "PuOr"))
# Proceed as before - this time the box is thicker...
choropleth(georgia, incomes, shades3)
choro.legend(-82, 34.87, shades3, fmt = "%3.0f", title = "Median Income (1000$'s)", 
    bty = "n")
title("Median Income in Georgia (1990)", cex.main = 2)
text(-85.3, 30.4, "Source: US Census")
north.arrow(-80.9, 30.4, 0.1)
box(which = "outer", lwd = 8)

plot of chunk unnamed-chunk-6

The fact there are no areas shaded with the highest intensities of brown, but the appearance of some shaded in the highest intensities of purple suggest that there is a positive skew to the data. This is also suggested by the histogram drawn at the beginning of this session. The example also shows how to specify divergent colour schemes in RColorBrewer and how to define your own cutter functions. The function symRangeCuts provides a set of equal interval cuts on the range

mean(x) - max(abs(x - mean(x))) to
mean(x) + max(abs(x - mean(x)))

ensuring a set of intervals in which the central category contains the datum.

Notice that I have also played around with some of the other function parameters to change the appearance of the map - in particular, the surrounding box is now thicker, and the legend no longer has a box. Also, the format of the numbers in the legend has been altered. The latter is specified by the fmt parameter in the choro.legend function, and is a C-style format, as used in the R (and C) function sprintf.


One thing you will not have seen in the above example is the inclusion of a scale bar. This is because the previous maps were shown in geographical coordinates, (Latitude and Longitude) - and the ratio scale in the x-direction to the scale in the y-direction changes for different latitudes, so in general the notion of a fixed scale is meaningless. However, when the georgia data set was loaded, another SpatialPolygonsDataFrame called georgia2 was also loaded. This is an equivalent set of zones to georgia but this time it is in the Albers equal area projection.

With this map projection, a scale becomes meaningfull, and so the code below adds this:

choropleth(georgia2,incomes,shades3)
choro.legend(1300000,1400000,shades3,fmt="%3.0f",title="Median Income (1000$'s)",bty='n')
title('Median Income in Georgia (1990)',cex.main=2)
text(976000,880000,'Source: US Census')
north.arrow(1400000,900000,10000)
map.scale(1400000,880000,60000,"km",3,20,sfcol='brown')
box(which='outer',lwd=8)

plot of chunk unnamed-chunk-7

One key difference between this code and that used earlier is that since the map is now based on a projected coordinate system, the spatial units used to place the legend, north arrow and so on have changed. Since the map is a slightly different shape, I have also moved these features relative to one another, to better fit the new map shape. The new function is map.scale which is used to draw the scale. The first two parameters provide the location for the scale, and the next gives the length of the scale in map units. Here the units are metres, so the length of 60,000 implies the scale has a length of 60km. The next parameter gives the units that the scale will be labelled in, and the next parameter the number of sub-divisions on the scale. This will have three. The next parameter gives the amount that the numerical values on the subdivisions will step up by. Here we wish that to be 20km. Finally, sfcol = "brown" tells the function that the subdivisions in the scale should be coloured in brown. The idea of specifing the subdivision step-up value and the length separately is that this makes it possible to create a scale in different spatial units to the map units in the projection. here, for example, labelling the scale in metres would be unweildy (too many zeros) and so it was labelled in km instead.