Preliminaries

For this practical session you will need to have the following packages installed and loaded - skip this if yopu are using a lab machine.

# test for package existance and install
if (!is.element("sf", installed.packages())) 
    install.packages("sf", dep = T)
if (!is.element("GISTools", installed.packages()))
    install.packages("GISTools", dep = T)
if (!is.element("tmap", installed.packages()))
    install.packages("tmap", dep = T)
if (!is.element("rgdal", installed.packages()))
    install.packages("rgdal", dep = T)
if (!is.element("grid", installed.packages()))
    install.packages("grid", dep = T)
if (!is.element("spdep", installed.packages()))
    install.packages("spdep", dep = T)
# load into the R session
library(sf)
library(GISTools)
library(tmap)
library(rgdal)
library(grid)
library(spdep)

1. Introduction: the sf revolution

This session will introduce both the sp and sf spatial data formats. It will demonstrate how to undertake some simple Choropleth mapping using the tmap package. It will introduce and apply some basic cluster analysis (Moran’s I, G-statistic). Along the way it will do some reading and writing of spatial data, such as shapefiles, into and out of R.

Why Spatial Data? Nearly all data are spatial they are collected some-where and in fact these arguments can be extended to the spatio-temporal domain: all data are spatiotemporal – they are collected some-where and at some-time.

In the preceding description of data each of the records (rows) related to some kind of real world feature (a person, a transaction, a date, etc) and the columns represent some attribute associated with that feature. In spatial data, each record typically represents some real world geographical feature - a place, a route, a region etc. In fact geographic data typically represent points, lines or areas.

There is some key background information to working with spatial data in R. R is dynamic - things don’t stay the same. New tools, packages and functions are constantly be produced, they are updated to improve and develop them. In most cases this is not problematic as the update almost always extends the functionality of the package without affecting the original code. However in a few instances, specific packages are completely re-written without backwards compatibility. If this happens then R code that previously worked may not work with the new package as the functions may take different parameters and arguments. However, there is usually a period of a few package versions before the code stops to work all together.

Occasionally a completely new paradigm is introduced and this has been the case recently for spatial data in R with the advent of the sf package.

Much of the code for handling spatial data in this workshop is built around data structures defined in the sp package and it defines the following spatial objects in R:

The sp package underpins many of the packages that you have loaded either directly or indirectly (ie they are loaded by other packages) have dependencies on sp: the GISTools package implicitly links to maptools, sp, rgeos etc. You may have noticed these packages being listed as GISTools loads.

However, recently a new class of spatial object has been defined called sf of Simple Feature that seeks to encode spatial data in a way that conforms to a formal standard (ISO 19125-1:2004). This emphasises the spatial geometry of objects, the way that objects are stored in databases. In brief, the aim of the team developing sf (actually they are the same people who developed sp so they do know what they are doing!!) is to provide a standard format for spatial data. An overview of the evolution of spatial data in R can be found at https://edzer.github.io/UseR2017/.

Their idea is that a feature is a thing, or an object in the real world, such as a building or a tree. As is the case with objects, they often consist of other objects. This is the case with features too: a set of features can form a single feature. A forest stand can be a feature, a forest can be a feature, a city can be a feature. A satellite image pixel can be a feature, a complete image can be a feature too. Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties. There many sf object types but the key ones (that are similar to lines, points and areas*) are in the table below (taken from the sf vignette).

Ultimately, it is planned that sf formats will completely replace sp and packages that use sp (such GWmodel for GWR) will all have to be updated to use sf at some point – but that is a few years away.

So…

in this session we will introduce both sp and sf as well as different mapping routines (with the plot function and with functions in the tmap package) and the tools to move between them. The later sessions on regression and spatial interaction models will mainly use sf.

2. Spatial Data

This section will first examine sp and then sf. It will also describe some mapping commands how to move (convert) between different spatial data formats on the way!

2.1 sp format

The GISTools package comes with spatial data that we can use to illustrate the sp format.

Load the georgia data and examine the georgia object. You can see that it has a class of SpatialPolygonsDataFrame.

library(GISTools)
# laod the data
data(georgia)
# look at what is loaded
ls()
# examine them
class(georgia)
summary(georgia)

This is a bit like an ESRI polygon shapefile: it has some spatial properties and attribute properties held in a data.frame. Is has a proj4string but is not projected - ie it has latitude and longitude in degrees rather than being projected for example in metres.

You can access the data.frame in a number of ways to explore the attributes:

head(data.frame(georgia))
head(georgia@data)

Note that the head() function was included int he above to limit the print out to the first 6 lines or records.

Similarity you can generate summaries of the data values:

summary(georgia@data)
##     Latitude        Longitud         TotPop90         PctRural     
##  Min.   :30.72   Min.   :-85.50   Min.   :  1915   Min.   :  2.50  
##  1st Qu.:31.79   1st Qu.:-84.44   1st Qu.:  9220   1st Qu.: 54.70  
##  Median :32.75   Median :-83.69   Median : 16934   Median : 72.30  
##  Mean   :32.81   Mean   :-83.58   Mean   : 40744   Mean   : 70.18  
##  3rd Qu.:33.79   3rd Qu.:-82.85   3rd Qu.: 36058   3rd Qu.:100.00  
##  Max.   :34.92   Max.   :-81.09   Max.   :648951   Max.   :100.00  
##     PctBach          PctEld          PctFB           PctPov     
##  Min.   : 4.20   Min.   : 1.46   Min.   :0.040   Min.   : 2.60  
##  1st Qu.: 7.60   1st Qu.: 9.81   1st Qu.:0.415   1st Qu.:14.05  
##  Median : 9.40   Median :12.07   Median :0.720   Median :18.60  
##  Mean   :10.95   Mean   :11.74   Mean   :1.131   Mean   :19.34  
##  3rd Qu.:12.00   3rd Qu.:13.70   3rd Qu.:1.265   3rd Qu.:24.65  
##  Max.   :37.50   Max.   :22.96   Max.   :6.740   Max.   :35.90  
##     PctBlack           X                 Y                 ID       
##  Min.   : 0.00   Min.   : 635964   Min.   :3401148   Min.   :13001  
##  1st Qu.:11.75   1st Qu.: 741144   1st Qu.:3523380   1st Qu.:13082  
##  Median :27.64   Median : 809737   Median :3636468   Median :13161  
##  Mean   :27.39   Mean   : 820944   Mean   :3636238   Mean   :13161  
##  3rd Qu.:40.06   3rd Qu.: 894584   3rd Qu.:3741656   3rd Qu.:13242  
##  Max.   :79.64   Max.   :1059706   Max.   :3872640   Max.   :13321  
##      Name               MedInc     
##  Length:159         Min.   :23456  
##  Class :character   1st Qu.:29773  
##  Mode  :character   Median :34307  
##                     Mean   :37147  
##                     3rd Qu.:41204  
##                     Max.   :81603

You can plot the spatial data using a basic plot command (we will do more advanced mapping later in this session) and of critical importance you can subset spatially using the same notation as you would do for a a matrix or data.frame (note the use of the add = T in the code below to add a plot layer to the preceding layer):

# 1 row, 2 columns
par(mfrow = c(1,2))
# left hand plot
plot(georgia)
plot(georgia[81,], add = T, col = "tomato")
# right hand plot
plot(georgia)
# set index
index <- c(81, 82, 83, 150, 62, 53, 21, 16, 124, 121, 17)
plot(georgia[index,], add = T, col = "tomato")
Maps of counties in Georgia, USA and a subset.

Maps of counties in Georgia, USA and a subset.

# rest par
par(mfrow = c(1,1))

The subsetting operation above using index can also be used to create a spatial subset:

g.subset <- georgia[index,]
# check the dimensions (rows, columns)
dim(georgia)
## [1] 159  14
dim(g.subset)
## [1] 11 14
# plot
plot(g.subset, col = "wheat")
# overlay the full data
plot(georgia, add = T)
Examples of the `plot` function for mapping.

Examples of the plot function for mapping.

Operations for data.frames of sp objects needs a bit more care than ordinary data.frames. Variables (attributes) can be accessed in the same way using $:

g.subset$PctBach
##  [1]  6.2  7.7  4.9  9.8  5.3  9.1  9.9 19.9  8.6 17.3  9.6

The PctBach attribute is the 5th field in the g.subset and if this was just a data.frame called df then this could be accessed using:

df[,5]

However for the data.frame of an sp object this prints out all of the spatial information as well. Try entering:

g.subset[,5]

To overcome this the data.frame needs to be directly invoked in as above:

g.subset@data[,3:5]
##     TotPop90 PctRural PctBach
## 80     17408    100.0     6.2
## 81      8247     53.8     7.7
## 82      8329    100.0     4.9
## 149    19112     67.1     9.8
## 61      2357    100.0     5.3
## 52     20546     64.2     9.1
## 20      7744     52.1     9.9
## 15     43125     63.2    19.9
## 123    13842     79.3     8.6
## 120   189719      9.9    17.3
## 16     20579     72.3     9.6
data.frame(g.subset[1:6,12:14])
##        ID       Name MedInc
## 80  13163  Jefferson  28203
## 81  13165    Jenkins  25951
## 82  13167    Johnson  26671
## 149 13303 Washington  31894
## 61  13125   Glascock  33055
## 52  13107    Emanuel  26639

Tasks

Task 1 Write some code to determine the mean value and standard deviation of median income variable MedInc from the georgia data. Hints: the mean and sd functions can help you!

Task 2 Write some code to subset and plot the counties in the 25 percentile of median income from the georgia data. Hint: the quantile function can help you and you may want to assign the result to an R object that you can use to determine a threshold with which to subset the data as was done with index above, using a logical statement.

Task 3

Finally, go back to Session 3 and examine how the tb$rural variable was created for the tb tibble object. The same operations can be used to on the data.frame of sp objects. You should create a variable called rural that separates the counties of georgia into Rural and Non-Rural classes and determine how many counties are Rural and Non-Rural using this definition. Hint the table function can help here.

2.2 sf format

The sf format seeks to implement tidy data. Data is tidy when (Wickham 2014):

  • Each variable forms a column.
  • Each observation forms a row.
  • Each type of observational unit forms a table.

The tidy data rules for sf / simple feature implies the need for data.frames where each feature forms a row. A single column (a list-column) contains the geometry for each observation.

The sf package puts features in sf tables deriving from data.frame or tbl_df, which have geometries in a list-column of class sfc, where each list element is the geometry of a single feature of class sfg. Feature geometries are represented in R by

  • a numeric vector for a single point (POINT)
  • a numeric matrix (each row a point) for a set of points (MULTIPOINT or LINESTRING)
  • a list of matrices for a set of set of points (MULTIINESTRING, POLYGON)
  • a list of lists of matrices (MULTIPOLYGON)
  • a list of anything mentioned above (GEOMETRYCOLLECTION) (all other classes also fall in one of these categories)

Other tidy aspects of sf include:

  • all functions/methods start with st_ (press tab to search), use _ and lower case
  • all function have data as first argument and are pipe-friendly (see Session 3)
  • read_sf is an alias for st_read with tidy defaults:
    • silent, stringAsFactors = FALSE

A quick look

The code below loads a dataset that is included in the sf package. It is of North Carolina (see https://cran.r-project.org/web/packages/spdep/vignettes/sids.pdf) and is loaded using the st_read function. This is a polygon spatial dataset…

# read the data
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.3/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs

which can of course be mapped

# do a quick tmap of the data
qtm(nc)
Quick `tmap` maps of North Carolina.

Quick tmap maps of North Carolina.

# and others mapping attributes
qtm(nc, fill = c("BIR74", "NWBIR79"))
Quick `tmap` maps of North Carolina.

Quick tmap maps of North Carolina.

Note that here we are using the qtm (quick tmap) function from the tmap package and not the plot function as above but also note that tmap can work with both sp and sf data formats. We can examine the data in the same way as a-spatial data:

dim(nc)
head(nc)
class(nc)

And again, you can see that nc has 100, 15 rows and columns respectively, that the class of nc indicate that it is sf class as well as data.frame. For data.frame related formats, R prints out all records and all columns. Try entering:

nc

Again you can generate and examine summaries of the nc data:

summary(nc)

Finally, it is also possibly to stop the legend and the body of the map from overlapping - although this can leave less space for the map itself:

qtm(nc, fill = "BIR74") +
  tm_layout(legend.outside = TRUE) 

qtm(nc, fill = "NWBIR79") +
  tm_layout(legend.outside = TRUE) 

sf vignettes

The sf package has a number of vignettes or tutorials that you could explore. These include an overview of the format, reading and writing from and to sf formats including conversions to and from sp and sf, and some illustrations of how sf objects can be manipulated. The code below will create a new window with a list of sf vignettes:

library(sf)
vignette(package = "sf")

To display a specific vignette topic, this can be called using the vignette function. A full description of the sf format can be found in thesf1` vignette

vignette("sf1", package = "sf")


Vignettes are an important part of R packages. They provide additional explanations of the package functionality to that found in the example code at the end of a help page. They can be accessed using the vignette function or through the R help. The sf1 vignette could also be accessed via the package help index: enter help(sf), then navigate to the index through the link at the bottom of the overview page and then click on the User guides, package vignettes and other documentation link.



2.3 Conversions between sp and sf

You will note that there are a number of functions that can be used to read, write and convert from and to sf. These will be summarised later in this session. The sf2 vignette covers these topics.

The fundamental function for converting to sf is st_as_sf(). In the code below it is used to convert the the georgia sp object to sf:

# conversion to sf
georgia_sf = st_as_sf(georgia)
class(georgia_sf)
## [1] "sf"         "data.frame"
georgia_sf
## Simple feature collection with 159 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -85.6052 ymin: 30.35541 xmax: -80.84126 ymax: 35.00068
## epsg (SRID):    4326
## proj4string:    +proj=longlat +ellps=WGS84 +no_defs
## First 20 features:
##    Latitude  Longitud TotPop90 PctRural PctBach PctEld PctFB PctPov
## 0  31.75339 -82.28558    15744     75.6     8.2  11.43  0.64   19.9
## 1  31.29486 -82.87474     6213    100.0     6.4  11.77  1.58   26.0
## 2  31.55678 -82.45115     9566     61.7     6.6  11.11  0.27   24.1
## 3  31.33084 -84.45401     3615    100.0     9.4  13.17  0.11   24.8
## 4  33.07193 -83.25085    39530     42.7    13.3   8.64  1.43   17.5
## 5  34.35270 -83.50054    10308    100.0     6.4  11.37  0.34   15.1
## 6  33.99347 -83.71181    29721     64.6     9.2  10.63  0.92   14.7
## 7  34.23840 -84.83918    55911     75.2     9.0   9.66  0.82   10.7
## 8  31.75940 -83.21976    16245     47.0     7.6  12.81  0.33   22.0
## 9  31.27424 -83.23179    14153     66.2     7.5  11.98  1.19   19.3
## 10 32.80451 -83.69915   149967     16.1    17.0  12.23  1.06   19.2
## 11 32.43552 -83.33121    10430     57.9    10.3  12.60  0.64   18.3
## 12 31.19702 -81.98323    11077    100.0     5.8   9.02  0.33   18.2
## 13 30.84653 -83.57726    15398     65.6     9.1  13.68  1.76   25.9
## 14 32.02037 -81.43763    15438     80.6    11.8   7.22  0.45   13.2
## 15 32.39071 -81.74391    43125     63.2    19.9   9.56  1.16   27.5
## 16 33.05837 -81.99939    20579     72.3     9.6  10.60  0.43   30.3
## 17 33.28834 -83.95713    15326     73.4     7.2  10.41  0.72   15.6
## 18 31.52793 -84.61891     5013    100.0    10.1  15.94  0.10   31.8
## 19 30.91895 -81.63783    30167     47.1    13.5   4.78  2.14   11.5
##    PctBlack         X       Y    ID     Name MedInc     rural
## 0     20.76  941396.6 3521764 13001  Appling  32152     Rural
## 1     26.86  895553.0 3471916 13003 Atkinson  27657     Rural
## 2     15.42  930946.4 3502787 13005    Bacon  29342     Rural
## 3     51.67  745398.6 3474765 13007    Baker  29610     Rural
## 4     42.39  849431.3 3665553 13009  Baldwin  36414 Non-Rural
## 5      3.49  819317.3 3807616 13011    Banks  41783     Rural
## 6     11.44  803747.1 3769623 13013   Barrow  49829     Rural
## 7      9.21  699011.5 3793408 13015   Bartow  47309     Rural
## 8     31.33  863020.8 3520432 13017 Ben Hill  28009 Non-Rural
## 9     11.62  859915.8 3466377 13019  Berrien  33786     Rural
## 10    41.68  809736.9 3636468 13021     Bibb  34872 Non-Rural
## 11    22.36  844270.1 3595691 13023 Bleckley  36643     Rural
## 12     4.58  979288.9 3463849 13025 Brantley  34189     Rural
## 13    41.47  827822.0 3421638 13027   Brooks  28244     Rural
## 14    14.85 1023145.0 3554982 13029    Bryan  57758     Rural
## 15    25.95  994903.4 3600493 13031  Bulloch  31519     Rural
## 16    52.19  971593.8 3671394 13033    Burke  29434     Rural
## 17    35.48  782448.2 3684504 13035    Butts  42829     Rural
## 18    58.89  724741.2 3492653 13037  Calhoun  26022     Rural
## 19    20.19 1008480.0 3437933 13039   Camden  45987 Non-Rural
##                          geometry
## 0  MULTIPOLYGON (((-82.2251968...
## 1  MULTIPOLYGON (((-82.6292953...
## 2  MULTIPOLYGON (((-82.5217056...
## 3  MULTIPOLYGON (((-84.1407012...
## 4  MULTIPOLYGON (((-83.2742309...
## 5  MULTIPOLYGON (((-83.3985061...
## 6  MULTIPOLYGON (((-83.5375061...
## 7  MULTIPOLYGON (((-84.6532974...
## 8  MULTIPOLYGON (((-83.1778106...
## 9  MULTIPOLYGON (((-83.1461029...
## 10 MULTIPOLYGON (((-83.8923034...
## 11 MULTIPOLYGON (((-83.2267074...
## 12 MULTIPOLYGON (((-81.7317123...
## 13 MULTIPOLYGON (((-83.7364883...
## 14 MULTIPOLYGON (((-81.4359970...
## 15 MULTIPOLYGON (((-81.8412857...
## 16 MULTIPOLYGON (((-81.8522949...
## 17 MULTIPOLYGON (((-83.8633193...
## 18 MULTIPOLYGON (((-84.4504165...
## 19 MULTIPOLYGON (((-81.9361419...

Notice how when georgia_sf is called the spatial information and the first 10 records of the attribute table are printed to the screen, rather than the entire object as with sp: for comparison you could enter:

georgia

The plot function is also different: it will create maps of sf objects and if the sf object has attributes, it will shade the first few of these:

# all attributes
plot(georgia_sf)
# selected attribute
plot(georgia_sf[, 6])
# selected attributes
plot(georgia_sf[,c(4,5)])

Finally, note that sf objects have a data.frame. This was covered earlier, but for the moment you could compare the data.frame of sp and sf objects""

## sp SpatialPolygonDataFrame object 
head(data.frame(georgia))
## sf polygon object
head(data.frame(georgia_sf))

Note that the data.frame of the sf objects have a geometry attributes.

We can also convert to sp by using the as function:

g2 <- as(georgia_sf, "Spatial")
class(g2)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

3. Reading and Writing Spatial Data

As we are working with spatial data it seems appropriate to consider how to get spatial data in and out of R.

Very often we have data that is in a particular format such as shapefile format. R has the ability to read and write data from and to many different spatial data formats using functions in the rgdal and sf packages - we will consider them both here:

Reading to and wrtiting from sp format

However, as the sp package and data formats are depreciated, so are their functions for reading and writing spatial data. The rgdal package includes 2 generic functions for reading and writing all kinds of spatial data - pretty much you name the format and it has a driver for that format. These are readOGR() and writeOGR().

Consider the georgia sp format object - we can write this out to a shape file using the writeOGR() function as follows:

writeOGR(obj=georgia, dsn=".", layer="georgia", driver="ESRI Shapefile") 

You will see that a shapefile has been written into your current working directory, with its associated supporting files (.dbf, etc) that can be recognised by other applications (QGIS etc). Similarly this can be read into R and assigned to a variable using the readOGR function:

new.georgia <- readOGR("georgia.shp") 

If you enter:

class(new.georgia)

You will see that the class of the new.georgia object is sp.

You should examine the writeOGR and readOGR function in the rgdal package. R is also able to read and write other proprietory spatial data formats using a number of packages, which you should be able to find through a search of the R help system or via an internet search engine. The rgdal package, which is the R version of the Geospatial Data Abstraction Library. It includes a number of methods for reading and writing spatial objects, including to and from SpatialXDataFrame objects:

The full syntax can be important - the code below overwrites any existing similarly named file:

writeOGR(new.georgia, dsn = ".", layer = "georgia", 
         driver="ESRI Shapefile", overwrite_layer = T)

The dsn parameter is important here: for shape files it determines the folder the files are written to. In the above example it was set to "." which places the files in the current working directory. You could specify a file path here.

For a PC might be something like:D:\MyDocuments\MyProject\DataFiles
And for a MAC: /Users/lex/my_docs/project

The setwd() and getwd() functions can be used in determining and setting the file path. You may want to set the file path and then use the dsn setting as above:

setwd("/Users/lex/my_docs/project")
writeOGR(new.georgia, dsn = ".", layer = "georgia", 
         driver="ESRI Shapefile", overwrite_layer = T)

Or you could use the getwd() function, save the results to variable and pass this to writeOGR:

td <- getwd()
writeOGR(new.georgia, dsn = td, layer = "georgia", 
         driver="ESRI Shapefile", overwrite_layer = T)

You should also examine the functions for reading and writing raster layers in rgdalwhich are readGDAL and writeGDAL. These read and write functions in rgdal are incredibly powerful and can read / write almost any format of spatial data.

Reading to and writing from sf format

Spatial data can be also be read in and written out using the sf functions: st_read and st_write.

For example to read in the georgia.shp shapefile that was created above (and to overwrite `g2) the following code can be used:

setwd("/Users/geoaco")
g2 <- st_read("georgia.shp")

Note that we need to set the working directory to ensure that st_read looks in the right place to read the file from. We see here that a single argument is used to find both the data source and the layer. This works when the data source contains a single layer.

To write a simple features object to a file, we need at least two arguments, the object and a filename:

st_write(g2, "georgia.shp")

As before this will not work if the georgia.shp file exists in the working directory. To overcome this adjust the code to the below:

st_write(g2, "georgia.shp", delete_layer = T)

The file name is taken as the data source name. The default for the layer name is the basename (filename without path) of the the data source name. For this, st_write needs to guess the driver. The above command is, for instance, equivalent to:

st_write(g2, dsn = "georgia.shp", layer = "georgia.shp", 
         driver = "ESRI Shapefile", delete_layer = T)

Typical users will use a file name with path for filename, or first set R’s working directory with setwd() and use file name without path.

Note that the output driver is guessed from the datasource name, either from its extension (.shp: ESRI Shapefile), or its prefix (PG:: PostgreSQL). The list of extensions with corresponding driver (short driver name) can be found in the sf2 vignette. You can examine this at a later date.

vignette("sf2", package = "sf")

4. Mapping with tmap

The tmap package is for thematic visualization, with a focus on mapping the spatial distribution of data attributes. It has a similar grammar to plotting with ggplot in that it seeks to handle each element of the map separately in a series of layers and in so doing to exercise control over each element. This is different to the basic plot functions used above to map sp and sf data.In earlier sections qtm() was used to compose a quick map and now we will expand this more fully. The code in this section will use the georgia / georgia_sf dataset.

4.1 Mapping spatial data (without attributes)

A number of plot parameters additional to the ones that have previously been used exist, including different window sizes, multiple plots in the same window, polygon or area shading, hatching, boundary thickness, boundary colour, labeling etc. First, install tmap in the usual way if you have not already done so and load it. Then plot georgia with a single shade and a background colour using the qtm or quick tmap function:

library(tmap)
qtm(georgia, fill = "red", style = "natural")

Note the use of the style parameter. This is a shortcut to a predefined style within the tmap package, in this case named tm_style_natural. These can be called in abbreviated form using qtm. You should explore the qtm function through the help and the vignette (see below). Full control of shading and map styles can be exercised using the full tm_shape function. This will be used mostly from this point in this practical work.

A useful introductory overview of tmap can be found in the tmap-nutshell vignette. You can uses this later to get some further ideas about mapping options:

vignette("tmap-nutshell")
Like ggplot the key point is that tmap is invoked with tm_shape and then the kind of mapping is specified with further Aesthetics commands to control and specify the base layers:
base description
tm_polygons Create a polygon layer (with borders)
tm_symbols Create a layer of symbols
tm_lines Create a layer of lines
tm_raster Create a raster layer
tm_text Create a layer of text labels

Further aesthetics for derived layers are described in

help(tmap)

Areas

It is possible to generate an outline of the area using the st_union() function in sf. An alternative for sp' is the gUnaryUnion() function in the rgeos package loaded with GISTools. The manipulation of spatial data using overlay, union and intersection functions is covered in more depth in Chapter 5 of Comber and Brunsdon (2015).

# do a merge
g <- st_union(georgia_sf)
# for sp
#georgia.outline <- gUnaryUnion(georgia, id = NULL)

Now run the code snippets below and inspect the results. You should see how the tmap functions are added as a series of layers to the map in a similar way to ggplot.

# plot the spatial layers
tm_shape(georgia_sf) +
  tm_fill("indianred") 

Add the border:

tm_shape(georgia_sf) +
  tm_fill("indianred") +
  tm_borders(lty = "dashed", col = "khaki")

Adding some styling:

tm_shape(georgia_sf) +
  tm_fill("indianred") +
  tm_borders(lty = "dashed", col = "khaki") +
  tm_style_natural(bg.color = "grey75") 

Adding the outline:

tm_shape(georgia_sf) +
  tm_fill("indianred") +
  tm_borders(lty = "dashed", col = "khaki") +
  tm_style_natural(bg.color = "grey75") +
  # now add the outline
  tm_shape(g) +
  tm_borders(lwd = 3) 

And finally putting it all together:

tm_shape(georgia_sf) +
  tm_fill("indianred") +
  tm_borders(lty = "dashed", col = "khaki") +
  tm_style_natural(bg.color = "grey75") +
  # now add the outline
  tm_shape(g) +
  tm_borders(lwd = 3) +
  tm_layout(title = "The State of Georgia", 
    title.size = 1, title.position = c(0.55, "top"))

So what you can see in the above code are two sets of tmap plot commands: the first set plots the georgia_sf dataset, specifying a dashed blue line to show the county boundaries, a red fill colour for the county and a map background colour of wheat. The second set adds the outline created by the union operation with a thicker line width before the title and subtitle are added. It is also possible to plot multiple different maps from different datasets together but this requires a bit more control over the tmap parameters. Note the way that the aspect of the second plot is used. The value was determined through trial and error. The code below assigns the map to a variable, and then a second set of functions, is used to manipulate these in a plot window. You may need to install the grid package:

# 1st plot of georgia
t1 <- tm_shape(georgia_sf) +
  tm_fill("coral") +
  tm_borders() +
  tm_style_natural(bg.color = "grey55") 
# 2nd plot of georgia2
t2 <- tm_shape(georgia2) +
  tm_fill("orange") +
  tm_borders() +
  tm_style_natural(bg.color = "grey75")+
  # the asp paramter controls aspect
  # this is makes the 2nd plot align
  tm_layout(asp = 0.86)

Now you can specify the layout of the combined map plot:

# open a new plot page
grid.newpage()
# set up the layout
pushViewport(viewport(layout=grid.layout(1,2)))
# plot using the print command
print(t1, vp=viewport(layout.pos.col = 1, height = 5))
print(t2, vp=viewport(layout.pos.col = 2, height = 5))

Thus different plot parameters can be used for different subsets of the data such that they are plotted in ways that are different from the default. Sometimes we would like to label the features in our maps. Have a look at the names of the counties in the georgia_sf dataset. These are held in the 13th attribute column and names(georgia_sf) will return a list of the names of all attributes:

data.frame(georgia_sf)[,13]

It would be useful to display these on the map and this can be done using the tm_text function in the maptools package that is loaded with tmap. The result is shown in the figure below.

tm_shape(georgia_sf) +
  tm_fill("white") +
  tm_borders() +
  tm_text("Name", size = 0.3) +
  tm_layout(frame = FALSE)

And we can subset the data as with sp format. The code below subsets the counties of Jefferson, Jenkins, Johnson, Washington, Glascock, Emanuel, Candler, Bulloch, Screven, Richmond and Burke.

# the county indices below were extracted from the data.frame
index <- c(81, 82, 83, 150, 62, 53, 21, 16, 124, 121, 17)
georgia_sf.sub <- georgia_sf[index,]

The notation is the same as for `sp object earlier, re-enforcing the fact that it is possible to select individual areas or polygons from spatial datasets using the bracket notation as used in matrices and vectors.

The subset can be plotted:

tm_shape(georgia_sf.sub) +
  tm_fill("gold1") +
  tm_borders("grey") +
  tm_text("Name", size = 1) +
  # add the outline
  tm_shape(g) +
  tm_borders(lwd = 2) +
  # specify some layout parameters
  tm_layout(frame = FALSE, title = "A subset of Georgia", 
    title.size = 2, title.position = c(0., "bottom"))

Finally, we can bring the different spatial data that has been created together in a single map. In the code below you should note how the different tm_shape, tm_fill etc functions are used to set up each layer of the map and that tmap determines the map extent from the layers:

# the 1st layer
tm_shape(georgia_sf) +
  tm_fill("white") +
  tm_borders("grey", lwd = 0.5) +
  # the 2nd layer
  tm_shape(g) +
  tm_borders(lwd = 2) +
  # the 3rd layer
  tm_shape(georgia_sf.sub) +
  tm_fill("lightblue") +
  tm_borders() +
  # specify some layout parameters
  tm_layout(frame = T, title = "Georgia with a subset of counties", 
    title.size = 1, title.position = c(0.02, "bottom"))

Points

Here we will briefly deal with mapping points using tmap. The code below creates some points from the coordinates of the georgia_sf polygons centroids (ignore the warnings!)

pts_sf <- st_centroid(georgia_sf)
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for
## longitude/latitude data

These can be plotted in the same way as before but rather than using tm_polygons, tm_fill or tm_borders the tm_bubbles or tm_dots aesthetics can be specified:

# the 1st layer
tm_shape(georgia_sf) +
  tm_fill("olivedrab4") +
  tm_borders("grey", lwd = 1) +
  # the point layer
  tm_shape(pts_sf) +
  tm_dots(size = 1, alpha = 0.5)

Note how in the code above the tm_shape function was invoked for each layer.

4.2 Choropleth maps (with attributes)

Areas

It is also possible to plot multiple attributes simply by passing them to the tm_fill function. The plot window may need to be expanded to accommodate the legend:

tm_shape(georgia_sf) +
  tm_fill(c("PctRural", "PctBlack")) +
  tm_borders() +
  tm_layout(legend.format = list(digits = 0), 
            legend.position = c("right", "top"),
            legend.text.size = 0.5,
            legend.title.size = 0.8)

Note the number of plot parameters that are set by tm_legend - it is worthwhile exploring the help for this:

?tm_layout

You could also try legend.position = c("centre","bottom")). It looks bad! Further documentation ontm_layout` can be found at

Try repeating the tmap code above using a different variable such as PctEld. What happens? You will see that tmap automatically determines the number of classes to be included and the class intervals or breaks. Finally a colour shading scheme is automatically allocated to the map and the legend is included in the map. All of these, and many other of the default mapping settings that tmmap uses can be controlled and modified.

The tmap package has inbuilt tools change the way the class interval boundaries are computed. As a default, they are based on equal-sized intervals of the attribute being mapped, and different palette styles are available. Have a look at the help for tm_polygons and you will see that a number of different plotting styles are available. You should explore these.

The class intervals can be changed to quantiles or any other range of intervals using the breaks parameter. For example, the code below produces 2 maps with equal intervals (left), intervals based on k-means and with quantiles as (right).

However, the maps are not printed out. Instead they are assigned to p1 andp2. These can be called directly - try entering p2 at the console or used to create a multiple map figure using the pushViewport function in the grid package:

# with equal interval
p1 <-tm_shape(georgia_sf) +
  tm_polygons("PctBlack", title = "% Black", palette = "Greys") +
  tm_layout(legend.title.size = 1)
# with a style = kmeans
p2 <- tm_shape(georgia_sf) +
  tm_polygons("PctRural", title = "% Rural", palette = "Oranges",
    style = "kmeans") +
  tm_layout(legend.title.size = 1)

A multiple map figure can be created using the pushViewport function as below

# Multiple plots using the grid package
grid.newpage()
# set up the layout (rows, columns)
pushViewport(viewport(layout=grid.layout(1,2)))
# plot using he print command
print(p1, vp=viewport(layout.pos.col = 1, height = 5))
print(p2, vp=viewport(layout.pos.col = 2, height = 5))

Finally the class intervals can be controlled by manually specifying the breaks parameter:

tm_shape(georgia_sf) +
  tm_polygons("PctPov", breaks=seq(0, 100, by=25))

and this can be done in many different ways:

tm_shape(georgia_sf) +
  tm_polygons("PctPov", breaks=c(1,5,10, 20,30,40))

It is also possible to alter the colours used in a shading scheme. The default colour scheme uses increasing intensities of yellow red. You will have noticed the use of the different palettes in the above. The RColorBrewer package has a number of graduated lists of colours and is automatically loaded with both tmap and GISTools. The palettes designed by Cynthia Brewer and they are intended to optimise the perceptual difference between each shade in the palette, so that visually each shading colour is distinct (sometimes even when converted to a greyscale). The palettes available in this package are displayed with the command:

display.brewer.all()

This displays the various colour palettes and their names in a plot window. To generate a list of colours from one of these palettes, for example enter the following:

brewer.pal(5,'Blues')
## [1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"

This is a list of colour codes used by R to specify the palette. The arguments to brewer.pal specify that a five-stage palette based on shades of blue is required. The output of brewer.pal can be fed into tmap to give alternative colours in shading schemes. For example, enter the code below and a choropleth map shaded in green is displayed with its legend. The palette argument in tm_polygons specifies the new colours in the shading scheme.

tm_shape(georgia_sf) +
  tm_polygons("MedInc", title = "Median Income", palette = "Reds") +
  tm_layout(legend.title.size = 1, frame = F)

Note that the same map would be produced if the tm_fill function was used instead of tm_polygons in this case, however without a tm_borders function, the census block outlines are not plotted:

tm_shape(georgia_sf) +
  tm_fill("PctEld", title = "Median Income", palette = "Blues") +
  tm_layout(legend.title.size = 1, frame = F, legend.format = list(digits = 1))

It is also possible to display a histogram of the distribution of the variable or attribute being mapped using the legend.hist parameter. This is a very useful for choropleth mapping as it gives a distribution of the attributes being examined. Bringing this all together allows you to create a map that with a number of refinements as in the figure below. Note for example the minus sign before the palette parameter is used to reverse the palette order and the various parameters passed to the tm_layout function.

tm_shape(georgia_sf) +
  tm_polygons("PctEld", title = "Percentage Elderly", palette = "-GnBu",
              style = "kmeans",
              legend.hist = T) +
  tm_scale_bar(width = 0.22, position = c("right", "top")) +
  tm_compass()+
  tm_layout(frame = F, title = "Georgia", 
            title.size = 2, 
            #title.position = c(0.55, "top"), 
            legend.hist.size = 0.5, 
            legend.outside = T) 

New variables and attributes can be computed and mapped attributes values on the fly in tmap. The code below plots population density using the convert2density function applied to the TotPop90 attribute.

tm_shape(georgia_sf) +
    tm_fill(col="TotPop90", convert2density=TRUE,
        style="kmeans", title = expression("Population (per " * km^2 * ")"), 
        legend.hist=F, id="name") +
    tm_borders("grey25", alpha=.5) + 
  tm_format_NLD_wide()

The convert2density function automatically converts the projection units (in this case degrees of latitude and longitude) to a projection in metres and then determines areal density in km2. You can check this by creating your own population density values, and examining the explanations of how the functions operate in the help pages for the functions used such as st_area.

Final note: here we are just using variables that are already included with the R spatial data objects. Of course you can create new variables in your analysis and assign them to the an sp or sf object.

The code below assigns some random numbers to a variable and generates a choropleth map:

georgia_sf$random <- rnorm(nrow(georgia_sf))
tm_shape(georgia_sf) +
    tm_polygons(col="random", style="kmeans", 
            title = "Random", palette = "RdBu") +
    tm_format_NLD_wide()

In summary, the tm_fill and tm_polygons functions in the tmap package generate choropleth maps of attributes held in SpatialPolygonsDataFrame (sp) or Simple Feature (sf) data objects. They automatically shade the variables using equal intervals. The intervals and the palettes can both be adjusted. It is instructive to examine the plotting functions and the way they operate. You can see the code in any R function by simply calling it without any arguments or brackets:

tm_polygons

The function code detail is displayed in the R console window. You will see that it takes a number of arguments and a number of default parameters. In addition to using the R help system to understand functions, examining functions in this way can also provide you with insight into their operation.

Points

Again we will briefly deal with mapping values or attributes for points using tmap. There are 2 basic options: change the size or the shading of the points according to the attribute value. We can use the pts_sf created earlier to illustrate these approaches.

First by size using tm_bubbles:

# the 1st layer
tm_shape(georgia_sf) +
  tm_fill("olivedrab4") +
  tm_borders("grey", lwd = 1) +
  # the points layer
  tm_shape(pts_sf) +
  tm_bubbles("PctBlack", title.size = "% Black", col = "gold")+
  tm_format_NLD()

Second by shade using tm_dots:

# the 1st layer
tm_shape(georgia_sf) +
  tm_fill("grey70") +
  tm_borders("gold", lwd = 1) +
  # the points layer
  tm_shape(pts_sf) +
  tm_dots("PctFB", size = 0.7, title = "% Foreign", shape = 15) +
  tm_layout(legend.outside = T, frame = F)

5. Simple spatial analysis

In this section two methods for undertaking classic cluster analysis will be demonstrated: Moran’s I (Anselin, 1995) and G-statistic (Ord and Getis, 1995). Now underpinning these is some concept of contiguity or a measure of how geographically connected the various spatial units are int he study areas, such as the counties of Georgia. One way to determine this is to generate some measure of adjacency for each unit to indicate which other units it is topologically connected to. Moran’s I and G-statistic seek to identify areas within this adjacency network that have a greater number of similar values of an attribute than would be expected by chance and they do this in slightly different ways.

Here the poly2nb() function in the spdep package can be used to determine the adjacency relation between polygons. You should examine the help and note in particular the queen parameter.

You should also note that the poly2nb() function takes an sp object (SpatialPolygons) not an sf one. There may be a way to compute neighbours in sf but I couldn’t find it on a plane coming back from China with no internet….!

The functions for Moran’s I and G-statistic are localmoran and localG in the spdep package. You should examine the help foe these and see what arguments they take:

?localmoran
?localG

You will see that they both take a numeric vector and a listw objected created by nb2listw. If in turn you examine the help for nb2listw

?nb2listw

you will see that this function requires an object of class nb as input. This is where the poly2nb function comes in. The sequence of analysis for polygons is then:

poly2nb
nb2listw
local model

For points it is similar except that the dnearneigh is used:

dnearneigh
nb2listw
local model

So first, we create the create nb object using poly2nb and nb2listw:

lw <- poly2nb(georgia)
lw <- nb2listw(lw, zero.policy=TRUE)

Next we create the variable - we are using Median Income here:

var = georgia$MedInc

Then we calculate the cluster measure

lG <- localG(var,lw) # Create local G

And finally this can be mapped after the cluster measure has been added to the spatial object. We will use the sp version just to show that tmap works withspas well assf` and critically we will use a divergent palette to show hotpots (greater than expected) and cold spots (less than expected)

georgia$lG <- lG
tm_shape(georgia) +
  tm_polygons("lG", title = "Clustering", palette = "PRGn",
              style = "kmeans",
              legend.hist = T) +
  tm_layout(frame = F, title = "G-stat of Income",  
            legend.hist.size = 0.5, 
            legend.outside = T) 

The form of the Moran’s I is very similar and we already have the nb object:

lI <- localmoran(var,lw) # Create local G

This generates a matrix of values. Here we are interested in the first column. This can again be added to the spatial data as an attribute and mapped.

georgia$lI <- lI[,1]
tm_shape(georgia) +
  tm_polygons("lI", title = "Clustering", palette = "PRGn",
              style = "kmeans",
              legend.hist = T) +
  tm_layout(frame = F, title = "Moran's I of Income",  
            legend.hist.size = 0.5, 
            legend.outside = T) 

The two cluster maps j=have broadly similar patterns but there are some nuanced differences. Interestingly the output of the localmoran function includes p-values which could be used to indicate areas of significant clustering.

6. Writing maps out

This is very short sections that simply introduces 3 methods for writing maps out to PDF, PNG and TIFF files.

There are a number of functions that do this:

pdf()
png()
tiff()

Examine the help for these.

The key thing you need to know is that these functions all open a file. The open file needs to be closed using dev.off() after all the map has been written to it. So the syntax is:

pdf(file = "MyPlot.pdf", other setting)
<tmap code?
dev.off()  

You should try to write out a .png file of the map using the code below - note that you may want to set the working directory that you write to first:

setwd('~/Desktop/my_docs_mac/leeds_work/research/Groningen/')
# open the file
png(filename = "Figure1.png", w = 5, h = 7, units = "in", res = 150)
# make the map
tm_shape(georgia_sf) +
  tm_fill("olivedrab4") +
  tm_borders("grey", lwd = 1) +
  # the points layer
  tm_shape(pts_sf) +
  tm_bubbles("PctBlack", title.size = "% Black", col = "gold")+
  tm_format_NLD()
# close the png file
dev.off()

Task 4 Your task is to write code that will produce a map of the counties in Georgia, shaded in a colour scheme of your choice but using 10 classes describing the distribution of median income in 1000s of dollars (this is described by the MedInc attribute in the data frame). The maps should include a scalebar and a legend and the code should write the map to a TIFF file, with a resolution of 300 dots per inch and a map size of 7 by 7 inches.

# Hints
display.brewer.all()   # to show the Brewer palettes
breaks            # to specify class breaks OR
style             # in the tm_fill / tm_polygons help
# Tools
library(ggplot2) # for the mapping tools
data(georgia)       # the Georgia data in the GISTools package
st_as_sf(georgia) # to convert the data to sf format
tm_layout         # takes many parameters eg legend.position

7. Answers to Tasks

Task 1 Write some code to determine the mean value and standard deviation of median income variable MedInc from the georgia data. Hints: the mean and sd functions can help you!

## [1] 10508.17
## [1] 37147.09

Task 2 Write some code to subset and plot the counties in the 25 percentile of median income from the georgia data. Hint: the quantile function can help you and you may want to assign the result to an R object that you can use to determine a threshold with which to subset the data as was done with index above, using a logical statement.

thresh = quantile(georgia$MedInc, 0.25)
index <- georgia$MedInc < thresh
g.subset <- georgia[index,]
par(mar = c(0,0,0,0))
plot(georgia)
plot(g.subset, add = T, col = "forestgreen")

NB tmap had not been introduced when this task was set

Task 3 Finally, go back to Session 3 and examine how the tb$rural variable was created for the tb tibble object. The same operations can be used to on the data.frame of sp objects. You should create a variable called rural that separates the counties of georgia into Rural and Non-Rural classes and determine how many counties are Rural and Non-Rural using this definition. Hint the table function can help here.

georgia$rural <- as.factor((georgia$PctRural > 50) +  0)
levels(georgia$rural) <- list("Non-Rural" = 0, "Rural"=1)
table(georgia$rural)

Task 4. Plots and maps: working with map data Your map should look something like the below.

georgia_sf$MedInc = georgia_sf$MedInc / 1000 
# open the tif file and give it a name
tiff("MyTif.tiff", width=7,height=7,units='in',res=300)
# start the tmap commands
tm_shape(georgia_sf) +
  tm_polygons("MedInc", title = "Median Income ($1000s)", palette = "GnBu",
    style = "equal", n = 10) +
  tm_layout(legend.title.size = 1, 
    legend.format = list(digits = 0), frame = F) +
  tm_legend(legend.outside=TRUE)
# close the tiff file
dev.off()

References

Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, 93–115;

Brunsdon, C. and Comber, L., 2015. An introduction to R for spatial analysis and mapping. Sage.

Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis, 27, 286–306;

Wilkinson, 2005