For this practical session you will need to have the following packages installed and loaded - skip this if yopu are using a lab machine.
# test for package existance and install
if (!is.element("sf", installed.packages()))
install.packages("sf", dep = T)
if (!is.element("GISTools", installed.packages()))
install.packages("GISTools", dep = T)
if (!is.element("tmap", installed.packages()))
install.packages("tmap", dep = T)
if (!is.element("rgdal", installed.packages()))
install.packages("rgdal", dep = T)
if (!is.element("grid", installed.packages()))
install.packages("grid", dep = T)
if (!is.element("spdep", installed.packages()))
install.packages("spdep", dep = T)
# load into the R session
library(sf)
library(GISTools)
library(tmap)
library(rgdal)
library(grid)
library(spdep)
sf
revolutionThis session will introduce both the sp
and sf
spatial data formats. It will demonstrate how to undertake some simple Choropleth mapping using the tmap
package. It will introduce and apply some basic cluster analysis (Moran’s I, G-statistic). Along the way it will do some reading and writing of spatial data, such as shapefiles, into and out of R.
Why Spatial Data? Nearly all data are spatial they are collected some-where and in fact these arguments can be extended to the spatio-temporal domain: all data are spatiotemporal – they are collected some-where and at some-time.
In the preceding description of data each of the records (rows) related to some kind of real world feature (a person, a transaction, a date, etc) and the columns represent some attribute associated with that feature. In spatial data, each record typically represents some real world geographical feature - a place, a route, a region etc. In fact geographic data typically represent points, lines or areas.
There is some key background information to working with spatial data in R. R is dynamic - things don’t stay the same. New tools, packages and functions are constantly be produced, they are updated to improve and develop them. In most cases this is not problematic as the update almost always extends the functionality of the package without affecting the original code. However in a few instances, specific packages are completely re-written without backwards compatibility. If this happens then R code that previously worked may not work with the new package as the functions may take different parameters and arguments. However, there is usually a period of a few package versions before the code stops to work all together.
Occasionally a completely new paradigm is introduced and this has been the case recently for spatial data in R with the advent of the sf
package.
Much of the code for handling spatial data in this workshop is built around data structures defined in the sp
package and it defines the following spatial objects in R:
The sp
package underpins many of the packages that you have loaded either directly or indirectly (ie they are loaded by other packages) have dependencies on sp
: the GISTools
package implicitly links to maptools
, sp
, rgeos
etc. You may have noticed these packages being listed as GISTools
loads.
However, recently a new class of spatial object has been defined called sf
of Simple Feature
that seeks to encode spatial data in a way that conforms to a formal standard (ISO 19125-1:2004). This emphasises the spatial geometry of objects, the way that objects are stored in databases. In brief, the aim of the team developing sf
(actually they are the same people who developed sp
so they do know what they are doing!!) is to provide a standard format for spatial data. An overview of the evolution of spatial data in R can be found at https://edzer.github.io/UseR2017/.
Their idea is that a feature is a thing, or an object in the real world, such as a building or a tree. As is the case with objects, they often consist of other objects. This is the case with features too: a set of features can form a single feature. A forest stand can be a feature, a forest can be a feature, a city can be a feature. A satellite image pixel can be a feature, a complete image can be a feature too. Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties. There many sf
object types but the key ones (that are similar to lines, points and areas*) are in the table below (taken from the sf
vignette).
Ultimately, it is planned that sf
formats will completely replace sp
and packages that use sp
(such GWmodel
for GWR) will all have to be updated to use sf
at some point – but that is a few years away.
in this session we will introduce both sp
and sf
as well as different mapping routines (with the plot
function and with functions in the tmap
package) and the tools to move between them. The later sessions on regression and spatial interaction models will mainly use sf
.
This section will first examine sp
and then sf
. It will also describe some mapping commands how to move (convert) between different spatial data formats on the way!
sp
formatThe GISTools
package comes with spatial data that we can use to illustrate the sp
format.
Load the georgia
data and examine the georgia
object. You can see that it has a class of SpatialPolygonsDataFrame
.
library(GISTools)
# laod the data
data(georgia)
# look at what is loaded
ls()
# examine them
class(georgia)
summary(georgia)
This is a bit like an ESRI polygon shapefile: it has some spatial properties and attribute properties held in a data.frame
. Is has a proj4string
but is not projected - ie it has latitude and longitude in degrees rather than being projected for example in metres.
You can access the data.frame
in a number of ways to explore the attributes:
head(data.frame(georgia))
head(georgia@data)
Note that the head()
function was included int he above to limit the print out to the first 6 lines or records.
Similarity you can generate summaries of the data values:
summary(georgia@data)
## Latitude Longitud TotPop90 PctRural
## Min. :30.72 Min. :-85.50 Min. : 1915 Min. : 2.50
## 1st Qu.:31.79 1st Qu.:-84.44 1st Qu.: 9220 1st Qu.: 54.70
## Median :32.75 Median :-83.69 Median : 16934 Median : 72.30
## Mean :32.81 Mean :-83.58 Mean : 40744 Mean : 70.18
## 3rd Qu.:33.79 3rd Qu.:-82.85 3rd Qu.: 36058 3rd Qu.:100.00
## Max. :34.92 Max. :-81.09 Max. :648951 Max. :100.00
## PctBach PctEld PctFB PctPov
## Min. : 4.20 Min. : 1.46 Min. :0.040 Min. : 2.60
## 1st Qu.: 7.60 1st Qu.: 9.81 1st Qu.:0.415 1st Qu.:14.05
## Median : 9.40 Median :12.07 Median :0.720 Median :18.60
## Mean :10.95 Mean :11.74 Mean :1.131 Mean :19.34
## 3rd Qu.:12.00 3rd Qu.:13.70 3rd Qu.:1.265 3rd Qu.:24.65
## Max. :37.50 Max. :22.96 Max. :6.740 Max. :35.90
## PctBlack X Y ID
## Min. : 0.00 Min. : 635964 Min. :3401148 Min. :13001
## 1st Qu.:11.75 1st Qu.: 741144 1st Qu.:3523380 1st Qu.:13082
## Median :27.64 Median : 809737 Median :3636468 Median :13161
## Mean :27.39 Mean : 820944 Mean :3636238 Mean :13161
## 3rd Qu.:40.06 3rd Qu.: 894584 3rd Qu.:3741656 3rd Qu.:13242
## Max. :79.64 Max. :1059706 Max. :3872640 Max. :13321
## Name MedInc
## Length:159 Min. :23456
## Class :character 1st Qu.:29773
## Mode :character Median :34307
## Mean :37147
## 3rd Qu.:41204
## Max. :81603
You can plot the spatial data using a basic plot
command (we will do more advanced mapping later in this session) and of critical importance you can subset spatially using the same notation as you would do for a a matrix
or data.frame
(note the use of the add = T
in the code below to add a plot layer to the preceding layer):
# 1 row, 2 columns
par(mfrow = c(1,2))
# left hand plot
plot(georgia)
plot(georgia[81,], add = T, col = "tomato")
# right hand plot
plot(georgia)
# set index
index <- c(81, 82, 83, 150, 62, 53, 21, 16, 124, 121, 17)
plot(georgia[index,], add = T, col = "tomato")
Maps of counties in Georgia, USA and a subset.
# rest par
par(mfrow = c(1,1))
The subsetting operation above using index
can also be used to create a spatial subset:
g.subset <- georgia[index,]
# check the dimensions (rows, columns)
dim(georgia)
## [1] 159 14
dim(g.subset)
## [1] 11 14
# plot
plot(g.subset, col = "wheat")
# overlay the full data
plot(georgia, add = T)
Examples of the plot
function for mapping.
Operations for data.frame
s of sp
objects needs a bit more care than ordinary data.frame
s. Variables (attributes) can be accessed in the same way using $
:
g.subset$PctBach
## [1] 6.2 7.7 4.9 9.8 5.3 9.1 9.9 19.9 8.6 17.3 9.6
The PctBach
attribute is the 5th field in the g.subset
and if this was just a data.frame
called df
then this could be accessed using:
df[,5]
However for the data.frame
of an sp
object this prints out all of the spatial information as well. Try entering:
g.subset[,5]
To overcome this the data.frame
needs to be directly invoked in as above:
g.subset@data[,3:5]
## TotPop90 PctRural PctBach
## 80 17408 100.0 6.2
## 81 8247 53.8 7.7
## 82 8329 100.0 4.9
## 149 19112 67.1 9.8
## 61 2357 100.0 5.3
## 52 20546 64.2 9.1
## 20 7744 52.1 9.9
## 15 43125 63.2 19.9
## 123 13842 79.3 8.6
## 120 189719 9.9 17.3
## 16 20579 72.3 9.6
data.frame(g.subset[1:6,12:14])
## ID Name MedInc
## 80 13163 Jefferson 28203
## 81 13165 Jenkins 25951
## 82 13167 Johnson 26671
## 149 13303 Washington 31894
## 61 13125 Glascock 33055
## 52 13107 Emanuel 26639
Task 1 Write some code to determine the mean value and standard deviation of median income variable MedInc
from the georgia
data. Hints: the mean
and sd
functions can help you!
Task 2 Write some code to subset and plot the counties in the 25 percentile of median income from the georgia
data. Hint: the quantile
function can help you and you may want to assign the result to an R object that you can use to determine a threshold with which to subset the data as was done with index
above, using a logical statement.
Task 3
Finally, go back to Session 3 and examine how the tb$rural
variable was created for the tb
tibble
object. The same operations can be used to on the data.frame
of sp
objects. You should create a variable called rural
that separates the counties of georgia
into Rural and Non-Rural classes and determine how many counties are Rural and Non-Rural using this definition. Hint the table
function can help here.
sf
formatThe sf
format seeks to implement tidy data. Data is tidy when (Wickham 2014):
The tidy data rules for sf
/ simple feature implies the need for data.frame
s where each feature forms a row. A single column (a list-column) contains the geometry for each observation.
The sf
package puts features in sf
tables deriving from data.frame
or tbl_df
, which have geometries in a list-column of class sfc
, where each list element is the geometry of a single feature of class sfg
. Feature geometries are represented in R by
Other tidy aspects of sf
include:
st_
(press tab to search), use _ and lower caseread_sf
is an alias for st_read
with tidy defaults:
The code below loads a dataset that is included in the sf
package. It is of North Carolina (see https://cran.r-project.org/web/packages/spdep/vignettes/sids.pdf) and is loaded using the st_read
function. This is a polygon spatial dataset…
# read the data
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.3/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
which can of course be mapped
# do a quick tmap of the data
qtm(nc)
Quick tmap
maps of North Carolina.
# and others mapping attributes
qtm(nc, fill = c("BIR74", "NWBIR79"))
Quick tmap
maps of North Carolina.
Note that here we are using the qtm
(quick tmap) function from the tmap
package and not the plot
function as above but also note that tmap
can work with both sp
and sf
data formats. We can examine the data in the same way as a-spatial data:
dim(nc)
head(nc)
class(nc)
And again, you can see that nc
has 100, 15 rows and columns respectively, that the class of nc
indicate that it is sf
class as well as data.frame
. For data.frame
related formats, R prints out all records and all columns. Try entering:
nc
Again you can generate and examine summaries of the nc
data:
summary(nc)
Finally, it is also possibly to stop the legend and the body of the map from overlapping - although this can leave less space for the map itself:
qtm(nc, fill = "BIR74") +
tm_layout(legend.outside = TRUE)
qtm(nc, fill = "NWBIR79") +
tm_layout(legend.outside = TRUE)
sf
vignettesThe sf
package has a number of vignettes or tutorials that you could explore. These include an overview of the format, reading and writing from and to sf
formats including conversions to and from sp
and sf
, and some illustrations of how sf
objects can be manipulated. The code below will create a new window with a list of sf
vignettes:
library(sf)
vignette(package = "sf")
To display a specific vignette topic, this can be called using the vignette
function. A full description of the sf format can be found in the
sf1` vignette
vignette("sf1", package = "sf")
Vignettes are an important part of R packages. They provide additional explanations of the package functionality to that found in the example code at the end of a help page. They can be accessed using the vignette
function or through the R help. The sf1
vignette could also be accessed via the package help index: enter help(sf)
, then navigate to the index through the link at the bottom of the overview page and then click on the User guides, package vignettes and other documentation link.
sp
and sf
You will note that there are a number of functions that can be used to read, write and convert from and to sf
. These will be summarised later in this session. The sf2
vignette covers these topics.
The fundamental function for converting to sf
is st_as_sf()
. In the code below it is used to convert the the georgia
sp
object to sf
:
# conversion to sf
georgia_sf = st_as_sf(georgia)
class(georgia_sf)
## [1] "sf" "data.frame"
georgia_sf
## Simple feature collection with 159 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -85.6052 ymin: 30.35541 xmax: -80.84126 ymax: 35.00068
## epsg (SRID): 4326
## proj4string: +proj=longlat +ellps=WGS84 +no_defs
## First 20 features:
## Latitude Longitud TotPop90 PctRural PctBach PctEld PctFB PctPov
## 0 31.75339 -82.28558 15744 75.6 8.2 11.43 0.64 19.9
## 1 31.29486 -82.87474 6213 100.0 6.4 11.77 1.58 26.0
## 2 31.55678 -82.45115 9566 61.7 6.6 11.11 0.27 24.1
## 3 31.33084 -84.45401 3615 100.0 9.4 13.17 0.11 24.8
## 4 33.07193 -83.25085 39530 42.7 13.3 8.64 1.43 17.5
## 5 34.35270 -83.50054 10308 100.0 6.4 11.37 0.34 15.1
## 6 33.99347 -83.71181 29721 64.6 9.2 10.63 0.92 14.7
## 7 34.23840 -84.83918 55911 75.2 9.0 9.66 0.82 10.7
## 8 31.75940 -83.21976 16245 47.0 7.6 12.81 0.33 22.0
## 9 31.27424 -83.23179 14153 66.2 7.5 11.98 1.19 19.3
## 10 32.80451 -83.69915 149967 16.1 17.0 12.23 1.06 19.2
## 11 32.43552 -83.33121 10430 57.9 10.3 12.60 0.64 18.3
## 12 31.19702 -81.98323 11077 100.0 5.8 9.02 0.33 18.2
## 13 30.84653 -83.57726 15398 65.6 9.1 13.68 1.76 25.9
## 14 32.02037 -81.43763 15438 80.6 11.8 7.22 0.45 13.2
## 15 32.39071 -81.74391 43125 63.2 19.9 9.56 1.16 27.5
## 16 33.05837 -81.99939 20579 72.3 9.6 10.60 0.43 30.3
## 17 33.28834 -83.95713 15326 73.4 7.2 10.41 0.72 15.6
## 18 31.52793 -84.61891 5013 100.0 10.1 15.94 0.10 31.8
## 19 30.91895 -81.63783 30167 47.1 13.5 4.78 2.14 11.5
## PctBlack X Y ID Name MedInc rural
## 0 20.76 941396.6 3521764 13001 Appling 32152 Rural
## 1 26.86 895553.0 3471916 13003 Atkinson 27657 Rural
## 2 15.42 930946.4 3502787 13005 Bacon 29342 Rural
## 3 51.67 745398.6 3474765 13007 Baker 29610 Rural
## 4 42.39 849431.3 3665553 13009 Baldwin 36414 Non-Rural
## 5 3.49 819317.3 3807616 13011 Banks 41783 Rural
## 6 11.44 803747.1 3769623 13013 Barrow 49829 Rural
## 7 9.21 699011.5 3793408 13015 Bartow 47309 Rural
## 8 31.33 863020.8 3520432 13017 Ben Hill 28009 Non-Rural
## 9 11.62 859915.8 3466377 13019 Berrien 33786 Rural
## 10 41.68 809736.9 3636468 13021 Bibb 34872 Non-Rural
## 11 22.36 844270.1 3595691 13023 Bleckley 36643 Rural
## 12 4.58 979288.9 3463849 13025 Brantley 34189 Rural
## 13 41.47 827822.0 3421638 13027 Brooks 28244 Rural
## 14 14.85 1023145.0 3554982 13029 Bryan 57758 Rural
## 15 25.95 994903.4 3600493 13031 Bulloch 31519 Rural
## 16 52.19 971593.8 3671394 13033 Burke 29434 Rural
## 17 35.48 782448.2 3684504 13035 Butts 42829 Rural
## 18 58.89 724741.2 3492653 13037 Calhoun 26022 Rural
## 19 20.19 1008480.0 3437933 13039 Camden 45987 Non-Rural
## geometry
## 0 MULTIPOLYGON (((-82.2251968...
## 1 MULTIPOLYGON (((-82.6292953...
## 2 MULTIPOLYGON (((-82.5217056...
## 3 MULTIPOLYGON (((-84.1407012...
## 4 MULTIPOLYGON (((-83.2742309...
## 5 MULTIPOLYGON (((-83.3985061...
## 6 MULTIPOLYGON (((-83.5375061...
## 7 MULTIPOLYGON (((-84.6532974...
## 8 MULTIPOLYGON (((-83.1778106...
## 9 MULTIPOLYGON (((-83.1461029...
## 10 MULTIPOLYGON (((-83.8923034...
## 11 MULTIPOLYGON (((-83.2267074...
## 12 MULTIPOLYGON (((-81.7317123...
## 13 MULTIPOLYGON (((-83.7364883...
## 14 MULTIPOLYGON (((-81.4359970...
## 15 MULTIPOLYGON (((-81.8412857...
## 16 MULTIPOLYGON (((-81.8522949...
## 17 MULTIPOLYGON (((-83.8633193...
## 18 MULTIPOLYGON (((-84.4504165...
## 19 MULTIPOLYGON (((-81.9361419...
Notice how when georgia_sf
is called the spatial information and the first 10 records of the attribute table are printed to the screen, rather than the entire object as with sp
: for comparison you could enter:
georgia
The plot
function is also different: it will create maps of sf
objects and if the sf
object has attributes, it will shade the first few of these:
# all attributes
plot(georgia_sf)
# selected attribute
plot(georgia_sf[, 6])
# selected attributes
plot(georgia_sf[,c(4,5)])
Finally, note that sf
objects have a data.frame
. This was covered earlier, but for the moment you could compare the data.frame
of sp
and sf
objects""
## sp SpatialPolygonDataFrame object
head(data.frame(georgia))
## sf polygon object
head(data.frame(georgia_sf))
Note that the data.frame
of the sf
objects have a geometry
attributes.
We can also convert to sp
by using the as
function:
g2 <- as(georgia_sf, "Spatial")
class(g2)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
As we are working with spatial data it seems appropriate to consider how to get spatial data in and out of R.
Very often we have data that is in a particular format such as shapefile format. R has the ability to read and write data from and to many different spatial data formats using functions in the rgdal
and sf
packages - we will consider them both here:
sp
formatHowever, as the sp
package and data formats are depreciated, so are their functions for reading and writing spatial data. The rgdal
package includes 2 generic functions for reading and writing all kinds of spatial data - pretty much you name the format and it has a driver for that format. These are readOGR()
and writeOGR()
.
Consider the georgia
sp
format object - we can write this out to a shape file using the writeOGR()
function as follows:
writeOGR(obj=georgia, dsn=".", layer="georgia", driver="ESRI Shapefile")
You will see that a shapefile has been written into your current working directory, with its associated supporting files (.dbf
, etc) that can be recognised by other applications (QGIS etc). Similarly this can be read into R and assigned to a variable using the readOGR
function:
new.georgia <- readOGR("georgia.shp")
If you enter:
class(new.georgia)
You will see that the class of the new.georgia
object is sp
.
You should examine the writeOGR
and readOGR
function in the rgdal
package. R is also able to read and write other proprietory spatial data formats using a number of packages, which you should be able to find through a search of the R help system or via an internet search engine. The rgdal
package, which is the R version of the Geospatial Data Abstraction Library. It includes a number of methods for reading and writing spatial objects, including to and from SpatialXDataFrame
objects:
The full syntax can be important - the code below overwrites any existing similarly named file:
writeOGR(new.georgia, dsn = ".", layer = "georgia",
driver="ESRI Shapefile", overwrite_layer = T)
The dsn
parameter is important here: for shape files it determines the folder the files are written to. In the above example it was set to "."
which places the files in the current working directory. You could specify a file path here.
For a PC might be something like:D:\MyDocuments\MyProject\DataFiles
And for a MAC: /Users/lex/my_docs/project
The setwd()
and getwd()
functions can be used in determining and setting the file path. You may want to set the file path and then use the dsn
setting as above:
setwd("/Users/lex/my_docs/project")
writeOGR(new.georgia, dsn = ".", layer = "georgia",
driver="ESRI Shapefile", overwrite_layer = T)
Or you could use the getwd()
function, save the results to variable and pass this to writeOGR
:
td <- getwd()
writeOGR(new.georgia, dsn = td, layer = "georgia",
driver="ESRI Shapefile", overwrite_layer = T)
You should also examine the functions for reading and writing raster layers in rgdal
which are readGDAL
and writeGDAL
. These read and write functions in rgdal
are incredibly powerful and can read / write almost any format of spatial data.
sf
formatSpatial data can be also be read in and written out using the sf
functions: st_read
and st_write
.
For example to read in the georgia.shp
shapefile that was created above (and to overwrite `g2) the following code can be used:
setwd("/Users/geoaco")
g2 <- st_read("georgia.shp")
Note that we need to set the working directory to ensure that st_read
looks in the right place to read the file from. We see here that a single argument is used to find both the data source and the layer. This works when the data source contains a single layer.
To write a simple features object to a file, we need at least two arguments, the object and a filename:
st_write(g2, "georgia.shp")
As before this will not work if the georgia.shp
file exists in the working directory. To overcome this adjust the code to the below:
st_write(g2, "georgia.shp", delete_layer = T)
The file name is taken as the data source name. The default for the layer name is the basename (filename without path) of the the data source name. For this, st_write
needs to guess the driver. The above command is, for instance, equivalent to:
st_write(g2, dsn = "georgia.shp", layer = "georgia.shp",
driver = "ESRI Shapefile", delete_layer = T)
Typical users will use a file name with path for filename, or first set R’s working directory with setwd()
and use file name without path.
Note that the output driver is guessed from the datasource name, either from its extension (.shp: ESRI Shapefile), or its prefix (PG:: PostgreSQL). The list of extensions with corresponding driver (short driver name) can be found in the sf2
vignette. You can examine this at a later date.
vignette("sf2", package = "sf")
tmap
The tmap
package is for thematic visualization, with a focus on mapping the spatial distribution of data attributes. It has a similar grammar to plotting with ggplot
in that it seeks to handle each element of the map separately in a series of layers and in so doing to exercise control over each element. This is different to the basic plot
functions used above to map sp
and sf
data.In earlier sections qtm()
was used to compose a quick map and now we will expand this more fully. The code in this section will use the georgia
/ georgia_sf
dataset.
A number of plot parameters additional to the ones that have previously been used exist, including different window sizes, multiple plots in the same window, polygon or area shading, hatching, boundary thickness, boundary colour, labeling etc. First, install tmap
in the usual way if you have not already done so and load it. Then plot georgia
with a single shade and a background colour using the qtm
or quick tmap function:
library(tmap)
qtm(georgia, fill = "red", style = "natural")
Note the use of the style
parameter. This is a shortcut to a predefined style within the tmap
package, in this case named tm_style_natural
. These can be called in abbreviated form using qtm
. You should explore the qtm
function through the help and the vignette (see below). Full control of shading and map styles can be exercised using the full tm_shape
function. This will be used mostly from this point in this practical work.
A useful introductory overview of tmap
can be found in the tmap-nutshell
vignette. You can uses this later to get some further ideas about mapping options:
vignette("tmap-nutshell")
Like ggplot
the key point is that tmap
is invoked with tm_shape
and then the kind of mapping is specified with further Aesthetics commands to control and specify the base layers:
base | description |
---|---|
tm_polygons | Create a polygon layer (with borders) |
tm_symbols | Create a layer of symbols |
tm_lines | Create a layer of lines |
tm_raster | Create a raster layer |
tm_text | Create a layer of text labels |
Further aesthetics for derived layers are described in
help(tmap)
It is possible to generate an outline of the area using the st_union()
function in sf
. An alternative for sp' is the gUnaryUnion()
function in the rgeos
package loaded with GISTools
. The manipulation of spatial data using overlay, union and intersection functions is covered in more depth in Chapter 5 of Comber and Brunsdon (2015).
# do a merge
g <- st_union(georgia_sf)
# for sp
#georgia.outline <- gUnaryUnion(georgia, id = NULL)
Now run the code snippets below and inspect the results. You should see how the tmap
functions are added as a series of layers to the map in a similar way to ggplot
.
# plot the spatial layers
tm_shape(georgia_sf) +
tm_fill("indianred")
Add the border:
tm_shape(georgia_sf) +
tm_fill("indianred") +
tm_borders(lty = "dashed", col = "khaki")
Adding some styling:
tm_shape(georgia_sf) +
tm_fill("indianred") +
tm_borders(lty = "dashed", col = "khaki") +
tm_style_natural(bg.color = "grey75")
Adding the outline:
tm_shape(georgia_sf) +
tm_fill("indianred") +
tm_borders(lty = "dashed", col = "khaki") +
tm_style_natural(bg.color = "grey75") +
# now add the outline
tm_shape(g) +
tm_borders(lwd = 3)
And finally putting it all together:
tm_shape(georgia_sf) +
tm_fill("indianred") +
tm_borders(lty = "dashed", col = "khaki") +
tm_style_natural(bg.color = "grey75") +
# now add the outline
tm_shape(g) +
tm_borders(lwd = 3) +
tm_layout(title = "The State of Georgia",
title.size = 1, title.position = c(0.55, "top"))
So what you can see in the above code are two sets of tmap
plot commands: the first set plots the georgia_sf
dataset, specifying a dashed blue line to show the county boundaries, a red fill colour for the county and a map background colour of wheat. The second set adds the outline created by the union operation with a thicker line width before the title and subtitle are added. It is also possible to plot multiple different maps from different datasets together but this requires a bit more control over the tmap
parameters. Note the way that the aspect of the second plot is used. The value was determined through trial and error. The code below assigns the map to a variable, and then a second set of functions, is used to manipulate these in a plot window. You may need to install the grid
package:
# 1st plot of georgia
t1 <- tm_shape(georgia_sf) +
tm_fill("coral") +
tm_borders() +
tm_style_natural(bg.color = "grey55")
# 2nd plot of georgia2
t2 <- tm_shape(georgia2) +
tm_fill("orange") +
tm_borders() +
tm_style_natural(bg.color = "grey75")+
# the asp paramter controls aspect
# this is makes the 2nd plot align
tm_layout(asp = 0.86)
Now you can specify the layout of the combined map plot:
# open a new plot page
grid.newpage()
# set up the layout
pushViewport(viewport(layout=grid.layout(1,2)))
# plot using the print command
print(t1, vp=viewport(layout.pos.col = 1, height = 5))
print(t2, vp=viewport(layout.pos.col = 2, height = 5))
Thus different plot parameters can be used for different subsets of the data such that they are plotted in ways that are different from the default. Sometimes we would like to label the features in our maps. Have a look at the names of the counties in the georgia_sf
dataset. These are held in the 13th attribute column and names(georgia_sf)
will return a list of the names of all attributes:
data.frame(georgia_sf)[,13]
It would be useful to display these on the map and this can be done using the tm_text
function in the maptools
package that is loaded with tmap
. The result is shown in the figure below.
tm_shape(georgia_sf) +
tm_fill("white") +
tm_borders() +
tm_text("Name", size = 0.3) +
tm_layout(frame = FALSE)
And we can subset the data as with sp
format. The code below subsets the counties of Jefferson, Jenkins, Johnson, Washington, Glascock, Emanuel, Candler, Bulloch, Screven, Richmond and Burke.
# the county indices below were extracted from the data.frame
index <- c(81, 82, 83, 150, 62, 53, 21, 16, 124, 121, 17)
georgia_sf.sub <- georgia_sf[index,]
The notation is the same as for `sp
object earlier, re-enforcing the fact that it is possible to select individual areas or polygons from spatial datasets using the bracket notation as used in matrices and vectors.
The subset can be plotted:
tm_shape(georgia_sf.sub) +
tm_fill("gold1") +
tm_borders("grey") +
tm_text("Name", size = 1) +
# add the outline
tm_shape(g) +
tm_borders(lwd = 2) +
# specify some layout parameters
tm_layout(frame = FALSE, title = "A subset of Georgia",
title.size = 2, title.position = c(0., "bottom"))
Finally, we can bring the different spatial data that has been created together in a single map. In the code below you should note how the different tm_shape
, tm_fill
etc functions are used to set up each layer of the map and that tmap
determines the map extent from the layers:
# the 1st layer
tm_shape(georgia_sf) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5) +
# the 2nd layer
tm_shape(g) +
tm_borders(lwd = 2) +
# the 3rd layer
tm_shape(georgia_sf.sub) +
tm_fill("lightblue") +
tm_borders() +
# specify some layout parameters
tm_layout(frame = T, title = "Georgia with a subset of counties",
title.size = 1, title.position = c(0.02, "bottom"))
Here we will briefly deal with mapping points using tmap
. The code below creates some points from the coordinates of the georgia_sf
polygons centroids (ignore the warnings!)
pts_sf <- st_centroid(georgia_sf)
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for
## longitude/latitude data
These can be plotted in the same way as before but rather than using tm_polygons
, tm_fill
or tm_borders
the tm_bubbles
or tm_dots
aesthetics can be specified:
# the 1st layer
tm_shape(georgia_sf) +
tm_fill("olivedrab4") +
tm_borders("grey", lwd = 1) +
# the point layer
tm_shape(pts_sf) +
tm_dots(size = 1, alpha = 0.5)
Note how in the code above the tm_shape
function was invoked for each layer.
It is also possible to plot multiple attributes simply by passing them to the tm_fill
function. The plot window may need to be expanded to accommodate the legend:
tm_shape(georgia_sf) +
tm_fill(c("PctRural", "PctBlack")) +
tm_borders() +
tm_layout(legend.format = list(digits = 0),
legend.position = c("right", "top"),
legend.text.size = 0.5,
legend.title.size = 0.8)
Note the number of plot parameters that are set by tm_legend
- it is worthwhile exploring the help for this:
?tm_layout
You could also try legend.position = c("centre","bottom")). It looks bad! Further documentation on
tm_layout` can be found at
Try repeating the tmap
code above using a different variable such as PctEld
. What happens? You will see that tmap
automatically determines the number of classes to be included and the class intervals or breaks. Finally a colour shading scheme is automatically allocated to the map and the legend is included in the map. All of these, and many other of the default mapping settings that tmmap
uses can be controlled and modified.
The tmap
package has inbuilt tools change the way the class interval boundaries are computed. As a default, they are based on equal-sized intervals of the attribute being mapped, and different palette styles are available. Have a look at the help for tm_polygons
and you will see that a number of different plotting styles are available. You should explore these.
The class intervals can be changed to quantiles or any other range of intervals using the breaks
parameter. For example, the code below produces 2 maps with equal intervals (left), intervals based on k-means and with quantiles as (right).
However, the maps are not printed out. Instead they are assigned to p1
andp2
. These can be called directly - try entering p2
at the console or used to create a multiple map figure using the pushViewport
function in the grid
package:
# with equal interval
p1 <-tm_shape(georgia_sf) +
tm_polygons("PctBlack", title = "% Black", palette = "Greys") +
tm_layout(legend.title.size = 1)
# with a style = kmeans
p2 <- tm_shape(georgia_sf) +
tm_polygons("PctRural", title = "% Rural", palette = "Oranges",
style = "kmeans") +
tm_layout(legend.title.size = 1)
A multiple map figure can be created using the pushViewport
function as below
# Multiple plots using the grid package
grid.newpage()
# set up the layout (rows, columns)
pushViewport(viewport(layout=grid.layout(1,2)))
# plot using he print command
print(p1, vp=viewport(layout.pos.col = 1, height = 5))
print(p2, vp=viewport(layout.pos.col = 2, height = 5))
Finally the class intervals can be controlled by manually specifying the breaks
parameter:
tm_shape(georgia_sf) +
tm_polygons("PctPov", breaks=seq(0, 100, by=25))
and this can be done in many different ways:
tm_shape(georgia_sf) +
tm_polygons("PctPov", breaks=c(1,5,10, 20,30,40))
It is also possible to alter the colours used in a shading scheme. The default colour scheme uses increasing intensities of yellow red. You will have noticed the use of the different palettes in the above. The RColorBrewer
package has a number of graduated lists of colours and is automatically loaded with both tmap
and GISTools
. The palettes designed by Cynthia Brewer and they are intended to optimise the perceptual difference between each shade in the palette, so that visually each shading colour is distinct (sometimes even when converted to a greyscale). The palettes available in this package are displayed with the command:
display.brewer.all()
This displays the various colour palettes and their names in a plot window. To generate a list of colours from one of these palettes, for example enter the following:
brewer.pal(5,'Blues')
## [1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"
This is a list of colour codes used by R to specify the palette. The arguments to brewer.pal
specify that a five-stage palette based on shades of blue is required. The output of brewer.pal
can be fed into tmap
to give alternative colours in shading schemes. For example, enter the code below and a choropleth map shaded in green is displayed with its legend. The palette
argument in tm_polygons
specifies the new colours in the shading scheme.
tm_shape(georgia_sf) +
tm_polygons("MedInc", title = "Median Income", palette = "Reds") +
tm_layout(legend.title.size = 1, frame = F)
Note that the same map would be produced if the tm_fill
function was used instead of tm_polygons in this case, however without a tm_borders
function, the census block outlines are not plotted:
tm_shape(georgia_sf) +
tm_fill("PctEld", title = "Median Income", palette = "Blues") +
tm_layout(legend.title.size = 1, frame = F, legend.format = list(digits = 1))
It is also possible to display a histogram of the distribution of the variable or attribute being mapped using the legend.hist
parameter. This is a very useful for choropleth mapping as it gives a distribution of the attributes being examined. Bringing this all together allows you to create a map that with a number of refinements as in the figure below. Note for example the minus sign before the palette
parameter is used to reverse the palette order and the various parameters passed to the tm_layout
function.
tm_shape(georgia_sf) +
tm_polygons("PctEld", title = "Percentage Elderly", palette = "-GnBu",
style = "kmeans",
legend.hist = T) +
tm_scale_bar(width = 0.22, position = c("right", "top")) +
tm_compass()+
tm_layout(frame = F, title = "Georgia",
title.size = 2,
#title.position = c(0.55, "top"),
legend.hist.size = 0.5,
legend.outside = T)
New variables and attributes can be computed and mapped attributes values on the fly in tmap
. The code below plots population density using the convert2density
function applied to the TotPop90
attribute.
tm_shape(georgia_sf) +
tm_fill(col="TotPop90", convert2density=TRUE,
style="kmeans", title = expression("Population (per " * km^2 * ")"),
legend.hist=F, id="name") +
tm_borders("grey25", alpha=.5) +
tm_format_NLD_wide()
The convert2density
function automatically converts the projection units (in this case degrees of latitude and longitude) to a projection in metres and then determines areal density in km2. You can check this by creating your own population density values, and examining the explanations of how the functions operate in the help pages for the functions used such as st_area
.
Final note: here we are just using variables that are already included with the R spatial data objects. Of course you can create new variables in your analysis and assign them to the an sp
or sf
object.
The code below assigns some random numbers to a variable and generates a choropleth map:
georgia_sf$random <- rnorm(nrow(georgia_sf))
tm_shape(georgia_sf) +
tm_polygons(col="random", style="kmeans",
title = "Random", palette = "RdBu") +
tm_format_NLD_wide()
In summary, the tm_fill
and tm_polygons
functions in the tmap
package generate choropleth maps of attributes held in SpatialPolygonsDataFrame (sp
) or Simple Feature (sf
) data objects. They automatically shade the variables using equal intervals. The intervals and the palettes can both be adjusted. It is instructive to examine the plotting functions and the way they operate. You can see the code in any R function by simply calling it without any arguments or brackets:
tm_polygons
The function code detail is displayed in the R console window. You will see that it takes a number of arguments and a number of default parameters. In addition to using the R help system to understand functions, examining functions in this way can also provide you with insight into their operation.
Again we will briefly deal with mapping values or attributes for points using tmap
. There are 2 basic options: change the size or the shading of the points according to the attribute value. We can use the pts_sf
created earlier to illustrate these approaches.
First by size using tm_bubbles
:
# the 1st layer
tm_shape(georgia_sf) +
tm_fill("olivedrab4") +
tm_borders("grey", lwd = 1) +
# the points layer
tm_shape(pts_sf) +
tm_bubbles("PctBlack", title.size = "% Black", col = "gold")+
tm_format_NLD()
Second by shade using tm_dots
:
# the 1st layer
tm_shape(georgia_sf) +
tm_fill("grey70") +
tm_borders("gold", lwd = 1) +
# the points layer
tm_shape(pts_sf) +
tm_dots("PctFB", size = 0.7, title = "% Foreign", shape = 15) +
tm_layout(legend.outside = T, frame = F)
In this section two methods for undertaking classic cluster analysis will be demonstrated: Moran’s I (Anselin, 1995) and G-statistic (Ord and Getis, 1995). Now underpinning these is some concept of contiguity or a measure of how geographically connected the various spatial units are int he study areas, such as the counties of Georgia. One way to determine this is to generate some measure of adjacency for each unit to indicate which other units it is topologically connected to. Moran’s I and G-statistic seek to identify areas within this adjacency network that have a greater number of similar values of an attribute than would be expected by chance and they do this in slightly different ways.
Here the poly2nb()
function in the spdep
package can be used to determine the adjacency relation between polygons. You should examine the help and note in particular the queen
parameter.
You should also note that the poly2nb()
function takes an sp
object (SpatialPolygons
) not an sf
one. There may be a way to compute neighbours in sf
but I couldn’t find it on a plane coming back from China with no internet….!
The functions for Moran’s I and G-statistic are localmoran
and localG
in the spdep
package. You should examine the help foe these and see what arguments they take:
?localmoran
?localG
You will see that they both take a numeric vector and a listw
objected created by nb2listw
. If in turn you examine the help for nb2listw
?nb2listw
you will see that this function requires an object of class nb
as input. This is where the poly2nb
function comes in. The sequence of analysis for polygons is then:
poly2nb
nb2listw
local model
For points it is similar except that the dnearneigh
is used:
dnearneigh
nb2listw
local model
So first, we create the create nb
object using poly2nb
and nb2listw
:
lw <- poly2nb(georgia)
lw <- nb2listw(lw, zero.policy=TRUE)
Next we create the variable - we are using Median Income here:
var = georgia$MedInc
Then we calculate the cluster measure
lG <- localG(var,lw) # Create local G
And finally this can be mapped after the cluster measure has been added to the spatial object. We will use the sp
version just to show that tmap works with
spas well as
sf` and critically we will use a divergent palette to show hotpots (greater than expected) and cold spots (less than expected)
georgia$lG <- lG
tm_shape(georgia) +
tm_polygons("lG", title = "Clustering", palette = "PRGn",
style = "kmeans",
legend.hist = T) +
tm_layout(frame = F, title = "G-stat of Income",
legend.hist.size = 0.5,
legend.outside = T)
The form of the Moran’s I is very similar and we already have the nb
object:
lI <- localmoran(var,lw) # Create local G
This generates a matrix of values. Here we are interested in the first column. This can again be added to the spatial data as an attribute and mapped.
georgia$lI <- lI[,1]
tm_shape(georgia) +
tm_polygons("lI", title = "Clustering", palette = "PRGn",
style = "kmeans",
legend.hist = T) +
tm_layout(frame = F, title = "Moran's I of Income",
legend.hist.size = 0.5,
legend.outside = T)
The two cluster maps j=have broadly similar patterns but there are some nuanced differences. Interestingly the output of the localmoran
function includes p-values which could be used to indicate areas of significant clustering.
This is very short sections that simply introduces 3 methods for writing maps out to PDF, PNG and TIFF files.
There are a number of functions that do this:
pdf()
png()
tiff()
Examine the help for these.
The key thing you need to know is that these functions all open a file. The open file needs to be closed using dev.off()
after all the map has been written to it. So the syntax is:
pdf(file = "MyPlot.pdf", other setting)
<tmap code?
dev.off()
You should try to write out a .png
file of the map using the code below - note that you may want to set the working directory that you write to first:
setwd('~/Desktop/my_docs_mac/leeds_work/research/Groningen/')
# open the file
png(filename = "Figure1.png", w = 5, h = 7, units = "in", res = 150)
# make the map
tm_shape(georgia_sf) +
tm_fill("olivedrab4") +
tm_borders("grey", lwd = 1) +
# the points layer
tm_shape(pts_sf) +
tm_bubbles("PctBlack", title.size = "% Black", col = "gold")+
tm_format_NLD()
# close the png file
dev.off()
Task 4 Your task is to write code that will produce a map of the counties in Georgia, shaded in a colour scheme of your choice but using 10 classes describing the distribution of median income in 1000s of dollars (this is described by the MedInc
attribute in the data frame). The maps should include a scalebar and a legend and the code should write the map to a TIFF file, with a resolution of 300 dots per inch and a map size of 7 by 7 inches.
# Hints
display.brewer.all() # to show the Brewer palettes
breaks # to specify class breaks OR
style # in the tm_fill / tm_polygons help
# Tools
library(ggplot2) # for the mapping tools
data(georgia) # the Georgia data in the GISTools package
st_as_sf(georgia) # to convert the data to sf format
tm_layout # takes many parameters eg legend.position
Task 1 Write some code to determine the mean value and standard deviation of median income variable MedInc
from the georgia
data. Hints: the mean
and sd
functions can help you!
## [1] 10508.17
## [1] 37147.09
Task 2 Write some code to subset and plot the counties in the 25 percentile of median income from the georgia
data. Hint: the quantile
function can help you and you may want to assign the result to an R object that you can use to determine a threshold with which to subset the data as was done with index
above, using a logical statement.
thresh = quantile(georgia$MedInc, 0.25)
index <- georgia$MedInc < thresh
g.subset <- georgia[index,]
par(mar = c(0,0,0,0))
plot(georgia)
plot(g.subset, add = T, col = "forestgreen")
NB tmap
had not been introduced when this task was set
Task 3 Finally, go back to Session 3 and examine how the tb$rural
variable was created for the tb
tibble
object. The same operations can be used to on the data.frame
of sp
objects. You should create a variable called rural
that separates the counties of georgia
into Rural and Non-Rural classes and determine how many counties are Rural and Non-Rural using this definition. Hint the table
function can help here.
georgia$rural <- as.factor((georgia$PctRural > 50) + 0)
levels(georgia$rural) <- list("Non-Rural" = 0, "Rural"=1)
table(georgia$rural)
Task 4. Plots and maps: working with map data Your map should look something like the below.
georgia_sf$MedInc = georgia_sf$MedInc / 1000
# open the tif file and give it a name
tiff("MyTif.tiff", width=7,height=7,units='in',res=300)
# start the tmap commands
tm_shape(georgia_sf) +
tm_polygons("MedInc", title = "Median Income ($1000s)", palette = "GnBu",
style = "equal", n = 10) +
tm_layout(legend.title.size = 1,
legend.format = list(digits = 0), frame = F) +
tm_legend(legend.outside=TRUE)
# close the tiff file
dev.off()
Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, 93–115;
Brunsdon, C. and Comber, L., 2015. An introduction to R for spatial analysis and mapping. Sage.
Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis, 27, 286–306;
Wilkinson, 2005