Mapping a Table of Data with Esri Shapefiles in R

Kevin McCormack Dr. Mary Smyth Sinead Phelan

Introduction

In this tutorial we discuss how to join Tanzanian GDP data, in CSV format, with two Esri Shapefiles, all opensource, and construct the map below, within the R environment.

Tanzania   + geom_sf(data=TanzmyDFW, fill ="skyblue")  + theme(panel.background = element_rect(fill = "white"),
                                                               panel.border = element_rect(colour = "blue",fill = NA)                                                           ) 

plot of chunk unnamed-chunk-1

Key steps

The following steps are involved in the construction of this map:

a) Create a CSV file with GDP data extracted from an official Tanzanian publication,

b) Create an R data-frame from this data,

c) Add a geospatial reference to this data-frame,

d) Download two Esri shapefiles, regions and water bodies, from the Tanzanian National Bureau of Statistics' (TNBS) website,

Steps continued

e) Create geospatial data-frames from the Esri shapefiles,

f) Join the region geospatial data-frame to the GDP data-frame,

g) Plot the data using the “ggplot2” package, and overlay the region and water bodies geospatial data frames,

h) Plot multiple time series maps and produce interactive zoomable web based maps using the “tmap” package.

Data Source

Data Source - Tanzanian National Bureau of Statistics

a) The Tanzanian National Bureau of Statistics (TNBS) has been established as an autonomous public office by the Statistics Act, 2015

b) It has the mandate to provide official statistics to the Government, business community and the public at large.

c) The Act also gives NBS the mandate to play the role as a co-coordinating agency, within the National Statistical System (NSS) to ensure that quality official statistics is produced.

d) Before the enactment of the Statistics Act of 2015, the NBS was one of the Government Executive Agencies which was established on the 26th March,1999 under the Executive Agencies Act, 1997.

Data

For this tutorial we will be using the data extracted from “Table 21: Regional Shares of GDP at Current Market Prices”, of the TNBS publication “National Accounts of Tanzania Mainland 2008 - 2016”. (See Section 10)

http://www.nbs.go.tz/nbs/takwimu/na/National_Accounts_Statistics_of_Tanzania%20_Mainland_2016.pdf

The data was manually extracted from the above PDF file to the following CSV file

Tanzania RegionCodes GDP Pct.csv

Note that the Region column is referenced as “Region_Num” to allow for linking to the Regions Esri shapefile later in this tutorial. Data is not available for all regions.

However, in order for the map to include the shape of all regions in Tanzania, references for the missing regions are added to the end of the data table.

RStudio

R is a language and environment for statistical computing and graphics and is rapidly becoming the leading programming language in statistics and data analytics.

It is recommended to use R-studio, which, provides popular open source and enterprise-ready professional software for the R statistical computing environment.

R-studio can be downloaded here: https://www.rstudio.com/

Reading the GDP data

Firstly, set the working directory, for example:

setwd("C:/My Documents/R/SDG/Tanzania/")

Next the CSV file is read in and an R data-frame is created, which is referenced as “region”.

  region <- read.csv("~/Tanzania RegionCodes GDP Pct.csv")

Region

head(region)
   Region_Nam 2007 2010 2011 2012 2013 2014 2015 2016
1      Dodoma 3.01 3.17 3.09 3.10 3.03 3.04 2.90 2.92
2      Arusha 4.52 4.87 4.73 4.77 4.74 4.75 4.70 4.71
3 Kilimanjaro 4.71 4.45 4.60 4.54 4.54 4.54 4.54 4.45
4       Tanga 5.58 4.65 4.75 4.69 4.67 4.66 4.66 4.68
5    Morogoro 5.41 4.87 4.93 4.89 4.84 4.85 4.90 4.81
6       Pwani 1.89 1.88 1.86 1.85 1.81 1.81 1.81 1.80

Esri Shapefiles - Tanzania

The Esri shape files used in this tutorial are called Regions.shp and Water_body.shp.

They were downloaded to the working directory from the TNBS website.

http://www.nbs.go.tz/nbstz/index.php/english/statistics-by-subject/population-and-housing-census/258-2012-phc-shapefiles-level-one-and-two

dlshape=function(shploc, shpfile) {
  temp=tempfile()
  download.file(shploc, temp)
  unzip(temp)
  shp.data <- sapply(".", function(f) {
    fp <- file.path(temp, f)
    return(readOGR(".",shpfile))
  })
}
TanzmyDFW = dlshape(shploc="http://www.nbs.go.tz/nbs/takwimu/references/GIS_Maps.zip", "Water_Body")
TanzmyDF= dlshape(shploc="http://www.nbs.go.tz/nbs/takwimu/references/GIS_Maps.zip", "Regions")
unlink(temp)
)

Creating an R data-frame from an Esri Shapefile.

To create an R data-frame from an Esri Shapefile one first needs to load the following library:

library(sf)

Using the st_read( ) function the Esri Regions shapefile is read in as a data frame, “TanzmyDF”.

TanzmyDF <-  st_read("Regions.shp",stringsAsFactors = FALSE)
Reading layer `Regions' from data source `C:\Users\oluingm\Documents\Tanzania\Regions.shp' using driver `ESRI Shapefile'
Simple feature collection with 30 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 29.38008 ymin: -11.76235 xmax: 40.44473 ymax: -0.983143
epsg (SRID):    NA
proj4string:    +proj=longlat +ellps=clrk80 +no_defs

Water Bodies shapefile

Using the st_read( ) function the Esri water_body shapefile is read in as a data frame, “TanzmyDFW”.

TanzmyDFW <- st_read("Water_body.shp", stringsAsFactors = FALSE)
Reading layer `Water_body' from data source `C:\Users\oluingm\Documents\Tanzania\Water_body.shp' using driver `ESRI Shapefile'
Simple feature collection with 17 features and 17 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 29.03289 ymin: -12.48018 xmax: 41.58952 ymax: -0.9913369
epsg (SRID):    NA
proj4string:    +proj=longlat +ellps=clrk80 +no_defs

Figure 4: TanzmyDFW dataframe.

Merging the geospatial and GDP data frames, TanzmyDF and region

The merge( ) function is use to join geospatial and GDP data to create a data-frame titled TanzReg, using the 'Region_Nam' reference.

Figure 5: TanzReg data-frame

TanzReg <- merge( TanzmyDF, region, by='Region_Nam')
head(TanzReg)[4:11]
Simple feature collection with 6 features and 7 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 30.41067 ymin: -9.040651 xmax: 39.55289 ymax: -0.983143
epsg (SRID):    NA
proj4string:    +proj=longlat +ellps=clrk80 +no_defs
   2010  2011  2012  2013  2014  2015  2016                       geometry
1  4.87  4.73  4.77  4.74  4.75  4.70  4.71 MULTIPOLYGON (((36.41799 -2...
2 16.81 16.69 16.93 17.28 17.20 17.20 17.02 MULTIPOLYGON (((39.12354 -6...
3  3.17  3.09  3.10  3.03  3.04  2.90  2.92 MULTIPOLYGON (((35.20109 -6...
4    NA    NA    NA    NA    NA    NA    NA MULTIPOLYGON (((31.81682 -2...
5  5.27  5.29  5.25  5.18  5.19  5.30  4.92 MULTIPOLYGON (((34.93466 -6...
6  3.90  3.96  3.94  3.94  3.94  3.94  3.95 MULTIPOLYGON (((31.69661 -2...

Mapping the GDP data

There are quite a number of R libraries that can be use in mapping geospatial data and in this tutorial the ggplot( ) function from the ggplot2 library is used. First load the library:

The plot is build up using this function followed by a “+” to build up the layers.

The R code below creates a map referenced as “Tanzania” as follows:

  • selects the TanzReg data to be mapped - ggplot,

  • identifies the year 2016 as the fill - geom_sf

  • provides the colours & limits for the scale fill and the name of the scale - scale_fill_gradient2, and where there is no data, the region is coloured yellow, “na.value”.

Mapping the GDP data Contd.

  • provides a title - ggtitle

  • removes the default axis titles and axis ticks - theme.

Mapping the GDP data code

Tanzania <- ggplot(TanzReg) + # input data
  geom_sf(aes(fill=TanzReg$`2016`)) +
       scale_fill_gradient2 (low = "white", high = "red", # colours 
                        limits = c(0, 20), #limts
                        na.value = "yellow", # colour when there are not values
                        name = "Regional GDP at Current Market Prices") + # legend options
  ggtitle("Regional Shares of GDP at Current Market Prices,Tanzania Mainland, 2016") +
     labs( caption = "Figure 6: Regional GDP, 2016 - no water bodies")+
  theme(plot.title = element_text(hjust = 0.1)) + theme(axis.text = element_blank(), axis.title = element_blank(),  axis.ticks = element_blank() ) 
Tanzania

No Water bodies

Tanzania

plot of chunk unnamed-chunk-11

Overlaying the water bodies map

In order to improve the look of the map, we will overlay the water bodies data-frame and add a blank background, using the following code.

Tanzania # original map +
geom_sf(data=TanzmyDFW, fill ="skyblue") + # overlay the water-bodies
blank() # blank background

Water bodies map

Tanzania + geom_sf(data=TanzmyDFW, fill ="skyblue") + blank()

plot of chunk unnamed-chunk-13

Thematic Maps - tmap

  • Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths1.

  • With the tmap package, thematic maps can be generated with great flexibility. The syntax for creating plots is similar to that of ggplot2, but tailored to maps. The initial command specifies the shape object and data input (tm_shape()) and is followed by the map layer (e.g., tm_polygons()). Layers can be stacked similar to ggplot2 using the + symbol. In addition, attribute elements can be added to the map and maps can be faceted similar to using facets in ggplot2.

  • With tmap there is not a lot of data preparation that needs to happen before mapping. With very little code you can create a simple map.

  • Furthermore, tmap has a unique capability to generate static and interactive maps using the same code via tmap_mode().

1 Choropleth: Areal regions, such as countries or municipalities, are filled with colours that represent a variable which is either a density or a ratio. The usage of class intervals encourages the readability of the data values.

Static Plotting or Interactive Viewing - tmap_mode

The global option tmap.mode determines the whether thematic maps are plotted in the graphics device, or shown as an interactive leaflet map. The function tmap_mode is a wrapper to set this global option.

There are two options for tmap_mode, plot or view, both of which will be applied to the TanzReg dataframe created in Section 8.

  • “plot”

Thematic maps are shown in the graphics device. This is the default mode, and supports all tmap's features and extensive layout settings tm_layout

tmap_mode Contd.

  • “view”

Thematic maps are viewed interactively in the web browser or RStudio's Viewer pane.

Maps are fully interactive with tiles from OpenStreetMap or other map providers (see tm_tiles).

This mode generates a leaflet widget, which can also be directly obtained with tmap_leaflet.

With RMarkdown, it is possible to publish it to an HTML page.

Static Plot

First load the tmap library.

library (tmap) # for static and interactive maps

Next set the mode to plot, this is the default mode.

  tmap_mode("plot")

The plot is build up using this function followed by a “+” to build up the layers.

The R code below creates a map referenced as “Tanzania2016” as follows:

  • selects the TanzReg data to be mapped - tm_shape

  • identifies the year 2016 as the fill - tm_polygons

Static Plot contd.

  • the colours & limits for the scale fill and the name of the scale are automatically selected, and where there is no data, the region is coloured grey, “na.value”.

  • provides a title - title

  • a main title is provided, including position and size - tm_layout

  • removes the default axis titles and axis ticks - theme.

No water bodies

Figure 8: Regional GDP, 2016 - no water bodies

Tanzania2016

plot of chunk unnamed-chunk-16

Overlaying the water bodies map

In order to improve the look of the map, we will overlay the water bodies data-frame and add a blank background, using the following code. (See Figure 9)

(TanzaniaB <- tm_shape(TanzmyDFW) + tm_fill(col = "lightblue") )
(TanzaniaA2016 <- Tanzania2016 + TanzaniaB)

Figure 9: Regional GDP, 2016 - including water bodies

TanzaniaA2016

plot of chunk unnamed-chunk-18

Plotting multiple maps - tmap_arrange()

Multiple maps can be arranged in a single metaplot with tmap_arrange(), which can be used to visualise a time series.

For example, if the script in Section 11.2 is run for the years 2014 and 2015, (i.e. col=“2014” and col=“ 2015” in the tm_polygons statement). There are now have three maps of a time series that can be visualised.

The following code arranges the three maps in a single metaplot, where the output for 2014 and 2015 are TansaniaA2014 and Tanzania2015A respectively. (See Figure 10)

Figure 10: Regional GDP, 2014-16

tmap_arrange(TanzaniaA2014, TanzaniaA2015, TanzaniaA2016 )

plot of chunk unnamed-chunk-19

Interactive Maps

While static can enhance geographic datasets, interactive maps can take them to a new level.

Interactivity can take many forms, the most common and useful of which is the ability to pan around and zoom into any part of a geographic dataset overlaid on a 'web map' to show context.

It should be noted that the map is always projected according to the Web Mercator projection.

Although this projection is the de facto standard for interactive web-based mapping, it lacks the equal-area property, which is important for many thematic maps, especially choropleths.

Interactive Maps contd.

A unique feature of tmap, previously mentioned, is its ability to create static and interactive maps using the same code.

Maps can be viewed interactively at any point by switching to view mode, using the command tmap_mode(“view”).

This is demonstrated in the code below, which creates an interactive map of Tanzania on the tmap object Tanzania2014, created in Section 11.3 and illustrated in Figure 11.

tmap_mode("view")
TanzaniaA2014 +
tm_legend(outside = TRUE) +
tm_view(view.legend.position = c("left", "bottom", legend.title.size = .5,
legend.text.size = .5))

Map

live interactive map