Introduction

The tmap package is an easy way to plot thematic maps in R. Thematic maps are geographical maps in which spatial data distrubtions are visualized. The package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps. The syntax for creating plots is similar to that of ggplot2.

Content Overview

There are many features within the tmap package, so today we will only go over how to create a simple tmap, add data to the map, and then spend time understanding different ways to modify components and aesthetics. This code through is aimed at those who have some experience with the R programming language and want to take their understanding and abilities to the next level.

Sections

Installing packages and uploading data
Creating your first tmap
Bounding boxes
Clustering Data
Aesthetics
Interactivity

Installing packages and uploading data

# Packages 

library( geojsonio )   # read shapefiles
library( sp )          # work with shapefiles
library( sf )          # work with shapefiles - simple features format
library( mclust )      # cluster analysis 
library( tmap )        # theme maps
library( ggplot2 )     # graphing 
library( ggthemes )    # nice formats for ggplots
library( dplyr )       # data wrangling 
library( pander )      # formatting RMD tables
library( tidycensus )
library( cartogram )  # spatial maps w/ tract size bias reduction
library( maptools )   # spatial object manipulation

This code through will use Census data from the 2012 American Communities Survey for Seattle, Washington.

github.url <- "https://raw.githubusercontent.com/jalehend/Sea_Data/main/seattle_dorling.geojson"
sea <- geojson_read( x=github.url,  what="sp" )

Creating your first tmap

The first step to creating a tmap is to transform your data into a spacial polygons dataframe. Spatial data is typically translated directly to “map”. However, there is much more to it. By analyzing spatial data we are able to understand how certain variables impact our lives: where we live, why certain locations are popular travel destinations, why brands are more successful in certain locations, etc. The tmap package allows us to explore this with spatial data.

sea <- spTransform( sea, CRS("+init=epsg:3395") )

Once the data is transformed spatially, we are able to place it into a tmap.

tmap_mode("view")
tm_shape(sea) + 
  tm_polygons( size="POP", col="hinc12", n=7, style="quantile", palette="Spectral" )

Bounding Boxes

Congratulations! You have your first tmap, with your spatial data populating in the Greater Seattle area. While you are able to zoom in and out, it is preferred to have a more aesthetically appealing map. You can do this by creating a bounding box. The bounding box creates your tmap more zoomed in, so that you are immediately viewing the data that is most important. You can find your x and y axis using the locator() function.

# user-defined bounding box to move closer to subjects 
bb <- st_bbox( c( xmin =  -13626971, xmax = -13582988, 
                  ymax = 6078671, ymin = 5955885 ), 
               crs = st_crs("+init=epsg:3395"))

tmap_mode("view")
tm_shape( sea, bbox=bb ) + 
  tm_polygons( col="hinc12", n=10, style="quantile", palette="Spectral" ) +
  tm_layout( "Seattle Dorling Cartogram", title.position=c("right","top") )

Now you notice that you are more zoomed in on your data and are able to quickly get a better idea of what the data is actually telling you. The next step is to cluster your data.

Clustering data

Clustering data is an important way to analyze city data. It allows you to accurately assign labels to your data, giving life to the neighborhoods and cities that you are analyzing. Completing a cluster analysis allows us to organize our data into groups, or neighborhoods, to classify and assign them to similar groups.

The first step to running a cluster analysis is to extract the data from the shapefile and save it as a separate data frame. Once that is done, we will also need to transform the variables into z scores to ensure that each variable holds equal weight. Z scores typically range from about -3 to +3 with a mean of 0.

d1 <- sea@data
# head(d1[,1:6])

keep.these <- c("pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12", "polang12", 
"phs12", "pcol12", "punemp12", "pflabf12", "pprof12", "pmanuf12", 
"pvet12", "psemp12", "hinc12", "incpc12", "ppov12", "pown12", 
"pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12", "p10yrs12", 
"p18und12", "p60up12", "p75up12", "pmar12", "pwds12", "pfhh12")

d2 <- select( d1, keep.these )
d3 <- apply( d2, 2, scale )

head( d3[,1:6] ) %>% pander()

pnhwht12	pnhblk12	phisp12	pntv12	pfb12	polang12
1.122	-0.323	-0.8162	-0.1682	-1.221	-1.219
-0.9925	0.446	0.7033	-0.06314	0.5707	0.68
0.7237	-0.6321	-0.5386	-0.2733	-0.8335	-0.9982
0.7436	-0.5444	0.1169	0.9753	-1.027	-0.7875
-2.298	1.115	3.61	-0.2857	1.178	1.943
1.114	-0.3753	-0.6713	-0.1373	-1.004	-1.212

The next step is to run the cluster analysis. To use this, we will use the mclust package. Without getting into too much detail, the mclust package is used for model-based clustering, classification, and density estimation based on finite normal mixture modelling. Basically, mclust uses a model to organize your data into a specific number of clusters, or groups, based on commonality. For more information, visit https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html.(This part of the code typically takes a little while to process)

# library( mclust )
set.seed( 1234 )
fit <- Mclust( d3 )
sea$cluster <- as.factor( fit$classification )
summary( fit )

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVE (ellipsoidal, equal orientation) model with 6 components: 
## 
##  log-likelihood   n  df       BIC       ICL
##       -12195.62 567 800 -29463.53 -29498.65
## 
## Clustering table:
##   1   2   3   4   5   6 
## 169  62  88 129  62  57

Customizing Aesthetics

Looking at the analysis above, there were 6 clusters that Seattle was broken into. Below, we will add a little bit of aesthetic customization to be able to see and understand the clusters more. We are still using our tmap package, but we are being a bit more descriptive.

Within the tmap, we are now defining the colors (col) by each individual cluster and assigning the “paired” color palette for each of the 6 clusters. This will help make the neighborhoods stick out to depict which cluster they belong to.

tmap_mode("view")
tm_shape( sea, bbox=bb ) + 
  tm_polygons( col="cluster", palette = "Paired", n = 6,
               title="Community Types")

This is a very basic level of customization. I encourage you to mess around with the various color palettes to see what palettes help make clusters stick out more and which ones are not as aesthetically pleasing for a map.

Now that we have a basic level of customization, we can move on to understanding how to make this map more interactive.

Interactivity

The first step to creating more interactivty is to create labels for your clusters. Below is the code used to create cluster labels and add them into your dataset for future use. I have already been able to analyze my clusters, so I will not go into that in this tutorial, but I encourage you to anaylze your clusters to see what characteristics bring those neighborhoods together.

sea$cluster[ sea$cluster == "1" ] <- "Established"
sea$cluster[ sea$cluster == "2" ] <- "Disadvantaged"
sea$cluster[ sea$cluster == "3" ] <- "Diverse Middle Class"
sea$cluster[ sea$cluster == "4" ] <- "Wealthy"
sea$cluster[ sea$cluster == "5" ] <- "Progressive Urban"
sea$cluster[ sea$cluster == "6" ] <- "Suburbia"

Now that you have your clusters labeled, you can begin looking at creating more interactive tmaps. In the code below, I added “popup.vars”. A great feature of tmap is the ability to get a closer look at each neighborhood and some of the variables in your dataset. For this example, I am only adding the cluster group as “Neighborhood type” and the population.

tmap_mode("view")
tm_shape( sea, bbox=bb ) + 
  tm_polygons( col="cluster", palette = "Paired", n = 6,
               title="Community Types",
               popup.vars = c("Community Type: " = "cluster", 
                              "Population: " = "POP"))

Interactive Maps with R and tmap

Jacob Hendershot

30 APR 2022