1. Introduction

This is an R Markdown Notebook aimed at learning how to make thematic maps. It is based on the cartography R package developed by Timothée Giraud as well as on code written by Bhaskar Karambelkar.

The G4D Assignment No. 1 comprises two tasks: (i) to reproduce and publish a notebook similar to this one; and (ii) to add new examples of R mapping functionalities to the notebook based on the geospatial data processing skills learned at Data Camp. The assignment can be done using RStudio Cloud (for editing the document) and RPubs (for publishing the document).

Please note that this assignment is due on 21.11.2018 at noon time (12:00 m). Please send me an e-mail once you have published your work.

2. Static Mapping

Before loading any library, please make sure that you have installed it in your RStudio Cloud project:

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.3-6, (SVN revision 773)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /usr/share/gdal/2.1
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
##  Path to PROJ.4 shared files: (autodetected)
##  Linking to sp version: 1.3-1
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.1.3, PROJ 4.9.2
library(sp)
library(cartography)

2.1 How to import a geospatial vector data file

The rgdal & sp way looks like this:
library(rgdal)
###  getting vector data
#download.file("http://biogeo.ucdavis.edu/data/diva/adm/COL_adm.zip", 
#destfile = "./datos/COL_adm.zip" , mode='wb')

#unzip("./datos/COL_adm.zip", exdir = "./datos")

spcol <- readOGR(dsn = "./datos/COL_adm1.shp", verbose = FALSE)
class(spcol)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

You can pick your own html colors to plot your country:

spcol@data$COLOUR <- "#FFFFFF"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 0] <- "#006837"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 1] <- "#52BE80"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 2] <- "#28B463"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 3] <- "#D4EFDF"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 4] <- "#138D75"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 5] <- "#76D7C4"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 6] <- "#A3E4D7"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 7] <- "#D0ECE7"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 8] <- "#FAE5D3"
spcol@data$COLOUR[(as.numeric(as.character(spcol@data$ID_1)) %% 10) == 9] <- "#A9CCE3"

plot(spcol, col=spcol$COLOUR, main = "Colombia's Departments")

##### The sf way looks like this:

library(sf)

sfcol <- st_read(dsn = "./datos/COL_adm1.shp", quiet = TRUE)
class(sfcol)
## [1] "sf"         "data.frame"
sfcol$COLOUR <- spcol@data$COLOUR

plot(st_geometry(sfcol), col=sfcol$COLOUR, main = "Colombia's Departments")

The cartography way

The __cartography_ package seems to provide useful mapping functionalities. Besides, it comes with geospatial data for Europe.

library(cartography)
# Load data
data(nuts2006)

# Plot a layer with the extent of the EU28 countries with only a background color
plot(nuts0.spdf, border = NA, col = NA, bg = "#A6CAE0")
# Plot non european space
plot(world.spdf, col  = "#E3DEBF", border=NA, add=TRUE)
# Plot a layer of countries borders
plot(nuts0.spdf, border = "grey20", lwd = 3, add = TRUE)
# Plot a layer of NUTS1
plot(nuts1.spdf, border = "grey30", lwd = 2, add = TRUE)
# Plot a layer of NUTS2
plot(nuts2.spdf, border = "grey40", lwd = 0.5, add = TRUE)
# Plot a layer of NUTS3
plot(nuts3.spdf, border = "grey20", lwd = 0.1, add = TRUE)

##### Label Map

Of course, a map without labels is not very useful. Let’s make a population map:

# Layout plot
layoutLayer(title = "Most Populated Countries in Europe", # title of the map
            author = "Author: Your name here",  # 
            sources = "Sources: Please give credit", # 
            scale = NULL, # no scale
            col = NA, # no color for the title box 
            coltitle = "black", # color of the title
            frame = FALSE,  # no frame around the map
            bg = "#A6CAE0", # background of the map
            extent = nuts0.spdf) # set the extent of the map

# Non European space
plot(world.spdf, col = "#AAB7B8", border = NA, add = TRUE)
# European (EU28) countries
plot(nuts0.spdf, col = "#F0B27A",border = "white", lwd = 1, add = TRUE)

# Selection of the 10 most populated countries of Europe
dflab <- nuts0.df[order(nuts0.df$pop2008, decreasing = TRUE),][1:10,]
# Label creation
dflab$lab <- paste(dflab$id, "\n", round(dflab$pop2008/1000000,0), "M", sep ="")

# Label plot of the 10 most populated countries
labelLayer(spdf = nuts0.spdf, # SpatialPolygonsDataFrame used to plot he labels
           df = dflab, # data frame containing the lables
           txt = "lab", # label field in df
           col = "#690409", # color of the labels
           cex = 0.6, # size of the labels
           font = 2) # label font

# Add an explanation text
text(x = 5477360, y = 4177311, labels = "The 10 most populated countries of Europe
Total population 2008 [millions]", cex = 0.7, adj = 0)

Note that you can pick your own html HEX color codes from here(e.g. bright red color HEX code is #FA0C05).

Choropleth Map

Choropleth maps display geographical areas or regions which are coloured, shaded or patterned in relation to a data variable. This provides a way to visualise values over a geographical area, which can show variation or patterns across the displayed location.

A common error when making choropleth maps is to encode raw data values (such as population) rather than using normalized values (calculating population per square kilometre for example) to produce a density map.

Let’s make a map of annual population growth rate, that is the rate at which the number of individuals in a region increases in a given time period, expressed as a fraction of the initial population. These data comes from the Eurostat website. For an explanation of NUTS Levels (Nomenclature of Territorial Units for Statistics) see here.

# Compute the compound annual growth rate
nuts2.df$cagr <- (((nuts2.df$pop2008 / nuts2.df$pop1999)^(1/9)) - 1) * 100

# Set a custom color palette
cols <- carto.pal(pal1 = "green.pal", # first color gradient
                  n1 = 2, # number of colors in the first gradiant
                  pal2 = "red.pal", # second color gradient
                  n2 = 4) # number of colors in the second gradiant

# Plot a layer with the extent of the EU28 countries with only a background color
plot(nuts0.spdf, border = NA, col = NA, bg = "#A6CAE0")
# Plot non european space
plot(world.spdf, col  = "#E3DEBF", border=NA, add=TRUE)

# Plot the compound annual growth rate
choroLayer(spdf = nuts2.spdf, # SpatialPolygonsDataFrame of the regions
           df = nuts2.df, # data frame with compound annual growth rate
           var = "cagr", # compound annual growth rate field in df
           breaks = c(-2.43,-1,0,0.5,1,2,3.1), # list of breaks
           col = cols, # colors 
           border = "grey40", # color of the polygons borders
           lwd = 0.5, # width of the borders
           legend.pos = "right", # position of the legend
           legend.title.txt = "Compound Annual\nGrowth Rate", # title of the legend
           legend.values.rnd = 2, # number of decimal in the legend values
           add = TRUE) # add the layer to the current plot

# Plot a layer of countries borders
plot(nuts0.spdf,border = "grey20", lwd=0.75, add=TRUE)

# Layout plot
layoutLayer(title = "Demographic Trends", author = "cartography", 
            sources = "Eurostat, 2008", frame = TRUE, col = NA, 
            scale = NULL,coltitle = "black",
            south = TRUE) # add a south arrow

The previous examples are just some of the map types which can be produced using the cartography package.

3. Interactive Mapping

The ggiraph way

This ggiraph package provides functionalities for interactive visualization. It uses data from rnaturalearth as well as functionalities from ggplot2. This example splits the code in several “chunks” just to see what is happening inside the box:

library(ggplot2)
library(ggiraph)
library(rnaturalearth)
library(readr)
library(RCurl)
## Loading required package: bitops
urlfile <- "https://raw.github.com/bhaskarvk/user2017.geodataviz/master/inst/extdata/africa-internet_usage-2015.csv"

internet_usage <- read.csv(urlfile)

head(internet_usage)
names(internet_usage) <- c("Country Name",  "Country Code",   "Series Name",  "Series Code", 
"2014 [YR2014]", "2015 [YR2015]", "2015 [YR20156]")


names(internet_usage)
## [1] "Country Name"   "Country Code"   "Series Name"    "Series Code"   
## [5] "2014 [YR2014]"  "2015 [YR2015]"  "2015 [YR20156]"
world <- sf::st_as_sf(rnaturalearth::countries110)

## str(world)
length(unique(world$iso_a3))
## [1] 175
africa <- dplyr::filter(world, region_un=='Africa') %>%
  dplyr::left_join(internet_usage %>% dplyr::select(
    `Country Code`, `2015 [YR2015]`
  ) %>% dplyr::rename(iso_a3=`Country Code`, internet.usage.2015=`2015 [YR2015]`),
  by = 'iso_a3') %>%
  st_transform(crs="+proj=laea +lon_0=18.984375")
head(africa)
africa.centers <- st_centroid(africa)

africa.spdf <- methods::as(africa, 'Spatial')
africa.spdf@data$id <- row.names(africa.spdf@data)

africa.tidy <- broom::tidy(africa.spdf)
library(htmlwidgets)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are *required* to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see http://bit.ly/arialnarrow
library(colormap)
library(widgetframe)

africa.tidy <- dplyr::left_join(africa.tidy, africa.spdf@data, by='id')

g <- ggplot(africa.tidy) +
  geom_polygon_interactive(
    color='black',
    aes(long, lat, group=group, fill=internet.usage.2015,
        tooltip=sprintf("%s<br/>%s",iso_a3,internet.usage.2015))) +
 hrbrthemes::theme_ipsum() +
  colormap::scale_fill_colormap(
    colormap=colormap::colormaps$copper, reverse = T) +
  labs(title='Internet Usage in Africa in 2015', subtitle='As Percent of Population',
       caption='Source: World Bank Open Data.')

g

Our World in Data

Our World in Data is an online publication that was mentioned in our first G4D session. This publication shows how living conditions are changing. It covers a wide range of topics: Trends in health, food provision, the growth and distribution of incomes, violence, rights, wars, culture, energy use, education, and environmental changes. For each topic the quality of the data is discussed and, by pointing the visitor to the sources, this website is also a database of databases. Covering all of these aspects in one resource makes it possible to understand how the observed long-run sustainable development trends are interlinked.

The project Our World in Data is produced by the Oxford Martin Programme on Global Development at the University of Oxford. Visualizations are licensed under CC BY-SA and may be freely adapted for any purpose. Data is available for download in CSV format. You are free to make use of anything you find there in your academic work.

As an illustration, we will go now to Our World in Data and download a CSV file containing all data used in The size of the poverty gap: some hints regarding the cost of ending extreme poverty. Then, we will upload the file to our Rstudio Cloud Project using the cloud interface. After uploading the file, we can continue:

poverty_gap <- read.csv("./datos/poverty-gap-index.csv")

#unique(poverty_gap$Code)

length(unique(poverty_gap$Code))
## [1] 133

Let’s select only one year for mapping povert_gap.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
npov <- filter(poverty_gap, Year==1992)

names(npov) <-   c("Entity", "Code",   "Year",   "Pov_1992")  

head(npov)

Now, we need to join our world geospatial object with the poverty_gap non-spatial object:

nworld <- left_join(world, npov, by = c('iso_a3' = 'Code'))

                    
nworld %>%  st_transform(crs="+proj=laea +lon_0=18.984375")

head(nworld)

Let#s check (and solve) if there are missing values (denoted as NA in R language):

nworld$Pov_1992

nworld$Pov_1992[is.na(nworld$Pov_1992)] <- 0

More processing on the spatial objects. Type ?sf::st_centroid in the console if you need help.

world.centers <- st_centroid(nworld)

world.spdf <- methods::as(nworld, 'Spatial')
world.spdf@data$id <- row.names(world.spdf@data)

world.tidy <- broom::tidy(world.spdf)

Joining tables, again:

world.tidy <- dplyr::left_join(world.tidy, world.spdf@data, by='id')
summary(world.tidy$Pov_1992)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.429   0.000  58.850

And now, mapping:

g <- ggplot(world.tidy) +
  geom_polygon_interactive(
    color='black',
    aes(long, lat, group=group, fill=(Pov_1992),
        tooltip=sprintf("%s<br/>%s",iso_a3,Pov_1992))) +
 hrbrthemes::theme_ipsum() +
  colormap::scale_fill_colormap(
    colormap=colormap::colormaps$freesurface_red, reverse = T) +
  labs(title='Poverty Gap in the World in 1992', subtitle='As Percent of Population',
       caption='Source: World Bank Open Data.')

g

Using plotly

The plotly package provides access to an online visualitation platform. Before running code in this section, you need to sign up at plotly[https://plot.ly/r/getting-started/]

Sys.setenv("plotly_username"="xxxx")
Sys.setenv("plotly_api_key"="xxxxxxxxxxxxxxx")
options(browser = 'false')
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv')

# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)

# specify map projection/options
g <- list(
  showframe = FALSE,
  showcoastlines = FALSE,
  projection = list(type = 'Mercator')
)

p <- plot_geo(df) %>%
  add_trace(
    z = ~GDP..BILLIONS., color = ~GDP..BILLIONS., colors = 'Blues',
    text = ~COUNTRY, locations = ~CODE, marker = list(line = l)
  ) %>%
  colorbar(title = 'GDP Billions US$', tickprefix = '$') %>%
  layout(
    title = '2014 Global GDP<br>Source:<a href="https://www.cia.gov/library/publications/the-world-factbook/fields/2195.html">CIA World Factbook</a>',
    geo = g
  )

# Create a shareable link to your chart
# Set up API credentials: https://plot.ly/r/getting-started
chart_link = api_create(p, filename="nchoropleth-ag")
## Found a grid already named: 'nchoropleth-ag Grid'. Since fileopt='overwrite', I'll try to update it
## Found a plot already named: 'nchoropleth-ag'. Since fileopt='overwrite', I'll try to update it
chart_link

Using leaflet

The following code is a just a plain visualization:

library(leaflet)
plt <- leaflet() %>%
  setView(lat = 50.85045, lng = 4.34878, zoom=13) %>%
  addTiles(group="OSM") %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group="Dark") %>%
  addProviderTiles(providers$CartoDB.Positron, group="Light") %>%
  addLayersControl(baseGroups=c('OSM','Dark','Light'))
  
plt

More important is the next example which uses a dataset from the recent 2014-2016 West African Ebola outbreak. This example is taken from here

urlfile2 <- "https://raw.github.com/amcrisan/EpiDesignPattern/master/data/ebola_metadata.RDS"
myfile <- download.file(urlfile2,"./datos/ebola_metadata.RDS", method="auto")
metadata <- readRDS("./datos/ebola_metadata.RDS")
library(ape)
#library(ggtree)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
library(dplyr)
library(ggmap)
## 
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
## 
##     wind
library(RColorBrewer)
library(dygraphs)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following object is masked from 'package:leaflet':
## 
##     addLegend
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(leaflet)
# First, we need to create some new data
# Counting number cases per country
aggDat<-metadata %>%
       filter(Country !="?") %>%
       group_by(Country,country_lon,country_lat) %>%
       dplyr::count()%>%
       mutate(popup=sprintf("%s = %d cases",Country,n)) #create a popup for the map
# Here's a very quick look at what this command generates for us:
aggDat
# Now, we'll define the color palette
pal <- colorFactor(
  palette = c('red', 'blue', 'green', 'purple', 'orange'),
  domain = aggDat$Country)
# Now, we'll create the Map
# This first command will creat an empty map
m<-leaflet(aggDat)
  m %>%
        addTiles()%>% 
        addCircleMarkers(
          lng=~country_lon,
          lat= ~country_lat,
          radius=~sqrt(n)*2,
          color = ~pal(Country),
          stroke = FALSE, fillOpacity = 0.7,
          label=~as.character(popup),
          labelOptions = labelOptions(noHide = T),
          options = leafletOptions(minZoom = 0, maxZoom = 10,scroolWheelZoom=FALSE))

By modifying the code, it is possible to get a dissagregated visualization:

aggDat<-metadata %>%
        filter(Country !="?") %>%
        group_by(Country,Region,region_lon,region_lat) %>%
        dplyr::count()%>% 
        mutate(popup=sprintf("%s (%s) = %d cases",Region,Country,n))
      
m<-leaflet(aggDat)
      
m %>%
  addTiles()%>% 
  addCircleMarkers(
    lng=~region_lon,
    lat= ~region_lat,
    radius=~sqrt(n)*2,
    color = ~pal(Country), #we actually colour the points by country here
    stroke = FALSE, fillOpacity = 0.7,
    label=~as.character(popup),
    labelOptions = labelOptions(noHide = F)
  )
## Warning in validateCoords(lng, lat, funcName): Data contains 3 rows with
## either missing or invalid lat/lon values and will be ignored

Leaflet also allows you to automatically generate clusters that will vary when you zoom in and out on the map. We didn’t implement this feature in the shiny application, but I wanted to let you know this is available.

m<-leaflet(metadata) 
# by providing the region latitude and longtitude co-ordinates we allow clustering of regional samples
m %>%
  addTiles()%>%
  addCircleMarkers(
    lng=~region_lon,
    lat= ~region_lat,
    stroke = FALSE, fillOpacity = 0.5,
    clusterOptions= markerClusterOptions(titile="regional clusters") #cluster options
  )
## Warning in validateCoords(lng, lat, funcName): Data contains 34 rows with
## either missing or invalid lat/lon values and will be ignored

The cluster colour here are automatically assigned by the Leaflet package, and are as follows:

  • orange / red for the largest clusters
  • yellow for “medium” clusters
  • green for small clusters

By click on a point, the clustering algorithm reveals the geospatial locations that were aggregated into a cluster (which its a neat and handy feature).

4. Addition of new content

This section is a blank section. Please complete it by focusing on any of the 17 Sustainable Development Goals and using the R skills learned at DataCamp.

Bis bald!!!