Spatial Data
In the social sciences, vectors within our data frames are frequently tied to geographical units: census divisions, voting districts, cities, etc.. Various functions within the ggplot2 package allow us to represent these vectors, as well as geographical objects such as city boundaries, roads, water sources and trees. In this lab, we will: (1) visualize demographic statistics across census divisions in Canada, (2) import and transform data from shape files, and (3) map point, line and polygon data.
Relevant functions: ggplot(),
mapcan(), st_read(),
geom_sf().
1. Visualizing Continuous Data on a Map
We will first use spatial and census data from the mapcan() package. You’ll find different packages in R that specialize in different types of data by geographical units, for various countries. The maps package offers a comprehensive collection of country data, but some packages are country-specific (ex.: the geoBR package for Brazil, or the mapcan package for Canada). Today, we’ll be using mapcan.
1.1 Mapping census divisions in Canada
We start by creating a simple map of the census divisions in Canada.
# Loading necessary packages
library(mapcan)
library(ggplot2)
# Creating a data frame called "map_data"
# Note: this data is encompassed within the mapcan package and is made
# accessible via the mapcan() function
map_data <- mapcan(boundaries = census, type = standard)
# Proving that this is a normal data frame like those we used before
class(map_data)
## [1] "data.frame"
# Printing out the column names of that data frame
colnames(map_data)
## [1] "long" "lat" "order"
## [4] "hole" "piece" "group"
## [7] "census_division_name" "census_division_type" "pr_sgc_code"
## [10] "pr_alpha" "pr_english" "pr_french"
## [13] "census_division_code"
# Creating the map
ggplot(map_data, aes(long, lat, group = group)) +
geom_polygon(fill="white", color="#454746") + # Plotting the census divisions
theme_mapcan() + # Using the presentation style from this package
coord_fixed() + # Fixing the coordinates to avoid distortion
ggtitle("Map of Canada by Census Subdivisions") # Including a title
1.2 Creating a merged dataset
We then want to visualize data encompassed within a vector within these census subdivisions. To do this, we must merge information on the share of immigrant population per district (also encompassed within the mapcan package) to our map_data data frame.
# Extracting a data frame called "census_2016" from the mapcan package
census_2016 <- mapcan::census_pop2016 # "mapcan::" here means "from the mapcan package"
# Changing the name of the first column for matching purposes
# It should match that of the geographical unit variable in map_data (i.e. "census_division_code")
colnames(census_2016)[1] <- "census_division_code"
# Loading the dyplr package for left_join
library(dplyr)
# Merging two columns of the census_2016 data frame to our map_data data frame
# by geographical unit (called "census_division_code")
map_data <- left_join(map_data, census_2016[,c("census_division_code","born_outside_canada_share")], by="census_division_code")
# Printing the summary of the variable on immigration share per district
summary(map_data$born_outside_canada_share)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01604 0.03450 0.05147 0.07117 0.51153
1.3 Mapping immigrant population by census district
We can now replace the white color within our polygons by a continuous shade representing the percentage of individuals who were not born within Canada within each district.
# Creating the map
ggplot(map_data, aes(long, lat, group = group, fill= born_outside_canada_share)) +
# ^ Notice how we just added a "fill" argument that wasn't there before?
# This indicates we want to fill the polygons with different colors based
# on the values of the born_outside_canada_share variable
geom_polygon() + # Plotting the census divisions (removing the previous arguments)
scale_fill_viridis_c(name = "% of population born \noutside of Canada") +
# ^ Specifying a color range from the "viridis" color palette + naming legend
theme_mapcan() + # Using the presentation style from this package
coord_fixed() + # Fixing the coordinates to avoid distortion
ggtitle("Immigrant Population per Census District") # Including a title
If we want to keep only census districts located in Ontario, we can simply subset the dataset using basic R grammar. Note: the pr_alpha variable refers to the province abbreviation.
# Creating the map
ggplot(map_data[map_data$pr_alpha=="ON",], # Subsetting the data
aes(long, lat, group = group, fill= born_outside_canada_share)) +
# on the values of the born_outside_canada_share variable
geom_polygon() + # Plotting the census divisions (removing the previous arguments)
scale_fill_viridis_c(name = "% of population born \noutside of Canada") +
# ^ Specifying a color range from the "viridis" color scale + naming legend
theme_mapcan() + # Using the presentation style from this package
coord_fixed() + # Fixing the coordinates to avoid distortion
ggtitle("Immigrant Population per Census District") # Including a title
2. Mapping Geographical Objects
In this portion of the lab, we will be using openly accessible data from opendata.london.ca, a governmental platform from the municipal government allowing users to explore and download a plethora of datasets related to the city of London. These datasets are available in various formats (among others .csv, .shp and .GEOjson).
2.1 Importing Shape Files
I have downloaded the relevant data for today’s lab as shapefiles, and have shared them in the #graphs channel within our Slack. Please download the entirety of the folder associated with each dataset (i.e. the folder encompassing the following file formats: .cpg, .dbf, .prj, .shp and .shx).
In this case, I wanted us to map the boundaries of London, as well as the roads, water sources and trees within the city. You will find below how to import these various files into R.
WARNING: Please note that the class recording used the RGDAL package. I have changed this to the SF package in this lab material, because RGDAL is now deprecated, and most students won’t be able to install it. You’ll notice that this version is slightly easier than that in the recording! :)
# Setting the working directory
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
# Loading the sf package to load the data
library(sf)
# Loading the shapefiles using st_read()
# City boundaries data
boundaries_ldn <- st_read("./Shapefiles/City_Boundary/London%2C_Canada_City_Boundary.shp", quiet = TRUE)
# Roads data
roads_ldn <- st_read("./Shapefiles/Single_Line_Roads/Single_Line_Roads.shp", quiet = TRUE)
# Water data
water_ldn <- st_read("./Shapefiles/Water/Water.shp", quiet = TRUE)
# Trees data
trees_ldn <- st_read("./Shapefiles/Trees/Trees.shp", quiet = TRUE)
2.2 Visualizing Spatial Data
Please note that when adding multiple layers of data, the order in which you include these layers in ggplot() will also be the order in which they are presented. You’ll notice that in the example below, I plot the roads last to make sure that they appear clearly and are not overshadowed by other features (in this particular case, by trees).
2.2.1 Polygons: City Boundaries
We begin by plotting the boundaries of the city of London.
# Creating the map
ggplot() +
geom_sf(data = boundaries_ldn, colour = "black", fill = NA) +
# ^ this is the line that adds the boundaries
# Note: the "fill" argument defines the color within the polygon (here, we don't want any, so we write NA)
theme_mapcan() +
coord_sf() +
ggtitle("Boundaries of London, ON")
2.2.2 Lines: Roads
We then add a line feature class of all the roads within London.
# Creating the map
ggplot() +
geom_sf(data = boundaries_ldn, colour = "black", fill = NA) +
geom_sf(data=roads_ldn, color="black", size=0.4) +
# ^ this is the line that adds the roads
# Note: the "size" argument defines the width of the lines
theme_mapcan() +
coord_sf() +
ggtitle("Map of London, ON") +
theme(legend.position = "none")
2.2.3 Polygons: Water
We here visualize a polygon representing water features within the municipal boundaries of London as identified through aerial imagery.
# Creating the map
ggplot() +
geom_sf(data = boundaries_ldn, colour = "black", fill = NA) +
geom_sf(data=roads_ldn, color="black", size=0.4) +
geom_sf(data =water_ldn, colour = "blue", fill = "blue", alpha=0.4, size=0.2) +
# ^ this is the line that adds the water
# Note: the "alpha" argument makes the water more transparent
# I found it more aesthetically pleasing to have a darker border than the filled color here,
# but you could have a single-tone visualization if you removed the alpha argument
theme_mapcan() +
coord_sf() +
ggtitle("Map of London, ON") +
theme(legend.position = "none")
2.2.4 Points: Trees
We are now adding the trees outside of natural forested lands within the municipal boundaries of the City of London as identified through aerial imagery.
# Creating the map
ggplot() +
geom_sf(data = boundaries_ldn, colour = "black", fill = NA) +
geom_sf(data = trees_ldn, color="darkgreen", size=0.5, alpha=0.01) +
# ^ this is the line that adds the trees
# Note: the "alpha" argument makes the trees more transparent to avoid oversaturating the map
geom_sf(data=roads_ldn, color="black", size=0.4) +
geom_sf(data =water_ldn, colour = "blue", fill = "blue", alpha=0.4, size=0.2) +
theme_mapcan() +
coord_sf() +
ggtitle("Map of London, ON")
# You'll notice that this last map takes longer to load... it's because there are a LOT of trees in London, ON!
Exercise
Exercise
Using ggplot(), create a map displaying the boundaries of London as well as the Parks polygon data and the Roads line data (available in the Dropbox folder shared via Slack). Give a grey color to the roads and a green color to the parks. When you are done, send a screenshot of your finished work in the #graphs channel of our Slack.
Please note that you can also download and map any spatial data of your choosing from the opendata.london.ca if you want to explore this further as other students finish the exercise.