Creating maps is something you don’t have to do commonly but if ever you are in a rush with some project where you have to display data using maps (as it happened to me), or you are just interested in the matter, this tutorial will give you a fast introduction about the data you need and two practical tools for creating maps using R. For the examples we’re going to use maps of Mexico, but it must not be difficult to extend it to any other map.
So even if you don’t have any geospatial data but just a bunch of longitudes and latitudes, you should keep reading cause this tutorial is for you.
Spatial data is the input you need for getting a map as an output. It is a way of representing polygons in a computer. spacial data is often stored in shapefiles.The shapefile format is a digital vector storage format for storing geometric location and associated attribute information.
Shapefile structure is made of 3 mandatory files:
1.- Shapefile shape format (.shp) is the file where all the polygons are stored.
2.- Shapefile shape index format (.shx) stores an index of the geometric entities
3.- Shape attribute format (.dbf) this file stores information about the polygons. (e.g. country, population, postal code etc.)
We are going to use readOGR() function from “rgdal” library to read our spatial data. if you don’t have rgdal lirbary installed yet, just write:
install.packages("rgdal")
On the INEGI webpage, you can download the geospatial information of Mexico. ‘areas_geoestadisticas_estatales.shp’ is the datafile that divides Mexico into 32 polygons (states).
library(rgdal)
mex <- readOGR(dsn ="conjunto_de_datos/areas_geoestadisticas_estatales.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "conjunto_de_datos/areas_geoestadisticas_estatales.shp", layer: "areas_geoestadisticas_estatales"
## with 32 features
## It has 2 fields
class(mex)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Spatial polygons have some special attributes that you can find using the @ (as if it were $). The information about the map is stored in @data as a data frame.
class(mex@data)
## [1] "data.frame"
head(mex@data)
## CVE_ENT NOM_ENT
## 0 01 Aguascalientes
## 1 02 Baja California
## 2 03 Baja California Sur
## 3 04 Campeche
## 4 05 Coahuila de Zaragoza
## 5 06 Colima
If you want to add information to your map, here is precisely where you have to do it. Just to make more interesting our map, let’s download the population size of each state here. From the webpage, select poblacion total > entidad y municipio > sexo>ver consulta, download it as .csv format and name it poblacion.csv, then enter the following code to clean a little bit and take a look.
Very important: When preparing your data to be merged, you have to be careful that the information match perfectly. In this example, info dataset has the record “Distrito Federal” and mex@data “Ciudad de México”.Both records refer to the same place but with a different name. We have to change it so they can match. This is an easy example but in some cases this can be a real data cleaning challenge.
info <- read.csv("poblacion.csv",skip=6,header = T) ## Read Data
head(info) ## show raw data
## X X.1 Total Hombre Mujer X.2
## 1 Total 112,336,538 54,855,231 57,481,307 NA
## 2 01 Aguascalientes 1,184,996 576,638 608,358 NA
## 3 02 Baja California 3,155,070 1,591,610 1,563,460 NA
## 4 03 Baja California Sur 637,026 325,433 311,593 NA
## 5 04 Campeche 822,441 407,721 414,720 NA
## 6 05 Coahuila de Zaragoza 2,748,391 1,364,197 1,384,194 NA
info <- info[1:33,1:5] ## Subset it so we remove Na and empty observations
colnames(info)[1:2] <- c("CVE_ENT","NOM_ENT") ## Give a handy name to the columns
for(i in 1:3){ ## Remove the comas and convert it to numeric
info[,i+2] <- as.character(info[,i+2])
info[,i+2]<- gsub(",","",info[,i+2])
info[,i+2]<- as.numeric(info[,i+2])
}
info$NOM_ENT <- as.character(info$NOM_ENT)
info[10,2] <- "Ciudad de México"
head(info) ##Show clean data
## CVE_ENT NOM_ENT Total Hombre Mujer
## 1 Total 112336538 54855231 57481307
## 2 01 Aguascalientes 1184996 576638 608358
## 3 02 Baja California 3155070 1591610 1563460
## 4 03 Baja California Sur 637026 325433 311593
## 5 04 Campeche 822441 407721 414720
## 6 05 Coahuila de Zaragoza 2748391 1364197 1384194
Now, we can merge mex@data (the map information) and info (population size). Because of the way we named the columns in the dataset info, we don’t have to specify the columns by we want to make the join (because they are the same).
mex@data <- merge(mex@data,info)
## Loading required package: sp
head(mex@data)
## CVE_ENT NOM_ENT Total Hombre Mujer
## 1 01 Aguascalientes 1184996 576638 608358
## 2 02 Baja California 3155070 1591610 1563460
## 3 03 Baja California Sur 637026 325433 311593
## 4 04 Campeche 822441 407721 414720
## 5 05 Coahuila de Zaragoza 2748391 1364197 1384194
## 6 06 Colima 650555 322790 327765
The easiest way to plotting a map is using the base plotting system (BPS). This is the default R package for making plots and its basic command is plot(). An interesting thing about this function is that it is so flexible that it´ll figure out the plot that best fits your data eg. histogram, scatterplot or map.
The plot() function can take as an input the .shp data.
plot(mex,main="My first map",col="khaki4")
To plot a specific state follow the next step:
Recall from the data that “Ciudad de México” has CVE_ENT = 09
cdmx <- mex$CVE_ENT=="09"
plot(mex[cdmx,],col="grey")
note: Previously we said that the information of the map is stored in mex@data, so you may ask why didn’t we subset the data by
## Remember this is incorrect
cdmx <- mex@data$CVE_ENT=="09"
plot(mex[cdmx,],col="grey")
The reason is the following: recall a shapefile is composed of 3 files. The polygons, the information, and the index. Well, if we promnpt ‘mex@data$CVE_ENT==“09”’, we are going to have the whole map but with missing information.With ‘mex$CVE_ENT==“09”’ we subset the three files.
Finally, suppose that we want to stratify states by their population. We´re going to assign each state to a stratum depending on the percentile they belong to, 0-33.33, 33.34-66.66 or 66.67-100 and we want each stratum to have a different color.
First we calculate the .33 and .66 percentile so we can split the data
quantile(info$Total,c(.33,.66))
## 33% 66%
## 1899415 3453925
If the population of a state is less than 1,899,415 we assign it to the 1st stratum, if it’s between 1,899,416 and 3,453,925 we assign it to 2nd stratum and if it is bigger than 3,453,925 it is assigned to the 3rd stratum. Here is a way to do this:
first <- mex$Total <= quantile(info$Total,c(.33,.66))[1]
second <- mex$Total > quantile(info$Total,c(.33,.66))[1] & mex$Total <= quantile(info$Total,c(.33,.66))[2]
third <- mex$Total > quantile(info$Total,c(.33,.66))[2]
plot(mex,main="Pupulation of Mexico 2005")
plot(mex[first,],col="cadetblue2",add=T)
plot(mex[second,],col="chartreuse3",add=T)
plot(mex[third,],col="firebrick1",add=T)
legend("bottomleft", legend=c("Low Pop", "Avg Pop","High Pop"),
col=c("cadetblue2","chartreuse3", "firebrick1"),pch=15,bty="n")
Leaflet is an open source JavaScript library for interactive maps. It can be used in many programming languages and fortunately one of them is R. Leaflet is great because it is powerful and easy to implement. Its weakness is that if you’re using too much data, it tends to be slow and heavy.
You can install leaflet writing the following code:
install.packages("leaflet")
The function leaflet() returns a canvas you can work on. addTiles() function puts a map of the world in the canvas. Write leaflet() %>% addTiles() at the prompt with no arguments to see the result. Also, leaflet() takes as an argument the data, but first we have to explain something.
Leaflet does not understand our spacial polygons data as they are. Before we can use them we must transform them. ‘rgdal’ library also provides the spTransform() function which makes the data understandable for leaflet.
The next easy step is to transform the data (remember we previously load rgdal with library(rgdal))
mexTransfor <- spTransform(mex, CRS("+init=epsg:4326"))
So even though our data didn’t change, it is now ready to use.
Some important features of Leaflet (like centering your map or adding markers) must be entered as latitude, longitude coordinates.A useful function for getting this information is geocode() from ggmap library.
you can install ggmap with the following code:
install.packages('ggmap')
geocode() allows you to find coordinates of places. It does this by looking up on Google maps so you can enter practically any place as you would do at google maps. The output of geocode is a list with the longitude ($lon) and latitude ($lat)
We are making a map of Mexico so we want our map to be centered around Mexico. Also, we would like to place a marker in the state with the lowest population (Baja California Sur).
library(ggmap)
## Loading required package: ggplot2
mex_loc <- geocode("Mexico")
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Mexico&sensor=false
BCS_loc <- geocode("Baja california Sur")
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Baja%20california%20Sur&sensor=false
print(paste('Mexico coordinates are: Longitude:',mex_loc$lon,'Latitude:',mex_loc$lat))
## [1] "Mexico coordinates are: Longitude: -102.552784 Latitude: 23.634501"
This two variables, mex_loc and BCS_loc, contain the latitude an longitude of each place.
The following code is a general template for a map. addpolygons() allow us to create the polygons over the map and addMarkers to put up the markers on the map. The popup argument lets us display information when you click on the object.
library(leaflet)
p1 <- leaflet(mexTransfor) %>% addTiles() %>% setView(lng = mex_loc$lon, lat = mex_loc$lat,zoom=5)%>%
addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5,
popup=~paste(NOM_ENT,'Pop=',Total),
fillColor = ~colorQuantile("YlOrRd", Total)(Total),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE))%>%
addMarkers(BCS_loc$lon,BCS_loc$lat ,popup='Lowest population')
p1