Setup: Install/Load Packages

library(knitr)
library(rmarkdown)
library(maps)
## 
##  # ATTENTION: maps v3.0 has an updated 'world' map.        #
##  # Many country borders and names have changed since 1990. #
##  # Type '?world' or 'news(package="maps")'. See README_v3. #
library(geosphere)
## Loading required package: sp

Step 1: Draw Base Maps Using map() Function from maps Package

map("state")  

Notice that this map only maps the continental United States, i.e., leaving out Hawaii and Alaska. To include these states, or if you need to include non-US locations, call the map() function with the argument “world”.

map("world")

Step 2: Setting Appropriate Boundaries

Often, you need to focus on a limited subarea of the map above. The extent of the subarea will obviously depend on the data you are working with. In this example, we are dealing with domestic flights for the U.S., so we ca use the xlim and ylim variable to limit the map to a rectagule that covers a range of given latitude and longitude data.

We can also use set the “fill” option in the map() function to TRUE to fill regions with specific colors.

xlim <- c(-171.738281, -56.601563)
ylim <- c(12.039321, 71.856229)
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)

Step 3: Drawing Connections

Connecting lines of a geographical nature on a map is done using the gcIntermediate() function from the geosphere package. It is important to know that connection, while pictorially mapped onto a 2-dimensional map, is in fact curved to account for the spherical nature implied by the input data, namely the latitude and longitude data for the endpoints.

Industrial Strength Demo

We load a dataset to represent airports (nodes); and a dataset to represent flights (edges). These datasets are available courtesy of the fine folks at flowingdata.com, who took data available from the Bureau of Transportation Statistics and Data Expo 2009, and and transformed them into .csv files.

Loading Flight/Airport Data

airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header=TRUE) 
flights <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv", header=TRUE, as.is=TRUE)

Draw Multiple Connections

We want to map all connections for American Airlines. Each iteration of the loop maps two airports and all connections between them.

map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
 
fsub <- flights[flights$airline == "AA",]
for (j in 1:length(fsub$airline)) {
    air1 <- airports[airports$iata == fsub[j,]$airport1,]
    air2 <- airports[airports$iata == fsub[j,]$airport2,]
     
    inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
             
    lines(inter, col="black", lwd=0.8)
}

Introduce Color for Clarity

The resulting graphic from the above chunk results in a network of flights that is of limited interpretability. We can improve this aspect by introducing colors. More specificaly, we can use a function called colorRampPalette() to indicate how many shades (or transparency) depending on the number of flights between two given nodes. Hence, the the darker the line, the more traffic there is between the two nodes at the ends, and vice versa. The code is as follows:

pal <- colorRampPalette(c("#f2f2f2", "black"))
colors <- pal(100)
 
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
 
fsub <- flights[flights$airline == "AA",]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
    air1 <- airports[airports$iata == fsub[j,]$airport1,]
    air2 <- airports[airports$iata == fsub[j,]$airport2,]
     
    inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
    colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
             
    lines(inter, col=colors[colindex], lwd=0.8)
}

The problem with this graphic, which isn’t apparent from the graphic itself, is that longer, less prominent flight obscure the more popular connections. Take a look at the the flights dataset.

head(flights)
##   airline airport1 airport2 cnt
## 1      AA      DFW      SJU 120
## 2      AA      MSP      DFW 326
## 3      AA      LGA      ORD 860
## 4      AA      TPA      JFK  56
## 5      AA      STT      BOS  44
## 6      AA      PHX      DFW 550

Notice that the dataset isn’t ordered by the number of connections (the ‘cnt’ feature). We fix this problem simply by ordering the connections from least to greatest counts, which results in a less fragmented representation.

pal <- colorRampPalette(c("#f2f2f2", "black"))
colors <- pal(100)
 
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
 
fsub <- flights[flights$airline == "AA",]
fsub <- fsub[order(fsub$cnt),]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
    air1 <- airports[airports$iata == fsub[j,]$airport1,]
    air2 <- airports[airports$iata == fsub[j,]$airport2,]
     
    inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
    colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
             
    lines(inter, col=colors[colindex], lwd=0.8)
}