library(knitr)
library(rmarkdown)
library(maps)
##
## # ATTENTION: maps v3.0 has an updated 'world' map. #
## # Many country borders and names have changed since 1990. #
## # Type '?world' or 'news(package="maps")'. See README_v3. #
library(geosphere)
## Loading required package: sp
map("state")
Notice that this map only maps the continental United States, i.e., leaving out Hawaii and Alaska. To include these states, or if you need to include non-US locations, call the map() function with the argument “world”.
map("world")
Often, you need to focus on a limited subarea of the map above. The extent of the subarea will obviously depend on the data you are working with. In this example, we are dealing with domestic flights for the U.S., so we ca use the xlim and ylim variable to limit the map to a rectagule that covers a range of given latitude and longitude data.
We can also use set the “fill” option in the map() function to TRUE to fill regions with specific colors.
xlim <- c(-171.738281, -56.601563)
ylim <- c(12.039321, 71.856229)
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
Connecting lines of a geographical nature on a map is done using the gcIntermediate() function from the geosphere package. It is important to know that connection, while pictorially mapped onto a 2-dimensional map, is in fact curved to account for the spherical nature implied by the input data, namely the latitude and longitude data for the endpoints.
We load a dataset to represent airports (nodes); and a dataset to represent flights (edges). These datasets are available courtesy of the fine folks at flowingdata.com, who took data available from the Bureau of Transportation Statistics and Data Expo 2009, and and transformed them into .csv files.
airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header=TRUE)
flights <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv", header=TRUE, as.is=TRUE)
We want to map all connections for American Airlines. Each iteration of the loop maps two airports and all connections between them.
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
fsub <- flights[flights$airline == "AA",]
for (j in 1:length(fsub$airline)) {
air1 <- airports[airports$iata == fsub[j,]$airport1,]
air2 <- airports[airports$iata == fsub[j,]$airport2,]
inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
lines(inter, col="black", lwd=0.8)
}
The resulting graphic from the above chunk results in a network of flights that is of limited interpretability. We can improve this aspect by introducing colors. More specificaly, we can use a function called colorRampPalette() to indicate how many shades (or transparency) depending on the number of flights between two given nodes. Hence, the the darker the line, the more traffic there is between the two nodes at the ends, and vice versa. The code is as follows:
pal <- colorRampPalette(c("#f2f2f2", "black"))
colors <- pal(100)
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
fsub <- flights[flights$airline == "AA",]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
air1 <- airports[airports$iata == fsub[j,]$airport1,]
air2 <- airports[airports$iata == fsub[j,]$airport2,]
inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
lines(inter, col=colors[colindex], lwd=0.8)
}
The problem with this graphic, which isn’t apparent from the graphic itself, is that longer, less prominent flight obscure the more popular connections. Take a look at the the flights dataset.
head(flights)
## airline airport1 airport2 cnt
## 1 AA DFW SJU 120
## 2 AA MSP DFW 326
## 3 AA LGA ORD 860
## 4 AA TPA JFK 56
## 5 AA STT BOS 44
## 6 AA PHX DFW 550
Notice that the dataset isn’t ordered by the number of connections (the ‘cnt’ feature). We fix this problem simply by ordering the connections from least to greatest counts, which results in a less fragmented representation.
pal <- colorRampPalette(c("#f2f2f2", "black"))
colors <- pal(100)
map("world", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.05, xlim=xlim, ylim=ylim)
fsub <- flights[flights$airline == "AA",]
fsub <- fsub[order(fsub$cnt),]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
air1 <- airports[airports$iata == fsub[j,]$airport1,]
air2 <- airports[airports$iata == fsub[j,]$airport2,]
inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
lines(inter, col=colors[colindex], lwd=0.8)
}