The data used for this network visualization was downloaded as two separate CSV files from this source
require(igraph)
require(classInt)
require(curl)
vertices <- read.csv(curl("https://stacks.stanford.edu/file/druid:mn425tz9757/orbis_nodes_0514.csv"))
edges <- read.csv(curl("https://stacks.stanford.edu/file/druid:mn425tz9757/orbis_edges_0514.csv"))
The network visualization did not work with the data as it was listed in the CSV files because some of the coordinates were blank. When the missing values were set to 0, the network visualization became skewed with the missing values in one corner and other points in a giant cluster away from the missing values. The best solution, therefore, was to remove the missing values from the data.
vertices2 <- vertices
#Set NA values to 0
vertices2[is.na(vertices2)] <- 0
#Create igraph object
orbis <- graph.data.frame(edges, vertices=vertices2, dir=TRUE)
#Remove vertices with missing coordinates
vertices.null <- V(orbis)[V(orbis)$x=='0']
orbis <- orbis - vertices.null
The downloaded data is very large, and the test plot makes it impossible to see anything in the visualization. Therefore, further work will need to be done to make this more meaningful.
plot(orbis)
The network is very large and it is hard to distinguish points from each other. I decided to experiment by changing the layout
plot(orbis.expense, layout=layout.circle, vertex.color="dark green", vertex.label=NA)
#Test plot using another layout option
plot(orbis, layout=layout.fruchterman.reingold, vertex.label=NA, vertex.size=3, edge.arrow.size=0)
I was interested in looking at the expense of going from one point to another within the network and visualizing the network accordingly. I used the Fisher-Jenks calculation from the classInt to detect natural breaks.
edgesb <- edges
breaks <- classIntervals(edges$expense, n=5, style="fisher")$brks
I then created a column numbered 1 through 5 with 5 as the least expensive (most accessible) and 1 as the most expensive (least accessible).
br <- numeric(nrow(edgesb))
edgesb <- data.frame(edgesb, br)
#Assign values to new column with subsetting
edgesb$br[edges$expense < breaks[2]] <- 5
edgesb$br[edges$expense < breaks[3] & edges$expense > breaks[2]] <- 4
edgesb$br[edges$expense < breaks[4] & edges$expense > breaks[3]] <- 3
edgesb$br[edges$expense < breaks[5] & edges$expense > breaks[4]] <- 2
edgesb$br[edges$expense < breaks[6] & edges$expense > breaks[5]] <- 1
This part of the code creates an igraph object as before. Even though I knew the data was directed in the original format, I decided to do an undirected visualization because it gave me the freedom to modify more aspects such as edge width. Some of the visualization changes that I made on the directed version were not incorporated into the plot.
orbis.expense <- graph.data.frame(edgesb, vertices=vertices2, dir=FALSE)
#Repeat process of removing missing data
vertices.null.exp <- V(orbis.expense)[V(orbis.expense)$x=='0']
orbis.expense <- orbis.expense - vertices.null.exp
This plot sets the width of the edges to be equal to the categories calculated above.The vertex size has also been reduced to make the change in the width more visible.
#Change width property to equal the categories calculated above
E(orbis.expense)$width <- E(orbis.expense)$br
plot(orbis.expense, vertex.label=NA, vertex.size=2)
I decided to make edges that represent road travel brown and all others light blue. I also made a plot with the node size equal to their degree in order to make the better connected nodes stand out more. Ultimately, the size of the network makes any more meaningful visualizations difficult, unless there was a way to project this onto a leaflet map.
E(orbis.expense)$color <- "light blue"
E(orbis.expense)[ type == 'road']$color <- "brown"
plot(orbis.expense, vertex.label=NA, vertex.size=3, vertex.shape='square', vertex.color='dark green')
plot(orbis.expense, vertex.label=NA, vertex.size=degree(orbis.expense), vertex.shape='square', vertex.color='dark green')