1 Load Packages

The only package we use is igraph which is available on CRAN and maintained by Gábor Csárdi.

library("igraph")
## Warning: package 'igraph' was built under R version 3.3.3
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union

2 Spatial Example - Wetland connectivity

For this example, we will use a set of 81 prairie wetlands for which we have location information. The file ‘wetland_centroids.csv’ includes a unique identifier (‘BAS_NUM’), an assigned hydroregime(‘BAS_REGIME’), and coordinates for the centroid (‘X_COORD’ and ‘Y_COORD’ in UTM) for each wetland.

The Universal Transever Mercator (UTM) coordinate system uses meters as the base unit, and thus is handy for making measurements. Information about the UTM coordinate system can be found at http://wiki.gis.com/wiki/index.php/Universal_Transverse_Mercator

3 Import Spatial Data

We will first use the list of coordinates for wetland centroids. Other options will be discussed below.

locs <- read.csv("wetland_centroids.csv", header=T)
dat <- locs
dat$id <- row.names(dat)

4 Create Adjacency Matrix

Create an adjacency matrix with the Distance Matrix Computation (‘dist{stats}’) and the coodinates of the weltand centroids. This computes the distance between all pairs of wetland centroids and results in a distance matrix (in meters).

d <- dist(cbind(dat$X_COORD,dat$Y_COORD))
d.m <- as.matrix(d)

d.m.m <- d.m #create a working copy of the distance matrix for further manipulation

Note: In this distance matrix, all nodes (wetlands) are connected. This can be adjusted by changing values to ‘0’, which indicate that the nodes are not connected.

5 Set Critical Distance Limit

Use a critical distance limit, based on dispersal distance or other criteria, to assign which nodes (wetlands) are connected. Note: The critical distance must be in meters (to match the units in the UTM coordinate system)

c.dist <- 500 #500 m = 0.5 km

d.m.m[d.m.m>c.dist]=0 #sets adjacency to '0' or not connected for distances greater than the critical distance

6 Create Network from the Adjacency Matrix

Create a network for the adjaceny matrix which has been modified based on the critical distance limit (assigned above). Note that the functions ‘graph.adjacency()’ and ‘graph_from_adjacency_matrix()’ both work for this one-mode network matrix.

net1 <- graph.adjacency(d.m.m,mode=c("undirected"),weighted=T)
net1a <- graph_from_adjacency_matrix(d.m.m,mode=c("undirected"),weighted=T)

7 Visualize/Plot the Network

Spatial networks are typically displayed based on the spatially explicit locations of the nodes. Thus, we will define the layout (‘wet.layout’) based on the coordinates of the wetland centroids.

wet.layout <- as.matrix(cbind(dat$X_COORD,dat$Y_COORD))

plot(net1, layout=wet.layout)

The default plot is not very pretty. Several options to try changing are: ‘vertex.size’, ‘vertex.color’, ‘vertex.label’, ‘frame’, and adding text for the axes. For example:

plot(net1,layout=wet.layout, vertex.size=3, vertex.color="red", vertex.label=NA, frame=TRUE, 
     xlab="Easting (m)", ylab="Northing (m)", main=paste("Critical distance =",c.dist/1000, "km"))

8 Metrics for Network and Nodes

Many of the basic metrics are relevant for this example of wetland connectivity.

At the scale of the network, we can calculate:

lth <- length(V(net1)) #number of nodes (wetlands)
lth
## [1] 81
den <- graph.density(net1) #number of links / number of possible links
den
## [1] 0.09876543
dia <- diameter(net1) #maximum distance (m) of the shortest paths between pairs of nodes
dia
## [1] 3875.819
apl <- average.path.length(net1) #average number of edges between pairs of nodes
apl
## [1] 3.385053
clc <- transitivity(net1, type="global") #graph level property: For ‘global’ a single number, or NaN if there are no connected triples in the graph
clc
## [1] 0.7457114

For each node, we can calculate:

dgr <- degree(net1)
dgr 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  3  0  3  2  1  0  3  6  2  4  2  4  4  3  3  4  1  9 10  2  6  4  0  5 11 
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
## 13  6 14  9 12  8  6  7  6 13  7 10 15  5  6  8  8 12 14  6  3 13 10 13 13 
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
## 11 12 12 13 12 14 13 13 14 11  9 10  8  8 12 11  7 11 12  8 12 12 12 13 11 
## 76 77 78 79 80 81 
##  2  1 10 11  6  0
dgr.dist <- degree.distribution(net1, cumulative = TRUE, mode="all") #distribution of degree across all nodes in the network
plot( x=0:max(dgr), y=1-dgr.dist, pch=19, cex=1.2, col="orange", 
      xlab="Degree", ylab="Cumulative Frequency")

lcc <- transitivity(net1, type="local") #node-level: For ‘local’ a vector of transitivity scores, one for each node

btw <- betweenness(net1)

9 Identify and Plot Critical Network Components

The above metrics can be used to identify critical components of the network.

First, we can find the path associated with the graph diameter, ‘diameter()’ returns the distance and ‘get_diameter()’ or ‘get.diameter()’ returns the nodes along the first path found with that distance.

g.dia <- get.diameter(net1)
g.dia
## + 12/81 vertices, named, from 6b5e5ca:
##  [1] 39 40 41 27 24 31 38 50 58 63 76 77

By modifying the attributes, we can visualize these nodes and edges associated with the graph diameter:

E(net1)$color <- "gray"
E(net1)$width <- 1
E(net1, path=g.dia)$color <- "red"
E(net1, path=g.dia)$width <- 2
V(net1)$label.color <- "blue"
V(net1)$color <- "SkyBlue2"
V(net1)[g.dia]$label.color <- "black"
V(net1)[g.dia]$color <- "red"

plot(net1, layout=wet.layout, vertex.size=3, vertex.label=NA, frame=TRUE, 
     xlab="Easting (m)", ylab="Northing (m)", main=paste("Critical distance =", c.dist/1000, "km"))

legend("topleft", legend=c("Wetlands (nodes)", "Edges less than 0.5 km", paste("Graph Diameter =", round(dia,1), "m")) ,
           pch=c(21, NA, NA), pt.bg=c("SkyBlue2",NA, NA), pt.cex=1.5, lty=c(NA, 1, 1), lwd=c(NA, 1, 2), seg.len=1, col=c("black", "gray", "red"), cex=0.8, bty="n", ncol=1)

Similarly, we can identify and highlight nodes with high betweeness centrality.

sorted.btw.rank <- order(betweenness(net1), decreasing = TRUE)
n <- 20 #higlight top 20 node

b <- sorted.btw.rank[1:n]

E(net1)$color <- "gray"
E(net1)$width <- 1
V(net1)$label.color <- "blue"
V(net1)$color <- "light blue"
V(net1)[b]$label.color <- "black"
V(net1)[b]$color <- "red"
plot(net1, layout=wet.layout, vertex.size=3, 
            #vertex.color=msum$nodecolor1, 
            vertex.label=NA, frame=TRUE, xlab="Easting (m)", ylab="Northing (m)", 
            main=paste("Critical distance =", c.dist/1000, "km"))

legend("topleft", legend=c("Wetlands (nodes)", "Edges less than 0.5 km", "Top 20 stepping stones"),
           pch=c(21, NA, 21), pt.bg=c("SkyBlue2",NA, "red"), pt.cex=1.5, lty=c(NA, 1, NA), lwd=c(NA, 1, NA), seg.len=1, col=c("black", "gray", "black"), cex=0.8, bty="n", ncol=1)

10 Clusters of Connected Components

There are multiple options for looking for subgroups, communities, or clusters in a network. Here we will use the igraph ‘clusters()’ function to check for clusters of connected components.

comps <- clusters(net1)$membership
cls <- no.clusters(net1)
cls
## [1] 8

Then, we can summarize the number of nodes (wetlands) within each cluster.

graph.clusters <- clusters(net1)
summary(graph.clusters$csize) #min, max, median, mean, IQR 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    2.00   10.12   10.25   51.00
graph.clusters$csize #number of nodes in each cluster
## [1]  9  1  1  3 51  1 14  1
meanclssize <- mean(graph.clusters$csize) #mean number of nodes in each cluster
meanclssize
## [1] 10.125

Finally, we can view the clusters within the network.

samcol <- rainbow(max(comps))
V(net1)$color <- samcol[comps]

plot(net1,layout=wet.layout, vertex.size=5, vertex.label=NA, frame=TRUE, 
     xlab="Easting (m)", ylab="Northing (m)", main=paste("Critical distance =", c.dist/1000, "km"))

plot(net1, layout=wet.layout, vertex.size=5, vertex.label.cex=0.6, vertex.label.dist=2)

Other clustering and community detection methods include identification of cliques (complete subgraphs of an undirected graph) and Community detection based on edge betweenness, propagating labels, or greedy optimization of modularity

For example, we can create a subgraph from one of clusters…

V(net1)$membership <- comps
net1.sub <- induced_subgraph(net1, vids=which(V(net1)$membership==1), impl ="copy_and_delete")

dat.sub <- subset(dat, id %in% c(V(net1.sub)$name))

wet.layout.sub <- as.matrix(cbind(dat.sub$X_COORD,dat.sub$Y_COORD))

plot(net1.sub, layout=wet.layout.sub, vertex.label.cex=1, vertex.label.dist=2)

And then evaluate cliques and communities within this cluster

clqs <- cliques(net1.sub) #Identification of cliques (complete subgraphs of an undirected graph)
sapply(cliques(net1.sub), length)
##  [1] 1 1 1 2 1 2 1 2 1 2 3 2 1 2 3 2 1 2 3 4 3 2 3 2 2 1 2 3 4 3 2 3 2
cl <- largest_cliques(net1.sub)

E(net1.sub)$color <- "gray"
E(net1.sub)$width <- 1
V(net1.sub)$label.color <- "blue"
V(net1.sub)$color <- "SkyBlue2"
V(net1.sub)[cl[[1]]]$label.color <- "black"
V(net1.sub)[cl[[1]]]$color <- "gold"

plot(net1.sub, layout=wet.layout.sub, vertex.label.cex=0.6, main="Largest Clique (complete subgraph)")

ceb <- cluster_edge_betweenness(net1.sub) #Community detection based on edge betweenness (Newman-Girvan)
## Warning in cluster_edge_betweenness(net1.sub): At community.c:
## 460 :Membership vector will be selected based on the lowest modularity
## score.
## Warning in cluster_edge_betweenness(net1.sub): At community.c:
## 467 :Modularity calculation with weighted edge betweenness community
## detection might not make sense -- modularity treats edge weights as
## similarities while edge betwenness treats them as distances
dendPlot(ceb, mode="hclust")

plot(ceb, net1.sub, layout=wet.layout.sub, main="Communities (based on edge betweenness)")

clp <- cluster_label_prop(net1.sub) #Community detection based on propagating labels
plot(clp, net1.sub, layout=wet.layout.sub, main="Communities (based on propagating labels)")

cfg <- cluster_fast_greedy(net1.sub) #Community detection based on greedy optimization of modularity
plot(cfg, net1.sub, layout=wet.layout.sub, main="Communities (based on greedy optimization of modularity)")

V(net1.sub)$community <- cfg$membership
colrs <- rainbow(10) 
plot(net1.sub, layout=wet.layout.sub, vertex.color=colrs[V(net1.sub)$community], main="Communities (based on greedy optimization of modularity) - Plot 2")

11 Importing Data: Edge List Approach

The edge list approach can also be used for creating a spatial network.

In this example we will import an edge list that was created in ArcGIS: ‘wetland_distances.csv’. This edge list has the distance between wetlands based on the wetland edges instead of the wetland centroids.

We will also import the list of nodes (‘wetland_centroids.csv’) and add fields for ‘id’ and ‘label’.

n.dist <- read.csv("wetland_distances.csv", header=T)

n <- length(unique(n.dist$BAS_NUM)) #check the number of nodes in the edge list
n
## [1] 81
locs <- read.csv("wetland_centroids.csv", header=T)

dat2 <- locs
dat2$id <- as.integer(as.character(dat2$BAS_NUM))
dat2$label <- as.character(dat2$BAS_NUM)

Follow the same approach as above, set a critical distance limit (in meters) and modify the edge list to remove the edges that are longer that critical distance. Before creating a network we also create ‘from’, ‘to’, and ‘weight’ fields from the original variables.

c.dist <- 500

wet.edge <- subset(n.dist, distance <= c.dist)
  
wet.edge$from <- wet.edge$BAS_NUM
wet.edge$to <- wet.edge$comp_ID
wet.edge$weight <- wet.edge$distance

Create and plot the network using the edge list and node list.

net.wet <- graph_from_data_frame(d = wet.edge, vertices = dat2, directed = FALSE)

plot(net.wet, layout=wet.layout,vertex.size=3, frame=TRUE, edge.curved=0, vertex.label.cex=0.6, vertex.label.dist=NA,
     xlab="Easting (m)", ylab="Northing (m)", main=paste("Critical distance =", c.dist/1000, "km"))

Create a directed graph, with seasonal and semipermanent wetlands as ‘sources’, and temporary wetlands as ‘sinks’.

dat.ssp <- subset(dat2, BAS_REGIME != "TEMPORARY") 

wet.edge.dir <- subset(wet.edge, from %in% c(dat.ssp$BAS_NUM))
  
net.wet2 <- graph_from_data_frame(d = wet.edge.dir, vertices = dat2, directed = TRUE)

V(net.wet2)$regime <-dat2$BAS_REGIME
colrs <- c("SkyBlue2", "blue", "gold") 

plot(net.wet2, edge.arrow.size=.3, layout=wet.layout,vertex.size=4, frame=TRUE, edge.curved=0,vertex.color=colrs[V(net.wet2)$regime], vertex.label.cex=0.6, vertex.label.dist=NA,
     xlab="Easting (m)", ylab="Northing (m)", main=paste("Critical distance =", c.dist/1000, "km"))

legend("topleft", legend=c("Wetlands (nodes)", "Temporary", "Seasonal", "Semipermanent", "Edges less than 0.5 km"), pch=c(NA, 21, 21, 21, NA), pt.bg=c(NA, "gold", "SkyBlue2","blue", NA),  pt.cex=1.5, lty=c(NA, NA, NA, NA, 1), lwd=c(NA, NA, NA, NA, 1), seg.len=1, col=c(NA, "black", "black","black","black","gray"), cex=0.8, bty="n", ncol=1)

12 Importing Data: Starting with GIS layer or shapefile

The details of working with spatial data are not covered in the this workshop; however, this method is mentioned because it is possible (and often very handy) to start a network analysis of spatial data with a GIS layer or shapefile.

The package ’maptools’can be used for reading and handling spatial objects in R. THis package is available on CRAN and maintained by Roger Bivand.

For this example, we successfully used ‘maptools’ to import a shapefile of the wetlands (polygons) that included UTM coordinates for the wetland centroids. These centroids were then used to create the distance matrix (as described in Create Adjacency Matrix section).

13 Options for Further Practice

  1. Identify and plot cut-points in a network using the igraph function ‘articulation.points()’. Cut-points are pivotal points of articulation between elements that make up a component.

  2. Remove selected nodes and re-create network and metrics. Removal could be based on a nodes being cut-points, nodes with high (or low) degree or betweenness centrality, a specific hydroregime (e.g. temporary wetlands), or other criteria.

  3. Find the critical distance limit at which network coalescence is achieved (network consists on only 1 component). Although there is surely a way to calculate this, it can be found simply by changing the critical distance limit and re-running the ‘Clusters of Connected Components’ until there is only 1 cluster.

  4. Compare network and metrics for distances between wetlands based on wetland centroids (like network ‘net1’) versus wetland edges (like network ‘net.wet’)