Directed igraph objects

create a directed graph from a dataframe using the graph_from_data_frame() function. ensure that it will be a directed graph by setting the second argument to TRUE

Get the graph object from a data frame.

measles %>% head()
##   from to
## 1   45  1
## 2   45  2
## 3  172  3
## 4  180  4
## 5   45  5
## 6  180  6
# Get the graph object
g <- graph_from_data_frame(measles, directed = T)

Check if the graph object is directed or weighted

inspect whether a graph object is directed and/or weighted

# is the graph directed?
is.directed(g)
## [1] TRUE
# Is the graph weighted?
is.weighted(g)
## [1] FALSE

Check where each edge originates from

Subset each vertex from which each edge originates by using head_of(). This function takes two arguments, the first being the graph object and the second the edges to examine. For all edges you can use E(g).

## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17  18  19  20  21 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  61  62  63 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 184 185 186 187 
##   1   1   1   1

Identifying edges for each vertex

Identifying edges for each vertex how to determine if an edge exists between two vertices as well as finding all vertices connected in either direction to a given vertex.

# Make a basic plot
plot(g, 
     vertex.label.color = "black", 
     edge.color = 'gray77',
     vertex.size = 0,
     edge.arrow.size = 0.1,
     layout = layout_nicely(g))

# Is there an edge going from vertex 184 to vertex 178?
g['184', '178']

## [1] 1
# Is there an edge going from vertex 178 to vertex 184?
g['178', '184']
## [1] 0
incident() function
# Show all edges going to or from vertex 184
incident(g, '184', mode = c("all"))
## + 6/184 edges from 8d31a69 (vertex names):
## [1] 184->45  184->182 184->181 184->178 184->183 184->177
# Show all edges going out from vertex 184
incident(g, '184', mode = c("out"))
## + 6/184 edges from 8d31a69 (vertex names):
## [1] 184->45  184->182 184->181 184->178 184->183 184->177
# Show all edges going in to vertex 184
incident(g, '184', mode = c("in"))
## + 0/184 edges from 8d31a69 (vertex names):

Neighbors

Often in network analysis it is important to explore the patterning of connections that exist between vertices. One way is to identify neighboring vertices of each vertex. You can then determine which neighboring vertices are shared even by unconnected vertices indicating how two vertices may have an indirect relationship through others.

using neighbors() function to identify neighbors and shared neighbors between pairs of vertices.

neighbors() function
# Identify all neighbors of vertex 12 regardless of direction
neighbors(g, '12', mode = c('all'))
## + 5/187 vertices, named, from 8d31a69:
## [1] 45  13  72  89  109
# Identify other vertices that direct edges towards vertex 12
neighbors(g, '12', mode = c('in'))
## + 1/187 vertex, named, from 8d31a69:
## [1] 45
# Identify any vertices that receive an edge from vertex 42 and direct an edge to vertex 124
n1 <- neighbors(g, '42', mode = c('out'))
n2 <- neighbors(g, '124', mode = c('in'))
intersection(n1, n2)
## + 1/187 vertex, named, from 8d31a69:
## [1] 7

Distances between vertices

The inter-connectivity of a network can be assessed by examining the number and length of paths between vertices. A path is simply the chain of connections between vertices. The number of intervening edges between two vertices represents the geodesic distance between vertices. Vertices that are connected to each other have a geodesic distance of 1. Those that share a neighbor in common but are not connected to each other have a geodesic distance of 2 and so on. In directed networks, the direction of edges can be taken into account. If two vertices cannot be reached via following directed edges they are given a geodesic distance of infinity.

Find the longest paths between vertices in a network;

Discern those vertices that are within connections of a given vertex. For disease transmission networks such as the measles dataset this helps you to identify how quickly the disease spreads through the network.

Path Length & diameter of the vertices

We can find the diameter in any network using farthest_vertices(), which will return the diameter diatance and the two vertices at either end of the longest path.

get_diameter(g) will return the exact sequence of connections. If there is more than one longest path, these functions will return only one of the possible longest paths.

# Which two vertices are the furthest apart in the graph ?
farthest_vertices(g) 
## $vertices
## + 2/187 vertices, named, from 8d31a69:
## [1] 184 162
## 
## $distance
## [1] 5
# Shows the path sequence between two furthest apart vertices.
get_diameter(g)
## + 6/187 vertices, named, from 8d31a69:
## [1] 184 178 42  7   123 162
ego() function

Use ego() to find all vertices that are reachable within 2 connections of vertex 42 and then those that can reach vertex 42 within two connections. The first argument of ego() is the graph object, the second argument is the maximum number of connections between the vertices, the third argument is the vertex of interest, and the fourth argument determines if you are considering connections going out or into the vertex of interest.

# Identify vertices that are reachable within two connections from vertex 42
ego(g, 2, '42', mode = c('out'))
## [[1]]
## + 13/187 vertices, named, from 8d31a69:
##  [1] 42  7   106 43  123 101 120 124 125 128 129 108 127
# Identify vertices that can reach vertex 42 within two connections
ego(g, 2, '42', mode = c('in'))
## [[1]]
## + 3/187 vertices, named, from 8d31a69:
## [1] 42  178 184

Identifying key vertices

Perhaps the most straightforward measure of vertex importance is the degree of a vertex. The out-degree of a vertex is the number of other individuals to which a vertex has an outgoing edge directed to. The in-degree is the number of edges received from other individuals. In the measles network, individuals that infect many other individuals will have a high out-degree. In the below exercise we will identify whether individuals infect equivalent amount of other children or if there are key children who have high out-degrees and infect many other children.

# Calculate the out-degree of each vertex
g.outd <- degree(g, mode = c("out"))


# View a summary of out-degree
# View a summary of the out-degrees of all individuals using the function table() on the vector object g.outd.
table(g.outd)
## g.outd
##   0   1   2   3   4   6   7   8  30 
## 125  21  16  12   6   2   3   1   1
# Make a histogram of out-degrees
hist(g.outd, breaks = 30)

which.max() function

Determine which vertex has the highest out-degree in the network using the function which.max() on the vector object g.outd.

# Find the vertex that has the maximum out-degree
which.max(g.outd)
## 45 
##  1

Betweenness

Another measure of the importance of a given vertex is its betweenness. This is an index of how frequently the vertex lies on shortest paths between any two vertices in the network. It can be thought of as how critical the vertex is to the flow of information through a network. Individuals with high betweenness are key bridges between different parts of a network. In our measles transmission network, vertices with high betweenness are those children who were central to passing on the disease to other parts of the network. In this exercise, you will identify the betweenness score for each vertex and then make a new plot of the network adjusting the vertex size by its betweenness score to highlight these key vertices.

# Calculate betweenness of each vertex
g.b <- betweenness(g, directed = T)

# Show histogram of vertex betweenness
hist(g.b, breaks = 80)

# Create plot with vertex size determined by betweenness score
plot(g, 
     vertex.label = NA,
     edge.color = 'black',
     vertex.size = sqrt(g.b)+1,
     edge.arrow.size = 0.05,
     layout = layout_nicely(g))

Visualizing important nodes and edges

One issue with the measles dataset is that there are three individuals for whom no information is known about who infected them. One of these individuals (vertex 184) appears ultimately responsible for spreading the disease to many other individuals even though they did not directly infect too many individuals. However, because vertex 184 has no incoming edge in the network they appear to have low betweenness. One way to explore the importance of this vertex is by visualizing the geodesic distances of connections going out from this individual. In this exercise you shall create a plot of these distances from this patient zero.

make_ego_graph() function

Use make_ego_graph() to create a subset of our network comprised of vertices that are connected to vertex 184. The first argument is the original graph g. The second argument is the maximal number of connections that any vertex needs to be connected to our vertex of interest. In this case we can use diameter() to return the length of the longest path in the network. The third argument is our vertex of interest which should be 184. The final argument is the mode. In this instance you can include all connections regardless of direction.

# Make an ego graph
g184 <- make_ego_graph(g, diameter(g), nodes = '184', mode = c("all"))[[1]]
distances() function

Create an object dists that contains the geodesic distance of every vertex from vertex 184. Use the function distances() to calculate this.

# Get a vector of geodesic distances of all vertices from vertex 184 
dists <- distances(g184, "184")
dists
##     45 180 42 182 12 181 22 10 31 34 17 93 178 184 8 56 58 186 11 19 64 179 54
## 184  1   2  2   1  2   1  2  2  2  2  2  2   1   0 2  2  2   2  2  2  2   2  3
##     74 5 78 39 82 44 1 47 183 97 7 21 37 106 16 116 14 79 4 6 145 148 153 73
## 184  2 2  2  2  2  2 2  2   1  2 3  2  2   3  2   2  2  3 3 3   2   3   2  3
##     156 68 123 102 98 169 177 2 9 13 15 20 23 24 25 26 27 28 29 30 32 33 35 36
## 184   2  3   4   3  3   3   1 2 2  3  2  2  2  3  3  2  3  3  3  2  2  2  2  3
##     38 40 41 43 46 48 49 50 51 52 53 55 57 59 60 61 62 63 65 66 67 69 70 71 72
## 184  3  3  3  3  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  4  3  3  3
##     75 76 77 80 81 83 84 85 86 87 88 89 90 91 92 94 95 96 99 100 101 103 104
## 184  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  2  3  3  3   3   4   3   3
##     105 107 108 109 111 112 113 114 115 117 118 119 120 121 124 125 126 127 128
## 184   3   3   4   3   3   3   3   3   3   3   3   3   4   3   4   4   3   4   4
##     129 130 131 132 134 135 136 137 138 139 140 142 143 144 149 150 151 152 154
## 184   4   3   3   3   3   4   3   3   3   4   4   3   3   3   3   4   3   3   3
##     155 157 158 159 160 161 162 163 164 165 166 168 170 171 185 187
## 184   4   3   3   3   4   4   5   5   4   4   3   4   3   4   3   3
plot

Assign the attribute color to each vertex. Each color will be selected based on its geodesic distance. The color palette colors is a length equal to the maximal geodesic distance plus one. This is so that vertices of the same distance are plotted in the same color and patient zero also has its own color.

Use plot() to visualize the network g184. The vertex label should be the geodesic distances dists.

# Create a color palette of length equal to the maximal geodesic distance plus one.
colors <- c("black", "red", "orange", "blue", "dodgerblue", "cyan")

# Set color attribute to vertices of network g184.
V(g184)$color <- colors[dists+1]

# Visualize the network based on geodesic distance from vertex 184 (patient zero).
plot(g184, 
     vertex.label = dists, 
     vertex.label.color = "white",
     vertex.label.cex = .6,
     edge.color = 'black',
     vertex.size = 7,
     edge.arrow.size = .05,
     main = "Geodesic Distances from Patient Zero"
     )

Reference

Datacamp: Network Analysis in R by JAMES CURLEY