Human displacement worldwide is at the highest level ever recorded. The United Nation’s refugee agency (UNHCR) recently reported that one in every 122 people on the planet is now a displaced individual (a refugee, an internally displaced individual or an asylum seeker).1
Living through displacement is a traumatic event and demands the effective coordination of aid agencies to assist these individuals. Once families and individuals have moved into new locations, they can often be missed by the formal response and support mechanisms. Of the 14.7 million people who were displaced from their homes in 2010, due to conflict or other natural disasters, the majority (an estimated 52%) ended up living outside the formal humanitarian response mechanism of refugee or IDP camps.2
As humanitarian aid agencies, we need to understand where people are moving; the routes they take; and once they arrive in safer locations, who the key contacts are within the communities (especially as people move into the informal support structure provided by host communities instead of the formal humanitarian response mechanisms).
Having access to this type of knowledge will supplement efforts to assist people during the act of migration and once they are settled (temporarily or otherwise). The motivation for Part 1 of this project is based on a requirement that aid agencies may benefit by being able to predict changes in the migration decisions of migrants. Part 2 is motivated by the fact that, in many cases, aid workers cannot access the most vulnerable in communities. To better understand the social networks that can be formed once people settle, is crucial. By modelling the various phenomena, the objective is to uncover novel methodologies that aid agencies can use to understand the problem, in order to be better prepared to respond appropriately.
Consider the following scenario: The regional capital in a given country is currently at the heart of major conflict between government forces and rebel groups. The city will fall in a matter of days to the rebels and many people are fleeing the city. The direction of flight is given in Figure 1:
Figure 1: Direction of flight, risk and major migration points.3
The focus of Part 1 of this work is motivated by the fact that aid agencies may have multiple opportunities to engage with displaced people as they migrate from one location to another. The challenge, however, is that aid agencies may not know where these people are at any given point in time, especially if alternative routes exist. However, if migration paths can be weighted in some fashion, it may be possible to identify logical ‘shortest path’ migration choices.
In Figure 1, the weights along a given path indicate the cost of travelling that path (i.e. the risk to the migrant). For the purpose of illustration, this was selected to be a integer value from a low of 1 to a maximum risk rating of 12 on any given route. Factors influencing a path or route’s weighting may be things such as the presence of official and unofficial checkpoints (the latter being more dangerous); mined roads; the existence of bandits and people smugglers; roads that are targeted as being of military strategic importance et cetera.
Notice that the act of migration (transitioning from one location to another) is a problem well-suited to network modelling and shortest path calculations.
Understanding the logical routes (paths) that people may take has important implications since it may influence the establishment of humanitarian security corridors or can inform how aid agencies should shift their resources to appropriate locations where people are known to be moving to. For example, by understanding the likely routes people will take, aid agencies may be better informed to establish Way Stations and Transit Centres4 in appropriate locations (stocked and staffed accordingly in those geographic spots where people are expect to arrive).
Given the inherent nature of the problem, Graph Theory was applied to model the decision-making process of people in transition. The nature of a graph is to encode rich semantic meaning. Notice there is structure in the graph used to represent Figure 1. This structure can include directionality if required and certainly weighting for risk on each edge.
Two Dynamic Programming algorithms are presented, namely Dijkstra Shortest Path and Floyd-Warshall algorithms, to determine the shortest path between (any) two vertices in the graph. In this work, the algorithms are implemented from scratch to help the reader understand the logic. Of course many software packages now include libraries to automate the process instead.
The data from Figure 1 was put into an appropriate structure for analyzing the problem. This structure includes the ability to set the existing edge weights for the graph. The matrix structure used in Dijkstra algorithm is given here:
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## C 0 8 10 12 0 0 0 0 0 0
## T1 8 0 2 0 0 0 0 0 12 0
## V1 10 2 0 0 2 0 0 0 0 0
## T2 12 0 0 0 0 5 0 10 0 0
## V2 0 0 2 0 0 1 7 0 0 0
## WS1 0 0 0 5 1 0 0 2 0 0
## WS2 0 0 0 0 7 0 0 2 2 0
## TC1 0 0 0 10 0 2 2 0 5 8
## TC2 0 12 0 0 0 0 2 5 0 3
## RRC 0 0 0 0 0 0 0 8 3 0
Where:
| Acronym | Means: |
|---|---|
| C | Regional Capital |
| T1 | Town 1 |
| T2 | Town 2 |
| V1 | Village 1 |
| V2 | Village 2 |
| WS1 | Way Station 1 |
| WS2 | Way Station 2 |
| TC1 | Transit Centre 1 |
| TC2 | Transit Centre 2 |
| RRC | Refugee Registration Centre |
In the data structure for Dijkstra’s algorithm, a weight of zero indicates no path.
Visualizing the graph in R:
The objective is to determine the lowest cost route between the Regional Capital (green vertex) and the final destination point of the Refugee Registration Centre (the red node).
When applying shortest path algorithms with graphs, the elegance of the processing comes from avoiding computationally expensive traversals of the graph. For example, one brute force strategy would be to compute n (the number of nodes) times the traversal paths for every other node (aka vertex) in the graph and then select the route with the minimal weighting. This would entail a lot of processing. However, with the dynamic programming algorithms being presented, the optimal shortest path solution can grow from finding optimal shortest-path sub-solutions. This helps negate some of the expensive and unnecessary processing espoused by a brute force alternative (i.e. finding the optimal solutions to a smaller problem, then growing the larger solution by relaxing constraints as the path is extended out further and further).
Dutch computer scientist Edward Dijkstra published the algorithm in 1959, which solves the single shortest path for a graph with non-negative edge paths, producing a shortest path tree. The pseudo-code is given in the references.
Implementation of Dijkstra’s algorithm O(\({ n }^{ 2 }\))5:
dijkstra <- function(g, start) {
# Set all distances to infinity
distance_g <- rep(Inf, length = vcount(g))
names(distance_g) <- V(g)$name
# Set distance to source vertex to zero
distance_g[start] <- 0
# Q, the queue that initially contains all the vertices
Q <- V(g)$name
# Previous node in optimal path
previous <- rep(NA, length = vcount(g))
names(previous) <- V(g)$name
# While queue is not empty
while(length(Q) > 0) {
# Select the element of Q with the minimum distance
min <- distance_g[Q[1]]
if ( length(Q) > 1 ) {
for (i in 2:length(Q)) {
if (min > (distance_g[Q[i]])) {
min <- distance_g[Q[i]]
}
}
}
u <- names(min)
# Remove u from list of visited vertices
Q <- Q[which(Q!=u)]
# for every neighbour (v) of u
for (v in neighbors(g,u)) {
alt <- distance_g[u] + g[u,v] # weight going from u to it's neighbour v
# if shortest path is found (i.e. weight on neighbour is greater than going through vertex u)
if (alt < distance_g[v]) {
distance_g[v] <- alt
previous[v] <- u
}
}
}
return( list(shortPath = distance_g, previous = previous, lastVertex = u))
}
To determine the shortest path from the Regional Capital (C) to the Refugee Registration Centre (RRC):
results <- dijkstra(g, start = "C")
The shortest path between the source vertex (fixed) and other vertices in the graph is given by:
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## 0 8 10 12 12 13 17 15 19 22
Important note: This does not imply shortest distances from any arbitrary starting point. For example, the shortest starting point from T1 to TC2 is actually an indirect path with cost of 11,6 not 19 as a tertiary glance of output may suggest. This output says that the lowest cost from “C” to TC2 is 19.
What is returned from Dijkstra’s algorithm is information on the shortest path from a single source to every other vertex on the graph (an optimal sub-graph). The Floyd-Warshall algorithm, to be presented shortly, will provide information on all the vertices in one run of the algorithm. This is a substantial difference between the two algorithms.
The final lowest cost of travelling from the Regional Capital to the Refugee Registration Centre can explicitly be extract as:
## RRC
## 22
Alternatively the appropriate igraph library call could be used:
shortest.paths(g, v='C', to='RRC', algorithm = "dijkstra")
## RRC
## C 22
To retrieve the path taken from the results set, a data structure has been added to Dijkstra’s algorithm to store this information. The list item (called “previous”) is traversed backwards from the destination node to the source:
shortDPath <- function(results) {
Seq <- vector(mode = "character", length = 0L)
u <- results$lastVertex
while( !is.na(u) ) {
Seq <- append(Seq, u)
u <- results$previous[u]
}
names(Seq) <- NULL
return(rev(Seq))
}
Seq <- shortDPath(results)
The path is given as:
## C -> V1 -> V2 -> WS1 -> TC1 -> WS2 -> TC2 -> RRC
A slightly more robust version of a shortest path algorithm is given by the work of Bernard Roy (in 1959) and Robert Floyd (1962). Where robustness includes the capability to handle negative edge weights, assuming non-negative cycles. It also has the ability to provide details on the lowest cost of moving from every vertex as a potential source to every other vertex pair in the graph, in a single run of the algorithm. This is achieved by running at Order(\({ n }^{ 3 }\)). Note, Floyd-Warshall will not fail to give the correct solution when using negative weights, while Dijkstra’s may fail.
The matrix representation of the graph in Figure 1, as used by the algorithm is given by:
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## C 0 8 10 12 Inf Inf Inf Inf Inf Inf
## T1 8 0 2 Inf Inf Inf Inf Inf 12 Inf
## V1 10 2 0 Inf 2 Inf Inf Inf Inf Inf
## T2 12 Inf Inf 0 Inf 5 Inf 10 Inf Inf
## V2 Inf Inf 2 Inf 0 1 7 Inf Inf Inf
## WS1 Inf Inf Inf 5 1 0 Inf 2 Inf Inf
## WS2 Inf Inf Inf Inf 7 Inf 0 2 2 Inf
## TC1 Inf Inf Inf 10 Inf 2 2 0 5 8
## TC2 Inf 12 Inf Inf Inf Inf 2 5 0 3
## RRC Inf Inf Inf Inf Inf Inf Inf 8 3 0
For this algorithm, diagonal entries show an edge weight of zero (as with Dijkstra, there is no path from a node back on to itself). Infinity is used as the initial estimate for a cost between vertices where a direct edge does not exist. The remaining values are the initial known costs. This data structure can be set-up to be non-symmetrical (i.e. to enforce directionality if so desired).
Implementation of Floyd-Warshall’s Algorithm O(\({ n }^{ 3 }\))7:
floydWarshall <- function(d) {
# |V(G)| = 10
num_v <- dim(d)[1]
# p[i,j] will store the path (the vertex k), to take in order to get to min path
# Initialized to NaN
p <- matrix(NA, nrow = num_v, ncol = num_v)
dimnames(p) <- dimnames(d)
# pathlist <- c('C', 'T1', 'V1', 'T2', 'V2', 'WS1', 'WS2', 'TC1', 'TC2', 'RRC')
pathlist <- dimnames(d)[[1]]
# Traverse the 'Graph' to find the shorest path:
for (k in 1:num_v) {
for (i in 1: num_v) {
for (j in 1: num_v) {
# This is the key optimization step:
if ( d[i,k] + d[k,j] < d[i,j]) {
# there is a smaller cost by going through the intermediary vertex
# update minimum distance/cost:
d[i,j] <- d[i,k] + d[k,j]
# store the path:
p[i,j] <- pathlist[k]
}
}
}
}
return (list(dist = d, path = p, pathlist = pathlist))
}
results <- floydWarshall(d)
(results$dist)
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## C 0 8 10 12 12 13 17 15 19 22
## T1 8 0 2 10 4 5 9 7 11 14
## V1 10 2 0 8 2 3 7 5 9 12
## T2 12 10 8 0 6 5 9 7 11 14
## V2 12 4 2 6 0 1 5 3 7 10
## WS1 13 5 3 5 1 0 4 2 6 9
## WS2 17 9 7 9 5 4 0 2 2 5
## TC1 15 7 5 7 3 2 2 0 4 7
## TC2 19 11 9 11 7 6 2 4 0 3
## RRC 22 14 12 14 10 9 5 7 3 0
Notice the result set being returned is the lowest cost of moving from any vertex, as a potential source, to every other vertex pair in the graph. In one run of this algorithm, it can ascertain that the true lowest cost moving from T1 to TC2 is indeed 11 (see end-note 6).
Reconstructing the shortest path for this scenario entails working backwards from final to source recursively:
shortFWPath <- function(i, j, dist, path, pathlist) {
if ( is.infinite(dist[i,j]) ) {
return ("No Path")
}
intermediate <- path[i,j]
if ( is.na(path[i,j])) {
return ("->")
} else {
return( c(shortFWPath(i, match(intermediate, pathlist), dist, path, pathlist),
intermediate,
shortFWPath( match(intermediate, pathlist), j, dist, path, pathlist) ) )
}
}
## [1] "The shortest path from the Capital (C) to the Refugee Registration Center (RRC):"
## [1] "C -> V1 -> V2 -> WS1 -> TC1 -> WS2 -> TC2 -> RRC"
## [1] "The mimimal distance has value of: 22"
If it is assumed that the path from Town 1 to the second Transit Centre (TC2) is cleared of paramilitary action in the southern direction, the risk rating is dropped to -2, while the northern direction retains its original risk rating. Floyd-Warshall handles negative weights assuming the weights are none negative cycles.
d <- fwData()
# Change - note Floyd-Warshall works with negatives but not with negative cycles:
d['T1', 'TC2'] <- -2
d['TC2', 'T1'] <- 12
Running the Floyd-Warshall algorithm:
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## C 0 8 10 12 12 12 8 10 6 9
## T1 8 0 2 9 4 4 0 2 -2 1
## V1 10 2 0 8 2 3 2 4 0 3
## T2 12 10 8 0 6 5 9 7 8 11
## V2 12 4 2 6 0 1 4 3 2 5
## WS1 13 5 3 5 1 0 4 2 3 6
## WS2 17 9 7 9 5 4 0 2 2 5
## TC1 15 7 5 7 3 2 2 0 4 7
## TC2 19 11 9 11 7 6 2 4 0 3
## RRC 22 14 12 14 10 9 5 7 3 0
## [1] "The shortest path from the Capital (C) to the Refugee Registration Center (RRC):"
## [1] "C -> T1 -> TC2 -> RRC"
## [1] "The mimimal distance has value of: 9"
This is the correct result: a shortened distance and a new direct path. How does it compare with Dijkstra?
d <- dijskstraData()
d['T1', 'TC2'] <- -2
(d)
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## C 0 8 10 12 0 0 0 0 0 0
## T1 8 0 2 0 0 0 0 0 -2 0
## V1 10 2 0 0 2 0 0 0 0 0
## T2 12 0 0 0 0 5 0 10 0 0
## V2 0 0 2 0 0 1 7 0 0 0
## WS1 0 0 0 5 1 0 0 2 0 0
## WS2 0 0 0 0 7 0 0 2 2 0
## TC1 0 0 0 10 0 2 2 0 5 8
## TC2 0 12 0 0 0 0 2 5 0 3
## RRC 0 0 0 0 0 0 0 8 3 0
g <- graph.adjacency(d, mode = 'directed', weighted = TRUE, diag = FALSE)
results <- dijkstra(g, start = "C")
print(results$shortPath)
## C T1 V1 T2 V2 WS1 WS2 TC1 TC2 RRC
## 0 8 10 12 12 12 8 10 6 9
The results in this instance are correct (again Dijkstra may not fail, while Floyd-Warshall will not fail when negative weights are included).
We can get the sense of a break down in the Dijkstra implementation when retrieving the path:
Seq <- shortDPath(results)
cat(Seq, sep = " -> ")
## C -> T1 -> TC2 -> WS2 -> TC1 -> WS1
E(g, path=Seq)$color <- 'red'
plot(g)
In this example, Dijkstra’s implementation yielded the correct cost but failed in generating the path from Regional Capital (C) to Refugee Registration Centre (RRC).
The essence of the preceding work has been to show how realities in the humanitarian sector can benefit from applying analytic modelling using graphs. These data structures offer an opportunity to encode rich semantic meaning into the analytic tool. Further, the work presented applies well-known algorithms to the problem of identifying optimal paths. Granted, a lot of assumptions have been made, not the least of which is the ability to access accurate and timely information on risks.
This said, the work can yield important information that may be very hard to discern through manual processing, especially as the size of the data gets larger, which will become more apparent in Part 2.
As a final note on Part 1, the choice of algorithms is important. In the preceding work, Floyd-Warshall’s implementations prove to be more robust for the field practitioners’ needs, given the scenario presented (weights can change, we may need to get a full overview of the entire graph’s shortest paths from every vertex to another node).
The following hypothetical community will be represented as a graph and used to run some measures of centrality:
This network can be mapped to an adjacency matrix where a vertex v in the matrix is \({a}_{v,t}\) = 1 if the vertex v is connected to vertex t, 0 otherwise:
## Nelson Bart Apu Barney Lenny Kent
## Nelson 0 1 0 0 0 0
## Bart 1 0 1 0 0 0
## Apu 0 1 0 1 1 1
## Barney 0 0 1 0 1 0
## Lenny 0 0 1 1 0 1
## Kent 0 0 1 0 1 0
Note that the social network will often be very sparse (represented as having many zeros in the table). Diagonal entries would mean a connection to oneself and are by convention denoted as 0 (no connection).
As a graph, it can be shown as follows:
## IGRAPH UNW- 6 7 --
## + attr: name (v/c), weight (e/n)
## + edges (vertex names):
## [1] Nelson--Bart Bart --Apu Apu --Barney Apu --Lenny
## [5] Apu --Kent Barney--Lenny Lenny --Kent
This is clearly a very simple graph, but in the words of Edward Dijkstra, “Simplicity is a prerequisite for reliability.”9 This graph can be used to understand the various measures of centrality.
There are different ways of measuring centrality under graph theory.
Degree Centrality is one such measure that counts the number of other vertices that a particular vertex is connected to. For example, vertex 3 (Apu) has degree centrality of 4
(g.data$deg <- degree(g)) # or rowSums(adj.mat)
## Nelson Bart Apu Barney Lenny Kent
## 1 2 4 2 3 2
Degree centrality is simply the number of nodes that are adjacent to the node in question. People who are popular within a community should have the highest number of friends.
Since this is not normalized, it is hard to compare the raw integer of Apu’s rating of 4, with an integer value from another network of, for example, 20. At a quick glance, it may be tempted to say that the second person has a higher degree of friends in his/her respective network. This argument will not hold, if the second network is comprised of 100 people versus the six in the current network. That is, \(\frac{4}{6} > \frac{2}{100}\)
The maximum number of connections that would be possible in this population of 6 people, would be 5 connections. The maximum in a network of 20, would be 19 (i.e. n - 1). Using a Normalized Degree Centrality is recommended and is given by:
norm.deg.centrality <- function(M) {
n <- dim(M)[2]
apply(M, 2, function(x) {sum(x)/(n-1)})
}
(g.data$norm.cent <- norm.deg.centrality(adj.mat))
## Nelson Bart Apu Barney Lenny Kent
## 0.2 0.4 0.8 0.4 0.6 0.4
Apu has connections with 80% of the community.
An alternative measure that can be used is that of Closeness Centrality. The closeness centrality of a vertex is defined by the inverse of the average length of the shortest paths to/from all the other vertices. This score gives a higher rating to those nodes that have a short path distance to all other nodes.
Recall that the geodesic distance (shortest path) is given by the following:
# call shortest.paths(g) or use what I have already
adj.mat[adj.mat == 0] <- Inf
diag(adj.mat) <- 0
floydWarshall(adj.mat)$dist
## Nelson Bart Apu Barney Lenny Kent
## Nelson 0 1 2 3 3 3
## Bart 1 0 1 2 2 2
## Apu 2 1 0 1 1 1
## Barney 3 2 1 0 1 2
## Lenny 3 2 1 1 0 1
## Kent 3 2 1 2 1 0
The closeness centrality measure, using the shortest path information:
closeless.centrality <- function(M, gp) {
lapply(M[1], function(x) { 1/colSums( shortest.paths(gp) ) } )[[1]]
}
(g.data$clos.cent <- closeless.centrality(adj.mat, g) )
## Nelson Bart Apu Barney Lenny Kent
## 0.08333333 0.12500000 0.16666667 0.11111111 0.12500000 0.11111111
# or simply use igraph closeness function
# ( closeness(g, mode='all') )
Nodes (people) that have low scores, may be of concern to aid workers. They represent individuals who are further away from the central figures in the community, such that if we need to rapidly get information to them, it would be more problematic.
In this simple graph structure, Apu continues to reign supreme as the central figure when wanting to get messages to the community. Notice however, that vertices 2 and 5 (Bart and Lenny respectively), have the second highest scores. If analysts were to rely solely on the normalized centrality scores, Bart (vertex 2) could have been missed as an important member of the community. The reason being that Bart is close to the centre of the network. This attribute is being picked up in the closeness centrality measure.
The concept of bridge nodes is better assessed by the Betweenness Centrality measure, and would be an ideal measure for the purposes of this exercise. This measure counts the number of shortest paths in a network that go through a specific vertex. Those vertices with high betweenness, play a key role in the communication or relaying of information within the network. The betweenness centrality of a vertex is defined as:
# igraph betweenness
( g.data$btw.cent <- betweenness(g) )
## Nelson Bart Apu Barney Lenny Kent
## 0.0 4.0 6.5 0.0 0.5 0.0
Here, Bart, Apu and Lenny stand out as nodes that act as bridges to the other nodes’ shortest paths when communicating information. However, notice how Bart’s score is much higher now than that of Lenny (in comparison to the Closeness Centrality measure). Bart is a vital link in getting information to and from Nelson. Lenny is important in getting information to and from Kent and Barney, but Apu could also do the same. As such, Lenny holds a lower Betweenness score than Bart. Should there be a need to rapidly disseminate information, using the Betweenness Centrality score would be a good idea.
Betweenness Centrality Score of Vertex \({v}_{i}\) = \(\sum _{ { v }_{ s }\neq { v }_{ i }\neq { v }_{ t }\in V,s<t }^{ }{ \frac { { \sigma }_{ st }({ v }_{ i }) }{ { \sigma }_{ st } } }\)
Where \({ \sigma }_{ st }({ v }_{ i })\) is the number (not length) of shortest paths between vertex \({v}_{s}\) and \({v}_{t}\) passing through \({v}_{i}\) and \({ \sigma }_{ st }\) is the number of shortest paths between vertex \({v}_{s}\) and \({v}_{t}\).
For example, the Betweenness Centrality for vertex 5, Lenny, is given by:
# Number of shortest path bwt Nelson and Kent is equal to 1
# This path doesn't use Lenny as a bridge, so the score is 0/1 = 0
all_shortest_paths(g, from = 1, to = 6)
## $res
## $res[[1]]
## + 4/6 vertices, named:
## [1] Nelson Bart Apu Kent
##
##
## $nrgeo
## [1] 1 1 1 0 0 1
# The same will be true for vertex 2 and 6, and vertex 3 and 6
# Number of shortest path bwt Barney and Kent = 2
# 1 many pass through Lenny, so the score is 1/2 = 0.5
all_shortest_paths(g, from = 4, to = 6)
## $res
## $res[[1]]
## + 3/6 vertices, named:
## [1] Barney Lenny Kent
##
## $res[[2]]
## + 3/6 vertices, named:
## [1] Barney Apu Kent
##
##
## $nrgeo
## [1] 0 1 1 1 1 2
| t/s | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 6 | 0/1 | 0/1 | 0/1 | 1/2 = 0.5 |
A final measure of centrality is given by the Eigenvector Centrality, which is a measure of the influence of a node’s connection in the network to other “popular” nodes. Higher eigenvector centrality scores are given to a node if it connects to many high-scoring nodes. The importance of one node is defined by the importance of that particular node’s “friends”.
( g.data$eign.cent <- evcent(g)$vector )
## Nelson Bart Apu Barney Lenny Kent
## 0.1582229 0.4280856 1.0000000 0.6965127 0.8844756 0.6965127
Nelson (vertex 1) does not connect to many high-scoring nodes (he only connects to Bart) and therefore has a notably lower eigenvector centrality score.
In summary, across the various measures, the following information has been collected for this community:
## Person deg norm.cent clos.cent btw.cent eign.cent
## 1 Nelson 1 0.2 0.08333333 0.0 0.1582229
## 2 Bart 2 0.4 0.12500000 4.0 0.4280856
## 3 Apu 4 0.8 0.16666667 6.5 1.0000000
## 4 Barney 2 0.4 0.11111111 0.0 0.6965127
## 5 Lenny 3 0.6 0.12500000 0.5 0.8844756
## 6 Kent 2 0.4 0.11111111 0.0 0.6965127
How does is this information helpful? One interesting read is that of Conway10 who describes “critical gatekeepers” as people having high betweenness centrality scores and low eigenvector centrality measurements. His interpretation is that these are individuals who connect people in a network who would otherwise be isolated from the core.
library(ggplot2)
library(ggthemes)
ggplot(data = g.data, aes(x = eign.cent, y = jitter(btw.cent))) + geom_point(aes(colour=Person)) +
theme_stata() + ggtitle("Comparing Eigenvector and Betweenness Centrality") +
xlab("Eigenvector Centrality") + ylab("Betweenness Centrality (Jittered)")
The importance of Apu is without question. He is a conduit for passing information onto others in the network (high betweenness centrality) and is also well connected to the most popular of people in this network (high eigenvector centrality). Further, using Conway’s classification, Bart would be viewed as a critical gate-keeper given his high betweenness score and relatively smaller eigenvector centrality score. The lower eigenvector score is due to the fact that Nelson is disconnected from the core network. Since Bart has a high betweenness score, he would be pivotal in getting information to Nelson.
It goes without saying that this is a small example, but the value of Graph and Social Network Analysis is obvious. Consider the case, where just a bit more data has been added, as in this random graph:
set.seed(1970)
h <- erdos.renyi.game(20, 0.3)
plot.igraph(h, vertex.label=V(h)$name, layout=layout.fruchterman.reingold, edge.color='black', edge.weight=E(h)$weights)
This is the case of 20 individuals. Isolating key individuals in the network becomes difficult to do manually. However, vital information on critical gatekeepers or people who act as bridges to the community, is easily achieved using SNA tools.
Extracting the highest scoring individual on the betweenness score:
max(betweenness(h))
## [1] 24.54698
which(betweenness(h) == max(betweenness(h)))
## [1] 15
And finally comparing the “critical gatekeepers”:
df <- data.frame(Person = seq(1,20), btw.cent = numeric(20), eign.cent = numeric(20))
df$btw.cent <- betweenness(h)
df$eign.cent <- evcent(h)$vector
( sorted.df <- df[order(df$btw.cent, df$eign.cent), ] )
## Person btw.cent eign.cent
## 3 3 0.7297619 0.3723812
## 10 10 0.7928571 0.3067415
## 5 5 1.8459707 0.4303435
## 7 7 2.9047619 0.3431103
## 13 13 3.5623016 0.3494582
## 17 17 3.8231685 0.5433029
## 18 18 3.8781136 0.3614756
## 4 4 4.3154762 0.2954993
## 6 6 5.0935287 0.6612864
## 12 12 7.1743590 0.6203213
## 9 9 7.6510073 0.4326130
## 11 11 7.7446581 0.5698994
## 8 8 8.4083333 0.4494209
## 2 2 10.0297619 0.5277134
## 19 19 10.4170635 0.7496878
## 16 16 12.3514652 0.5233444
## 20 20 13.2765263 0.7146714
## 14 14 16.0178571 0.6382962
## 1 1 16.4360501 0.8094327
## 15 15 24.5469780 1.0000000
ggplot(data = sorted.df, aes(x = eign.cent, y = btw.cent, label = Person)) + geom_text() + geom_point(alpha=0.4, colour='red') +
theme_foundation() + ggtitle("Eigenvector and Betweenness Centrality in |V| = 20 Graph") +
xlab("Eigenvector Centrality") + ylab("Betweenness Centrality")
Individuals 1, 14, 15, 16 and 20 may be identified as people to target as being central figures to the network.
The final part of the analysis looks at the question of which individuals inside an undercover network are central. Having this type of information would be a boon to aid staff in getting information to the informal settlements and gathering information from the community. Considering the fact that these undercover networks can grow to be hundreds if not thousands of people, the power of leveraging graphs and social network analysis tools becomes apparent.
Graph theory and social network analysis are methodologies of interest that should be considered when modelling the informal settlements of migrants. These examples are a small representation of what is possible.
UN High Commissioner for Refugees (UNHCR), World at War. Global Trends Forced Displacement in 2014, June 2015, available at: http://unhcr.org/556725e69.html#_ga=1.74777362.1880093282.1447460740 [accessed 14 November 2015]↩
UN High Commissioner for Refugees (UNHCR), IDPs in Host Families and Host Communities: Assistance for Hosting Arrangements, April 2012, available at: http://www.refworld.org/docid/4fe8732c2.html [accessed 25 October 2015].↩
Inspired by Corsellis T. & Vitale, A. (2004). Transitional Settlements Displaced Populations. The ShelterProject.org, University of Cambridge, Cambridge UK. Trial edition.↩
These are interim stopping points for people, typically stocked with clean water, food and medical supplies. Transit Centres provide overnight accommodation. Both way stations and transit centres may also entail preliminary registrations.↩
Pseudo-code from https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm↩
The shortest path is not a direct path (cost would equal 12), but rather an indirect path of T1->V1->V2->WS1->TC1->WS2->TC2↩
Pseudo-code from https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm↩
Krebs, V., Mapping networks of terrorist cells. Connections 24(3), 2002, available at: http://insna.org/PDF/Connections/v24/2001_I-3-7.pdf [accessed 25 October 2015].↩
See https://doccamiryan.wordpress.com/2010/07/07/new-conceptions-around-sna-measures-conway-2009-thanks-to-matt-b/↩