Networks either explicitly, or implicitly represent the flow of resources, information, influence or similar factors through a set of entities. It naturally follows that an analyst may wish to track the potential flow of those factors through the network. It can be particularly helpful to identify the paths or entities that hold the potential to facilitate, impede, or otherwise broker the flow of such factors.
We will be using data from Zachary’s Karate Club network. To download the example data, go to the Network Data link on the course website. You will find the edgelist data Zachary.csv there.
Create a folder labeled “Cohesive Subgroups” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:
As we move into more involved algorithms, we cannot continue to use just one analytic package. We are therefore going to be availing ourselves of Carter Butts’ “sna” package from this point to the end of the course. Actually, Butts first wrote “network” and “sna,” but has since subsumed those packages into the larger suite called “statnet.” We will, therefore, be referring to this package as “statnet” in this and all following practica.
If you have not done so at this point, go ahead and install statnet.
install.packages("statnet", dependencies=TRUE)
If you have already installed statnet, but it has been a while, you may update it to the newest version as well.
update.packages("statnet")
For more information on installing or updating statnet, feel free to visit their webpage on the topic.
If you like, you may choose to start by loading igraph and statnet at the beginning.
library(igraph)
library(statnet)
If you have elected to load both packages at the outset, you may have noticed a warning that was generated after you loaded the second package.
Because both packages are designed for use in social network
analysis, they often give their functions the same names. Although this
makes perfect sense, it is often cause for some confusion in R. You
will, therefore, see a warning when you load both packages together:
“The following objects are masked from ‘package:igraph’:
”
The packages are “masked,” because the sna package, which is integral to statnet, has eleven functions with the same names as those used in igraph. (betweenness, bonpow, closeness, components, degree, dyad.census, evcent, hierarchy, is.connected, neighborhood, triad.census) Since these functions are “masked” from igraph, R will default to running the sna version of the functions listed above, unless you give it more direction. In other words, you have a workaround: tell R which package you intend to use when you write your script.
The solution to the masking problem is fairly simple. Just identify
the package that you intend to use at the time. So if, for example, you
want to calculate betweenness for a network that you loaded in
igraph, just enter something along the lines of
igraph::betweenness(g)
, or
bet <- igraph::betweenness(g)
. Entering the package
name, followed by double colons (sna::
, or
igraph::
) simply tells R where to look for the function you
need. This is something that come in very handy if you are using a mix
of packages in one script. You don’t have to load each package
separately if you do not so wish.
One last caveat: Although we will be loading
statnet from this point on, most of the basic network
calculations that you have made up to this point are done using
sna, which is a part of the statnet suite. We,
therefore, need to identify sna as the package that we are
using in some cases, and other packages in other cases. When in doubt
about which package actually contains the function that you wish to use,
just check it out by using the help function: ?degree
. In
the upper left-hand side of the help document, you will see the function
name, followed by the package where it resides
(degree {sna}
).
When working with more than one analytic package in R, it is easy to
create confusion. If, for example, you try to use a network that was
loaded in statnet to make calculations in igraph,
igraph will give you a warning:
Error in igraph::degree(net) : Not a graph object
If you forget that you loaded the network in statnet, or if
you forgot to convert the network to an igraph object, this can be
fairly confusing. To help avoid such unnecessary confusion, we’ll use
the following convention to identify whether a network has been loaded
in igraph or statnet. All networks that are loaded
using the statnet suite will use “net
”, and all
networks loaded in igraph will use “g
.”
Using this convention, we may now load the network both ways.
You can downoad the example data here
el <- read.csv(file.choose(), header = FALSE) # Read in the edgelist
g <- graph.data.frame(el, directed = TRUE) # Load with igraph
net<-network(el, directed=TRUE, matrix.type="edgelist") # Load with network (for sna)
If you are not starting from an edgelist or matrix, then you still
have the option of converting a network data object from igraph to sna,
or from sna to igraph, using the intergraph
package. If you
would like to briefly explore your options with intergraph, visit this
short
tutorial in using its functions.
Practicum 4 already
introduced how to decipher the information about an igraph
network object. The sna package is a little more
straightforward. Type net
into the R console to get the
details on the network object that sna uses.
net
## Network attributes:
## vertices = 34
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 155
## missing edges= 0
## non-missing edges= 155
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
This was actually created in a package called “network.” The
summary tells us that the network has 34 nodes, directed ties, no loops,
etc. Unlike igraph, the default summary does not list the actual edges.
To see the edgelist, along with the above information, use
summary(net)
.
For more information on how to read the summary, you can check out the original (2008) paper introducing the network package.: http://www.jstatsoft.org/v24/i02/.
There are a lot of ways to operationalize community structure. The simplest, and most straightforward is to simply establish what components there are in the network, so that one may then focus only on the largest - or main - component.
Running component identification and extraction in igraph is
particularly simple. The function name is “decompose
” and
it isn’t as gross as it sounds. As the name implies, the function will
decompose a network into a list of subnetworks that represent all of
either, the strong components, or the weak components, contained within
the network.
# ?decompose # for more information on options
dg <- decompose(g)
dg # Check out the subnetworks, if any.
## [[1]]
## IGRAPH f5cda7f DN-- 34 155 --
## + attr: name (v/c)
## + edges from f5cda7f (vertex names):
## [1] 1 ->3 1 ->4 1 ->5 1 ->6 1 ->7 1 ->8 1 ->9 1 ->11 1 ->12 1 ->13
## [11] 1 ->14 1 ->18 1 ->20 1 ->22 1 ->32 2 ->1 2 ->3 2 ->4 2 ->8 2 ->14
## [21] 2 ->18 2 ->20 2 ->22 2 ->31 3 ->1 3 ->2 3 ->4 3 ->8 3 ->9 3 ->10
## [31] 3 ->14 3 ->28 3 ->29 3 ->33 4 ->1 4 ->2 4 ->3 4 ->8 4 ->13 4 ->14
## [41] 5 ->1 5 ->7 5 ->11 6 ->1 6 ->7 6 ->11 6 ->17 7 ->1 7 ->5 7 ->6
## [51] 7 ->17 8 ->1 8 ->2 8 ->3 8 ->4 9 ->1 9 ->3 9 ->31 9 ->33 9 ->34
## [61] 10->3 10->34 11->1 11->5 11->6 12->1 13->1 13->4 14->1 14->2
## [71] 14->3 14->4 14->34 15->33 15->34 16->33 16->34 17->6 17->7 18->1
## + ... omitted several edges
The Karate Club network has only one component. So, this example won’t offer a particularly interesting result. But, hopefully, you get the idea.
If there are multiple components contained within the network, then
the object you create, as we did in the example above (dg
),
will contain a list of those components. You can simply type in the name
of the object (dg
) to see them. In many cases, these lists
can be quite long. So, you can consider the edgelist for any one of them
by referencing them using double square brackets, like so:
dg[[1]]
Another way of limiting the number of components that you extract is
to limit the size of components that you consider to be meaningful. This
is especially helpful if the network is particularly large, or contains
a lot of isolates. To do so, set the minimum number of vertices to some
number greater than two. For more information on this and other options,
check out the help section: ?decompose
.
dg <- decompose(g, mode="weak",
min.vertices = 5)
You’ll notice, above, that the subnetworks that are pulled using the
decompose
function are stored as edgelists. That means that
you may use them as igraph objects just as they are. Use square brackets
as we discuss above to identify which component you would like to
plot.
plot.igraph(dg[[1]]) # Plot the largest one
Another option you will want to consider is saving the component you are interested in analyzing under a new name so that you can focus on it without constantly having to reference the network decomposition.
Note the double square brackets [[]]
in the code below.
They are used to reference one or more specific elements in a vector of
elements. If there had been more than one component in the object we
named dg
, then each would be stored inside the
dg
object and they would be organized by number in a
vector. So, to extract the the third component from the network, you
would enter dg[[3]]
. In this case, there was only one
(dg[[1]]
), so we use the script below to save that
particular subnetwork under a new name.
g2 <- dg[[1]] # Save the largest as a new network object
# Next, go on to analyze the subnetwork
In practice, we will frequently be interested in only the very largest component in a network.
To automatically extract the largest component you can use the
which.max
function. which.max
will identify
the largest of the components. We use which.max
in
conjunction with the sapply
function. sapply
is a function that will iterate a command over all items in a list. In
this case, we are asking R to evaluate the order (number of nodes) in
each of the components in dg
using the vcount
function in igraph. which.max
will identify the largest
one, to be saved in a new object named “max.g
”. We can then
extract that subnetwork from the dg
object and save it as
“g2
”.
dg <- decompose.graph(g)
max.g <- which.max(sapply(dg, vcount))
g2 <- dg[[max.g]]
# Then use g2 as you would above.
Of course, there is only one component in this network. So,
g2
will look exactly like g
this time. For
larger, disconnected, networks, this will be an important part of your
preprocessing before further analysis.
Extracting components in statnet is a bit more involved. The
decomposition of the network is not automatic in statnet. You
will therefore need to identify which nodes belong in which components,
and then use that information to extract
(get.inducedSubgraph
) a subset of vertices - and the edges
between them - using that information.
Note: Since we are using statnet for the following
example, you will notice that we have used the naming convention of
net
to signify that this is a network object that is
suitable for use in statnet, not igraph. We will do this for all network
objects used by statnet.
Here is the process for investigating and extracting components in statnet.
You will notice net %v% "component"
in the code below.
This is one way to either extract or add attributes to the vertices in
statnet. If you read it literally, it is saying: “in the network
named”net
” there is (or shall be) an attribute of the
vertices (%v%
) named "component"
. The code is
essentially set up as network_name %v% "attribute_name"
. As
such, this replaces both get.vertex.attribute()
and
set.vertex.attribute()
. You can also do this for all edges
or arcs, using %e%
. For more information, see
?get.vertex.attribute()
.
# ?component.dist # for more information on options
cd<-component.dist(net,
connected="weak") # Run components algorithm for "weak" components
net %v% "component" <- cd$membership # Add "component" to vertex attributes
cd$membership # Check out the membership to see what is prevalent
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
cd$csize # Check out the size of the various components
## [1] 34
subnet <- get.inducedSubgraph(net,v=which(net %v% "component"==1)) # Component 1 was largest
subnet # Check out the new subnetwork
## Network attributes:
## vertices = 34
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 155
## missing edges= 0
## non-missing edges= 155
##
## Vertex attribute names:
## component vertex.names
##
## No edge attributes
gplot(subnet,
usearrows = FALSE,
displaylabels = TRUE)
Although the concept of cliques - maximal complete subgroups - is fairly easy to grasp, they can be somewhat difficult to analyze appropriately.
Remember that a clique is a maximal subset of a network, where each node in the set is adjacent to all other nodes in the set. The term “maximal” just means that every node that can be included in the subset has been included, and that no additional nodes may be included without violating its definition: [density(clique)=1.0]
Eliciting cliques is fairly straightforward. For this to be somewhat more realistic in social terms, we’ll adhere to the practice of considering only cliques comprised of three or more nodes to be meaningful.
We’ll start in igraph. To see your clique analysis options
in igraph, use the help function:
?igraph::cliques
First, we can find the number of cliques in a network using the
count_max_cliques()
function. Again, we are only interested
in cliques sized 3 or greater.
Note: By default, igraph treats networks as being undirected when running all operations on cliques. If you have a directed network, then you will get a warning about this every time you do a clique analysis.
count_max_cliques(g, min=3) # counts the number of cliques (minimum size=3)
## [1] 25
We can see that there are 25 cliques of size three or greater in this network.
We are often interested in the largest clique or cliques in a
network. For information on the size of the largest clique, use
clique_num()
.
clique_num(g) # size of largest clique(s)
## [1] 5
As we can see, the largest clique, or cliques, contain five nodes.
From here, we can extract the cliques.
# To list the cliques out on your screen
max_cliques(g, min=3) # all maximal cliques
# Alternately, create an object and call up the cliques
# one by one.
mc <- max_cliques(g, min=3)
mc[[1]] # View clique #1
mc[[21]] # View clique #21
mc[[23]] # View clique #23
We can see above, in count_max_cliques
, that there are
25 cliques with three or more nodes. We did not run max_cliques here,
since it would take a lot of space to display all 25 cliques. But, you
are welcome to do so.
To save space, we save the results as a new object, named
“mc
.” You can see all the cliques by simply typing in
mc, or by using the double square braces ([[]]
) as
we did above to identify which of the 25 cliques (or subnetworks) whose
membership we are interested in seeing.
Try this for yourself. The function will give you a list of all cliques, as we define them. Again, the word “maximal” just means that you cannot add more nodes to the clique without violating the definition of a clique.
We may also turn our focus to just the largest clique or cliques,
largest_cliques(g) # all largest cliques
## Warning in largest_cliques(g): At core/cliques/maximal_cliques_template.h:269 :
## Edge directions are ignored for maximal clique calculation.
## [[1]]
## + 5/34 vertices, named, from 752bfa8:
## [1] 2 1 4 3 8
##
## [[2]]
## + 5/34 vertices, named, from 752bfa8:
## [1] 2 1 4 3 14
cliq <- largest_cliques(g) # get the largest cliques of g
## Warning in largest_cliques(g): At core/cliques/maximal_cliques_template.h:269 :
## Edge directions are ignored for maximal clique calculation.
The largest_cliques
function lists two cliques, each
with five vertices. If you save this information as an object (which we
named “cliq
” in this case), you may then plot the largest
cliques.
As above, the double square brackets ([[]]
) are used to
reference items within the list of elements in the object we named
“cliq
”.
plot(induced_subgraph(g, cliq[[1]]))
plot(induced_subgraph(g, cliq[[2]]))
You can also get a little more fancy and plot them side-by-side. The
length
function measures how many items are in the list of
elements in the object. for()
is a type of function that
will loop a function of set of functions oer a specified number of
times. In this case, you can read this as: for
each number
(i
) from 1 to however many are in the cliq
object (1:length(cliq)
), do the following… The
i
in plot(induced_subgraph(g, cliq[[i]]))
will
be replaced in each of a series of numbers. In this case, there are only
two subgraphs in cliq
.
par(mfrow = c(1, 2)) # set graphics to plot 1 row, 2 cols
for(i in 1:length(cliq)){ # plot the 2 largest cliques
plot(induced_subgraph(g, cliq[[i]]))
}
par(mfrow = c(1, 1)) # Reset the plot window
# so you are not stuck
# with 2 columns
As mentioned, that was the fancy version. You can accomplish the same thing by specifying each network to be plotted.
par(mfrow = c(1, 2)) # set graphics to plot 1 row, 2 cols
plot(induced_subgraph(g, cliq[[1]])) # plot one of the 2 largest cliques
plot(induced_subgraph(g, cliq[[2]])) # plot the other of the 2 largest cliques
par(mfrow = c(1, 1)) # Reset the plot window
# so you are not stuck
# with 2 columns
This will accomplish the same thing as the code above.
Statnet goes a little farther in the analysis of cliques. Though, it is also a little more involved.
Note, since we are using statnet for the following
example, you will notice that we have used the naming convention of
net
for all network objects used by statnet. (This is the
last time we will mention it.)
census <- clique.census(net, mode="graph")
ls(census) # Unpack the "census" object
## [1] "clique.count" "cliques"
The census
object that was generated using the
clique.census
function in statnet has two types of
information. We look at each, below.
census$clique.count # Number of cliques for each vertex
## Agg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 11 2 1 3 0 0 0 0 0 0 2 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 1 0
## 3 21 9 3 2 1 2 3 3 0 2 0 2 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 3 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## 5 2 2 2 2 2 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 28 29 30 31 32 33 34
## 1 0 0 0 0 0 0 0
## 2 2 1 0 1 1 0 3
## 3 1 1 1 0 3 7 9
## 4 0 0 1 1 0 2 2
## 5 0 0 0 0 0 0 0
The clique.count
shows how many times each node appears
in a clique of a particular size. In this case, we have no cliques
larger than five nodes, so the y axis only goes up to five.
census$cliques # List all cliques
We have suppressed the output for the list of actual cliques. It was too long. So, try this on your own.
Though, as with igraph, you can use this information to plot. Though, the plots are a little more versatile.
First, let’s change the layout of the network to better reflect
similarity between nodes by clique overlap. To do so, run the
clique.census
again. But, this time, add the option to save
a “co-membership” matrix (clique.comembership="sum"
). You
are summing up all the times that two nodes appear in the same clique
These sums are saved as links between all pairs of nodes, in matrix
format. Because that is all we are interested in this time, that is the
only output we are preserving from this clique census. That is why we
have “$clique.comemb
” included on the outside paren.
The co-membership matrix provides some advantages. You may use the clique overlap counts as a means for understanding how similar the various nodes are to one another. One of the more well-used methods for doing this is to create coordinates using a technique known as multidimensional scaling (MDS). MDS will create a distance matrix based on overlap, such that nodes with more overlap will have smaller distances between them in an n-dimensional space.
For more information on MDS, consult this conceptual tutorial. Or, you can investigate your options in R. And, for those who wish to go deeper, here is a chapter that provides a more comprehensive background into the procedure.
comemb <- clique.census(net, clique.comembership="sum")$clique.comemb # Co-membership matrix
coords <- cmdscale(1/(1+comemb)) # Perform an MDS (multidimensional scaling)
# the more overlap,
# the "closer" the vertices
gplot(net,
coord=coords,
usearrows=FALSE,
displaylabels=TRUE)
Using an inverse of multidimensional scaling (see:
?cmdscale
), you just produced a distance matrix of
similarities and used that to plot the network in statnet. (The
“1+” part was just to get rid of the zeroes in the matrix.) This
effectively re-envisions the social space according to clique
overlap.
You may have noticed that gplot
, the visualizer for
statnet, operates differently from plot.igraph
.
For a fairly complete list of the different options in
gplot
, check the help section; ?gplot
.
We can also reuse the co-membership matrix to make a weighted
visualization of the network using co-membership as the weights.
The CoNet%e%'weight'
code chunk is similar to igraph’s
E(g)$attribute
function.
CoNet <- network(comemb, matrix.type="adjacency",
directed=FALSE,
ignore.eval=FALSE, # Keep edge weights
names.eval="weight") # names.eval stores edge weights
# as edge attribute named "weight".
CoWeights <- as.sociomatrix(CoNet,"weight")
gplot(net,coord=coords,
usearrows=FALSE,
displaylabels=TRUE,
edge.lwd=CoNet%e%'weight'*2) # Use the edge attribute, "weight" (multiplied by 2
# to emphasize differences).
Last, to get a better idea of node similarity, we can use a dendrogram to visualize the similarity by co-membership. We generally use hierarchical clustering to establish a “hierarchy” of similarities.
hc <- hclust(as.dist(1/(CoWeights+1))) # Hierarchal clustering by co-membership
plot(hc) # Plot the dendrogram
# Optional:
rect.hclust(hc, h=0.6) # Plot a cutoff point
# Use the rect.hclust() to make the pattern easier to see.
We can find k-cores with both igraph and statnet.
coreness(g)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 8 8 8 8 6 6 6 8 8 4 6 2 4 8 4 4 4 4 4 6 4 4 4 6 6 6
## 27 28 29 30 31 32 33 34
## 4 6 6 6 8 6 8 8
As you can see, the coreness function produces a table listing to which core each node belongs. Vertex names are on the top row of the table, and core classifications are on the bottom row.
We can use this to plot the network.
kcore <- coreness(g) # Extract k-cores as a data object.
V(g)$core <- kcore # Add the cores as a vertex attribute
plot.igraph(g, vertex.color=V(g)$core) # plot
If you would like to extract one or more of the cores as a subnetwork, then first consider the values of the various cores. Remember that k-cores are assigned according to degree within each core. That means that the densest core will have the greatest values.
In this case, let’s extract the densest core. Then, we can relax our criterion a little and extract a slightly larger subset.
To start, consider the values of the various cores.
kcore
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 8 8 8 8 6 6 6 8 8 4 6 2 4 8 4 4 4 4 4 6 4 4 4 6 6 6
## 27 28 29 30 31 32 33 34
## 4 6 6 6 8 6 8 8
It may also help to see how many nodes are included in each core. (Keep in mind that smaller valued cores contain all larger valued cores.)
table(kcore)
## kcore
## 2 4 6 8
## 1 11 12 10
As you can see, the greatest value is eight (8), so all nodes with a k-core value of eight (the 8-core) will be in the densest portion of the network.
We can extract the 8-core(s) from the network using the
induced_subgraph()
function. The arguments you will need
are the name of the network, from which you are extracting some nodes
and their accompanying ties, and a list of the nodes you wish to
extract. In this case we can use the information contained in the
“kcore” object that we made when we ran the k-cores algorithm.
To extract only the densest core, we’ll use only those nodes that reside in that core (the 8-core).
Some operators you may need for this:
Operator | Meaning |
---|---|
> | Greater than |
< | Less than |
== | Equal to |
>= | Greater than or equal to |
<= | Less than or equal to |
Note that the “==” double equal sign used in the argument. That is
used because R, like other programming languages, uses =
to
indicate assignment, just like “<-
”. The double equal
sign is used to say that we mean “equals” in the mathematical sense.
g8c <- induced_subgraph(g, kcore==8)
To extract a slightly larger and less dense core, we can extract all nodes in the 6-core (there is no 7-core in this network), which also contains the 8-core.
Recall that lower value cores should always include the higher value cores. Because k-cores identify dense regions in the network, they are defined as regions where each node is connected with at least k other nodes. Therefore, everyone in the 8-core also qualifies for membership in the 6-core, since they are connected with at least 6 alters in the core.
g6c <- induced_subgraph(g, kcore>=6) # The value of the k-core is greather than
# or equal to 6
Feel free to plot either, or both.
plot(g8c)
plot(g6c)
Running k-cores in statnet is not all that involved.
Keep in mind, this is a function of relative density. So, degree within the core is what is being calculated.
kcores(net)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 8 8 8 8 6 6 6 8 8 4 6 2 4 8 4 4 4 4 4 6 4 4 4 6 6 6
## 27 28 29 30 31 32 33 34
## 4 6 6 6 8 6 8 8
For both igraph and statnet, we can clearly see that there are four cores. There is one node in the two core, there are eleven nodes in the four core, twelve nodes in the six core, and there are ten nodes in the eight core.
As with igraph, we can use this information to plot. The difference is, we do not need to make the cores into a network attribute first.
set.seed(18675309)
kco <- kcores(net)
gplot(net,
vertex.col=kco,
usearrows=FALSE,
displaylabels=TRUE)
If you consider the values of each of the cores, you will note that the 8-core is the densest region. We can extract that region for further analysis.
net_kc <- net[kco>7, kco>7]
net_kc
## 1 2 3 4 8 9 14 31 33 34
## 1 0 0 1 1 1 1 1 0 0 0
## 2 1 0 1 1 1 0 1 1 0 0
## 3 1 1 0 1 1 1 1 0 1 0
## 4 1 1 1 0 1 0 1 0 0 0
## 8 1 1 1 1 0 0 0 0 0 0
## 9 1 0 1 0 0 0 0 1 1 1
## 14 1 1 1 1 0 0 0 0 0 1
## 31 0 1 0 0 0 1 0 0 1 1
## 33 0 0 1 0 0 1 0 1 0 1
## 34 0 0 0 0 0 1 1 1 1 0
We can also plot just that region.
gplot(net[kco>7, kco>7],
usearrows=FALSE,
displaylabels=TRUE)
We can also extract the region as a subnetwork for further analysis as we did in igraph.
net8c <- get.inducedSubgraph(net, which(kco==8))
plot(net8c, displaylabels=TRUE)
For this and all the following, we’ll stay in igraph.
Note: you may need to convert a directed graph to undirected for some of these. If you receive a warning about directed networks, use the following to make the network undirected.
g <- as.undirected(g, mode="collapse")
Now, run the edge betweenness algorithm and then save the membership as a node attribute in the network.
eb <- cluster_edge_betweenness(g)
You have a few alternatives that you can use to check to see which
nodes are in which groups. Here, we’ll present some of the built-in
options. Consider that the cluster_edge_betweenness
function produces a lot of output that you will not immediately see
until it is unpacked. To take a look at what is produced, examine the
object that we just created to store the algorithm’s output:
ls(eb)
. This will list the contents of the object that you
just made called “eb
.”
A lot of information is packed into the eb
object. The
igraph package can unpack these for you. For a list of the, consult
?communities
. Here, we’ll use a few of these built-in
functions to see how the nodes in the network were grouped by the edge
betweenness algorithm.
For an overview of which nodes are in which group, use the
membership
function.
membership(eb)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 2 1 3 3 3 1 4 5 3 1 1 1 4 4 3 1 4 1 4 1 4 4 2 2
## 27 28 29 30 31 32 33 34
## 4 2 2 4 4 2 4 4
You can store this information in the network as a vertex attribute. Below, we’ll call the attribute “groups.” Though, if you are going to store the results of a few different conmmunity structure algorithms, it is a good idea to use a more specific attribute name, such as “groups_eb” so that you can tell them apart later.
V(g)$group <- membership(eb) # assign membership to vertices
You can also count the number of nodes in each group (community) using:
sizes(eb)
## Community sizes
## 1 2 3 4 5
## 10 6 5 12 1
Looking at the groupings, above, it is apparent that groups one (1) and four (4) have the most members. If you like, you can extract each.
When doing this, remember that you added the membership to the network object as a vertex attribute names “group.” We can use that to extract each group.
g_eb1 <- induced_subgraph(g, V(g)$group==1) # extract nodes in group 1
g_eb4 <- induced_subgraph(g, V(g)$group==4) # extract nodes in group 4
plot(g_eb1)
plot(g_eb4)
To view the groups in a visualization, you will need to first supply the object with the grouping information, and then the network to which the groups belong.
Here, we are also adding a title (main=
), labeling the
vertices by their index number (vertex.label=
), and
specifying which layout we want to use (layout=
). If you
are just taking a quick look, then plot(eb, g)
will
work.
plot(eb, g,
vertex.label = V(g)$id, # Uses index number instead of name
layout = layout_with_kk,
main="Max Modularity Solution")
# This overlays with a community polygon.
# internal edges = dark green
# external edges = red
This solution looks a little odd to me. This is especially true if you consider that node 10 is in its own group. To help decide what may be a better cut, we can use a dendrogram to get a better idea of who is in which group each time the network was parsed.
By default, the edge betweenness function will produce a group structure that corresponds with the greatest modularity value. However, the greatest modularity value is not always the best representation of how the network should be parsed.
To get a quick look at the various ways that the algorithm could
divide the network, use a dendrogram. In the dendrogram, the
mode="hclust"
argument allows us to use the
rect=2
argument to help identify what a two group solution
would look like. Try changing the number of groups to see what the
various solutions would look like.
plot_dendrogram(eb,
mode="hclust",
rect=2)
Read the dendrogram from top to bottom. If you were to delete the top
horizontal line, then you would separate the network into two groups,
like we have identified using the rect=2
argument. The next
cut (remove the next horizontal line in your mind) produced three
groups, but node ten was then on its own as a group unto itself.
The dendrogram should give you a little better idea of how the network may be parsed and the membership of each group as it is parsed using a particular algorithm. In this case, we can see how the Karate Club network would divide into two groups using edge betweenness.
If, you would rather seek a two community solution rather than the solution that is identified as “optimal” using modularity (the default for edge betweenness and other algorithms), you may specify that - for any of the following algorithms. The example below uses edge betweenness, but you can substitute any other community detection algorithm that uses modularity.
V(g)$group <- membership(eb)
# Change the number of communities in the solution.
eb$membership <- cut_at(eb, 2) # 2 communities
plot(eb, g,
vertex.label = V(g)$id, # Uses index number instead of name
layout = layout_with_kk,
main="Two-Community Solution")
You may incorporate the above in the remaining algorithms on this
page.
sg <- cluster_spinglass(g)
plot(sg, g)
# Save your work in a data frame.
SG.Communities <- data.frame(sg$membership, sg$names)
head(SG.Communities ) # Take a peek
## sg.membership sg.names
## 1 2 1
## 2 2 2
## 3 2 3
## 4 2 4
## 5 4 5
## 6 4 6
Spinglass will allow the user to specify the maximum number of
communites using the spins
argument. To see the two
community solution for spinglass, set spins
equal to 2.
sg_2 <- cluster_spinglass(g, spins=2)
plot(sg_2, g,
vertex.label = V(g)$id, # Uses index number instead of name
layout = layout_with_kk,
main="Two-Community Spinglass Solution")
sg_results_members<- list()
sg_results_modularity <- c()
for (i in 1:5) {
set.seed(i)
sg <- cluster_spinglass(g)
sg_membership <- list(sg$names)
sg_names <- list(sg$membership)
sg_mod <- list(cluster_spinglass(g)$modularity)
sg_results_members[i] <- sg_membership
sg_results_modularity <- c(sg_results_modularity, sg_mod)
}
sg_results_modularity %>% unlist() %>% mean()
sg_results_modularity %>% unlist() %>% max()
sg_results_modularity %>% unlist() %>% min()
sg_results_modularity %>% unlist() %>% median()
best_sg <- which(max(unlist(sg_results_modularity)) %in% sg_results_modularity)
spins <- data.frame(sg_membership[best_sg],
sg_names[best_sg]) %>%
`colnames<-`(c("vertex", "group"))
head(spins) # Take a peek
lvc <- cluster_louvain(g)
sizes(lvc)
## Community sizes
## 1 2 3 4
## 12 5 11 6
Louv <- data.frame(lvc$membership,
lvc$names)
head(Louv)
## lvc.membership lvc.names
## 1 1 1
## 2 1 2
## 3 1 3
## 4 1 4
## 5 2 5
## 6 2 6
plot(lvc, g, vertex.label=V(g)$id)
gBlocks <- cohesive_blocks(g)
blocks(gBlocks)
## [[1]]
## + 34/34 vertices, named, from 60c2cd2:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34
##
## [[2]]
## + 28/34 vertices, named, from 60c2cd2:
## [1] 1 2 3 4 8 9 10 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31
## [26] 32 33 34
##
## [[3]]
## + 6/34 vertices, named, from 60c2cd2:
## [1] 1 5 6 7 11 17
##
## [[4]]
## + 5/34 vertices, named, from 60c2cd2:
## [1] 1 2 3 4 8
##
## [[5]]
## + 7/34 vertices, named, from 60c2cd2:
## [1] 1 2 3 9 31 33 34
##
## [[6]]
## + 5/34 vertices, named, from 60c2cd2:
## [1] 1 5 6 7 11
##
## [[7]]
## + 5/34 vertices, named, from 60c2cd2:
## [1] 1 2 3 4 14
##
## [[8]]
## + 10/34 vertices, named, from 60c2cd2:
## [1] 3 24 25 26 28 29 30 32 33 34
igraph::hierarchy(gBlocks)
## IGRAPH 60fe8be D--- 8 7 --
## + edges from 60fe8be:
## [1] 1->2 1->3 2->4 2->5 3->6 2->7 2->8
parent(gBlocks)
## [1] 0 1 1 2 2 3 2 2
cohesion(gBlocks)
## [1] 1 2 2 4 3 3 4 3
max_cohesion(gBlocks)
## [1] 4 4 4 4 3 3 3 4 3 2 3 1 2 4 2 2 2 2 2 2 2 2 2 3 3 3 2 3 3 3 3 3 3 3
graphs_from_cohesive_blocks(gBlocks, g)
## [[1]]
## IGRAPH 4938e9d UN-- 34 78 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 4938e9d (vertex names):
## [1] 1 --2 1 --3 2 --3 1 --4 2 --4 3 --4 1 --5 1 --6 1 --7 5 --7
## [11] 6 --7 1 --8 2 --8 3 --8 4 --8 1 --9 3 --9 3 --10 1 --11 5 --11
## [21] 6 --11 1 --12 1 --13 4 --13 1 --14 2 --14 3 --14 4 --14 6 --17 7 --17
## [31] 1 --18 2 --18 1 --20 2 --20 1 --22 2 --22 24--26 25--26 3 --28 24--28
## [41] 25--28 3 --29 24--30 27--30 2 --31 9 --31 1 --32 25--32 26--32 29--32
## [51] 3 --33 9 --33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33 9 --34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34
##
## [[2]]
## IGRAPH 9fbfd61 UN-- 28 67 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 9fbfd61 (vertex names):
## [1] 1 --2 1 --3 2 --3 1 --4 2 --4 3 --4 1 --8 2 --8 3 --8 4 --8
## [11] 1 --9 3 --9 3 --10 1 --13 4 --13 1 --14 2 --14 3 --14 4 --14 1 --18
## [21] 2 --18 1 --20 2 --20 1 --22 2 --22 24--26 25--26 3 --28 24--28 25--28
## [31] 3 --29 24--30 27--30 2 --31 9 --31 1 --32 25--32 26--32 29--32 3 --33
## [41] 9 --33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33 32--33
## [51] 9 --34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34 24--34
## [61] 27--34 28--34 29--34 30--34 31--34 32--34 33--34
##
## [[3]]
## IGRAPH 445db92 UN-- 6 10 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 445db92 (vertex names):
## [1] 1--5 1--6 1--7 5--7 6--7 1--11 5--11 6--11 6--17 7--17
##
## [[4]]
## IGRAPH 7f18be2 UN-- 5 10 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 7f18be2 (vertex names):
## [1] 1--2 1--3 2--3 1--4 2--4 3--4 1--8 2--8 3--8 4--8
##
## [[5]]
## IGRAPH 25aaa17 UN-- 7 13 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 25aaa17 (vertex names):
## [1] 1 --2 1 --3 2 --3 1 --9 3 --9 2 --31 9 --31 3 --33 9 --33 31--33
## [11] 9 --34 31--34 33--34
##
## [[6]]
## IGRAPH dbe38dd UN-- 5 8 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from dbe38dd (vertex names):
## [1] 1--5 1--6 1--7 5--7 6--7 1--11 5--11 6--11
##
## [[7]]
## IGRAPH 94e5950 UN-- 5 10 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from 94e5950 (vertex names):
## [1] 1--2 1--3 2--3 1--4 2--4 3--4 1--14 2--14 3--14 4--14
##
## [[8]]
## IGRAPH ad628e3 UN-- 10 20 --
## + attr: name (v/c), core (v/n), group (v/n)
## + edges from ad628e3 (vertex names):
## [1] 24--26 25--26 3 --28 24--28 25--28 3 --29 24--30 25--32 26--32 29--32
## [11] 3 --33 24--33 30--33 32--33 24--34 28--34 29--34 30--34 32--34 33--34
plot(gBlocks, g,
layout = layout_with_kk)
plot_hierarchy(gBlocks)
eb <- cluster_edge_betweenness(g)
algorithm(eb) # algorithm used to create community object
## [1] "edge betweenness"
is_hierarchical(eb) # hierarchical?
## [1] TRUE
sizes(eb) # community szes
## Community sizes
## 1 2 3 4 5
## 10 6 5 12 1
compare(eb, lvc) # compare community objects
## [1] 0.4600528
as.dendrogram(eb) # converts to a dendrogram object
## 'dendrogram' with 2 branches and 34 members total, at height 33
plot_dendrogram(eb)
plot_dendrogram(eb, mode = "hclust")
data.frame(eb$membership, # which vertex in which group
eb$names)
## eb.membership eb.names
## 1 1 1
## 2 1 2
## 3 2 3
## 4 1 4
## 5 3 5
## 6 3 6
## 7 3 7
## 8 1 8
## 9 4 9
## 10 5 10
## 11 3 11
## 12 1 12
## 13 1 13
## 14 1 14
## 15 4 15
## 16 4 16
## 17 3 17
## 18 1 18
## 19 4 19
## 20 1 20
## 21 4 21
## 22 1 22
## 23 4 23
## 24 4 24
## 25 2 25
## 26 2 26
## 27 4 27
## 28 2 28
## 29 2 29
## 30 4 30
## 31 4 31
## 32 2 32
## 33 4 33
## 34 4 34
This should give you a fair start at what is possible in finding and defining cohesive subgroups in R. Consult your notes of chapter 5 of the text to refresh your memory on why and how to use each approach.
You have two choices for this practicum. You may either use the
“talked to” network from the Death of Stalin; or, if you are up for a
challenge, you may use a network of first responders in the hurricane
Katrina disaster. The hurricane Katrina network was collected using
secondary sources, well after the event. You
can read the article on the topic here: https://www.cmu.edu/joss/content/articles/volume13/ButtsActonMarcum.pdf
+ To download the data, once again, go to the Network
Data link on the course website. You will find the edgelist
data (KatrinaResponse.csv) on the Network
Data page of the website.
Use the network of your choice to do the following:
Remember: This should be an undirected network.
Good luck!