As with any analytic technique, network analysis has measures that are descriptive in nature. These measures are similar to descriptive measures in statistics, and operate at two levels: global and local. Global measures are designed to summarize the state of an entire network. Such measures include the size, order, or diameter of the network, or provide insight into the general topography using measures to represent the clusterability, fragmentation, or other aspects of cohesion within the network.
The goal of this practicum is to use global measures to compare two networks. You will be comparing networks of students - your colleagues - at two points in time. The first time point (time 0) was elicited when they began the semester, and the second (time 2) was elicited near the end of the semester. In order to capture the nature of these students’ familiarity with one anther, we will focus on a network that is comprised of students reporting whose input resonated with them.
To download the data, go to the Course Data link on the canvas site and look for Sprintensive.zip. Do not forget to unzip the folder!
Note: The data are stored in a zip file. You have to unzip the zip file before you can open it to use the data. You cannot upload a zip file into R and have it work. (Well, you can. But it is a lot more difficult and I am not covering how to work within a zipped file here.)
Interpret the measures below using the readings and your notes. You should not expect that every measure will be different from time 0 to time 2. In some cases, there may be no change. In others, we will expect to see a change over time.
Create a folder labeled “Practicum 3” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:
Although it is not required, it can be helpful to follow along with this tutorial using the example data that we are using.
For the purpose of the example, I will use a classic network: Padgett’s “Florentine Families”. This network is derived from John Padgett’s work in researching the rise of the Medici family in a scenario where powerful families were competing for political power in Florence, Italy. The two dominant families were the Medicis and the Strozzis.
If you wish to know more, you can read the original manuscript that he submitted for publication
If you would like to use the Padgett networks to follow along with this tutorial, you can download it from the same place that you downloaded the Sprintensive data. Look for “Padgett_business.net” and “Padgett_business.net”.
Now, let’s get started.
To begin, start igraph in R.
library(igraph)
There are a lot of options for loading data into igraph. This time, we will be using a “.net” file that was developed for a program called Pajek. The .net format is an extension of an edgelist. But, it has the ability to incorporate a much wider variety of information than a more traditional edgelist.
Feel free to research this further. But, for now, just know that the data are in .net format. So, that is how we are loading it this time.
g1 <- read_graph(file.choose(), format="pajek") # Business Ties
g2 <- read_graph(file.choose(), format="pajek") # Marriage Ties
# Then, take a look at each.
plot(g1)
plot(g2)
In this example, network on the left (g1) represents the trading ties between Florentine families, and the network on the right (g2) represents the marriage ties.
Topographical measures, for the most part, each use just one number to summarize some aspect of a network. They are meant as a shorthand tool that provides better perspective about a network or grounds for comparison between two or more networks.
There are a number of means for measuring such generalized aspects of networks. Here, we divide topographical measures into five general classifications:
Let’s start with the most common and conceptually basic summaries that describe entire networks.
The size and order of the networks are built into the first line of the description.
# To make it easier to follow, I'm identifying each network as
# either "# Business", or "# Marriage". If you are running
# your own analysis, then you should change the tags to
# reflect what each of the networks is, or just leave them
# off completely.
#
g1 # Business
## IGRAPH 69d17af UNW- 16 15 --
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), label.cex
## | (v/n), weight (e/n)
## + edges from 69d17af (vertex names):
## [1] Barbadori --Castellani Barbadori --Ginori
## [3] Barbadori --Medici Barbadori --Peruzzi
## [5] Bischeri --Guadagni Bischeri --Lamberteschi
## [7] Bischeri --Peruzzi Castellani --Lamberteschi
## [9] Castellani --Peruzzi Ginori --Medici
## [11] Guadagni --Lamberteschi Lamberteschi--Peruzzi
## [13] Medici --Pazzi Medici --Salviati
## + ... omitted several edges
g2 # Marriage
## IGRAPH b7ad9f2 UNW- 16 20 --
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), label.cex
## | (v/n), weight (e/n)
## + edges from b7ad9f2 (vertex names):
## [1] Acciaiuoli--Medici Albizzi --Ginori Albizzi --Guadagni
## [4] Albizzi --Medici Barbadori --Castellani Barbadori --Medici
## [7] Bischeri --Guadagni Bischeri --Peruzzi Bischeri --Strozzi
## [10] Castellani--Peruzzi Castellani--Strozzi Guadagni --Lamberteschi
## [13] Guadagni --Tornabuoni Medici --Ridolfi Medici --Salviati
## [16] Medici --Tornabuoni Pazzi --Salviati Peruzzi --Strozzi
## [19] Ridolfi --Strozzi Ridolfi --Tornabuoni
Looking at the first line, you will see “IGRAPH (some number) UNW- 16 15 –”. The last two numbers are the size and order of the network, respectively.
In the Florentine families example, we can see that each network has the same number of nodes (families), the the marriage network has 20 ties, whereas the business network has only 15 ties.
What is the length of the longest geodesic?
diameter(g1) # Business
## [1] 5
diameter(g2) # Marriage
## [1] 5
It is also interesting to see that, although the densities of each network were different, the diameter of these networks is the same.
If you consider the diameter of the network to reflect the path of greatest inefficiency in the network, then what do you make of this?
What is the average geodesic distance in the network?
average.path.length(g1) # Business
## [1] 2.381818
average.path.length(g2) # Marriage
## [1] 2.485714
Here, we can see that distances in the business network are slightly shorter than those in the marriage network. This should have implications for interpreting which of the two networks has more structural impediments, or, conversely, which network has more
Generally speaking, how do you prefer to sum up the way a network is connected? Here are some options.
How densely connected are the nodes in these networks?
edge_density(g1) # Business
## [1] 0.125
edge_density(g2) # Marriage
## [1] 0.1666667
Density is greater in the Marriage network. So, the Florence’s great families were more tightly tied together through marriage ties than they are through business ties.
How many ties do the nodes in these networks have, on average?
mean(degree(g1)) # Business
## [1] 1.875
mean(degree(g2)) # Marriage
## [1] 2.5
Average degree is analogous to density, but can be more meaningful when comparing networks of different size. These networks are the same size, but the output can be a little more contextual. Here, we can see that Florentine families had, on average, 1.9 business ties and 2.5 marriage ties.
Both, density, and average degree agree on this aspect. The marriage ties appear to reflect greater Interdependency among the families. Each of the two measures above say this, but in different ways.
Is the network generally more spaced out or bunched up?
Igraph has no native function for calculating this measure. So, we can get around that by writing our own function.
In their basic form, functions are built like this:
Name_of_New_Function <- function(arguments){
some_value <- some_process_or_set_of_functions(arguments)
return(some_value)
}
What you see above are the essential building blocks of a function in R. This is essentially how most of the other functions that you are using on this page were originally built.
To start, you give your new function a name. Above, that name is “Name_of_New_Function”. Next, you will tell the function what sort of inputs to expect. Those are called “arguments” and you will see that they appear inside of the function as well.
The stuff between the curly braces is all of the things that we will do to process the inputs (arguments). The last thing that we do inside the curly braces is to tell the function what to spit out for us to see. That is the return() that you see in the last line.
Now, putting this knowledge to use, we can create a function that will run through the steps that we need to do in order to calculate compactness. For more information about this, take another look at the sample chapter in our reading for the week. That will explain what is going on here. But, in the meantime, feel free to cut and paste the following into R.
Compactness <- function(g) {
gra.geo <- distances(g) ## get geodesics
gra.rdist <- 1/gra.geo ## get reciprocal of geodesics
diag(gra.rdist) <- NA ## assign NA to diagonal
gra.rdist[gra.rdist == Inf] <- 0 ## replace infinity with 0
# Compactness = mean of reciprocal distances
comp.igph <- mean(gra.rdist, na.rm=TRUE)
return(comp.igph)
}
Now, you can run the new Compactness() function as follows. If you are using RStudio, you will notice that it will already be suggesting the name of the new function as you type it in.
Compactness(g1) # Business
## [1] 0.2522222
Compactness(g2) # Marriage
## [1] 0.4376389
Lower values reflect that distances, on average, are shorter (more compact). This may seem odd at first, since the business ties are much more compact than the marriage ties. But, if you look at the visualizations above, then you may notice that there are five families in the network that are not connected to any others. The distances are, therefore, incalculable. So, by treating those distances as zero, we are actually tricking ourselves. To do this correctly, consider removing isolates from the network and then running the function.
g1_noi <- delete.vertices(g1, which(degree(g1)<1))
Compactness(g1_noi) # Business
## [1] 0.550303
g2_noi <- delete.vertices(g2, which(degree(g2)<1))
Compactness(g2_noi) # Marriage
## [1] 0.5001587
Now, we can see that the business network (or the connected part at least) is less compact than the marriage network. For a better idea of why this is, plot the networks without their isolates (plot(g1_noi) and plot(g2_noi)) for a visual representation.
For now, we will skip actually analyzing reciprocity in these networks since each of them is undirected. I can, however, demonstrate what it looks like to run reciprocity.
igraph offers two measures of reciprocity, each with a slightly different meaning. Feel free to check into the difference between these options (?reciprocity). But, for the purpose of this course, we will be using the “classic” measure of reciprocity as a proportion of ties within a network that are mutual. This allows us to compare igraph output with statnet output, which we will be using later in the course. (statnet offers no version of reciprocity that is equivalent to igraph’s default.)
reciprocity(g, mode="ratio")
If you choose not to use the mode="ratio" argument, then you will receive a very different-looking measure. Look carefully at the output and help file when you choose which version to use.
To what degree could we expect these networks to easily break down into subgroups?
transitivity(g1) # Business
## [1] 0.4166667
transitivity(g2) # Marriage
## [1] 0.1914894
Look again at the visualizations above. How many triangles do you see in each? The triangles are cliques (or indications of the presence of cliques), where all three nodes are tied to one another. The more “cliquey” of the two networks is the business network, indicating that it is more likely to be able to be broken down into subgroups of factions.
Again, the Padgett data are presented here for the purpose of illustration and some small amount of context. For a much more detailed interpretation of the above measures, you must still read the chapter and consider what the measure is designed to convey.
Your work will be in a very different context. You are being asked to compare the same group at two points in time. See the instructions below for more on that.
There are two means for understanding how dominated a network is by one or a few nodes. Each of them considers average differences. But, the comparisons make for very different measures.
Is the network dominated by one, or a few, central actors?
centr_degree(g1)$centralization # Business
centr_betw(g1)$centralization
centr_clo(g1)$centralization
centr_eigen(g1)$centralization
centr_degree(g2)$centralization # Marriage
centr_betw(g2)$centralization
centr_clo(g2)$centralization
centr_eigen(g2)$centralization
That is a lot of measures, and they are provided for example only. It is strongly recommended that you not just run a bunch of measures like this. Rather, take your time to select the measure that best expresses what it is you are hoping to understand.
For example, if we are considering something like the Medici network, then we are interested in general access to power. For that reason, I would evaluate the network for how dominated it is in terms of access to power. Eigenvector centrality provides us with that perspective, so we can use eigenvector centralization as our comparison measure.
centr_eigen(g1)$centralization # Business
## [1] 0.7004496
centr_eigen(g2)$centralization # Marriage
## [1] 0.5614437
Looking at the output, it is apparent that the network of business ties is clearly more dominated by a central node (or nodes) in terms of access to power.
As with centralization, we are asking whether the network is dominated by one or a few actors.
sd(degree(g1)) # Business
sd(betweenness(g1))
sd(closeness(g1))
sd(evcent(g1)$vector)
sd(degree(g2)) # Marriage
sd(betweenness(g2))
sd(closeness(g2))
sd(evcent(g2)$vector)
As above, we will just run one of the above measures, presumably after thinking this through and selecting the one that describes a feature of the network that we wish to compare.
For the sake of comparison, we are using eigenvector centrality.
sd(evcent(g1)$vector) # Business
## [1] 0.2980252
sd(evcent(g2)$vector) # Marriage
## [1] 0.2796051
Again, we can note that the business network is more dominated than the marriage network; though, the magnitude of the difference is much smaller. This is because centralization compares each centrality value to the largest value and standard deviation compares each value to the average. Think about this for a moment and decide which of these approaches makes the most sense to you before selecting which one you prefer.
With networks, it is often interesting to ask why the ties were formed in the manner that they were formed. One potential answer could be that people formed ties with those similar to themselves (homophily), or that they formed ties with those who were most different (heterophily). We have two ways of measuring this. For categorical data, we use the E-I Index. For continuous data, we use measures of autocorrelation. Unfortunately, at present, neither igraph, nor statnet includes the E-I Index and neither includesigraph` does not include measures of (continuous) autocorrelation.
All is not lost, however. Your former colleague, Brendan Knapp has written a function that we can use to calculate the E-I Index when using either igraph or statnet.
If you have data with an attribute, you will be able to check whether people tend to operate within their own attribute, or tend to mix between attributes. We do this with the E-I Index (among other tools).
Before you run this calculation, however, you will have to import the package to do so.
If you are reading the text, Understanding Dark Networks: A Strategic Framework of the use of Social Network Analysis, you may notice that there is a major typo on page 104 that misspecifies the equation for the E-I Index. The actual equation is: \[E-I\ Index = \frac{E-I}{E+I}\]
The E-I Index is not common to many R packages, and it is not as simple as it seems to program. To make your life simpler, it is necessary to first install a package called homophily, which includes a function called ei_index. The package is written and maintained by Brendan Knapp.
The catch is that homophily is only available through Git Hub, so you cannot load it in the conventional way. Git Hub is a repository for open-source software, like R packages in development and similar stuff. Follow these steps to install homophily:
install.packages("devtools")
library(devtools)
install_github("knapply/homophily")
Once the package is installed, open isnar.
library(homophily)
The generic method for using the E-I Index in homophily is ei_index(g, node_attr_name="attribute"), where “g” is the igraph object, with a qualitative attribute (“attribute”) assigned to each of the vertices.
If you wish to use Brendan’s example, enter the following:
ei_index(jemmah_islamiyah, node_attr_name = "role")
## [1] 0.3650794
To use the example from page 104 of the Disrupting Dark Networks book using the Noordin data, you will use the Noordin combined network and add the Noordin membership attribute (labeled as “NordNet” here). Go to the Network Data link on the website. You are looking for:
Noordin 139 - Combined Network-Aggregated.net
Noordin Membership Attribute.csv
Load each, in that order, and then use the following scripts to remove isolates.
g <- read_graph(file.choose(), format="pajek") # "Noordin 139 - Combined Network-Aggregated.net"
mem.att <- read.csv(file.choose(), header=TRUE) # "Noordin Membership Attribute.csv"
V(g)$NordNet <- as.character(mem.att$member[match(V(g)$name, mem.att$ID)])
g <- simplify(g) # For some reason, the .net file doubles the edges. This binarizes.
iso <- V(g)[degree(g)==0] # Identify isolates
g <- delete.vertices(g, iso)
With the isolates removed, you can add a few fancy aspects to the network data. First, set the size that you want the visualized nodes to have using V(g)$size=5. Next, color the nodes according to the group to which they belong, using the final three lines of the script below.
V(g)$size=5
V(g)$color=V(g)$NordNet # Base the color of vertices on the membership attribute
V(g)$color=gsub("1","red",V(g)$color) # Specify each of the colors you wish to use
V(g)$color=gsub("0","yellow",V(g)$color)
Once you have completed that, you should be able to visualize something like this:
# plot the Noordin network
plot(g, layout=layout.fruchterman.reingold, vertex.label=NA)
To Get the E-I Index for the Noordin network, enter the following:
ei_index(g, node_attr_name="NordNet")
## [1] -0.4220532
The networks that you are being asked to evaluate are colleagues of yours who have participated in “Sprintensive” - an intense four-quarter semester with one cohort of students taking all four courses together as a group. We suspected that the conditions that a program of this sort would produce increased ties between participating students, so we used network analysis to test our suspicion.
Students were asked at the beginning of the semester to indicate colleagues to whom they felt a connection on multiple levels (acquaintances, friends, close friends, sit near, resonate). One of these levels - “Whose ideas most frequently resonate with you?” - may be taken to indicate a certain tendency for either feelings of closeness, or possibly an ideological or cognitive convergence. You are free to decide what you consider the “resonate” question to indicate about the members of the network. But you will need to interpret the changes over time in that context.
The full text of the question was:
Whose ideas most frequently resonate with you?
Please choose all that apply. (Think about whose comments most often feel comfortable, familiar, or otherwise similar to your own thoughts and ideas.)
Submit a google document with the following to satisfy the terms of this practicum:
Please note, that the boxes in the table are included for your reference. The only lines that should appear in your table are the three bold horizontal lines.
Good luck!