Centrality in igraph - for the most part

The purpose of this exercise is to further acquaint you with entering data, importing large amounts of data, and manipulating it into a usable form.

One of the most fundamental (and critical) skills in network analysis, is that of gathering data and actually getting it into the programs that you will be using to analyze them. Although this practicum does not cover every aspect of data collection and formatting, it is designed to provide some fairly universal skills in extracting, entering, and analyzing data.

Practicum 2 introduced some of the fundamentals of getting network data loaded into R. Though, that particular exercise involved a small - no more than ten nodes - network, it still gave you all a chance to see some of the many challenges that can arise when loading network data into R. Now that you have a little practical experience, we’ll use some of your new skills and involve a bit more of a challenge. We will be loading a much larger network and running a few diagnostics to give us a summary description and visualization.

Because we are still in the beginning stages of working with network data in R, you will see a lot of repetition between this practicum and practicum 2. That is intentional. Practicum 2 was more of a confidence builder, and most people found it easiest to copy and paste the scripts directly into R. This time, we’ll use an example network to illustrate your objective and then provide instructions on where to find the data you will be using to complete this practicum.

You are welcome to help one another with this. But I expect everyone to do their own work. If you do not do your own work on at least some level, then these lessons will be much more difficult for you to use later.

Getting Started with the Example Data

This part is important. Please be sure to do this part again:

Create a folder labeled “Practicum 4” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:

Session
- Set Working Directory
- Choose Directory…

Although it is not required, it can be helpful to follow along with this tutorial using the example data that we are using. Brendan Knapp has converted a classic network (Zachary’s “Karate Club” network) into an edgelist, and I have added some changes to give us a chance to demonstrate some simple data cleaning and manipulation.

We will use the modified version of Zachary’s Karate Club network to walk though the process you will be using to load and clean the edgelist data. The changes that I made will give us something to fix once the data are loaded.

The edgelist can be found at goo.gl/dXmeMK. If you wish to use the Karate Club network to follow along with the tutorial, download it as a CSV file and save it to the folder that you hopefully just created when you followed the directions, above.

Loading the CSV File

Because you are working with what will ultimately be network data, you should first go ahead and load a network analysis package. We are learning igraph first in this course, since it is somewhat better suited to visualization and has a wide range of available analyses.

So, go ahead and load the igraph package to get started.

library(igraph)

With igraph loaded, go ahead and read in the CSV.

Again, just to be clear, we are using Zachary’s “Karate Club” network as an example. The output you see from here on will be for that network. The network you are using for this practicum will look considerably different.

NetworkEL <- read.csv(file.choose(), header=TRUE, na.strings="")
        # na.strings="" tells R that blank cells are missing values.

Troubleshooting: Tools for Working with Edgelists

Sorting and Inspecting

The head() function used here prints the top portion of NetworkEL to the console. Using head() instead of directly calling the variable is a good habit to practice to avoid printing large objects to the console. tail() is also available to print the bottom portion of NetworkEL.

With that, you should inspect your data. Start by looking at how many rows and columns are in the data.

dim(NetworkEL)

## [1] 82  3

The resulting output is listed as: [object number] row column. In this case, we can see that there are 82 rows and 3 columns in the Karate Club data. That could be a problem, since we need only two columns for an edgelist.

The best way to see what is going on in the data is to take an actual look at what you just imported. You have to be careful, though. For really large data sets like the one that you will be working with later, the data can quickly fill up the screen. To keep things a little more under control, try using the head() and tail() functions. They allow you to view just the first six and last six observations, respectively.

head(NetworkEL)

##   V1 V2  X
## 1  1  2 NA
## 2  1  3 NA
## 3  1  4 NA
## 4  1  5 NA
## 5  1  6 NA
## 6  1  7 NA

The output indicate that we have three columns: “V1”, “V2”, and “X”. There is also a column of row numbers. The “X” column was probably caused by a stray space or other hidden character in the spreadsheet and may cause us some trouble if we try to convert this to a network as it is now. Issues like these are common and easy to fix.

If you see more than two columns, try this:

NetworkEL <- NetworkEL[ , c(1,2)]   # Re-save the edgelist using only columns 1 and 2

The data are presently arranged in order of the first column, representing the sender (ego). To re-sort the data alphabetically (or, in this case, numerically) by the second column (a.k.a, the receiver, or alter) for a different perspective.

NetworkEL <- NetworkEL[order(NetworkEL[2]),]

head(NetworkEL)

##    V1 V2
## 1   1  2
## 2   1  3
## 17  2  3
## 18 NA  3
## 3   1  4
## 19  2  4

The output above is the edgelist that we’ll be converting to a network. In the first column, the row numbers show that the rows are now reordered. Each row represents the edge that exists between the sender (ego) and the receiver (alter). But, before we can convert the edgelist to a network, it is a good idea to be sure to check for errors one more time.

When we resorted the data above, it became apparent that there is a missing value in row 18. Missing values are also an issue that should be resolved before converting the data to a network.

Missing Data

When you convert an edgelist to a network in igraph, each ego should have an alter, and visa versa. In other words, for each node sending a tie, there must be one receiving it, and the other way around. That means that we cannot have any missing data in our edgelist.

To check for missings, use the is.na() function. That will return a “TRUE” or “FALSE” value for each time a missing value is present in the data. To make this a little easier on you, just sum up all the times that “TRUE” appears. R treats “TRUE” as 1, and “FALSE” as 0, so is it just a matter of summing up all the NAs.

sum(is.na(NetworkEL))

## [1] 4

This indicates that we have four missing values in the data set.

The quickest way to address this situation is to simply delete all rows with missing values. In this example, I added the missing values for demonstration purposes. In other circumstances, you would probably want to go back and check to see why certain values are missing and replace them so that you do not lose information.

But, this is an example and we don’t have any good reason to keep the rows with missing values. So let’s delete them!

NetworkEL <- na.omit(NetworkEL)
    # Then check for missings again.
sum(is.na(NetworkEL))

## [1] 0

There. Now we have the correct number of columns and there are no longer missing values in the edgelist. It is time to convert the data into a network.

Conversion from Edgelist to Network in `igraph`

An edgelist is one of the more common ways that a graph can be represented, and igraph allows us to construct a graph object directly from an edgelist by using the graph_from_edgelist() function.

With unfamiliar functions, your first step should always be to read the documentation. Take a look at it by running ?graph_from_edgelist, which searches for the documentation in RStudio’s help tab.

Think of what you see as a guide to how to use the function.

The Description section gives you a summary and basic details of the function’s intended use.
The Usage section details how the function is actually defined and the arguments that it expects.
The Arguments section explains what each of the arguments are referencing.

In graph_from_edgelist’s Description, we are told that it expects the edgelist to be a two-column matrix. In R, a matrix is a specific type of object. We can check to see if our NetworkEL is a matrix by running:

class(NetworkEL)

## [1] "data.frame"

We see that NetworkEL is actually a data.frame. While similar in concept, matrices and data frames are treated differently. This is because data frames can contain different types of columns/variables (character, numeric, etc.) while matrices only contain data of the same type.

In order for us to pass our edgelist to graph_from_edgelist(), we need to convert it into a matrix. Let’s create a new variable for the new, converted object. This is accomplished by running:

Network_Matrix <- as.matrix(NetworkEL)

We can then inspect Network_Matrix’s class to see if it works.

class(Network_Matrix)

## [1] "matrix"

head(Network_Matrix)
    # You get the idea...

Returning to the documentation, we see in Usage how arguments are expected. Notice the directed = TRUE portion of the function’s arguments. This tells us that the directed argument is assigned a default in the event that we don’t pass our own argument. What this means is that unless we specify otherwise, graph_from_edgelist() defaults to making a directed graph.

Considering that our data do not represent one party being the source of an edge, and therefore that the edges have a direction, we know that our network is undirected.

Refresher on `igraph` Objects

Now, let’s create our network:

g is the variable to which we are assigning the network
Network_Matrix is the object we created by converting NetworkEL
we are explicitly setting directed = to FALSE:

g <- graph_from_edgelist(Network_Matrix, directed=FALSE)
# Or, if it should be a directed network:
gD <- graph_from_edgelist(Network_Matrix, directed=TRUE)  # Keep in mind that this network 
                                                          # was not actually undirected. We 
                                                          # are just pretending here for the 
                                                          # sake of having a directed example.

We can then look at our new igraph object by simply calling its variable:

## IGRAPH 2eef9f6 U--- 34 78 -- 
## + edges from 2eef9f6:
##  [1]  1-- 2  1-- 3  2-- 3  1-- 4  2-- 4  3-- 4  1-- 5  1-- 6  1-- 7  5-- 7
## [11]  6-- 7  1-- 8  2-- 8  3-- 8  4-- 8  1-- 9  3-- 9  3--10  1--11  5--11
## [21]  6--11  1--12  1--13  4--13  1--14  2--14  3--14  4--14  6--17  7--17
## [31]  1--18  2--18  1--20  2--20  1--22  2--22 24--26 25--26  3--28 24--28
## [41] 25--28  3--29 24--30 27--30  2--31  9--31  1--32 25--32 26--32 29--32
## [51]  3--33  9--33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33  9--34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34

gD

## IGRAPH bcfab25 D--- 34 78 -- 
## + edges from bcfab25:
##  [1]  1-> 2  1-> 3  2-> 3  1-> 4  2-> 4  3-> 4  1-> 5  1-> 6  1-> 7  5-> 7
## [11]  6-> 7  1-> 8  2-> 8  3-> 8  4-> 8  1-> 9  3-> 9  3->10  1->11  5->11
## [21]  6->11  1->12  1->13  4->13  1->14  2->14  3->14  4->14  6->17  7->17
## [31]  1->18  2->18  1->20  2->20  1->22  2->22 24->26 25->26  3->28 24->28
## [41] 25->28  3->29 24->30 27->30  2->31  9->31  1->32 25->32 26->32 29->32
## [51]  3->33  9->33 15->33 16->33 19->33 21->33 23->33 24->33 30->33 31->33
## [61] 32->33  9->34 10->34 14->34 15->34 16->34 19->34 20->34 21->34 23->34
## [71] 24->34 27->34 28->34 29->34 30->34 31->34 32->34 33->34

Let’s understand the information contained in an igraph object:

IGRAPH simply annotates g as an igraph object
3361180, 16b4918, or whatever follows IGRAPH is simply how igraph identifies the g for itself
- it’s not important for our purposes and will be referred to as arbitrary igraph name
UN-- refers to descriptive details of g:
- U tells us that g is an undirected graph
  - D would tell us that it is directed graph
- N indicates that g is a named graph, in that the vertices have a name attribute
- -- refers to attributes not applicable to g, but we will see them in the future:
  - W would refer to a weighted graph, where edges have a weight attribute
  - B would refer to a bipartite graph, where vertices have a type attribute
34 refers to the number of vertices in g
78 refers to the number of edges in g
attr: is a list of attributes within the graph. There are no attributes in this network. But, in cases where you load networks that use names instead of numbers, you will see name listed after attr:. You will see multiple attributes in the future.
- (v/c), which will appear following name, tells us that it is a vertex attribute of a character data type. character is simply what R calls a string.
- In the future we will also see:
  - (e/c) or (e/n) referring to edge attributes that are of character or numeric data types
  - (g/c) or (g/n) referring to graph attributes that are of character or numeric data types
+ edges from *arbitrary igraph name* (vertex names): lists a sample of g’s edges using the names of the vertices which they connect.

Analysis

Use the following code to produce the measures discussed in chapter 4 of Understanding Dark Networks.

Local Measures in `igraph`

There are many, many available centrality measures that have been developed for network analysis. At present, there is no program that is so comprehensive that it includes all of the measures. We will, therefore, limit this discussion to a subset of the measures that are included in igraph. These include the “big four” measures (degree, betweenness, closeness, and eigenvector) and a few useful others. In what follows, we introduce:

Degree Centrality
- In-degree
- Out-degree
Eigenvector Centrality
Hubs & Authorities
Closeness Centrality
Reach Centrality
Betweenness Centrality

Degree Centrality

There are variations in the way that each of these measures will treat the network you are analyzing. So it is always a good idea to check the default settings with ?. For instance, ?degree will produce the help page for the degree centrality measure. Under the word “usage” you will find:

degree(graph, v = V(graph), 
      mode = c("all", "out", "in", "total"), 
      loops = TRUE, normalized = FALSE)

This is saying that the default settings in igraph, among other things, consider loops when calculating degree (loops = TRUE). We can also see that the resulting output will be raw counts for degree, rather than normalized scores. These are defaults for the program. So if you simply calculate density using density(g), then R will return raw counts. To produce normalized output, add normalized = TRUE to your script: density(g, normalized=TRUE).

The “mode” information tells you what your options are for dealing with directed networks. You will notice that the first option in the list is “all”. This is the default, and will produce a value of degree that adds all incoming and outgoing ties (the sum of all ties adjacent to a node). To change this to indegree, just add degree(gD, mode="in"). “Mode” has no effect on undirected networks.

Try this out with the two networks (g, and gD) that we created above.

Degree <- degree(g)
Indegree.Undirected <- degree(g, mode="in")
Outdegree.Undirected <- degree(g, mode="out")

Degree.Directed <- degree(gD)
Indegree <- degree(gD, mode="in")
Outdegree <- degree(gD, mode="out")

To see what the output for each of these looks like, use the head() function. That will show you the first six observations.

A method that many may find easier, or possibly more intuitive is to group all of these measures together into a data frame and look at them all at once. To do this, use the cbind command to combine the measures for comparison.

CompareDegree <- cbind(Degree, Indegree.Undirected, Outdegree.Undirected, Degree.Directed, Indegree, Outdegree)
# Then look at just the first few observations, to save space.
head(CompareDegree)

##      Degree Indegree.Undirected Outdegree.Undirected Degree.Directed
## [1,]     16                  16                   16              16
## [2,]      9                   9                    9               9
## [3,]     10                  10                   10              10
## [4,]      6                   6                    6               6
## [5,]      3                   3                    3               3
## [6,]      4                   4                    4               4
##      Indegree Outdegree
## [1,]        0        16
## [2,]        1         8
## [3,]        2         8
## [4,]        3         3
## [5,]        1         2
## [6,]        1         3

This is a little bit of a manufactured comparison, since the network data that we are using were really designed as an undirected network. Normally, you will not see the indegree and outdegree from the directed network summing to the same total as the undirected version of the network. But, hopefully, you get the idea.

That was a fairly lengthy treatment of degree centrality. The following centralities will be fairly minimal in explanation and interpretation. The degree example should demonstrate how to modify the centrality measures to work with digraphs.

Eigenvector Centrality

Eigenvector centrality calculations are iterative and can produce a wealth of information. Most users will only be interested in the centrality scores, however. Therefore, when you calculate eigenvector centrality, you will need to tell igraph that you are only interested in the vector of centrality scores that you are calculating.

Eig <- evcent(g)$vector

Hubs and Authorities

These centrality (prominence) scores are calculated in a manner that is very similar to that of eigenvector centrality. You will, therefore, have to isolate only the vector of eigenvector scores in each of these.

Hub <- hub.score(g)$vector
Authority <- authority.score(g)$vector

Closeness

Closeness <- closeness(g)

Reach

Reach is not actually included with the igraph suite. But it is fairly easy to calculate, since it is just a count of how many alters each nodes has at k steps out into the network.

Thankfully, Dr. Diazaboro Shizuka has written a short function to calculate reach centrality in a network. Note: this is for undirected graphs only. This script will treat directed graphs as though they are undirected.

To use reach centrality, you will first need to run the function, and then run the calculation.

# Function for 2-step reach
reach2<-function(x){
    r=vector(length=vcount(x))
    for (i in 1:vcount(x)){
    n=neighborhood(x,2,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)/vcount(x)}
    r}

# Function for 3-step reach
reach3<-function(x){
    r=vector(length=vcount(x))
    for (i in 1:vcount(x)){
    n=neighborhood(x,3,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)/vcount(x)}
    r}

# Now, run the calculations.
Reach_2 <- reach2(g)    # Note the differences between the object
Reach_3 <- reach3(g)    #   names and the function names!

Betweenness

Betweenness <- betweenness(g)

Comparing Centrality Scores

To keep all your work in one place, put it all together in one data frame. You can then export the data frame to a spreadsheet, or sort it in R (see the “Some More Advanced Versions of Doing the Above” section of this site for that).

centralities <- cbind(Degree, Eig, Hub, Authority, Closeness, Reach_2, Reach_3, Betweenness)

# Save it to your computer as a spreadsheet
write.csv(centralities, file="centralities.csv")

Comparing the Various Measures Statistically

R is primarily a statistical program. So you can take advantage of that by seeing how correlated the various centralities are.

round(cor(centralities), 2)

##             Degree  Eig  Hub Authority Closeness Reach_2 Reach_3
## Degree        1.00 0.92 0.92      0.92      0.77    0.44    0.52
## Eig           0.92 1.00 1.00      1.00      0.90    0.70    0.66
## Hub           0.92 1.00 1.00      1.00      0.90    0.70    0.66
## Authority     0.92 1.00 1.00      1.00      0.90    0.70    0.66
## Closeness     0.77 0.90 0.90      0.90      1.00    0.87    0.84
## Reach_2       0.44 0.70 0.70      0.70      0.87    1.00    0.66
## Reach_3       0.52 0.66 0.66      0.66      0.84    0.66    1.00
## Betweenness   0.91 0.80 0.80      0.80      0.72    0.42    0.43
##             Betweenness
## Degree             0.91
## Eig                0.80
## Hub                0.80
## Authority          0.80
## Closeness          0.72
## Reach_2            0.42
## Reach_3            0.43
## Betweenness        1.00

The cor() function produces a correlation matrix. In this case, we can see the correlation values for each pair of centralities. We can see, for example, that degree is correlated with eigenvector, hubs, and authorities at a 0.91 level.

The round( , ) function sets the number of digits that are displayed after the decimal. If you would like to see what you would have gotten if you had not rounded to the first two places after the decimal, try: cor(centralities). The round function takes two arguments: (1) the object, number, or vector that you are rounding, and (2) the number of digits that should appear after the decimal in the output. In this case, we have set the output to just include two digits after the decimal.

Please keep in mind that this is just a “quick and dirty” way of comparing the centralities. In most cases, when you are comparing numeric attributes of vertices it is best to use statistical tests that are designed with network data in mind. Tools such as conditional uniform graphs (CUG), quadratic assignment procedure (QAP), stochastic actor oriented modeling (SAOM) exponential random graph modeling (ERGM / P*), and are explicitly designed to handle the interdependencies present within the network.

We will cover those in a later section.

The World Treaty Index’s WTI Bilateral Agreement Dataset

To retrieve the data you will be using for this practicum, go to the World Treaty Index and download their database of bilateral agreements.

If you are typing this into your browser, go to the World Treaty Index: http://worldtreatyindex.com/, go to “download”, and select the “WTI Bilateral Agreement Dataset”.

Now, drag the csv file into the folder you created on your desktop for this purpose. That folder is where you will be doing all of your work.

These data are being used for practice purposes. To find out more about them, spend a little time on the website looking through what they have to say about the data and its proper analysis. There are essentially 18 variables and 61,346 treaties in this spreadsheet. But we are interested in the network created by all of these bilateral agreements. So all you will really need for this is the list of countries participating in each agreement.

Look through the headers in the dataset for “party1” and “party2”. These are the only two columns you will need. You may either copy both columns into a new spreadsheet, or delete all columns that are not “party1” or “party2”.

Important Note:
Although this is a dataset of bilateral agreements, there are still some unilateral “agreements” included. This means that there are missings. You will, therefore, have to decide what to do about those missing values. You could copy the sender’s name into the receiver’s column, and visa versa, to make a recursive tie (a.k.a, a “loop”). Alternatively, you can check the comments section of the dataset to decide who the implicit recipient of the tie should be. Or, you could, of course, just choose to delete the row with the missing value altogether.

The decision about what to do about the missing values in this network is yours. But be ready to explain what you did and why.

Deliverable Instructions

Submit a Google document with the following to satisfy the terms of this practicum:

Describe the data - as though you were preparing this for someone other than me - and respond to the following:
- Select some global measures that you think are appropriate for this network and interpret them. (See Practicum 3)
- What the empty cells signify and what you decided to do with them
List five countries that you suspect are the most active in their participation in treaties. (Just guess or list the first five you can think of if you have no idea.), Then provide the list and the name of the measure you used for each of the following:
- List the top 5 or 10 countries (with their corresponding values of the measure you used) that have entered into the largest number of treaties overall.
- List the top 5 or 10 countries (with their corresponding values of the measure you used) that you consider to be brokers in this network.
- List the top 5 or 10 countries (with their corresponding values of the measure you used) that have the largest neighborhood within two steps in this network.
- List the top 5 or 10 countries (with their corresponding values of the measure you used) that you feel occupy the most critical junctures in this network. Then tell why you select that particular measure for this network.
Include the best visualization you can manage, as well as a caption that tells what layout you used and what the visualization demonstrates.

Include your name, and the title of the practicum.

Some More Advanced Options for Doing the Above

Make the centrality scores into vertex attributes

In each of the following examples, you are adding a vertex attribute. The word following V(g)$ is the name of the attribute being added to the network data. The function to the right of the assignment arrow (<-) is calculating the value of the attribute for each node in the network. Once you have finished these, check your work using g. You should see these six centralities listed among the vertex attributes.

Word to the wise, if you want to be able to access these another day, be sure to save your network (save(g, file="CentalityNetwork.Rda")). If you forget to save, you can always re-run this script. But, that is more work, isn’t it?

V(g)$degree <- degree(g)                        # Degree centrality
V(g)$eig <- evcent(g)$vector                    # Eigenvector centrality
V(g)$hubs <- hub.score(g)$vector                # "Hub" centrality
V(g)$authorities <- authority.score(g)$vector   # "Authority" centrality
V(g)$closeness <- closeness(g)                  # Closeness centrality
V(g)$betweenness <- betweenness(g)              # Vertex betweenness centrality

Alternate way to Create a Data Frame

centrality <- data.frame(row.names   = V(g)$name,
                         degree      = V(g)$degree,
                         closeness   = V(g)$closeness,
                         betweenness = V(g)$betweenness,
                         eigenvector = V(g)$eig)

centrality <- centrality[order(row.names(centrality)),]

head(centrality)

##    degree  closeness betweenness eigenvector
## 1      16 0.01724138 231.0714286   0.9521324
## 10      2 0.01315789   0.4476190   0.2749981
## 11      3 0.01149425   0.3333333   0.2034715
## 12      1 0.01111111   0.0000000   0.1415663
## 13      2 0.01123596   0.0000000   0.2256638
## 14      5 0.01562500  24.2158730   0.6065744

Save the data frame as a .csv file

Many will choose this method just to make life easier for themselves. Once you have saved this in CSV format, you can sort the output in a spreadsheet program such as Excel, Numbers, Google Sheets, etc.

write.csv(centrality, file = "Centrality.csv")

If you prefer to sort your data in R…

If we use the data frame that we created and named “centrality”, earlier, we can sort it multiple ways. For this example, we’ll only sort on the betweenness measure.

Lowest tie value

# Minimum value
min(centrality$betweenness)

## [1] 0

# Maximum value
max(centrality$betweenness)

## [1] 231.0714

# Bottom six 

head(centrality[order(-centrality$betweenness),])

##    degree  closeness betweenness eigenvector
## 1      16 0.01724138   231.07143   0.9521324
## 34     17 0.01666667   160.55159   1.0000000
## 33     12 0.01562500    76.69048   0.8266589
## 3      10 0.01694915    75.85079   0.8495542
## 32      6 0.01639344    73.00952   0.5116565
## 9       5 0.01562500    29.52937   0.6090684

# Bottom five
head(centrality[order(-centrality$betweenness),], n=5)

##    degree  closeness betweenness eigenvector
## 1      16 0.01724138   231.07143   0.9521324
## 34     17 0.01666667   160.55159   1.0000000
## 33     12 0.01562500    76.69048   0.8266589
## 3      10 0.01694915    75.85079   0.8495542
## 32      6 0.01639344    73.00952   0.5116565

# Bottom 10
head(centrality[order(-centrality$betweenness),], n=10)

##    degree  closeness betweenness eigenvector
## 1      16 0.01724138   231.07143   0.9521324
## 34     17 0.01666667   160.55159   1.0000000
## 33     12 0.01562500    76.69048   0.8266589
## 3      10 0.01694915    75.85079   0.8495542
## 32      6 0.01639344    73.00952   0.5116565
## 9       5 0.01562500    29.52937   0.6090684
## 2       9 0.01470588    28.47857   0.7123351
## 14      5 0.01562500    24.21587   0.6065744
## 20      3 0.01515152    17.14683   0.3961622
## 6       4 0.01162791    15.83333   0.2128838

Highest tie value

max(degree(g))

## [1] 17

# Top six
head(centrality[order(centrality$betweenness),])

##    degree  closeness betweenness eigenvector
## 12      1 0.01111111           0  0.14156633
## 13      2 0.01123596           0  0.22566382
## 15      2 0.01123596           0  0.27159396
## 16      2 0.01123596           0  0.27159396
## 17      2 0.00862069           0  0.06330461
## 18      2 0.01136364           0  0.24747879

# Top five
head(centrality[order(centrality$betweenness),], n=5)

##    degree  closeness betweenness eigenvector
## 12      1 0.01111111           0  0.14156633
## 13      2 0.01123596           0  0.22566382
## 15      2 0.01123596           0  0.27159396
## 16      2 0.01123596           0  0.27159396
## 17      2 0.00862069           0  0.06330461

# Top ten
head(centrality[order(centrality$betweenness),], n=10)

##    degree  closeness betweenness eigenvector
## 12      1 0.01111111           0  0.14156633
## 13      2 0.01123596           0  0.22566382
## 15      2 0.01123596           0  0.27159396
## 16      2 0.01123596           0  0.27159396
## 17      2 0.00862069           0  0.06330461
## 18      2 0.01136364           0  0.24747879
## 19      2 0.01123596           0  0.27159396
## 21      2 0.01123596           0  0.27159396
## 22      2 0.01136364           0  0.24747879
## 23      2 0.01123596           0  0.27159396

Visualization Options

Normally, igraph will calculate new coordinates every time you visualize a network. This can be annoying because it changes the orientation of the network every time. For those times when you prefer to show the network in one static position, to show the differences in node size or color, you can save the coordinates from a particular layout and reuse them multiple times. To do so, just save the coordinates from the first time you run the visualization, and reuse it as you like.

Here, lay is a variable to which you’re assigning the coordinates that igraph calculates based off an algorithm of your choosing. Here’s g using a force-directed algorithm referred to as Kamada-Kawai, the names of its creators.

lay <- layout_with_kk(g)

plot(g, layout = lay, 
     vertex.label = NA)

This is a complicated network for which Kamada-Kawai doesn’t yield a very helpful plot. You can find mode layouts to try in the igraph documentation by searching in the Help pane or running layout_. Simply assign the coordinates to the lay variable, and use your new lay as your layout argument when use the plot() function.

Here’s another example using another force-directed algorithm referred to as Fruchterman-Reingold, the names of its creators.

lay <- layout_with_fr(g)

plot(g, layout = lay, 
     vertex.label = NA)

Try some of the options from Practicum 2 to see if you can render a better visualization with some other layout.

Here is what you can do with the same layout:

Size vertices by centrality measure!

Note: Some centrality measures result in small values. To make the relative differences between them visible, it is important to rescale the centrality vector. We have noted the places where the centrality output was rescaled. There is no real rule to this. In this case, we tried out a few scaling values until the visualization took on the visual aspect we hoped to display.

It is important to rescale an entire vector of output at once in the same manner. Do not change only a few values, as that will change the relative differences between them.

lay <- layout.fruchterman.reingold(g) # save coordinates
plot.igraph(g, layout=lay)

plot.igraph(g, layout=lay, 
            vertex.size=degree(g), 
            main="Degree")

plot.igraph(gD, layout=lay, 
            vertex.size=degree(gD, mode="in"), 
            main="Indegree")

plot.igraph(gD, layout=lay, 
            vertex.size=degree(gD, mode="out"), 
            main="Outdegree")

plot.igraph(g, layout=lay, 
            vertex.size=evcent(g)$vector*15,   # Rescaled by multiplying by 15
            main="Eigenvector")

plot.igraph(gD, layout=lay, 
            vertex.size=hub.score(gD)$vector*15,   # Rescaled by multiplying by 15
            main="Hubs")

plot.igraph(gD, layout=lay, 
            vertex.size=authority.score(gD)$vector*15,   # Rescaled by multiplying by 15
            main="Authorities")

plot.igraph(g, layout=lay, 
            vertex.size=closeness(g)*1000,    # Rescaled by multiplying by 1000
            main="Closeness (X 1000)")

plot.igraph(g, layout=lay, 
            vertex.size=betweenness(g)*0.25,    # Rescaled by multiplying by 25
            main="Betweenness")

plot.igraph(g, layout=lay, 
            vertex.size=reach2(g)*10,   # Rescaled by multiplying by 10
            main="Reach 2")

plot.igraph(g, layout=lay, 
            vertex.size=reach3(g)*10,    # Rescaled by multiplying by 10
            main="Reach 3")