Introduction to igraph in R

Practical Session I

Before you begin, you should go ahead and change the location where R will save your data and other work. For now, just use your computer’s desktop. In the future, you’ll want to create a new folder for every project you conduct. But, this time, click the following in RStudio:

Session
- Set Working Directory
- Choose Directory…

Getting started in igraph

First time only: install the package “igraph”

install.packages("igraph", dependencies=TRUE)

Within the R Console, load package “igraph”

Just type the following to start igraph:

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

Also, if you want your various analyses from this exercise to turn out like the ones on this page, then enter the following into R:

set.seed(8675309)

Entering and loading network data

Random

The easiest way to start out is to just simulate a network. Simulations can be useful if you are trying to demonstrate something, or - later - if you are trying to test something. For now, they are just an easy way for you to get a network into R.

One way of doing that is to create a scale-free network:

g <- sample_pa(50)

You just generated a “scale-free” network with 50 nodes using an algorithm that was introduced by Barabasi and Albert. But, we are getting a little ahead of ourselves. Before going into scale-free networks and network generation in general, let’s take a moment to unpack what you just did.

You created a data object named “g”. At the same time, you used the assignment operator (<-) to put information into the object. The assignment operator assigns values on the right to objects on the left. So, after executing x <- 50, the value of x is 50. The arrow can be read as 3 goes into x. If you are familiar with other coding languages, then you may be relieved to learn that you can also use = for assignments. But the = sign will in every context in R and its use can become confusing. Because of the slight differences in syntax, it is good practice to always use <- for assignments.

As you will notice in a lot of instances, RStudio is here to help. RStudio offers shortcuts for routine keystrokes like assignment and others. On a PC, typing Alt + - (push Alt at the same time as the - key) will write <- in a single keystroke. If you are using a Mac, typing Option + - (push Option at the same time as the - key) does the same thing.

Objects can be given any name such as network, this_really_big_network_I_like, or really whatever you wish to call the new data object. The object names that you use should be explicit and not too long. They cannot start with a number (2x is not valid, but x2 is). R is case sensitive (e.g., age is different from Age). There are some names that cannot be used because they are the names of fundamental functions in R (e.g., if, else, for, see here for a complete list). In general, even if it’s allowed, it’s best to not use other function names (e.g., c, T, mean, data, df, weights). If in doubt, check the help to see if the name is already in use. It’s also best to avoid dots (.) within an object name as in my.dataset. There are many functions in R with dots in their names for historical reasons, but because dots have a special meaning in R (for methods) and other programming languages, it’s best to avoid them. It is also recommended to use nouns for object names, and verbs for function names. It’s important to be consistent in the styling of your code (where you put spaces, how you name objects, etc.). Using a consistent coding style makes your code clearer to read for your future self and your collaborators.

If you would like to know more about the details of what the function sample_pa is doing, or if you are interested in what options you have when you use it, then consult the help section for this function. You have two options for accessing help. You can use help(sample_pa), or you can type ?(sample_pa). These functions both do the same thing: they access the help section for you. Cyou can use them to get help with any function you find in R.

Functions are scripts that people have written to save time and effort for others. They automate more complicated sets of commands including operations assignments, etc. Many functions are predefined, or can be made available by importing R packages like igraph`` orstatnet`, which you will see later in the course. A function usually includes one or more inputs called arguments. Functions often (but not always) return a value.

In this case, the value that we generated was stored in the object “g”. If you run the function without assigning it to store the information in g, then you will see the igraph network that the function creates. Of course, you can also type g into the console and you will also see the object. Try each. We’ll wait.

Let’s get this over with early: Plotting

Plotting, or visualizing networks, is something that you will be doing a lot of in this course. First, and foremost, visualizations will help you to gain some general insight into medium to small networks, and sometimes large to very large networks as well. We use this as a diagnostic, and as a way to set up our expectations for the next round of exploratory analyses.

Visualizations are also - quite often - pretty and pleasing to the eye. They may impress your boss, or, in a best case scenario, they can get some information across succinctly. You will use them as a “wow factor” in some, but not all, reports that you give that involve networks.

Remember: visualizations do not always help. Some people are not at all visually inclined. In those circumstances, output and graphs may be a better option for getting your point across.

In any case, to see what you have created, enter the following into R:

plot(g)

You will be able to do this every time you create, modify, or enter network information. Try this for each of the networks you create below.

Also, you will note that every time we create a new network in this example page, we name it the same thing: “g”. This is done for the sake of convenience. But, it means that every time we create something new, we are overwriting the last thing that we made. If you want to save a network that you made, you can do so by just assigning the network object to a new name like so:

SomeNewName <- g

Now, with that out of the way, you can proceed to try out some other cool stuff.

More examples of growing a random network…

Try each of the following to get an idea of what each will do. Remember to check the help section for each using the ? operator for details on each.

Also, plot each one. Because we are naming them all the same way, plot(g) will work each time.

We will work on “best practices” for coding this stuff as we progress through the term. For now, consider this to be an efficiency measure that you may do again in the future.

g <- growing.random.game(50, m=2)
g <- simplify(g) # "Simplify" removes loops and multiple edges.

The simplify function removes stuff that we often don’t want in a simple, binary network. We will learn more about loops and multiple edges later in the course. But, for now, just understand that I am trying to keep this pretty simple.

Use the plot(g) function now. Run the growing random game again and plot it. Then, simplify and plot it again to get an idea of what the simplification is doing.

The following algorithm does not require you to simplify the network you create.

g <- erdos.renyi.game(50, 5/50)
degree.distribution(g)

##  [1] 0.02 0.04 0.10 0.16 0.10 0.16 0.14 0.14 0.06 0.04 0.02 0.02

Igraph data entry

There are many ways to get data into igraph. You can load an R data object that you have previously saved, of course. But, most of the time, you will be finding and entering your own data.

Here are a few options for you to get data into R.

Using an edgelist

(taken almost directly from igraph help section) First create an edgelist and save it as a text file (__.txt).

To keep this easy, copy and paste the following into a text editor (not Microsoft Word or similar), and save it as “edgelist.txt”.

jim,tab
tab,jen
joe,john
john,tab
tab,joe
jen,john
tab,john
jen,tab

The first column is the person sending the tie and the second column is the person receiving the tie. So, in the first row, you can see that jim sends a tie to tab. If this is a sentiment relation, such as “likes,” then this is saying that jim likes tab. Look through the edgelist to see if tab likes jim back.

el <- as.matrix(read.table(file.choose(), sep=",")) # Load your edgelist as a two column matrix.
g <- graph.edgelist(el, directed=TRUE)    # Convert it into an igraph object.

Note: New Technique

You may have noticed all of the annoying parentheses we had in the script, above. There are a lot because we used something referred to as “nesting” to save a little space. Nesting is a practice of putting one function inside another. Above, we have two functions nested inside of a third. I am including a script below that will do the same thing. But, you will note that it takes a little more typing and it requires us to pass the object from one function to another. We can even modify an object and replace the contents of the object with the modified version.

See? Confusing. That is why we nested the functions above.

You will see that there are a lot of other ways to do this same thing, especially once you learn about the “tidyverse”. abut we’ll have to leave that for later.

el <- read.table(file.choose(), sep=",")) # Read in the table
el <- as.matrix(el)                       # Now, format it as a matrix

The sep="," bit is important. It tells the read.table function that the two columns are separated by commas. This function takes what it sees pretty literally. So, if there are spaces after the commas, then R will think they are part of the name unless you tell the function strip.white=TRUE. Again, check out the help section (?read.table) if you are going to use this again.

If you don’t add that argument then the function will use its default, which is to use white space as the separator. The "sep="," argument works the same with .csv files. You can also use less commonly used characters as your separator with this function. The mind just boggles at your options…

Go ahead. Plot it. You know you want to.

You can also create the same edgelist in R

(best if it is a short list)

el <- cbind( c("jim", "tab", "joe", "john", "tab", "jen", "tab", "jen"), 
             c("tab", "jen", "john", "tab", "joe", "john", "john", "tab"))
g <- graph.edgelist(el, directed=TRUE)

This method works the same way as the one you used above. Except, now the sending column is all enclosed in the first vector, and the receivers are in the second vector. We bind those together into columns (cbind()) to make the same matrix as we did when we imported the edgelist.

Take a look at the network object you just created by simply typing g into R. We will explain what all the stuff means later. But, for now, you can see that this is an Igraph object with one attribute (name), and you can see the edgelist you just entered. We’ll come back to this later in the semester.

Adding some attributes:

When you add attributes, it is a super good idea to know the order in which the various nodes have been saved. To do this, you can check the names of each of the nodes and their order using:

get.vertex.attribute(g, "name")

## [1] "jim"  "tab"  "jen"  "joe"  "john"

Now that you can see the order, you can assign the attributes that each node (or tie) has.

Vertex attributes

V(g)$name # Check the Vertex IDs
  # Then use that information to set other vertex attributes:
g <- set.vertex.attribute(g, "gender", value=c("male", "female", "female", "male", "male"))

Edge Attributes

  # We can also add edge weights.
g <- set.edge.attribute(g, "weight", value=c(1, 1, 2, 1, 3, 4, 2, 1))

Now use those attributes to color your nodes and edges:

V(g)$color <- V(g)$gender
V(g)$color <- gsub("female","orange",V(g)$color)
V(g)$color <- gsub("male","purple",V(g)$color)
  # You can add color to your edges as well (If you can think of a good reason for it.)
E(g)$color <- c("red", "blue", "red", "purple", "red", "blue", "red", "purple")

When you seeV(g), you are doing something with vertices (nodes) in the network. When you seeE(g), you are working with the edges. The $ is assignment. You are essentially introducing some new aspect of the node or edge attribute list when you use this.

That was a lot of stuff, done quickly. Don’t worry if you are not quite sure what you just did. Just go back and see if you can work it out.

We’ll spend more time going through each step and what it means in the course. Until then, use the help function (?) liberally.

To look at your network:

There are a few ways to look at your network that don’t involve visualization.

 fix(g) # Use caution. It can cause grief if you change anything manually.
 print(g, e=TRUE, v=TRUE)
 plot(g)
 plot(g, edge.width=E(g)$weight) # To plot using edge weights

Using data frames with igraph

You can also enter the data yourself using data frames.
(This was taken directly from igraph help section.)

What you see below is similar to what we entered above. But, here it is a bit more efficient and compact.

actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
                     age=c(48,33,45,34,21),
                     gender=c("F","M","F","M","F"))

relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David", "David", "Esmeralda"),
                        to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
                        same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
                        friendship=c(4,5,5,2,1,1), 
                        advice=c(4,5,5,4,2,3))

g <- graph.data.frame(relations, directed=TRUE, vertices=actors)

Now, type g into the R console to see what your network looks like. You should have multiple edge and vertex attributes.

You are also welcome to plot this as well. Why not?

Some simple analyses you can do in igraph

If you recall the centrality measures that we used in the example analyses on the first day, then you are welcome to see who is most “central” in this (or another) network. As in the example from the first day, there are the “big four” centralities: degree, betweenness, closeness, and eigenvector.

Calculating them in igraph is easy.

deg <- degree(g)            # Degree centrality

clo <- closeness(g)         # Closeness centrality

## Warning in closeness(g): At centrality.c:2617 :closeness centrality is not
## well-defined for disconnected graphs

bet <- betweenness(g)       # Betweenness centrality

eig <- evcent(g)$vector     # Eigenvector centrality

The tough part is making sense of them. We don’t really know much about the context around this network, since this is just a made-up exercise. But, go ahead and take a look at each one by simply typing deg, clo, bet, or eig into R. Or, you can use the following script to combine them all into a neat table:

name <- get.vertex.attribute(g, "name")

table <- cbind(name, deg, clo, bet, eig)

Now, type table into R.

Degree distribution

You may also want to draw a degree distribution for your network if it is large enough. A degree distribution is something that shows how “popularity” or direct influence on others is distributed throughout the network. (That is an oversimplification. But more on that later.)

hist(degree.distribution(g))

(This network really isn’t large enough to justify a degree distribution. This one is so boring. Wow, I am so embarassed.)

Well, we can see that most people do have a high degree. Fewer people have a low degree. But, in a network this small, that is pretty easy to see already. Once the networks start to get really large, you will find this sort of thing really handy.

To produce a visualnetwork of whose centrality is whose, you can create a data object:

First, you will need to identify who is who with: V(g)$id

object.name <- cbind(V(g)$id, deg, clo, bet, eig)

You can now export it in a .csv file if you like.

write.csv(object.name, file=paste("centrality.csv", sep=","))

### Saving your work
If you would like to export the centrality scores in a single spreadsheet, first bind them into one data frame, then save them as a csv file.

  # First, merge vectors into table, store as 'cent'
cent <- cbind(deg, clo, bet, eig)

  # Next, save them as a .csv file.
write.csv(cent, file="Centrality.csv") # You may want to choose a working directory first.
         # If you need to find out where it went, use: getwd()

You can also save the igraph network as a data object.

save(g, file = "example_network.RDA")

Some visualization options

2-Dimensions

plot(g)

tkplot(g)

3-Dimensions

rglplot(g, layout=layout.fruchterman.reingold(g, dim=3))

#################################################################################
# Add-ons for the visualizers:
#       For node labels:    vertex.label=V(g)$id
#       For other layouts:  layout=layout.spring
#                           layout=layout.sphere
#                           layout=layout.fruchterman.reingold.grid
#                           layout=layout.kamada.kawai
#                           layout=layout.mds
#   Size of nodes               vertex.size=20
#   Size nodes by betweenness:  vertex.size=bet
#   Size nodes by eigenvector:  vertex.size=eig
#   Size nodes by closeness:    vertex.size=clo
#   Size nodes by degree:       vertex.size=deg
#
# Example: plot(g, vertex.size=clo*20, layout=layout.mds, vertex.label=V(g)$id)
#
# You can also set the layout as an attribute of the netowrk.
#   g <- set.graph.attribute(g, "layout", layout.kamada.kawai(g))
#
# To move the label out of the middle of the vertex use "vertex.label.dist"  
#   plot(g, vertex.label=V(g)$id, vertex.label.dist=3)
#  
# You may choose to resize the vertex IDs. To do so, you will add that information to 
#   the vertex itself before plotting. Here are some examples:
#    V(g)$label.cex <- deg
#    V(g)$label.cex <- log(deg)
#    V(g)$label.cex <- sqrt(deg)
#    V(g)$label.cex <- deg^2
#
#################################################################################

That is a lot of stuff to do just for exercise in using R. Later, we will deal with why we do the stuff above. But, for now, I just want you to get used to the scripting interface.

So, feel free to mix and match the above scripts and see what you can get R to do with only a few commands.