Phil Murphy & Brendan Knapp
knitr::opts_chunk$set(fig.width = 10, fig.height = 8)First, load the igraph package to work with network data.
library(igraph)The dataset for that we're going to be using in this example - Davis' Southern Women - is available in Pajek's .net format, but one of the perks of working with R is that you are able to read and write practically any kind of data file. This includes Pajek files, which are simply text files adhering to a format which Pajek knows how to parse and compute.
A quick note about Pajek (and network analysis software in general) that are worth mentioning if you're looking to explore other platforms. The following adapts some of the points that Benjamin blogged about here
R and Python have exploded in recent years.R is that you're also learning to work in one of the major platforms for statistical analysis.R is also a fully-featured development environment with capabilities that are growing at an increasing rate.
R because it's pointing and clicking, Pajek is less user friendly than some other GUI programs like UCINET, Gephi, Visone, etc. and its interface can be a headache until you really learn it.R and its packages apart from the rest of the herd.R.Pajek-3XL can handle networks beyond TWO BILLION vertices.R means you can do more than someone who can only use one platform.Ultimately, the software that you choose should be based on your own personal preferences. If you are likely to use network analysis a lot, or are likely to use it in combination with other analyses, then R is likely a good choice. On the other hand, if you are coding averse or just prefer the comfort of the point-and-click interface, then there are many powerful and reliable programs, such as Pajek and UCINET, that you may find more to your taste.
With that, let's read our Pajek file, which uses the extension .net...
g <- read.graph(file.choose(), format = "pajek")...and take a look.
g## IGRAPH 0aaa3be UNW- 32 89 --
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), weight
## | (e/n)
## + edges from 0aaa3be (vertex names):
## [1] EVELYN --1 EVELYN --2 EVELYN --3 EVELYN --4 EVELYN --5
## [6] EVELYN --6 EVELYN --8 LAURA --1 LAURA --2 LAURA --3
## [11] LAURA --5 LAURA --6 LAURA --7 THERESA --2 THERESA --3
## [16] THERESA --4 THERESA --5 THERESA --6 THERESA --7 THERESA --8
## [21] BRENDA --1 BRENDA --3 BRENDA --4 BRENDA --5 BRENDA --6
## [26] BRENDA --7 CHARLOTTE--3 CHARLOTTE--4 CHARLOTTE--5 FRANCES --3
## [31] FRANCES --5 FRANCES --6 ELEANOR --5 ELEANOR --6 ELEANOR --7
## + ... omitted several edges
If the network that you are working with is a one-mode network, then you are essentially done with the data loading process.
Inspecting the igraph object gives us the header UNW-, which tells us that g is undirected, the vertices are named, and that the edges are weighted. We didn't need to tell igraph whether or not the edges should have directions as Pajek files already specify whether they are directed or undirected by labeling the ties as being arcs or edges.
You may already know that the "Southern Women" network is actually a two-mode, or bipartite network. So, ultimately, we will want the - in UNW- to include a B for bipartite.
Because igraph does not automatically recognize two-mode networks, it is necessary to tell igraph that there are two types of vertices. There are multiple methods for doing this. We cover two options here:
bipartite.mapping() functionbipartite.mapping() functionIgraph can evaluate the network that you have entered for whether it meets the criteria of a two-mode network. Those criteria are that there are (1) two sets of nodes in the network, and (2) there are only ties between node sets and not within them. That is, there are two sets of entities in the network, and the entities from each set are only connected with one another through the other node set. If the network meets the criteria, igraph will identify which nodes belong in each mode.
To see what the function does, try running it:
bipartite.mapping(g)## $res
## [1] TRUE
##
## $type
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
## [23] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Igraph returns two responses:
$res), and$type).The "type" argument is what igraph uses to identify the two modes. We can add this into the network fairly easily.
# First, don't take chances.
g2M <-g # Create a new network so you don't accidently
# overwrite your network with a mistake
V(g2M)$type <- bipartite_mapping(g2M)$type # Add the "type" attribute
# to the network.Igraph's bipartite.mapping() function is certainly handy, and will likely be useful to you 95% of the time. However, there will also be times when you will likely not want igraph to make the decision of which node belongs in which mode for you. In those cases, you can assign node classes manually.
Keep in mind that igraph denotes different modes by whether they are TRUE or FALSE. Now, look at the igraph object above. The vertices are identified with names: + the names of the "Southern Women", and + the events which they attended, annotated as numbers.
The easiest means of assigning nodes into modes is to work with the data as an edgelist, which will be formatted so that the "Southern Women" are in the first column and the events are in the second column. This will make labeling the type of the vertices relatively simple.
as_edgelist() to create an edgelist object called g_el.head() to inspect the edgelist.as_edgelist()g_el <- as_edgelist(g)
head(g_el)## [,1] [,2]
## [1,] "EVELYN" "1"
## [2,] "EVELYN" "2"
## [3,] "EVELYN" "3"
## [4,] "EVELYN" "4"
## [5,] "EVELYN" "5"
## [6,] "EVELYN" "6"
Next, let's check to see what kind of object as_edgelist() returns.
class(g_el)## [1] "matrix"
To make life simple, let's name the column headers of our edgelist so that they reflect which column refers to a person and which refers to an event.
Since we now know that g_el is a matrix, we can use colnames() to assign headers.
To pass multiple arguments to colnames(), we use c() to combine "person" and "event".
colnames(g_el) <- c("person", "event")Let's take a look at g_el with head().
head(g_el)## person event
## [1,] "EVELYN" "1"
## [2,] "EVELYN" "2"
## [3,] "EVELYN" "3"
## [4,] "EVELYN" "4"
## [5,] "EVELYN" "5"
## [6,] "EVELYN" "6"
In order to label each vertices' type, we're going to use a logical function that will return TRUE or FALSE based on which column of g_el each vertex is in. There are few ways to do this. In this case, we're going to use ifelse() to...
name of a vertex is %in% the first column of g_el, which we named with the header "person"igraph wants types to be TRUE or FALSE, use those values for our second and third argumentsNote: If you are interested in learning more about logical comparisons in R, visit out helper resources on that topic.
V(g)$type <- ifelse(V(g)$name %in% g_el[,"person"], TRUE, FALSE)
g## IGRAPH 0aaa3be UNWB 32 89 --
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), type
## | (v/l), weight (e/n)
## + edges from 0aaa3be (vertex names):
## [1] EVELYN --1 EVELYN --2 EVELYN --3 EVELYN --4 EVELYN --5
## [6] EVELYN --6 EVELYN --8 LAURA --1 LAURA --2 LAURA --3
## [11] LAURA --5 LAURA --6 LAURA --7 THERESA --2 THERESA --3
## [16] THERESA --4 THERESA --5 THERESA --6 THERESA --7 THERESA --8
## [21] BRENDA --1 BRENDA --3 BRENDA --4 BRENDA --5 BRENDA --6
## [26] BRENDA --7 CHARLOTTE--3 CHARLOTTE--4 CHARLOTTE--5 FRANCES --3
## [31] FRANCES --5 FRANCES --6 ELEANOR --5 ELEANOR --6 ELEANOR --7
## + ... omitted several edges
Now igraph knows that our network is bipartite, which we can tell from the B in UNWB