Import Pajek (.net) files

Phil Murphy & Brendan Knapp

knitr::opts_chunk$set(fig.width = 10, fig.height = 8)

First, load the igraph package to work with network data.

library(igraph)

Pajek and file types

The dataset for that we're going to be using in this example - Davis' Southern Women - is available in Pajek's .net format, but one of the perks of working with R is that you are able to read and write practically any kind of data file. This includes Pajek files, which are simply text files adhering to a format which Pajek knows how to parse and compute.

Brendan's Aside:

A quick note about Pajek (and network analysis software in general) that are worth mentioning if you're looking to explore other platforms. The following adapts some of the points that Benjamin blogged about here

Ultimately, the software that you choose should be based on your own personal preferences. If you are likely to use network analysis a lot, or are likely to use it in combination with other analyses, then R is likely a good choice. On the other hand, if you are coding averse or just prefer the comfort of the point-and-click interface, then there are many powerful and reliable programs, such as Pajek and UCINET, that you may find more to your taste.

Loading .net files into igraph

With that, let's read our Pajek file, which uses the extension .net...

g <- read.graph(file.choose(), format = "pajek")

...and take a look.

g
## IGRAPH 0aaa3be UNW- 32 89 -- 
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), weight
## | (e/n)
## + edges from 0aaa3be (vertex names):
##  [1] EVELYN   --1 EVELYN   --2 EVELYN   --3 EVELYN   --4 EVELYN   --5
##  [6] EVELYN   --6 EVELYN   --8 LAURA    --1 LAURA    --2 LAURA    --3
## [11] LAURA    --5 LAURA    --6 LAURA    --7 THERESA  --2 THERESA  --3
## [16] THERESA  --4 THERESA  --5 THERESA  --6 THERESA  --7 THERESA  --8
## [21] BRENDA   --1 BRENDA   --3 BRENDA   --4 BRENDA   --5 BRENDA   --6
## [26] BRENDA   --7 CHARLOTTE--3 CHARLOTTE--4 CHARLOTTE--5 FRANCES  --3
## [31] FRANCES  --5 FRANCES  --6 ELEANOR  --5 ELEANOR  --6 ELEANOR  --7
## + ... omitted several edges

If the network that you are working with is a one-mode network, then you are essentially done with the data loading process.

Inspecting the igraph object gives us the header UNW-, which tells us that g is undirected, the vertices are named, and that the edges are weighted. We didn't need to tell igraph whether or not the edges should have directions as Pajek files already specify whether they are directed or undirected by labeling the ties as being arcs or edges.

You may already know that the "Southern Women" network is actually a two-mode, or bipartite network. So, ultimately, we will want the - in UNW- to include a B for bipartite.

Because igraph does not automatically recognize two-mode networks, it is necessary to tell igraph that there are two types of vertices. There are multiple methods for doing this. We cover two options here:


Igraph's bipartite.mapping() function

Igraph can evaluate the network that you have entered for whether it meets the criteria of a two-mode network. Those criteria are that there are (1) two sets of nodes in the network, and (2) there are only ties between node sets and not within them. That is, there are two sets of entities in the network, and the entities from each set are only connected with one another through the other node set. If the network meets the criteria, igraph will identify which nodes belong in each mode.

To see what the function does, try running it:

bipartite.mapping(g)
## $res
## [1] TRUE
## 
## $type
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
## [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Igraph returns two responses:

The "type" argument is what igraph uses to identify the two modes. We can add this into the network fairly easily.

# First, don't take chances.
g2M <-g         # Create a new network so you don't accidently
                #  overwrite your network with a mistake
V(g2M)$type <- bipartite_mapping(g2M)$type  # Add the "type" attribute
                                            #  to the network.


Manual Assignment

Igraph's bipartite.mapping() function is certainly handy, and will likely be useful to you 95% of the time. However, there will also be times when you will likely not want igraph to make the decision of which node belongs in which mode for you. In those cases, you can assign node classes manually.

Keep in mind that igraph denotes different modes by whether they are TRUE or FALSE. Now, look at the igraph object above. The vertices are identified with names: + the names of the "Southern Women", and + the events which they attended, annotated as numbers.

The easiest means of assigning nodes into modes is to work with the data as an edgelist, which will be formatted so that the "Southern Women" are in the first column and the events are in the second column. This will make labeling the type of the vertices relatively simple.

as_edgelist()
g_el <- as_edgelist(g)

head(g_el)
##      [,1]     [,2]
## [1,] "EVELYN" "1" 
## [2,] "EVELYN" "2" 
## [3,] "EVELYN" "3" 
## [4,] "EVELYN" "4" 
## [5,] "EVELYN" "5" 
## [6,] "EVELYN" "6"

Next, let's check to see what kind of object as_edgelist() returns.

class(g_el)
## [1] "matrix"

To make life simple, let's name the column headers of our edgelist so that they reflect which column refers to a person and which refers to an event.

Since we now know that g_el is a matrix, we can use colnames() to assign headers.

To pass multiple arguments to colnames(), we use c() to combine "person" and "event".

colnames(g_el) <- c("person", "event")

Let's take a look at g_el with head().

head(g_el)
##      person   event
## [1,] "EVELYN" "1"  
## [2,] "EVELYN" "2"  
## [3,] "EVELYN" "3"  
## [4,] "EVELYN" "4"  
## [5,] "EVELYN" "5"  
## [6,] "EVELYN" "6"

In order to label each vertices' type, we're going to use a logical function that will return TRUE or FALSE based on which column of g_el each vertex is in. There are few ways to do this. In this case, we're going to use ifelse() to...

Note: If you are interested in learning more about logical comparisons in R, visit out helper resources on that topic.

V(g)$type <- ifelse(V(g)$name %in% g_el[,"person"], TRUE, FALSE)

g
## IGRAPH 0aaa3be UNWB 32 89 -- 
## + attr: id (v/c), name (v/c), x (v/n), y (v/n), z (v/n), type
## | (v/l), weight (e/n)
## + edges from 0aaa3be (vertex names):
##  [1] EVELYN   --1 EVELYN   --2 EVELYN   --3 EVELYN   --4 EVELYN   --5
##  [6] EVELYN   --6 EVELYN   --8 LAURA    --1 LAURA    --2 LAURA    --3
## [11] LAURA    --5 LAURA    --6 LAURA    --7 THERESA  --2 THERESA  --3
## [16] THERESA  --4 THERESA  --5 THERESA  --6 THERESA  --7 THERESA  --8
## [21] BRENDA   --1 BRENDA   --3 BRENDA   --4 BRENDA   --5 BRENDA   --6
## [26] BRENDA   --7 CHARLOTTE--3 CHARLOTTE--4 CHARLOTTE--5 FRANCES  --3
## [31] FRANCES  --5 FRANCES  --6 ELEANOR  --5 ELEANOR  --6 ELEANOR  --7
## + ... omitted several edges

Now igraph knows that our network is bipartite, which we can tell from the B in UNWB