Exploring network visualization techniques

Do you remember how to install and load a package? What are the two required functions?

Kenyan Households contact network dataset:

Let’s try with an actual dataset that you can download here. This dataset examines the number and frequency of contacts between members of different households in Kenya. One of the parameters used in the code below includes an age group category. Each age group corresponds to individuals between a range of ages. You can find more details here.

Age group category	Age range (in years)
1	<1 (infant)
2	1-5 (pre-school)
3	6-15 (primary school)
4	16-19 (secondary school)
5	20-49 (adults)
6	≥50 (elderly)

Note that you can examine a much larger dataset using data from both within and between household contacts, but you’ll need to combine the data from two csv files. Look at the code that is commented out, however for this example we will only use data for contacts between households. Find the file path for the following csv files: scc2034_kilifi_all_contacts_across_households.csv and node_attributes.csv and create dataframes labelled d1 and node_attributes respectively using each file. Do you remember how to do this using the read.csv command?

Take a look at the datasets d1 and node_attributes which include data about contacts between different household members and attributes such as age and sex for each individual.

head(d1, n=3)

##   h1 m1 h2 m2 age1 age2 sex1 sex2 duration day hour
## 1  E 23  F  3    3    3    F    M       40   1   14
## 2  E 23  F  3    3    3    F    M       40   1   14
## 3  E 23  F  3    3    3    F    M       20   1   14

head(node_attributes, n=3)

##    ID age sex
## 1 E10   3   F
## 2 E11   0   M
## 3 E13   3   F

Let’s combine the household index with individual index to get unique IDs for each individual:

d1$V1 <- paste0(d1$h1,"",d1$m1) 
d1$V2 <- paste0(d1$h2,"",d1$m2) 
d1 <- d1[with(d1, order(V2, V1)), ]

## Note the lines commented below between *** can be used to combine data from both csv files.
## ************
##d2 <- read.csv("~/scc2034_kilifi_all_contacts_within_households.csv")
##d2$V1 <- paste0(d2$h1,"",d2$m1) 
##d2$V2 <- paste0(d2$h2,"",d2$m2) 

##d3 <- cbind(c(d1$V1, d2$V1), c(d1$V2, d2$V2), rbind(d1[5:9], d2[5:9]))
##colnames(d3)[1:2] <- c("V1","V2")
## If you are using d3, you will need to modify the edgelist name and column numbers below
## ************

Now using this dataframe, let’s create an edgelist and a network graph object using functions in the igraph package. Note that you’ll first have to convert your dataframe d1 into a matrix object since the function graph.edgelist only handles matrix objects. The get.adjacency function will create the appropriate adjacency matrix from your edgelist. This adjacency matrix is then converted into a network graph object.

myEdgeList <- as.matrix(d1) ## *d3 here if using both datasets
g <- graph.edgelist(myEdgeList[,12:13], directed=FALSE) ## *use [1:2] for d3
## create an adjacency matrix from your edgelist
net <- graph.adjacency(get.adjacency(g), weighted=TRUE, mode="upper")

N.B. We define the mode in the adjacency graph as upper or lower to signify that the matrix is symmetric and there is no directionality in contacts.

3. Assigning attributes to your nodes and edges

Maybe you have categorical and numerical variables that give us more information about individuals and interactions between individuals in a network. What kind of data would you need to define these attributes? Always consider all the data available before you start making your network (or any graph for that matter!)

Let’s assign vertex and edge attributes to our edgelist objects, based on the age and sex of individuals. We’ll also use the duration of contacts between individuals to define edge weights: how “heavy” an edge is can reflect how repeated these interactions are. You will assign the weight using the edge.weigth paramter when you plot the network.

## Here we are matching the sex with the vertices by inlcuding it as an attribute for each vertex. 
V(net)$sex <- as.character(node_attributes$sex[match(V(net)$name,node_attributes$ID)])

## We can also assign edge attributes, or include weighted edges
## First, extract the duration of contacts by summing for each dyad
temp <- aggregate(duration~V1+V2, sum, data=d1)
## let's sort these by ID
temp <- temp[order(temp$V2),]
## now assign these durations as edge weights.  
E(net)$weight <- as.numeric(temp[,3])

## Warning in eattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

One last thing before we plot our network: I want to color nodes by the attribute sex and define a specific layout for the network. Recall that your layout is not deterministic and will be new each time you rerun this line. Set the layout as a saved vector by sing the layout paramter when you plot the network.

## Remember the color transparency function?
transp <- function(col, alpha=.5){
res <- apply(col2rgb(col),2, function(c) rgb(c[1]/255, c[2]/255, c[3]/255, alpha))
return(res)
} 

## Choose your node colors
Fcolor <- transp("green", alpha=0.5)
Mcolor <- transp("dodgerblue", alpha=0.5)
V(net)$color <- V(net)$sex
V(net)$color <- gsub("F", Fcolor, V(net)$color) # Females will be green
V(net)$color <- gsub("M", Mcolor, V(net)$color) # Males will be blue

## set the network layout
layout_net <- layout_with_lgl(net)

4. Plotting your network

Great, now let’s plot!

plot(net, 
     vertex.label=NA, 
     layout=layout_net, 
     edge.width=E(net)$weight/100,
     vertex.label.cex=0.6, 
     vertex.label.color="black")

Let’s say you want to delete edges that are minor (contacts less than the mean contact rate). You’ll have to decide what cut off value is meaningful to your data.

cut.off <- mean(temp$duration) ## set a cut.off value
net_sparse <- delete_edges(net, E(net)[weight<cut.off]) ## save a new network with short contacts deleted
plot(net_sparse, 
     vertex.label=NA, 
     layout=layout_net, 
     edge.width=E(net_sparse)$weight/100, 
     vertex.label.cex=0.6, 
     vertex.label.color="black")

Okay, now let’s try and modify node attributes to color individuals by their age group instead of sex.

## Assign vertex color based on the age group of individuals
V(net)$agegroup <- node_attributes$age

## How many subgroups do we have?
unique(V(net)$agegroup)

## [1] 3 0 1 4 2

## so we need 5 colors for these subgroups
sub_color <- transp(c("green", "tomato", "dodgerblue", "yellow","gray"), alpha=0.5)

plot(net, 
     vertex.label=NA, 
     layout=layout_net, 
     edge.width=E(net)$weight/100,
     vertex.label.cex=0.6, 
     vertex.label.color="black",
     vertex.color=sub_color[V(net)$agegroup] ## set the subgroup colors by age group
     )

What if you want to highlight specific subgroups in the network? You can mark specific groups in the network based on their vertex id using the mark.groups parameter.

plot(net, 
     vertex.label=NA, 
     layout=layout_net,
     edge.width=E(net)$weight/100, 
     vertex.label.cex=0.6, 
     vertex.label.color="black",
     mark.groups=list(c(1,2,3), c(4:8)), 
     mark.col=c("#C5E5E7","#ECD89A"), 
     mark.border=NA
     )

You can mark groups by the household too!

Ideally, you would include a column in your node_attributes data frame that includes all this relevant information or you would extract the household from the ID column. As an exercise, think about how you might get this information from the ID columns in node_attributes.

Since I don’t already have a column in my node_attributes data frame that includes only the household, I am going to look through my vertices individually and see which nodes correspond to which household. Note that households are indicated by letters.

V(net)

## + 28/28 vertices, named, from 502d350:
##  [1] L1  E10 L3  L7  E11 E13 E15 E16 E17 E2  E20 F10 E22 F3  E23 E25 E27
## [18] E30 E4  E6  F1  F11 E9  F12 E5  F5  F7  F9

It looks like vertices 1,3,4 are in household L, vertices 2, 5:11,13, 15, 17:20, 23, 25 are in household E, and vertices 14, 16, 21:22, 24, 26:28 are in household F

plot(net, 
     vertex.label=NA, 
     layout=layout_net, 
     edge.width=E(net)$weight/100, 
     vertex.label.cex=0.6, 
     vertex.label.color="black",
     mark.groups=list(c(1,3,4), c(2, 5:11,13, 15, 17:20, 23, 25), c(14, 16, 21:22, 24, 26:28)), 
     mark.col=c("#FFFF0080","#9ACD3280", "#FA807280"), 
     mark.border=NA
     )

Well, that’s not super informative; let’s try only 2 subgroups:

plot(net, 
     vertex.label=NA, 
     layout=layout_net, 
     edge.width=E(net)$weight/100, 
     vertex.label.cex=0.6, 
     vertex.label.color="black",
     mark.groups=list(c(1,3,4), c(14, 16, 21:22, 24, 26:28)), 
     mark.col=c("#FFFF0080", "#FA807280"), 
     mark.border=NA
     )

Okay, that looks better. Try playing around with sub-grouping if you think it highlights important features in your network.

There are plenty of other graphical options you can explore!

Find out how to modify:

vertex labels
vertex colors
vertex border colors
edge width
edge color
doted/dashed edges
edge labels
make curved edges
graph layout (?layout_)

Hint: ?igraph

PART II - INTERACTIVE NETWORK VISUALIZATION

Let’s try and visualize interactive networks. You’ll need to install a new package called visNetwork. Once installed on your computer, load the package. This is a neat package that allows you to publish interactive networks on HTML pages, for example. So you can share your cool visualizations on a website or blog.

The visNetwork package requires data that represent nodes in a network and edges to and from nodes. We’ll use the same edgelist and node attributes dataframes that we used earlier, however these need to be in the form of a dataframe with specific column names. Take a look at the code below to see how to rename column headers in a newly created dataframe.

N.B. This package will only work when you knit your Rmarkdown document as an HTML file, and not a PDF file.

## let's make a new dataframe with just the edges to and from nodes
edges <- as.data.frame(cbind(d1$V1, d1$V2, d1$duration))
nodes <- as.data.frame(node_attributes$ID)
colnames(edges) <- c("from", "to","value")
colnames(nodes) <- c("id")

Try clicking on one of the nodes and drag it around. Cool, right?

visNetwork(nodes, edges)

Alright, now let’s add some features to the network. Note that the visNetork package has specifications for each argument that you can use to modify the network properties. Make sure you check out the help page for this package, or use this vignette.

Let’s color nodes by sex and add a legend to indicate what node colors refer to. Let’s also include a feature that highlights nodes in red when you click on them.

nodes$size <- node_attributes$age*10 ## node size here is a function of the age group
nodes$group <- node_attributes$sex ## nodes are colored by sex
nodes$color.highlight.border <- "red" ## highlights node borders in red when you click on them
visNetwork(nodes, edges) %>% 
  visLegend(useGroups=TRUE) ## adds a legend based on the groups defined above

You’ll notice in the options listed after visNetwork(nodes,edges), the symbol %>% is introduced. Each time you modify global options for your network, make sure you include this after every new option. See how this works in the code below.

We can also change the shape of nodes, label nodes by ID, and include a dropdown menu that allows you to highlight groups of nodes as well as the nodes nearest to the selected node.

nodes$label <- nodes$id ## adds labels next to each node

visNetwork(nodes, edges) %>% 
  visLegend() %>%
  visNodes(shape = "square") %>% ## you can change the shape of nodes too!
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)

Sometimes, scrolling in and out and navigating within the frame of the network gets tricky. Let’s add a widget that allows you control how you navigate.

visNetwork(nodes, edges) %>% 
  visLegend() %>% 
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visInteraction(navigationButtons = TRUE)

Fun, right? Okay, one last thing: sometimes, you just want a static network represented on your HTML page. To freeze the network, add the following line of code:

visNetwork(nodes, edges) %>% 
  visLegend() %>% 
  visInteraction(dragNodes = FALSE, dragView = FALSE, zoomView = FALSE)

This just scratches the surface of cool ways to visualize network data. Make sure you check out different features, and try working with a different dataset and create your own networks!

Exploring network visualization techniques

5/10/2018

Kenyan Households contact network dataset: