Network Analysis for Contact Tracing

The following outlines an experimental approach to the analysis of contact tracing data, leveraging network analysis methods to observe connections between sources and targets of infectious disease.

The example here uses sample contact tracing data, generously provided by authors of the Epidemiologist R Handbook, and explores relationships between sources and targets during a COVID19 outbreak. Sources are people with illness, and targets those they have had recent, proximal contact with. Through documentation of these source and target connections we are able to create an interactive map of acyclic exposure situations, and hopefully learn something from it..

# install and load packages
if (!require("pacman")) install.packages("pacman")

library(pacman) 

pacman::p_load(
  tidyverse, # wrangling 
  visNetwork, # network visualization
  prettydoc) # knitting

# import data from url 
relationships <- read_rds(
  url("https://github.com/appliedepi/epiRhandbook_eng/blob/master/data/godata/relationships_clean.rds?raw=true")) %>% 
  # select "to" and "from" variables
   select(source_visualid, target_visualid) %>%
  # reduce id strings for legibility
  mutate(source_visualid = str_remove_all(source_visualid, "(-2020)"),
         target_visualid = str_remove_all(target_visualid, "(-2020)"))

This dataset is one of four used in a wider contact investigation excercise found here. For an example of wrangling data for network analysis, I am focusing on the relationships dataset used in the chapter, and two variables of that data in particular: source_visualid and target_visualid. These two columns document frequency and directionality of connections between people together, observed through a contact tracing procedure.

It may not seem obvious, but the observations in the relationships data represent contact events between sources and targets. To recognize that, we can countevents in both the source and target variables.

# count source contact events
relationships %>% 
  count(source_visualid, sort = TRUE)

## # A tibble: 23 x 2
##    source_visualid     n
##    <chr>           <int>
##  1 <NA>               17
##  2 CASE-0001          13
##  3 CASE-0002           5
##  4 CASE-0005           5
##  5 CASE-0013           5
##  6 CASE-0004           4
##  7 CASE-0018           4
##  8 CASE-0023           4
##  9 CASE-0034           4
## 10 CASE-0006           3
## # … with 13 more rows

# count target contact events
relationships %>% 
  count(target_visualid, sort = TRUE)

## # A tibble: 84 x 2
##    target_visualid     n
##    <chr>           <int>
##  1 <NA>                4
##  2 CONTACT-0046        3
##  3 CONTACT-0056        3
##  4 CASE-0006           2
##  5 CASE-0008           2
##  6 CASE-0009           2
##  7 CONTACT-0015        2
##  8 CONTACT-0027        2
##  9 CONTACT-0028        2
## 10 CONTACT-0029        2
## # … with 74 more rows

The two main ingredients of a network graph are nodes and edges. Nodes can be thought of as units of observation, and edges are the threads that connect nodes together. To make nodes, we fork our target and source variables into seperate objects that will become the distinct units, in this case unique people who are included in the contact tracing investigation.

# establish nodes for cases, "sources" of contagion
source_nodes <- relationships %>% 
  distinct(source_visualid) %>% 
  rename(label = source_visualid)

# establish contacts nodes, or "targets"
target_nodes <- relationships %>% 
  distinct(target_visualid) %>% 
  rename(label = target_visualid)

# join both into tibble
ct_nodes <- full_join(source_nodes, 
                      target_nodes) %>%
  
  # make unique id number for each node
  rowid_to_column("id") %>% 
  # remove missing values 
  na.omit() 

head(ct_nodes)

## # A tibble: 6 x 2
##      id label    
##   <int> <chr>    
## 1     1 CASE-0016
## 2     3 CASE-0045
## 3     4 CASE-0004
## 4     5 CASE-0010
## 5     6 CASE-0034
## 6     7 CASE-0037

Once we have a list of the distinct nodes, each with unique id values, we can join ct_nodes to a new fork of the relationships data that will make our graph edges. The id variable is joined in two different ways: from the source_visualid as from and the target_visualid as to.

# create "edges", lines between nodes 
ct_edges <- relationships %>% 
  select(target_visualid, source_visualid) %>% 
  # join the sources and rename those ids as from
  left_join(ct_nodes, by = c("source_visualid" = "label")) %>% 
  rename(from = id) %>% 
   # then join the targets and rename those ids as to
  left_join(ct_nodes, by = c("target_visualid" = "label")) %>% 
  rename(to = id)
  
head(ct_edges)

## # A tibble: 6 x 4
##   target_visualid source_visualid  from    to
##   <chr>           <chr>           <int> <int>
## 1 CONTACT-0027    CASE-0016           1    24
## 2 CASE-0014       <NA>               NA    17
## 3 CASE-0031       <NA>               NA    18
## 4 CASE-0021       CASE-0045           3    25
## 5 CONTACT-0020    CASE-0004           4    26
## 6 CONTACT-0038    CASE-0010           5    27

The ct_edges object now contains all the observable connections between distinct nodes, directionalilty is obtained through the to and from variables. With them, we can put togther a fairly nice interactive graph using the visNetwork package. We can also add a few basic options, like a node id selection dropdown.

# visualise
visNetwork(ct_nodes, ct_edges) %>% 
  visNodes(shadow = list(enabled = TRUE, 
                         size = 10)) %>% 
  # options for edges
  visEdges(arrows = "middle",
           width = 2, 
           hoverWidth = 20) %>%
  # other options
  visOptions(nodesIdSelection = TRUE)

This interactive object is a visualization of the connections from sources (cases) to targets (contacts) in the relationships dataset. I strongly suggest zooming in and out of the graph in your browser, clicking and dragging on the nodes helps make sense of the shapes together. The scale of the graph can zoom out pretty easily as well, so just reload this page if it gets lost in whitespace.

Can we learn anything about the outbreak from this network graph? The visualization is interesting to look at and play around with, but in many ways it is a beginning, an exploratory view into what network analysis can offer the field of contact tracing.

If you like what you see, or have specific questions or feedback, feel free to email me directly: avery.richards@berkeley.edu

Network Analysis for Contact Tracing

Avery Richards

12/20/2021