Introduction

This codethrough explores some basics of networks as well as a deeper dive into the program DiagrammeR

Before beginning, make sure that you install DiagrammeR and load it into your library. This codethrough will also be utilizing functions from the packages dplyr and kableExtra

#Clear out your Global Environment
rm(list = ls())

install.packages("DiagrammeR")

library(DiagrammeR)
library(dplyr)
library(pander)
library(kableExtra)

Content Overview

I will demonstrate how to construct a network using data from a course discussion board. I will then show how this information may be visualized.

Why You Should Care

This topic is valuable because networks are everywhere! Think about the last 10 people you have spoken with. Now how about the last 10 people each of them spoke to? That’s a network!

Now what about in the news? Networks, like the one below (found here) have been used to track the spread of disease!

Genetic Network Analysis Provides Snapshot of Pandemic Origins

Following COVID-19, many classes moved online, changing the ways we all interact. This example will look at how students interacted with one another in a class discussion board.

Learning Objectives

Specifically, you’ll learn how to…

1.) Load data into DiagrammeR both manually and from existing data files
2.) Graph a network
3.) Customize how the network is visualized

Definitions

Let’s start with a few definitions. For more detail, see this fabulous overview of network visualization

Nodes: Vertices in a network. These can be people, places, organizations, etc. In this example, they are students in a class identified by their initials.

Edges: Connections between nodes. These can be conversations, relationships, etc. In this example, an interaction occurred when a student posted on another student’s discussion post.

Node Attributes: Characteristics of the the node. In this example, node attributes include the student’s area of study and whether or not they had previous experience with R.

Edge Attributes: Characteristics of the edge/relationship. This could include when the connection occurred or the type of relationship. In this example, edge attributes include the time that the post was made.

Creating a dgr_graph

We will begin with an example created from a class discussion board.

In order to map a network, DiagrammeR requires graph objects of class dgr_graph. For more detailed instructions, click here

Creating a Network Manually

One option is to enter a network into R manually. This is done by first creating the nodes and then the edges. Although simple, this is not recommended for large networks.

Note, the nodes will automatically be assigned an ID number. The edges should align with those ID numbers. So, in this example, if we wanted to say that student UA commented on student GB’s post, we would create a row with ID number 1 in the “from” column and ID number 2 in the “to” column.

# Create nodes (initials and area of study for all class members)

class_node<- 
  create_node_df(n = 25,
                 label = c("UA", "GB","MP","MR","BL",
                           "SW", "AE","IQ","SC","BS",
                           "KP","EC","RZ","MO","AC",
                           "DS","CK","AG","CC","AF",
                           "SK","AS","SS","BP","SQ"), 
                 shape = "circle", 
                 type = "student",
                 data = c("Economics", "Economics","Criminal Justice",
                          "Psychology","Accounting","Applied Linguistics",
                          "Public Health", "Political Science", "Neuroscience",
                          "Economics", "Public Health", "Political Science",
                          "Finance", "Finance", "Political Science", "Political Science",
                          "Economics","Public Health","Public Policy","Economics",
                          "Political Science","Public Policy","Criminal Justice",
                          "Economics","Public Health"))
                          #Add area of study as a node atrribute

#Create an edgelist (interaction on discussion board #2)  
class_edge<-
  create_edge_df(from = c(13,11,11,8,11,14,11,17,17,23,
                          15,23,20,4,11,23,4,3,4,19,11,
                          4,13,4,22,14), 
                          #StudentID of the person who replied to a post
                 to   = c(17,13,3,3,8,6,5,5,19,14,22,22,
                          11,11,4,20,20,21,21,15,15,15,
                          12,12,12,23))
                          #StudentID of the original poster

class_disscussion_manual <-
  create_graph(nodes_df = class_node,
               edges_df = class_edge)


#Let's check our Network
ndf_manual <- get_node_df(class_disscussion_manual) %>%
  head()
kable(ndf_manual) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

id	type	label	shape	data
1	student	UA	circle	Economics
2	student	GB	circle	Economics
3	student	MP	circle	Criminal Justice
4	student	MR	circle	Psychology
5	student	BL	circle	Accounting
6	student	SW	circle	Applied Linguistics

edf_manual <- get_edge_df(class_disscussion_manual) %>%
  head()
kable(edf_manual) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

id	from	to	rel
1	13	17	NA
2	11	13	NA
3	11	3	NA
4	8	3	NA
5	11	8	NA
6	14	6	NA

This is based on code from Node and Edge Data Frames

Creating a Network from Data

Now let’s try generating a network from an existing data set. This is the preferred method for large networks and/or networks with a lot of attributes.

Examining Your Data

The first data set, discussion nodes, contains the intitials of all students. It also includes some basic demographic information, such as field of interest and whether or not the student has experience in R.

The second data set, discussion edges, contains one line for each response on a single discussion board. Let’s look at the data:

#Data set with network nodes
nodes_messy <- read.csv("discussion_nodes.csv")

n_m <- head(nodes_messy)

kable(n_m)

ï..Initials	Student.ID	Graduate	Field	Experienced.in.R.
UA	1	1	Economics	0
GB	2	1	Economics	1
MP	3	1	Criminal Justice	0
MR	4	1	Psychology	1
BL	5	1	Accounting	0
SW	6	1	Applied Linguistics	1

##Let's clean up the columns using dplyr
nodes <-nodes_messy %>%
    rename ("Initials" = ï..Initials,
            "Experienced.in.R" = Experienced.in.R.) %>%
    select("Student.ID", "Initials", "Graduate",
           "Field", "Experienced.in.R")

n <-head(nodes)

kable(n) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Student.ID	Initials	Graduate	Field	Experienced.in.R
1	UA	1	Economics	0
2	GB	1	Economics	1
3	MP	1	Criminal Justice	0
4	MR	1	Psychology	1
5	BL	1	Accounting	0
6	SW	1	Applied Linguistics	1

#Data set with network edges
edges_messy <-read.csv("discussion_edges.csv")

e_m <- head(nodes_messy)

kable(e_m)

ï..Initials	Student.ID	Graduate	Field	Experienced.in.R.
UA	1	1	Economics	0
GB	2	1	Economics	1
MP	3	1	Criminal Justice	0
MR	4	1	Psychology	1
BL	5	1	Accounting	0
SW	6	1	Applied Linguistics	1

##Clean up time
edges <- edges_messy %>%
  rename ("Poster.Initials" = ï..Poster.Initials, 
          "Responder.Initials"= Responder.Inititials) %>%
  select("Poster.ID", "Responder.ID")
  
e <- edges %>%
     head()

kable(e) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Poster.ID	Responder.ID
17	13
13	11
3	11
3	8
8	11
6	14

Turning Data into a dgr_graph

Now that our data is cleaned up, let’s load it into a network graph

class_disscussion_auto <- 
  create_graph() %>% #create an empty dgr_graph
  #retrieve network nodes from the dataframe "nodes"
  add_nodes_from_table(table = nodes,
                       label_col = Initials)%>%
  #retrieve network edges from the dataframe "edges"
  add_edges_from_table(table = edges,
                       from_col = Responder.ID,
                       to_col = Poster.ID,
                       from_to_map = Student.ID,
                       rel_col = Time.of.Response) 
                       #Note, the numbers in Student ID correspond to the numbers 
                      ##in the Responder.ID. and Poster.ID. columns
                      ##If these numbers do not match, DiamgrammeR cannot bring 
                      ##the two datasets together

#Let's check our Network
ndf_auto <- get_node_df(class_disscussion_auto) %>%
  head()
kable(ndf_auto) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

id	type	label	Student.ID	Graduate	Field	Experienced.in.R
1	NA	UA	1	1	Economics	0
2	NA	GB	2	1	Economics	1
3	NA	MP	3	1	Criminal Justice	0
4	NA	MR	4	1	Psychology	1
5	NA	BL	5	1	Accounting	0
6	NA	SW	6	1	Applied Linguistics	1

edf_auto <- get_edge_df(class_disscussion_auto) %>%
  head()
kable(edf_auto) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

id	from	to	rel
1	13	17	NA
2	11	13	NA
3	11	3	NA
4	8	3	NA
5	11	8	NA
6	14	6	NA

You may have noticed something interesting in the node table. Even though we did not specifically mention “Field” or “Experience in R” in the code to create the dgr_graph, these variables were still included as node attributes. We will come back to these attributes shortly.

The code used to generate the dgr_graph is adapted from examples provided on the extremely helpful DiagrammeR GitHub page

Graph Your Network

Now that we have made our network(s), we should graph them!

render_graph(class_disscussion_manual)

render_graph(class_disscussion_auto)

As you can see these are both directed networks. This means that the direction of the interaction matters and is indicated by an arrow (instead of just a line).

If we look at student MP, we can see that they replied to student SK’s post. Students MO and SS replied to each other.

Good news! Both networks look the same. This means that we didn’t make any errors in our data entry. For the sake of simplicity, we will be using the network generated from the csv files (“class_discussion_auto”) for the rest of the examples.

For more information on creating network graphs, see Creating Simple Graphs from NDFs/EDFs

Customizing Our Graph

Although DiagrammeR automatically generates a network graph, there are many ways that we can customize our network graphs to make them even more useful!

Data Arrangement

We can tell it to arrange the data differently:

render_graph(class_disscussion_auto, 
             layout = "nicely")

Networks are extremely visual! So choosing the arrangement that is best suited to your data can make a huge difference

Selecting Nodes by Attributes

First, let’s learn a little more about the node attribute “Field”

#First, let's learn more about our data using dplr
sumstats_nodes <- nodes %>%
  group_by(Field) %>%
  summarize(Fields = n()) %>%
  arrange(desc(Fields)) %>%
  ungroup()
kable(sumstats_nodes) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Field	Fields
Economics	6
Political Science	5
Public Health	4
Criminal Justice	2
Finance	2
Public Policy	2
Accounting	1
Applied Linguistics	1
Neuroscience	1
Psychology	1

Among students on the discussion board, Economics was the most common field.

So let’s highlight the students in Economics in orange.

nodeseconomics_graph <-
  class_disscussion_auto %>%
  select_nodes(conditions = Field == "Economics") %>%
  set_node_attrs_ws(node_attr = fillcolor, value = "orange") %>%
  clear_selection()
  #Use selecte_nodes to highlight students in economics

render_graph(nodeseconomics_graph)

That was pretty cool! Now we will give every field another color, using the Spectral color palette. Each color will represent a different field of study.

nodes_field_graph <-
  class_disscussion_auto %>%
  colorize_node_attrs(node_attr_from = Field,
                      node_attr_to = fillcolor,
                      palette = "Spectral",
                      alpha = 90)
  
render_graph(nodes_field_graph)

Further Resources

Hopefully this codethrough peaked your interest in networks and DiagrammeR!

Learn more about Visualizing Networks with the following:

Resource I Preparing Network Data in R
Resource II Introduction to Network Analysis
Resource III Creating a Node Selection

Works Cited

This code through references and cites the following sources:

cran.r-project.org DiagrammeR
cran.r-project.org Creating Simple Graphs from NDFs/EDFs
cran.r-project.org Selections
Iannone, R. (2020) GitHub

Graphing Networks in DiagrammeR

Georgia State University - Coding in R for Policy Analytics

Melinda Reed

27 July 2020