This codethrough explores some basics of networks as well as a deeper dive into the program DiagrammeR
Before beginning, make sure that you install DiagrammeR and load it into your library. This codethrough will also be utilizing functions from the packages dplyr and kableExtra
#Clear out your Global Environment
rm(list = ls())
install.packages("DiagrammeR")
library(DiagrammeR)
library(dplyr)
library(pander)
library(kableExtra)I will demonstrate how to construct a network using data from a course discussion board. I will then show how this information may be visualized.
This topic is valuable because networks are everywhere! Think about the last 10 people you have spoken with. Now how about the last 10 people each of them spoke to? That’s a network!
Now what about in the news? Networks, like the one below (found here) have been used to track the spread of disease!
Genetic Network Analysis Provides Snapshot of Pandemic Origins
Following COVID-19, many classes moved online, changing the ways we all interact. This example will look at how students interacted with one another in a class discussion board.
Specifically, you’ll learn how to…
1.) Load data into DiagrammeR both manually and from existing data files
2.) Graph a network
3.) Customize how the network is visualized
Let’s start with a few definitions. For more detail, see this fabulous overview of network visualization
Nodes: Vertices in a network. These can be people, places, organizations, etc. In this example, they are students in a class identified by their initials.
Edges: Connections between nodes. These can be conversations, relationships, etc. In this example, an interaction occurred when a student posted on another student’s discussion post.
Node Attributes: Characteristics of the the node. In this example, node attributes include the student’s area of study and whether or not they had previous experience with R.
Edge Attributes: Characteristics of the edge/relationship. This could include when the connection occurred or the type of relationship. In this example, edge attributes include the time that the post was made.
We will begin with an example created from a class discussion board.
In order to map a network, DiagrammeR requires graph objects of class dgr_graph. For more detailed instructions, click here
One option is to enter a network into R manually. This is done by first creating the nodes and then the edges. Although simple, this is not recommended for large networks.
Note, the nodes will automatically be assigned an ID number. The edges should align with those ID numbers. So, in this example, if we wanted to say that student UA commented on student GB’s post, we would create a row with ID number 1 in the “from” column and ID number 2 in the “to” column.
# Create nodes (initials and area of study for all class members)
class_node<-
create_node_df(n = 25,
label = c("UA", "GB","MP","MR","BL",
"SW", "AE","IQ","SC","BS",
"KP","EC","RZ","MO","AC",
"DS","CK","AG","CC","AF",
"SK","AS","SS","BP","SQ"),
shape = "circle",
type = "student",
data = c("Economics", "Economics","Criminal Justice",
"Psychology","Accounting","Applied Linguistics",
"Public Health", "Political Science", "Neuroscience",
"Economics", "Public Health", "Political Science",
"Finance", "Finance", "Political Science", "Political Science",
"Economics","Public Health","Public Policy","Economics",
"Political Science","Public Policy","Criminal Justice",
"Economics","Public Health"))
#Add area of study as a node atrribute
#Create an edgelist (interaction on discussion board #2)
class_edge<-
create_edge_df(from = c(13,11,11,8,11,14,11,17,17,23,
15,23,20,4,11,23,4,3,4,19,11,
4,13,4,22,14),
#StudentID of the person who replied to a post
to = c(17,13,3,3,8,6,5,5,19,14,22,22,
11,11,4,20,20,21,21,15,15,15,
12,12,12,23))
#StudentID of the original poster
class_disscussion_manual <-
create_graph(nodes_df = class_node,
edges_df = class_edge)
#Let's check our Network
ndf_manual <- get_node_df(class_disscussion_manual) %>%
head()
kable(ndf_manual) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| id | type | label | shape | data |
|---|---|---|---|---|
| 1 | student | UA | circle | Economics |
| 2 | student | GB | circle | Economics |
| 3 | student | MP | circle | Criminal Justice |
| 4 | student | MR | circle | Psychology |
| 5 | student | BL | circle | Accounting |
| 6 | student | SW | circle | Applied Linguistics |
edf_manual <- get_edge_df(class_disscussion_manual) %>%
head()
kable(edf_manual) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| id | from | to | rel |
|---|---|---|---|
| 1 | 13 | 17 | NA |
| 2 | 11 | 13 | NA |
| 3 | 11 | 3 | NA |
| 4 | 8 | 3 | NA |
| 5 | 11 | 8 | NA |
| 6 | 14 | 6 | NA |
This is based on code from Node and Edge Data Frames
Now let’s try generating a network from an existing data set. This is the preferred method for large networks and/or networks with a lot of attributes.
The first data set, discussion nodes, contains the intitials of all students. It also includes some basic demographic information, such as field of interest and whether or not the student has experience in R.
The second data set, discussion edges, contains one line for each response on a single discussion board. Let’s look at the data:
#Data set with network nodes
nodes_messy <- read.csv("discussion_nodes.csv")
n_m <- head(nodes_messy)
kable(n_m)| ï..Initials | Student.ID | Graduate | Undergraduate | Career | Field | Experienced.in.R. |
|---|---|---|---|---|---|---|
| UA | 1 | 1 | 0 | 0 | Economics | 0 |
| GB | 2 | 1 | 0 | 0 | Economics | 1 |
| MP | 3 | 1 | 0 | 0 | Criminal Justice | 0 |
| MR | 4 | 1 | 0 | 0 | Psychology | 1 |
| BL | 5 | 1 | 0 | 0 | Accounting | 0 |
| SW | 6 | 1 | 0 | 0 | Applied Linguistics | 1 |
##Let's clean up the columns using dplyr
nodes <-nodes_messy %>%
rename ("Initials" = ï..Initials,
"Experienced.in.R" = Experienced.in.R.) %>%
select("Student.ID", "Initials", "Graduate",
"Field", "Experienced.in.R")
n <-head(nodes)
kable(n) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Student.ID | Initials | Graduate | Field | Experienced.in.R |
|---|---|---|---|---|
| 1 | UA | 1 | Economics | 0 |
| 2 | GB | 1 | Economics | 1 |
| 3 | MP | 1 | Criminal Justice | 0 |
| 4 | MR | 1 | Psychology | 1 |
| 5 | BL | 1 | Accounting | 0 |
| 6 | SW | 1 | Applied Linguistics | 1 |
#Data set with network edges
edges_messy <-read.csv("discussion_edges.csv")
e_m <- head(nodes_messy)
kable(e_m)| ï..Initials | Student.ID | Graduate | Undergraduate | Career | Field | Experienced.in.R. |
|---|---|---|---|---|---|---|
| UA | 1 | 1 | 0 | 0 | Economics | 0 |
| GB | 2 | 1 | 0 | 0 | Economics | 1 |
| MP | 3 | 1 | 0 | 0 | Criminal Justice | 0 |
| MR | 4 | 1 | 0 | 0 | Psychology | 1 |
| BL | 5 | 1 | 0 | 0 | Accounting | 0 |
| SW | 6 | 1 | 0 | 0 | Applied Linguistics | 1 |
##Clean up time
edges <- edges_messy %>%
rename ("Poster.Initials" = ï..Poster.Initials,
"Responder.Initials"= Responder.Inititials) %>%
select("Poster.ID", "Responder.ID")
e <- edges %>%
head()
kable(e) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Poster.ID | Responder.ID |
|---|---|
| 17 | 13 |
| 13 | 11 |
| 3 | 11 |
| 3 | 8 |
| 8 | 11 |
| 6 | 14 |
Now that our data is cleaned up, let’s load it into a network graph
class_disscussion_auto <-
create_graph() %>% #create an empty dgr_graph
#retrieve network nodes from the dataframe "nodes"
add_nodes_from_table(table = nodes,
label_col = Initials)%>%
#retrieve network edges from the dataframe "edges"
add_edges_from_table(table = edges,
from_col = Responder.ID,
to_col = Poster.ID,
from_to_map = Student.ID,
rel_col = Time.of.Response)
#Note, the numbers in Student ID correspond to the numbers
##in the Responder.ID. and Poster.ID. columns
##If these numbers do not match, DiamgrammeR cannot bring
##the two datasets together
#Let's check our Network
ndf_auto <- get_node_df(class_disscussion_auto) %>%
head()
kable(ndf_auto) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| id | type | label | Student.ID | Graduate | Field | Experienced.in.R |
|---|---|---|---|---|---|---|
| 1 | NA | UA | 1 | 1 | Economics | 0 |
| 2 | NA | GB | 2 | 1 | Economics | 1 |
| 3 | NA | MP | 3 | 1 | Criminal Justice | 0 |
| 4 | NA | MR | 4 | 1 | Psychology | 1 |
| 5 | NA | BL | 5 | 1 | Accounting | 0 |
| 6 | NA | SW | 6 | 1 | Applied Linguistics | 1 |
edf_auto <- get_edge_df(class_disscussion_auto) %>%
head()
kable(edf_auto) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| id | from | to | rel |
|---|---|---|---|
| 1 | 13 | 17 | NA |
| 2 | 11 | 13 | NA |
| 3 | 11 | 3 | NA |
| 4 | 8 | 3 | NA |
| 5 | 11 | 8 | NA |
| 6 | 14 | 6 | NA |
You may have noticed something interesting in the node table. Even though we did not specifically mention “Field” or “Experience in R” in the code to create the dgr_graph, these variables were still included as node attributes. We will come back to these attributes shortly.
The code used to generate the dgr_graph is adapted from examples provided on the extremely helpful DiagrammeR GitHub page
Now that we have made our network(s), we should graph them!
As you can see these are both directed networks. This means that the direction of the interaction matters and is indicated by an arrow (instead of just a line).If we look at student MP, we can see that they replied to student SK’s post. Students MO and SS replied to each other.
Good news! Both networks look the same. This means that we didn’t make any errors in our data entry. For the sake of simplicity, we will be using the network generated from the csv files (“class_discussion_auto”) for the rest of the examples.
For more information on creating network graphs, see Creating Simple Graphs from NDFs/EDFs
Although DiagrammeR automatically generates a network graph, there are many ways that we can customize our network graphs to make them even more useful!
We can tell it to arrange the data differently:
Networks are extremely visual! So choosing the arrangement that is best suited to your data can make a huge differenceFirst, let’s learn a little more about the node attribute “Field”
#First, let's learn more about our data using dplr
sumstats_nodes <- nodes %>%
group_by(Field) %>%
summarize(Fields = n()) %>%
arrange(desc(Fields)) %>%
ungroup()
kable(sumstats_nodes) %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Field | Fields |
|---|---|
| Economics | 6 |
| Political Science | 5 |
| Public Health | 4 |
| Criminal Justice | 2 |
| Finance | 2 |
| Public Policy | 2 |
| Accounting | 1 |
| Applied Linguistics | 1 |
| Neuroscience | 1 |
| Psychology | 1 |
Among students on the discussion board, Economics was the most common field.
So let’s highlight the students in Economics in orange.
nodeseconomics_graph <-
class_disscussion_auto %>%
select_nodes(conditions = Field == "Economics") %>%
set_node_attrs_ws(node_attr = fillcolor, value = "orange") %>%
clear_selection()
#Use selecte_nodes to highlight students in economics
render_graph(nodeseconomics_graph)nodes_field_graph <-
class_disscussion_auto %>%
colorize_node_attrs(node_attr_from = Field,
node_attr_to = fillcolor,
palette = "Spectral",
alpha = 90)
render_graph(nodes_field_graph)Hopefully this codethrough peaked your interest in networks and DiagrammeR!
Learn more about Visualizing Networks with the following:
Resource I Preparing Network Data in R
Resource II Introduction to Network Analysis
Resource III Creating a Node Selection
This code through references and cites the following sources:
cran.r-project.org DiagrammeR
cran.r-project.org Creating Simple Graphs from NDFs/EDFs
cran.r-project.org Selections
Iannone, R. (2020) GitHub