Introduction

This codethrough explores some basics of networks as well as a deeper dive into the program DiagrammeR

Before beginning, make sure that you install DiagrammeR and load it into your library. This codethrough will also be utilizing functions from the packages dplyr and kableExtra


Content Overview

I will demonstrate how to construct a network using data from a course discussion board. I will then show how this information may be visualized.


Why You Should Care

This topic is valuable because networks are everywhere! Think about the last 10 people you have spoken with. Now how about the last 10 people each of them spoke to? That’s a network!

Now what about in the news? Networks, like the one below (found here) have been used to track the spread of disease!


Genetic Network Analysis Provides Snapshot of Pandemic Origins


Following COVID-19, many classes moved online, changing the ways we all interact. This example will look at how students interacted with one another in a class discussion board.


Learning Objectives

Specifically, you’ll learn how to…


1.) Load data into DiagrammeR both manually and from existing data files
2.) Graph a network
3.) Customize how the network is visualized



Definitions

Let’s start with a few definitions. For more detail, see this fabulous overview of network visualization


Nodes: Vertices in a network. These can be people, places, organizations, etc. In this example, they are students in a class identified by their initials.

Edges: Connections between nodes. These can be conversations, relationships, etc. In this example, an interaction occurred when a student posted on another student’s discussion post.

Node Attributes: Characteristics of the the node. In this example, node attributes include the student’s area of study and whether or not they had previous experience with R.

Edge Attributes: Characteristics of the edge/relationship. This could include when the connection occurred or the type of relationship. In this example, edge attributes include the time that the post was made.


Creating a dgr_graph

We will begin with an example created from a class discussion board.

In order to map a network, DiagrammeR requires graph objects of class dgr_graph. For more detailed instructions, click here

Creating a Network Manually

One option is to enter a network into R manually. This is done by first creating the nodes and then the edges. Although simple, this is not recommended for large networks.


Note, the nodes will automatically be assigned an ID number. The edges should align with those ID numbers. So, in this example, if we wanted to say that student UA commented on student GB’s post, we would create a row with ID number 1 in the “from” column and ID number 2 in the “to” column.

id type label shape data
1 student UA circle Economics
2 student GB circle Economics
3 student MP circle Criminal Justice
4 student MR circle Psychology
5 student BL circle Accounting
6 student SW circle Applied Linguistics
id from to rel
1 13 17 NA
2 11 13 NA
3 11 3 NA
4 8 3 NA
5 11 8 NA
6 14 6 NA

This is based on code from Node and Edge Data Frames


Creating a Network from Data

Now let’s try generating a network from an existing data set. This is the preferred method for large networks and/or networks with a lot of attributes.


Examining Your Data

The first data set, discussion nodes, contains the intitials of all students. It also includes some basic demographic information, such as field of interest and whether or not the student has experience in R.

The second data set, discussion edges, contains one line for each response on a single discussion board. Let’s look at the data:

ï..Initials Student.ID Graduate Undergraduate Career Field Experienced.in.R.
UA 1 1 0 0 Economics 0
GB 2 1 0 0 Economics 1
MP 3 1 0 0 Criminal Justice 0
MR 4 1 0 0 Psychology 1
BL 5 1 0 0 Accounting 0
SW 6 1 0 0 Applied Linguistics 1
Student.ID Initials Graduate Field Experienced.in.R
1 UA 1 Economics 0
2 GB 1 Economics 1
3 MP 1 Criminal Justice 0
4 MR 1 Psychology 1
5 BL 1 Accounting 0
6 SW 1 Applied Linguistics 1
ï..Initials Student.ID Graduate Undergraduate Career Field Experienced.in.R.
UA 1 1 0 0 Economics 0
GB 2 1 0 0 Economics 1
MP 3 1 0 0 Criminal Justice 0
MR 4 1 0 0 Psychology 1
BL 5 1 0 0 Accounting 0
SW 6 1 0 0 Applied Linguistics 1
Poster.ID Responder.ID
17 13
13 11
3 11
3 8
8 11
6 14

Turning Data into a dgr_graph

Now that our data is cleaned up, let’s load it into a network graph

id type label Student.ID Graduate Field Experienced.in.R
1 NA UA 1 1 Economics 0
2 NA GB 2 1 Economics 1
3 NA MP 3 1 Criminal Justice 0
4 NA MR 4 1 Psychology 1
5 NA BL 5 1 Accounting 0
6 NA SW 6 1 Applied Linguistics 1
id from to rel
1 13 17 NA
2 11 13 NA
3 11 3 NA
4 8 3 NA
5 11 8 NA
6 14 6 NA

You may have noticed something interesting in the node table. Even though we did not specifically mention “Field” or “Experience in R” in the code to create the dgr_graph, these variables were still included as node attributes. We will come back to these attributes shortly.


The code used to generate the dgr_graph is adapted from examples provided on the extremely helpful DiagrammeR GitHub page


Graph Your Network

Now that we have made our network(s), we should graph them!

As you can see these are both directed networks. This means that the direction of the interaction matters and is indicated by an arrow (instead of just a line).


If we look at student MP, we can see that they replied to student SK’s post. Students MO and SS replied to each other.


Good news! Both networks look the same. This means that we didn’t make any errors in our data entry. For the sake of simplicity, we will be using the network generated from the csv files (“class_discussion_auto”) for the rest of the examples.


For more information on creating network graphs, see Creating Simple Graphs from NDFs/EDFs


Customizing Our Graph

Although DiagrammeR automatically generates a network graph, there are many ways that we can customize our network graphs to make them even more useful!

Data Arrangement

We can tell it to arrange the data differently:

Networks are extremely visual! So choosing the arrangement that is best suited to your data can make a huge difference

Selecting Nodes by Attributes

First, let’s learn a little more about the node attribute “Field”

Field Fields
Economics 6
Political Science 5
Public Health 4
Criminal Justice 2
Finance 2
Public Policy 2
Accounting 1
Applied Linguistics 1
Neuroscience 1
Psychology 1

Among students on the discussion board, Economics was the most common field.

So let’s highlight the students in Economics in orange.

That was pretty cool! Now we will give every field another color, using the Spectral color palette. Each color will represent a different field of study.


Further Resources

Hopefully this codethrough peaked your interest in networks and DiagrammeR!

Learn more about Visualizing Networks with the following:




Works Cited

This code through references and cites the following sources: