The cliche goes that the world is an increasingly interconnected place, and the connections between different entities are often best represented with a graph. Graphs are comprised of vertices (also often called “nodes”) and edges connecting those nodes. In this assignment, we will learn how to visualize networks using the igraph package in R.
For this assignment, we will visualize social networking data using anonymized data from Facebook; this data was originally curated in a recent paper about computing social circles in social networks. In our visualizations, the vertices in our network will represent Facebook users and the edges will represent these users being Facebook friends with each other.
The first file we will use, edges.csv, contains variables V1 and V2, which label the endpoints of edges in our network. Each row represents a pair of users in our graph who are Facebook friends. For a pair of friends A and B, edges.csv will only contain a single row – the smaller identifier will be listed first in this row. From this row, we will know that A is friends with B and B is friends with A.
The second file, users.csv, contains information about the Facebook users, who are the vertices in our network. This file contains the following variables:
id: A unique identifier for this user; this is the value that appears in the rows of edges.csv
gender: An identifier for the gender of a user taking the values A and B. Because the data is anonymized, we don’t know which value refers to males and which value refers to females.
school: An identifier for the school the user attended taking the values A and AB (users with AB attended school A as well as another school B). Because the data is anonymized, we don’t know the schools represented by A and B.
locale: An identifier for the locale of the user taking the values A and B. Because the data is anonymized, we don’t know which value refers to what locale.
Load the data from edges.csv into a data frame called edges, and load the data from users.csv into a data frame called users.
# Load data
edges = read.csv("edges.csv")
users = read.csv("users.csv")
# The number of observations
nrow(users)
## [1] 59
59 users.
# Output summary
(nrow(edges)*2)/nrow(users)
## [1] 4.949153
Average Number of Friends Per User = 4.949
# Most common locale
z = table(users$locale, users$school)
kable(z)
A | AB | ||
---|---|---|---|
3 | 0 | 0 | |
A | 6 | 0 | 0 |
B | 31 | 17 | 2 |
Locale B.
# Tabulate if AB or B is gender specific school
z = table(users$gender, users$school)
kable(z)
A | AB | ||
---|---|---|---|
1 | 1 | 0 | |
A | 11 | 3 | 1 |
B | 28 | 13 | 1 |
We see from table(users$gender, users$school) that both genders A and B have attended schools A and B.
We will be using the igraph package to visualize networks; install and load this package using the install.packages and library commands.
# Load igraph
library("igraph")
We can create a new graph object using the graph.data.frame() function.
# Set graph object
g = graph.data.frame(edges, FALSE, users)
Now, we want to plot our graph. By default, the vertices are large and have text labels of a user’s identifier. Because this would clutter the output, we will plot with no text labels and smaller vertices:
# Plot graph
plot(g, vertex.size=5, vertex.label=NA)
In this graph, there are a number of groups of nodes where all the nodes in each group are connected but the groups are disjoint from one another, forming “islands” in the graph. Such groups are called “connected components,” or “components” for short.
4 connected components with at least 2 nodes.
7 users are there with no friends in the network.
# Compute the degree
z = degree(g)
k = table(z >=10)
kable(k)
Var1 | Freq |
---|---|
FALSE | 50 |
TRUE | 9 |
9 users with 10 more friends in this network.
In a network, it’s often visually useful to draw attention to “important” nodes in the network. While this might mean different things in different contexts, in a social network we might consider a user with a large number of friends to be an important user. From the previous problem, we know this is the same as saying that nodes with a high degree are important users.
To visually draw attention to these nodes, we will change the size of the vertices so the vertices with high degrees are larger. To do this, we will change the “size” attribute of the vertices of our graph to be an increasing function of their degrees:
# Change the size attribute
V(g)$size = degree(g)/2+2
Now that we have specified the vertex size of each vertex, we will no longer use the vertex.size parameter when we plot our graph:
# Plot new graph
plot(g, vertex.label=NA)
# Maximum degree of g
max(V(g)$size)
## [1] 11
# Minimum degree of g
min(V(g)$size)
## [1] 2
Thus far, we have changed the “size” attributes of our vertices. However, we can also change the colors of vertices to capture additional information about the Facebook users we are depicting.
When changing the size of nodes, we first obtained the vertices of our graph with V(g) and then accessed the the size attribute with V(g)\(size. To change the color, we will update the attribute V(g)\)color.
To color the vertices based on the gender of the user, we will need access to that variable. When we created our graph g, we provided it with the data frame users, which had variables gender, school, and locale. These are now stored as attributes V(g)$gender, V(g)$school, and V(g)$locale.
We can update the colors by setting the color to black for all vertices, than setting it to red for the vertices with gender A and setting it to gray for the vertices with gender B:
# Update the colors
V(g)$color = "black"
V(g)$color[V(g)$gender == "A"] = "red"
V(g)$color[V(g)$gender == "B"] = "gray"
Plot the resulting graph.
plot(g, vertex.label=NA)
Gender B.
# Update colors for school
V(g)$color = "black"
V(g)$color[V(g)$school == "A"] = "red"
V(g)$color[V(g)$school == "AB"] = "gray"
# Plot graph
plot(g, vertex.label=NA)
Yes.
Some, but not all, of the high-degree users attended school A
# Update colors of locale
V(g)$color <- "black"
V(g)$color[V(g)$locale == "A"] <- "red"
V(g)$color[V(g)$locale == "B"] <- "green"
# Plot graph
plot(g, vertex.label=NA)
Locale B.
Locale A.
The help page is a helpful tool when making visualizations. Answer the following questions with the help of ?igraph.plotting and experimentation in your R console.
?igraph.plotting
rglplot
edge.width