Load packages readr packages for reading in the data frame tidyverse for tidying the data ggplot2 for plotting igraph for SNA visualization ggraph for SNA visualization
library(tidyverse)
library(ggplot2)
library(tidygraph)
library(ggraph)
library(igraph)
library(readr)
library(readxl)
#read in year 1 and add numbers for colnames
year_1_collaboration <- read_excel("data/year_1_collaboration.xlsx",
col_names = FALSE)
#glmipse your data to undestand the observation, and variables
head(year_1_collaboration)
## # A tibble: 6 x 43
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 3 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 4 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 3 0 0 0 0
## 6 3 0 0 0 0 0 0 0 0 0 0 0 0
## # ... with 30 more variables: ...14 <dbl>, ...15 <dbl>, ...16 <dbl>,
## # ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>, ...21 <dbl>,
## # ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>, ...26 <dbl>,
## # ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>, ...31 <dbl>,
## # ...32 <dbl>, ...33 <dbl>, ...34 <dbl>, ...35 <dbl>, ...36 <dbl>,
## # ...37 <dbl>, ...38 <dbl>, ...39 <dbl>, ...40 <dbl>, ...41 <dbl>,
## # ...42 <dbl>, ...43 <dbl>
From inspecting the data we see that there are 43 columns and 43 rows with numbers ranging from 0 to 4. It appears that this data set shows the ties in some form. However to visualize it we much first restructure the data into a formal matrix, then convert to a network class R object required by the ggraph packages. In order to determine the dyads in our networks we need to anonymize the columns and rows to 1:43 for unique column names.
rownames(year_1_collaboration) <- 1:43
colnames(year_1_collaboration) <- 1:43
#change to a formal matrix class
year_1_matrix <- as.matrix(year_1_collaboration )
Now we will need to change the matrix into a table graph so that we can use it in ggraph.
year_1_network <- as_tbl_graph(year_1_matrix, directed = TRUE)
After checking the class of the year_1_network we see that it is a table graph and igraph class.
#check the class
class(year_1_network)
## [1] "tbl_graph" "igraph"
I know the simple igraph package argument plot() will allow for a simple sociogram of the network to visualize the relationships. By creating this first I have a frame of reference when I try and create a visualization with ggraph.
#plot using igraph plot argument
plot(year_1_network)
The ggraph also has a simple argument, autograph(), to visualize a simple sociogram. For graph I want to explore the argument graph_from_data_frame to see if I can create a visualization from the original data and then from the matrix created. Using the two data frames I was unsuccessful. Then I will visualize it with the gggraph function autograph(). The graphs produced did not look similar to the igraph visualization, meaning I am unable to use the iniital data frame nor the matrix.
graph <- graph_from_data_frame(year_1_matrix)
graph2 <- graph_from_data_frame(year_1_collaboration)
autograph(graph)
autograph(graph2)
From reading more about ggraph I learned that it requires 3 main functions. The nodes, the edges and the layout. The functions and associated arguments add and modify the nodes of the network plot. Additionally, the functions and associated arguments add and modify the edges of the network plot.
autograph(year_1_network)
Let’s create our first sociogram with the ggraph format that include the layout, edge and nodes.
#create ggraph with layout, edge and nodes
ggraph(year_1_network) +
geom_edge_link() +
geom_node_point()
#Draw Edges geom_edge_link() in action, which draws a straight line between the connected nodes, use after_stat to split up the line in a bunch of small fragments and it is possible to use that to draw a gradient along the edge, to show direction:
ggraph(year_1_network, layout = "stress") +
geom_edge_link(aes(alpha = after_stat(index)))
In this data set the relationships were weighted 1 - 4 so the lines are colored with respect to the weight. That weighted number came from the creation of our year-1-network. Take a look below. I also found out that weight is only for directed networks.
#add color to the type of weighted relationship 1:4
ggraph(year_1_network) +
geom_edge_link(aes(color = weight)) +
geom_node_point()
In order to show parallel edges you can either use geom_edge_fan() or geom_edge_parallel() which helps when multiple edges running between the same nodes
ggraph(year_1_network, layout = "stress") +
geom_edge_fan()
Data imaginist explains “the nodes in a graph are the abstract concepts of entities, and the layout is their physical placement, the node geoms is the visual manifestation of the entities.”
x and y position is encoded in an x and y column. This means that geom_node_* can default the x and y aesthetics - so you see something like a scatter plot
ggraph(year_1_network, layout = 'kk') +
geom_node_point()
ggraph(year_1_network, layout = 'partition') +
geom_node_tile(aes(y = -y, fill = depth))
ggraph geoms gets a filter aesthetic that allows you to quickly filter the input data. The use of this can be illustrated when plotting a tree. In the above plot only the terminal nodes are drawn by filtering on the logical leaf column provided by the dendrogram layout
ggraph(year_1_network, layout = 'dendrogram', circular = TRUE) +
geom_edge_diagonal() +
geom_node_point(aes(filter = leaf)) +
coord_fixed()
l <- ggraph(year_1_network, layout = 'partition', circular = TRUE)
l + geom_edge_diagonal(aes(width = ..index.., alpha = ..index..), lineend = 'round') +
scale_edge_width(range = c(0.2, 1.5)) +
geom_node_point(aes(colour = depth)) +
coord_fixed()
l <- ggraph(year_1_network, layout = 'partition', circular = TRUE)
l + geom_node_arc_bar(aes(fill = depth)) +
coord_fixed()
Add labels for the nodes names 1:43
#add node names (1:43)
ggraph(year_1_network) +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), repel=TRUE)
What if we add arrows will that make that better? To get the edges to stop before they reach the point so that the arrow is not obscured. This is possible in ggraph using the start_cap and end_cap aesthetics which allow you to specify a clipping region around the terminal nodes.
ggraph(year_1_network, layout = 'graphopt') +
geom_edge_link(aes(start_cap = label_rect(node1.name), end_cap = label_rect(node2.name)), arrow = arrow(type = "closed", length = unit(1, 'mm'))) +
geom_node_text(aes(label = name)) +
theme_graph()
This didn’t look very good so I needed to look up layouts and see how I can fix this.
Try this one with the arrows and change the seed to find a graph you like the position of the nodes is unspecified by the data, and they’re placed randomly. To make the output repeatable, you can set the random seed before making the plot. You can try different random numbers until you get a result that you like:
set.seed(300)
# Remove unnecessary margins
par(mar = c(0, 0, 0, 0))
plot(year_1_network, layout = layout.fruchterman.reingold, vertex.size = 8,
edge.arrow.size = 0.2, vertex.label = NA)
set.seed(300)
# Remove unnecessary margins
par(mar = c(0, 0, 0, 0))
plot(year_1_network, layout = layout.fruchterman.reingold, vertex.size = 8,
edge.arrow.size = 0.5, vertex.label = NA)
set.seed(1000)
# Remove unnecessary margins
par(mar = c(0, 0, 0, 0))
plot(year_1_network, layout = layout.fruchterman.reingold, vertex.size = 8,
edge.arrow.size = 1.0, vertex.label = NA)+
geom_node_text(aes(label = name), repel=TRUE)
## NULL
set.seed(224)
# Remove unnecessary margins
par(mar = c(0, 0, 0, 0))
plot(year_1_network, layout = layout.fruchterman.reingold, vertex.size = 8,
edge.arrow.size = 0.5, vertex.label = NA)
I found an article explaining to clip the node edges so they doesn’t over lap
set.seed(40)
ggraph(year_1_network, layout = "stress") +
geom_edge_link(arrow = arrow()) +
geom_node_point(aes(colour = name), size = 4)
attributes(year_1_network)
## $class
## [1] "tbl_graph" "igraph"
##
## $active
## [1] "nodes"
# setting theme_graph
set_graph_style()
year_1_network %>%
activate(nodes) %>%
mutate(pagerank = centrality_pagerank()) %>%
activate(edges) %>%
mutate(betweenness = centrality_edge_betweenness()) %>%
ggraph() +
geom_edge_link(aes(alpha = betweenness)) +
geom_node_point(aes(size = pagerank, colour = pagerank)) +
# discrete colour legend
scale_color_gradient(guide = 'legend')
Here I tried to use the year_1_network to visualize communities of nodes but it gave me an error that “activate is not applied to an object of class”igraph”
Now I have to figure out how to change the graph to undirected looks like another google search. After googling for some time I was not able to find a good representation of convertion between undirected and directed graph.
insert error pictures here
Look at communities within the network. We will need to first set the graph style, then change year_1_network to undirected and then change that to a a table graph class to activate the nodes.
# setting theme_graph
set_graph_style()
#Convert Year 1 graph from directed to undirected
year_1_undirected <- as.undirected(year_1_network,
mode = c("collapse", "each", "mutual"))
ggraph_network1 <- as_tbl_graph(year_1_undirected)
# visualize communities of nodes
ggraph_network1 %>%
activate(nodes) %>%
mutate(community = as.factor(group_louvain())) %>%
ggraph() +
geom_edge_link() +
geom_node_point(aes(colour = community), size = 5)
Pick your own pallette at https://colorbrewer2.org/#type=sequential&scheme=YlOrRd&n=3
# define a custom color palette
got_palette <- c("#1A5878", "#C44237", "#AD8941", "#E99093",
"#50594B", "#8968CD", "#9ACD32", "#feb24c")
# compute a clustering for node colors
V(ggraph_network1)$clu <- as.character(membership(cluster_louvain(ggraph_network1)))
# compute degree as node size
V(ggraph_network1)$size <- degree(ggraph_network1)
ggraph(ggraph_network1,layout = "stress")+
geom_edge_link0(aes(edge_width = weight),edge_colour = "grey66")+
geom_node_point(aes(fill = clu,size = size),shape = 21)+
geom_node_text(aes(filter = size >= 26, label = name),family="serif")+
scale_fill_manual(values = got_palette)+
scale_edge_width(range = c(0.2,3))+
scale_size(range = c(1,8))+
theme_graph()+
theme(legend.position = "right")