library(tidyverse)
library(igraph) # This is the package to analyze the network
library(visNetwork) # Creates visualizations of the network
library(DT)
library(plotly)
Network analysis is the analysis of groups of individuals and the links between them. The links might be relationships, communication lines, spread of a contagious disease, followers on social networks, etc.
Network analysis data are often set up in two datasets: One for the individuals or nodes in the network, and one for the links or connections between them.
There are two data files for this introductory exercise, one called terrorist_nodes.csv and the other called terrorist_links.csv. The data come from this Datacamp lesson on network analysis: https://www.datacamp.com/courses/network-science-in-r-a-tidy-approach
A synonym for nodes is vertices. Synonyms for links are edges and connections.
Read in the data:
terrorist_nodes <- read_csv("terrorist_nodes.csv")
Parsed with column specification:
cols(
id = col_double(),
name = col_character()
)
terrorist_links <- read_csv("terrorist_links.csv")
Parsed with column specification:
cols(
from = col_double(),
to = col_double()
)
Look at the nodes data with datatable().
terrorist_nodes %>%
datatable(rownames = F)
Do the same for the links data below:
terrorist_links %>%
datatable(rownames = F)
Let’s create a quick network diagram with visNetwork(). Put the nodes and then the links into the parenteses, separated by a comma. We’ll adjust it and make it look better later.
visNetwork(terrorist_nodes, terrorist_links)
In order to get statistics on the network, we put it into a format for the package igraph using graph_from_data_frame(). The most important part of the network is the list of links, so our terrorist_links goes first. Next, another name for ‘nodes’ is ‘vertices’, so we set vertices = terrorist_nodes. Finally, this is not a directed network - all the relationships here are considered to be two-way - so we set directed = F.
terrorist_network <- graph_from_data_frame(terrorist_links,
vertices = terrorist_nodes,
directed = F)
We can display the network, but in itself it doesn’t tell us much.
terrorist_network
IGRAPH cde917f UN-- 64 243 --
+ attr: name (v/c)
+ edges from cde917f (vertex names):
[1] Jamal Zougam--Mohamed Bekkali Jamal Zougam--Mohamed Chaoui Jamal Zougam--Vinay Kholy
[4] Jamal Zougam--Suresh Kumar Jamal Zougam--Mohamed Chedadi Jamal Zougam--Imad Eddin Barakat
[7] Jamal Zougam--Abdelaziz Benyaich Jamal Zougam--Abu Abderrahame Jamal Zougam--Amer Azizi
[10] Jamal Zougam--Abu Musad Alsakaoui Jamal Zougam--Mohamed Atta Jamal Zougam--Ramzi Binalshibh
[13] Jamal Zougam--Mohamed Belfatmi Jamal Zougam--Said Bahaji Jamal Zougam--Galeb Kalaje
[16] Jamal Zougam--Abderrahim Zbakh Jamal Zougam--Naima Oulad Akcha Jamal Zougam--Abdelkarim el Mejjati
[19] Jamal Zougam--Basel Ghayoun Jamal Zougam--S B Abdelmajid Fakhet Jamal Zougam--Jamal Ahmidan
[22] Jamal Zougam--Hamid Ahmidan Jamal Zougam--Abdeluahid Berrak Jamal Zougam--Said Berrak
+ ... omitted several edges
Now that we have the network, we can use igraph to pull different types of information out of it.
To start, we can count the number of nodes in the network. igraph calls them vertices, so we count vertices by piping terrorist_network into vcount().
terrorist_network %>%
vcount()
[1] 64
That number should match the number of terrorists in terrorist_nodes.
Because links (or ties or connections) are called edges in igraph, count them by piping the network into ecount(). Do that below:
terrorist_network %>%
ecount()
[1] 243
That number should match the number of rows in our terrorist_links list.
Density is the number of connections divided by the number of potential connections.
For example, among 4 people, there are 6 potential friendships (1-2, 1-3, 1-4, 2-3, 2-4, and 3-4). But not all pairs will actually be friends. If there are 4 friendships among those 4 people, that is a network density of 4/6, or .67. A high network density indicates a close-knit group of people.
To calculate the number of potential links, use n(n-1)/2. So for 4 people, there are 4(3)/2 = 6. For 64 terrorists, there are 64(63)/2 = 2016 potential links. We know there are 243 links, so the density is 243/2016 = .12.
You can get density directly without doing the math by piping the network into edge_density(). Do that below:
terrorist_network %>%
edge_density()
[1] 0.1205357
Distances are the shortest paths between nodes. Even if two nodes are not directly connected, you can hop from one link to another to get there. Looking at the diagram, there are clearly lots of nodes that are just one or two hops apart, but some appear to be 5 or more apart.
terrorist_network %>%
distances() %>%
datatable()
One oddity to note about the matrix is that it counts connections both ways: A connection from terrorist 1 to terrorist 2 is one connection, and that same connection from terrorist 2 to terrorist 1 is counted again. That makes sense in a directed network, but not really in an undirected network. So there might be twice as many connections as you are expecting.
To graph the connections we need to convert the matrix format into a dataframe that plotly can understand. The following will create a histogram of the distances of each possible pair of nodes:
terrorist_network %>%
distances() %>%
as.vector() %>% # these two lines convert the distances matrix
as_tibble() %>% # to something plotly can graph
plot_ly(x = ~value) %>%
add_histogram()
We can see that most terrorists are connected by 2 or 3 hops, but some are connected by 1 and some by 6. There are 64 at 0: This is just the number of terrorists total, which are counted as connecting to themselves with 0 steps.
We can boil all that down to one number with mean_distance(). Pipe the network into that function below:
terrorist_network %>%
mean_distance()
[1] 2.690972
The number you get should be near the middle of the histogram above.
The diameter of a network is the longest of the above distances.
Pipe the network into get_diameter() to get the specific path and nodes that contains the diameter.
terrorist_network %>%
get_diameter()
+ 7/64 vertices, named, from cde917f:
[1] Anwar Adnan Ahmad Abdelkarim el Mejjati Jamal Zougam Naima Oulad Akcha Rafa Zuher
[6] José Emilio Suárez Antonio Toro
It looks like there are 7 terrorists there.
If you don’t care so much about who the specific nodes are in the diameter, you can just get the length of the diameter by piping the network into get_diameter, and then piping that into another line with length().
terrorist_network %>%
get_diameter() %>%
length()
[1] 7
Let’s go back to visNetwork and the diagram.
First, I recommend always using a layout. You can set visNetwork layouts with visIgraphLayout(). Here’s one example:
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_in_circle")
Create two more graphs below, one with “layout_on_sphere” and another with “layout_on_grid”.
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_on_sphere")
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_on_grid")
But let’s stick with a standard one: layout_nicely. It uses an algorithm that generates a nice readable layout.
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_nicely")
NA
To see the names of the terrorists in the diagram, add a ‘label’ column to terrorist_nodes. It’s just the same as the name column, but it has to be titled ‘label’ so it shows up in the diagram.
terrorist_nodes <- terrorist_nodes %>%
mutate(label = name)
terrorist_nodes %>%
datatable()
It seems redundant to have two columns with the same information, but we do that so we can see the names in the graph.
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_nicely")
Now we can see the names of the terrorists. You’ll probably need to zoom in to see them.
You can also add a column in nodes called ‘title’, which will appear when you hover over the node. We can just mutate yet another column with the names of the terrorists, this time called ‘title’. Do that below, piping terrorist_nodes into mutate(title = name).
terrorist_nodes <- terrorist_nodes %>%
mutate(title = name)
terrorist_nodes
NA
After creating the title column, go back up to the diagram and run it again. If you hover over a node, the name should pop up for you.
Add a pipe and a new line with visOptions(highlightNearest = T) to the chunk below. Now when you click on one terrorist, that terrorist and their contacts will be highlighted.
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T)
NA
Add nodeIdSelection = T inside the parentheses of visOptions. You should get a drop-down menu with each terrorist.
visNetwork(terrorist_nodes, terrorist_links) %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
NA
Finally, using main = "" in the visNetwork() call adds a title:
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
NA
The number of links each person has is called degree, and can be found by piping terrorist_network into degree(). Do that below:
terrorist_network %>%
degree()
Jamal Zougam Mohamed Bekkali Mohamed Chaoui Vinay Kholy
29 2 27 10
Suresh Kumar Mohamed Chedadi Imad Eddin Barakat Abdelaziz Benyaich
10 7 22 6
Abu Abderrahame Omar Dhegayes Amer Azizi Abu Musad Alsakaoui
4 2 18 10
Mohamed Atta Ramzi Binalshibh Mohamed Belfatmi Said Bahaji
10 10 11 11
Galeb Kalaje Abderrahim Zbakh Farid Oulad Ali José Emilio Suárez
16 15 6 8
Khalid Ouled Akcha Rafa Zuher Naima Oulad Akcha Abdelkarim el Mejjati
5 3 16 8
Anwar Adnan Ahmad Basel Ghayoun S B Abdelmajid Fakhet Jamal Ahmidan
4 11 12 14
Said Ahmidan Hamid Ahmidan Mustafa Ahmidan Antonio Toro
3 12 5 5
Mohamed Oulad Akcha Rachid Oulad Akcha Mamoun Darkazanli Fouad El Morabit Anghar
5 5 4 2
Abdeluahid Berrak Said Berrak Waanid Altaraki Almasri Abddenabi Koujma
11 17 1 1
Otman El Gnaut Abdelilah el Fouad Parlindumgan Siregar El Hemir
10 1 1 4
Anuar Asri Rifaat Rachid Adli Ghasoub Al Albrash Said Chedadi
1 1 1 2
Mohamed Bahaiah Taysir Alouny OM. Othman Abu Qutada Shakur
4 6 8 10
Driss Chebli Abdul Fatal Mohamed El Egipcio Nasredine Boushoa
2 2 13 1
Semaan Gaby Eid Emilio Llamo Ivan Granados Raul Gonzales Perez
11 6 6 6
El Gitanillo Moutaz Almallah Mohamed Almallah Yousef Hichman
6 2 2 2
This is an important enough measure that we might create a new variable out of it and include it in the data. The following uses mutate() to create a new variable called degree, and then shows it in a table, arranged with the highest degree at the top.
terrorist_nodes <- terrorist_nodes %>%
mutate(degree = degree(terrorist_network))
terrorist_nodes %>%
arrange(-degree) %>%
datatable()
This shows how many connections each terrorist has. Jamal Zougam was one of the first to be arrested after the bombing. He owned a mobile phone shop, which probably has something to do with the number of connections he had to the other terrorists.
We can see the distribution of the degrees with a histogram. The chunk below creates a histogram with plotly. Add nbinsx = inside of add_histogram() to show more bars than he default shows.
terrorist_nodes %>%
plot_ly(x = ~degree) %>%
add_histogram()
NA
You can see that there are many terrorists with 10 or fewer connections, and just a few terrorists with more than 20 connections.
If the node data has a column called ‘value’, the size of the nodes will be adjusted by that variable.
The following mutates a new column called value, and sets value = degree.
terrorist_nodes <- terrorist_nodes %>%
mutate(value = degree)
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
NA
Another measure is closeness. Like degree, it’s a measure of the importance or centrality of an individual. It is a measure of how many paths each other node would have to take to get to that node. The higher the closeness, the easier to get to that node. There’s a more precise mathematical definition, but that’s the idea.
Use closeness() to display the closeness of each terrorist in the network. Pipe terrorist_network into closeness().
terrorist_network %>%
closeness()
Jamal Zougam Mohamed Bekkali Mohamed Chaoui Vinay Kholy
0.009259259 0.005917160 0.009090909 0.007142857
Suresh Kumar Mohamed Chedadi Imad Eddin Barakat Abdelaziz Benyaich
0.007142857 0.006896552 0.007936508 0.006329114
Abu Abderrahame Omar Dhegayes Amer Azizi Abu Musad Alsakaoui
0.006172840 0.005405405 0.007518797 0.006493506
Mohamed Atta Ramzi Binalshibh Mohamed Belfatmi Said Bahaji
0.006493506 0.006493506 0.006535948 0.006535948
Galeb Kalaje Abderrahim Zbakh Farid Oulad Ali José Emilio Suárez
0.007407407 0.007518797 0.005847953 0.005128205
Khalid Ouled Akcha Rafa Zuher Naima Oulad Akcha Abdelkarim el Mejjati
0.005952381 0.005813953 0.007633588 0.006369427
Anwar Adnan Ahmad Basel Ghayoun S B Abdelmajid Fakhet Jamal Ahmidan
0.004716981 0.007352941 0.007518797 0.007874016
Said Ahmidan Hamid Ahmidan Mustafa Ahmidan Antonio Toro
0.005376344 0.007299270 0.005747126 0.003952569
Mohamed Oulad Akcha Rachid Oulad Akcha Mamoun Darkazanli Fouad El Morabit Anghar
0.005649718 0.005649718 0.004716981 0.005291005
Abdeluahid Berrak Said Berrak Waanid Altaraki Almasri Abddenabi Koujma
0.007751938 0.008064516 0.005181347 0.005128205
Otman El Gnaut Abdelilah el Fouad Parlindumgan Siregar El Hemir
0.007092199 0.004807692 0.005128205 0.006289308
Anuar Asri Rifaat Rachid Adli Ghasoub Al Albrash Said Chedadi
0.005128205 0.003891051 0.005319149 0.005405405
Mohamed Bahaiah Taysir Alouny OM. Othman Abu Qutada Shakur
0.004716981 0.005649718 0.006802721 0.006493506
Driss Chebli Abdul Fatal Mohamed El Egipcio Nasredine Boushoa
0.005347594 0.004854369 0.007194245 0.005291005
Semaan Gaby Eid Emilio Llamo Ivan Granados Raul Gonzales Perez
0.006849315 0.005000000 0.005000000 0.005000000
El Gitanillo Moutaz Almallah Mohamed Almallah Yousef Hichman
0.005000000 0.005000000 0.005000000 0.004878049
The following create a new closeness column in the terrorist_nodes data, and also creates a new ‘value’ column set to closeness.
terrorist_nodes <- terrorist_nodes %>%
mutate(closeness = closeness(terrorist_network)) %>%
mutate(value = closeness)
terrorist_nodes %>%
arrange(-closeness) %>%
datatable()
Closeness and degree are both measures of the centrality of each terrorist in the network. They’re pretty highly correlated - terrorists with high degree also have high closeness - but they’re not exactly the same.
Generate the visNetwork again. Now, since value has the closeness numbers, the sizes of the nodes will be based on that instead of degree.
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
Betweenness in network analysis is a measure of the number of shortest paths that use a particular link. Each link has betweenness. Degree and closeness apply to nodes, betweenness applies to links.
For example, in a city, there are some streets that are very commonly used because they are between important areas. Many people drive on Main St. in the Heights because it’s one of just a few commonly-used roads between the Heights and the rest of Billings. We could say Main St. has high betweenness.
Terrorists that are go-betweens for many other terrorists will have high betweenness, and are very important because, if those links can be disrupted, it will have a damaging effect on the communication in the network as a whole.
People who study internet connections are interested in betweenness. If a cable that carries a lot of internet traffic - say, between the US and Europe - is disrupted, it could cause internet outages across the world.
Pipe the terrorist_network into edge_betweenness() below:
terrorist_network %>%
edge_betweenness()
[1] 33.000000 1.500000 14.316667 14.316667 43.252814 25.934554 27.115222 28.900000 17.149206 19.358553
[11] 19.358553 19.358553 23.698113 35.906172 14.980952 51.638889 78.582791 100.883211 12.073810 8.202381
[21] 73.011242 29.951834 38.620519 5.700000 10.552381 33.461172 12.492532 19.358553 35.029004 30.000000
[31] 12.450000 12.450000 24.182173 24.431888 25.900000 16.446825 17.525219 17.525219 17.525219 21.514780
[41] 32.346648 14.638095 47.888889 71.925281 90.533211 10.907143 7.101190 65.478458 26.154215 36.752021
[51] 5.200000 10.552381 11.568290 17.525219 34.654004 1.000000 4.866667 7.166667 1.916667 2.500000
[61] 11.500000 2.333333 4.950000 4.866667 7.166667 1.916667 2.500000 11.500000 2.333333 4.950000
[71] 23.851984 30.992136 24.883658 28.388492 10.263528 23.907359 13.882011 50.904212 11.666667 9.288743
[81] 9.288743 9.288743 8.788743 13.483981 12.126190 27.283364 80.734369 25.265507 10.319048 13.485165
[91] 63.000000 89.612149 8.865909 9.288743 48.837546 32.805195 2.866667 12.095788 7.866667 5.333333
[101] 5.455409 5.455409 5.455409 6.455409 10.367314 4.792857 14.131349 36.244605 12.146825 4.769048
[111] 63.000000 8.235165 4.365909 5.455409 14.733766 1.000000 1.000000 1.500000 2.000000 4.872076
[121] 1.000000 1.000000 1.500000 2.000000 4.872076 1.000000 1.500000 2.000000 4.872076 1.000000
[131] 2.833333 5.872076 1.500000 14.162454 9.333981 2.000000 53.771429 27.341763 11.171429 3.259524
[141] 39.878355 3.832576 4.872076 13.575433 15.535714 7.666667 10.767857 39.921429 10.554762 26.594444
[151] 26.594444 17.888889 63.000000 63.000000 10.033333 15.250866 31.550000 4.795635 4.795635 49.982026
[161] 19.343716 63.000000 97.095023 4.166667 4.166667 4.166667 4.166667 43.438095 10.692063 10.692063
[171] 43.912698 76.664493 11.427381 15.266667 21.483333 12.074242 23.417857 23.417857 25.383369 63.000000
[181] 46.884656 46.884656 46.884656 12.295788 1.000000 1.000000 14.115344 2.853571 14.196429 4.113095
[191] 3.478571 20.840476 15.925214 5.258547 2.567857 31.000000 6.121212 31.616001 18.879759 63.000000
[201] 152.151075 25.125000 6.875000 34.737213 9.725214 23.009521 17.245310 10.914071 10.914071 10.914071
[211] 10.914071 1.000000 1.000000 14.115344 40.366342 9.882323 11.342857 12.449675 43.085381 276.127575
[221] 6.352381 7.368290 18.911147 2.056818 5.590909 63.000000 7.818498 14.115344 11.728571 62.000000
[231] 62.000000 64.747404 64.747404 64.747404 64.747404 45.754690 1.000000 1.000000 1.000000 1.000000
[241] 1.000000 1.000000 1.000000
This shows each network connection, and how valuable and commonly used it is in the network.
Create a new code chunk below that adds a new betweenness columns to the terrorist_links data. Also add a column called value so thatvisNetwork adjusts the size of each line based on value. Model your commands below after the closeness chunk above. Make sure you modify terrorist_links rather than terrorist_nodes. Also, create the table with descending values of betweenness.
terrorist_links <- terrorist_links %>%
mutate(betweenness = edge_betweenness(terrorist_network)) %>%
mutate(value = betweenness)
terrorist_links
Now look what happens to the lines when we create the network again with visNetwork:
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
There should be one particularly thick line apparent in the network diagram. Who are the terrorists that form this important link?
Network analysis can identify groups of individuals that have many connections between them. This is called community detection.
One method is called infomap, and uses the infomap.community() function. Pipe the network into it below:
terrorist_network %>%
infomap.community()
IGRAPH clustering infomap, groups: 6, mod: 0.44
+ groups:
$`1`
[1] "Jamal Zougam" "Mohamed Bekkali" "Mohamed Chaoui" "Mohamed Chedadi"
[5] "Imad Eddin Barakat" "Abdelaziz Benyaich" "Abu Abderrahame" "Omar Dhegayes"
[9] "Amer Azizi" "Abu Musad Alsakaoui" "Mohamed Atta" "Ramzi Binalshibh"
[13] "Mohamed Belfatmi" "Said Bahaji" "Galeb Kalaje" "Fouad El Morabit Anghar"
[17] "Abdeluahid Berrak" "Otman El Gnaut" "Parlindumgan Siregar" "El Hemir"
[21] "Ghasoub Al Albrash" "Said Chedadi" "OM. Othman Abu Qutada" "Shakur"
[25] "Driss Chebli" "Mohamed El Egipcio"
$`2`
+ ... omitted several groups/vertices
To display the group that each terrorist belongs to, further pipe the above into membership().
terrorist_network %>%
infomap.community() %>%
membership()
Jamal Zougam Mohamed Bekkali Mohamed Chaoui Vinay Kholy
1 1 1 2
Suresh Kumar Mohamed Chedadi Imad Eddin Barakat Abdelaziz Benyaich
2 1 1 1
Abu Abderrahame Omar Dhegayes Amer Azizi Abu Musad Alsakaoui
1 1 1 1
Mohamed Atta Ramzi Binalshibh Mohamed Belfatmi Said Bahaji
1 1 1 1
Galeb Kalaje Abderrahim Zbakh Farid Oulad Ali José Emilio Suárez
1 2 4 3
Khalid Ouled Akcha Rafa Zuher Naima Oulad Akcha Abdelkarim el Mejjati
4 4 4 5
Anwar Adnan Ahmad Basel Ghayoun S B Abdelmajid Fakhet Jamal Ahmidan
5 2 2 2
Said Ahmidan Hamid Ahmidan Mustafa Ahmidan Antonio Toro
2 2 2 3
Mohamed Oulad Akcha Rachid Oulad Akcha Mamoun Darkazanli Fouad El Morabit Anghar
4 4 5 1
Abdeluahid Berrak Said Berrak Waanid Altaraki Almasri Abddenabi Koujma
1 2 4 2
Otman El Gnaut Abdelilah el Fouad Parlindumgan Siregar El Hemir
1 3 1 1
Anuar Asri Rifaat Rachid Adli Ghasoub Al Albrash Said Chedadi
2 3 1 1
Mohamed Bahaiah Taysir Alouny OM. Othman Abu Qutada Shakur
5 5 1 1
Driss Chebli Abdul Fatal Mohamed El Egipcio Nasredine Boushoa
1 5 1 2
Semaan Gaby Eid Emilio Llamo Ivan Granados Raul Gonzales Perez
3 3 3 3
El Gitanillo Moutaz Almallah Mohamed Almallah Yousef Hichman
3 6 6 3
We can also mutate a new variable with each terrorists’ group:
terrorist_nodes <- terrorist_nodes %>%
mutate(group = membership(infomap.community(terrorist_network)))
terrorist_nodes %>%
datatable()
NA
Now when we create the graph again, the nodes will automatically be colored by group membership.
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T)
NA
Finally, inside visOptions() add selectedBy = “group”. That will allow you to select entire groups with the drop-down menu.
visNetwork(terrorist_nodes,
terrorist_links,
main = "Network of Terrorists involved in the 2004 Madrid Bombing") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visOptions(highlightNearest = T, nodesIdSelection = T, selectedBy = "group")
NA