Zachary's karate club is a great example of social relationships within a small group. This set of data indicated the interactions of club members outside of the club. This dataset also documented the conflict between the instructor, Mr.Hi, and the club president, John because of course price. In the end, half of the members formed a new club around Mr.Hi, and the other half either stayed at the old karate club or gave up karate. Now, start some analysis about karate clucb data. Before the analysis, we will first include the required libraries.
library(igraph)
library(igraphdata)
data("karate")
After adding the libraries and Enron data into the code. Now, start analysis of that data. First off all, we will check out how the data looks and what are the attributes. To find the number of nodes(club members) and edges(interaction) following code is used. First of all look at the nodes and edges in karate club.
library(knitr)
#convert the graph into data frame
df = as_data_frame(karate)
#displaying the first 6 records in table.
kable(head(df),caption = "Zachary Karate Club Data")
from | to | weight |
---|---|---|
Mr Hi | Actor 2 | 4 |
Mr Hi | Actor 3 | 5 |
Mr Hi | Actor 4 | 3 |
Mr Hi | Actor 5 | 3 |
Mr Hi | Actor 6 | 3 |
Mr Hi | Actor 7 | 3 |
#number of nodes in the graph
vcount(karate)
## [1] 34
#number of edges in the graph
ecount(karate)
## [1] 78
The table represents the first 6 records of karate club. Moreover, To find number of attributes attached with the data. We can use the _attr function to access the attributes of network data. Moreover, we can see the first few or more nodes and edges using V() & E() functions.
#Names of first 10 nodes(club members)
V(karate)[1:10]$name
## [1] "Mr Hi" "Actor 2" "Actor 3" "Actor 4" "Actor 5" "Actor 6"
## [7] "Actor 7" "Actor 8" "Actor 9" "Actor 10"
#First 10 interactions between members (edges)
E(karate)[1:10]
## + 10/78 edges from 4b458a1 (vertex names):
## [1] Mr Hi--Actor 2 Mr Hi--Actor 3 Mr Hi--Actor 4 Mr Hi--Actor 5
## [5] Mr Hi--Actor 6 Mr Hi--Actor 7 Mr Hi--Actor 8 Mr Hi--Actor 9
## [9] Mr Hi--Actor 11 Mr Hi--Actor 12
#Attributes attached with nodes
vertex_attr_names(karate)
## [1] "Faction" "name" "label" "color"
#Attributes attached with edges
edge_attr_names(karate)
## [1] "weight"
#Access to the information of vertex attributes
vertex_attr(karate,"Name",index = c(1:5))
## NULL
#Access to the information of edges attributes
edge_attr(karate,"Reciptype",index = c(1:5))
## NULL
#Information about first 5 members
V(karate)[[1:5]]
## + 5/34 vertices, named, from 4b458a1:
## Faction name label color
## 1 1 Mr Hi H 1
## 2 1 Actor 2 2 1
## 3 1 Actor 3 3 1
## 4 1 Actor 4 4 1
## 5 1 Actor 5 5 1
#Information about first 5 interactions
E(karate)[[1:5]]
## + 5/78 edges from 4b458a1 (vertex names):
## tail head tid hid weight
## 1 Mr Hi Actor 2 1 2 4
## 2 Mr Hi Actor 3 1 3 5
## 3 Mr Hi Actor 4 1 4 3
## 4 Mr Hi Actor 5 1 5 3
## 5 Mr Hi Actor 6 1 6 3
So, we have seen that nodes have 5 attributes (i.e., Faction, name, label, color and community) and edges have 1 attribute (i.e., weight). Moreover, we also seen the first 5 members with interaction history.
# Group the club members from two communities
c1 = grepl(1, V(karate)$community)
c2 = grepl(4, V(karate)$community)
#Edges between both communities
E(karate)[V(karate)[c1]%--%V(karate)[c2]]
## + 0/78 edges from 4b458a1 (vertex names):
So, we have seen that there no such interaction between member of these communiteis. Now, we can identify the club members that are taking interest in MR Hi as well as club members interacted with John A. Here, we will the list of only that nodes that have a relationship with "Mr Hi" and "John A".
#See the club members interacted with Mr Hi and John A.
neighbors(karate, "Mr Hi") #Interaction of club members with Mr Hi
## + 16/34 vertices, named, from 4b458a1:
## [1] Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8 Actor 9
## [9] Actor 11 Actor 12 Actor 13 Actor 14 Actor 18 Actor 20 Actor 22 Actor 32
neighbors(karate, "John A") #Interaction of club members with John A
## + 17/34 vertices, named, from 4b458a1:
## [1] Actor 9 Actor 10 Actor 14 Actor 15 Actor 16 Actor 19 Actor 20 Actor 21
## [9] Actor 23 Actor 24 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
## [17] Actor 33
In the above code, we seen that Mr Hi have 16 interctions, while, John A have 17 interaction. Now, just take a look at karate club network.
plot(karate)
The above plot is much prety and dsiplaying a structure of two different communities. where, yellow are the club members who are close to Mr Hi, while, blue club members are closed to John A. We can further investigate through different community finding algorithms as follows. Here, interactive graph is displayed through visNetwork library.
library(visNetwork)
fc = spinglass.community(karate)
V(karate)$community = fc$membership
nodes <- data.frame(id = V(karate)$name, title = V(karate)$name, group = V(karate)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(karate, what="edges")[1:2]
#visNetwork(nodes, edges) %>%
# visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
# visLegend()
vis.nodes <- nodes
vis.links <- edges
vis.nodes$shape <- "dot"
vis.nodes$shadow <- TRUE # Nodes will drop shadow
vis.nodes$title <- V(karate)$name # Text on click
vis.nodes$label <- V(karate)$name # Node label
vis.nodes$size <- degree(karate)+25 # Node size
vis.nodes$borderWidth <- 2 # Node border width
vis.links$width <- E(karate)$weight # line width
vis.links$color <- "gray" # line color
vis.links$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
vis.links$smooth <- FALSE # should the edges be curved?
vis.links$shadow <- FALSE # edge shadow
vis.nodes$color.background <- c("slategrey", "tomato", "gold")[V(karate)$community]
vis.nodes$color.border <- "black"
vis.nodes$color.highlight.background <- "orange"
vis.nodes$color.highlight.border <- "darkred"
visNetwork(vis.nodes, vis.links)
The algorithm spinglas presented 4 different communities of karate club. Where, community of John A looks bigger than other communities in the club. Now, doing some different analysis, where we will find important club members in the network.
To find important nodes, different centrality measures are used (i.e., Betweenness, Degree or Closeness). Now, calcualte the centralities for the dataset and attach with the nodes as attributes.
#Compute betweenness centrality
BetC = betweenness(karate,directed = TRUE)
#Compute Closeness centrality
CloC = closeness(karate,mode = "all")
#Compute Degree centrality using both In & Out Edges
DegC = degree(karate,mode = "all")
#Add attribute to the nodes
V(karate)$BetC = BetC
V(karate)$CloC = CloC
V(karate)$DegC = DegC
These centralities score are easly used to find and display the important persons. Look at the following code to display the most important persons.
#grouping the top nodes and other than top nodes
important = as.vector(ifelse(degree(karate) >= 9, "Top" , "Simple"))
#making nodes and edges data frames
nodes <- data.frame(id = V(karate)$name, title = V(karate)$name, group = important)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(karate, what="edges")[1:2]
vis.nodes <- nodes
vis.links <- edges
#giving some styles to nodes and edges
vis.nodes$shape <- as.vector(ifelse(important=="Top", "square" , "dot"))
vis.nodes$shadow <- TRUE # Nodes will drop shadow
vis.nodes$title <- vis.nodes$id # Text on click
vis.nodes$label <- vis.nodes$id # Node label
vis.nodes$size <- degree(karate)+10 # Node size
vis.nodes$borderWidth <- 2 # Node border width
vis.links$width <- E(karate)$weight # line width
vis.links$color <- "gray" # line color
vis.links$arrows <- "to" # arrows: 'from', 'to', or 'middle'
vis.links$smooth <- FALSE # should the edges be curved?
vis.links$shadow <- FALSE # edge shadow
vis.nodes$color.background <- as.vector(ifelse(important=="Top", "slategrey" , "tomato"))
vis.nodes$color.border <- "black"
vis.nodes$color.highlight.background <- "orange"
vis.nodes$color.highlight.border <- "darkred"
visnet3 = visNetwork(vis.nodes, vis.links)
visnet3 <- visGroups(visnet3, groupname = "Top", shape = "square",
color = list(background = "gray", border="black"))
visnet3 <- visGroups(visnet3, groupname = "Simple", shape = "dot",
color = list(background = "tomato", border="black"))
visLegend(visnet3, main="Legend", position="right", ncol=1)
In the above graph, nodes with square shape are most important in the network. We can also show the list of top 5 important persons as follows.
sort(degree(karate),decreasing = TRUE)[1:5]
## John A Mr Hi Actor 33 Actor 3 Actor 2
## 17 16 12 10 9
Now, the showing the top 5 persons through betweenness centrlity.
#grouping the top nodes and other than top nodes
important = as.vector(ifelse(betweenness(karate,directed = FALSE) >= 38.13333, "Top" , "Simple"))
#making nodes and edges data frames
nodes <- data.frame(id = V(karate)$name, title = V(karate)$name, group = important)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(karate, what="edges")[1:2]
vis.nodes <- nodes
vis.links <- edges
#giving some styles to nodes and edges
vis.nodes$shape <- as.vector(ifelse(important=="Top", "square" , "dot"))
vis.nodes$shadow <- TRUE # Nodes will drop shadow
vis.nodes$title <- vis.nodes$id # Text on click
vis.nodes$label <- vis.nodes$id # Node label
vis.nodes$size <- betweenness(karate,directed = FALSE)*0.2 # Node size
vis.nodes$borderWidth <- 2 # Node border width
vis.links$width <- E(karate)$weight # line width
vis.links$color <- "gray" # line color
vis.links$arrows <- "to" # arrows: 'from', 'to', or 'middle'
vis.links$smooth <- FALSE # should the edges be curved?
vis.links$shadow <- FALSE # edge shadow
vis.nodes$color.background <- as.vector(ifelse(important=="Top", "slategrey" , "tomato"))
vis.nodes$color.border <- "black"
vis.nodes$color.highlight.background <- "orange"
vis.nodes$color.highlight.border <- "darkred"
visnet3 = visNetwork(vis.nodes, vis.links)
visnet3 <- visGroups(visnet3, groupname = "Top", shape = "square",
color = list(background = "gray", border="black"))
visnet3 <- visGroups(visnet3, groupname = "Simple", shape = "dot",
color = list(background = "tomato", border="black"))
visLegend(visnet3, main="Legend", position="right", ncol=1)
sort(betweenness(karate,directed = FALSE),decreasing = TRUE)[1:5]
## Mr Hi John A Actor 20 Actor 32 Actor 33
## 250.15000 209.50000 127.06667 66.33333 38.13333
Finally, showing the top 5 persons through Closeness centrality centrlity.
#grouping the top nodes and other than top nodes
important = as.vector(ifelse(closeness(karate,mode = "all") >= 0.006134969, "Top" , "Simple"))
#making nodes and edges data frames
nodes <- data.frame(id = V(karate)$name, title = V(karate)$name, group = important)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(karate, what="edges")[1:2]
vis.nodes <- nodes
vis.links <- edges
#giving some styles to nodes and edges
vis.nodes$shape <- as.vector(ifelse(important=="Top", "diamond" , "dot"))
vis.nodes$shadow <- TRUE # Nodes will drop shadow
vis.nodes$title <- vis.nodes$id # Text on click
vis.nodes$label <- vis.nodes$id # Node label
vis.nodes$size <- closeness(karate,mode = "all")*4000 # Node size
vis.nodes$borderWidth <- 2 # Node border width
vis.links$width <- 2 # line width
vis.links$color <- "gray" # line color
#vis.links$arrows <- "from" # arrows: 'from', 'to', or 'middle'
vis.links$smooth <- FALSE # should the edges be curved?
vis.links$shadow <- FALSE # edge shadow
vis.nodes$color.background <- as.vector(ifelse(important=="Top", "slategrey" , "tomato"))
vis.nodes$color.border <- "black"
vis.nodes$color.highlight.background <- "orange"
vis.nodes$color.highlight.border <- "darkred"
visnet3 = visNetwork(vis.nodes, vis.links)
visnet3 <- visGroups(visnet3, groupname = "Top", shape = "diamond",
color = list(background = "gray", border="black"))
visnet3 <- visGroups(visnet3, groupname = "Simple", shape = "dot",
color = list(background = "tomato", border="black"))
visLegend(visnet3, main="Legend", position="right", ncol=1)
sort(closeness(karate,mode = "all"),decreasing = TRUE)[1:5]
## Mr Hi John A Actor 20 Actor 13 Actor 21
## 0.007575758 0.007575758 0.007246377 0.006134969 0.006134969
In the above examples, we have seen that Mr Hi and John A are the most important members in the network. They have devided the club into two pieces. Now, question is that who is the most similar person with Mr Hi? as wel as we can also find that who is the most similar person with John A. In order to find the similarity, different similarity measures are used in the social network analysis (i.e., cosine, jaccard, adamic adar, resource allocation, prefrential attachment etc). Now, we are trying to find the most similar person with Mr Hi & John A.
library(linkprediction)
proxfun(karate,method = "jaccard")[1,]
## 1 2 3 4 5 6 7
## 1.00000000 0.38888889 0.23809524 0.29411765 0.11764706 0.11111111 0.11111111
## 8 9 10 11 12 13 14
## 0.17647059 0.05000000 0.05882353 0.11764706 0.00000000 0.05882353 0.16666667
## 15 16 17 18 19 20 21
## 0.00000000 0.00000000 0.12500000 0.05882353 0.00000000 0.05555556 0.00000000
## 22 23 24 25 26 27 28
## 0.05882353 0.00000000 0.00000000 0.05555556 0.05555556 0.00000000 0.05263158
## 29 30 31 32 33 34
## 0.11764706 0.00000000 0.11111111 0.00000000 0.12000000 0.13793103
proxfun(karate,method = "jaccard")[34,]
## 1 2 3 4 5 6 7
## 0.13793103 0.13043478 0.28571429 0.04545455 0.00000000 0.00000000 0.00000000
## 8 9 10 11 12 13 14
## 0.00000000 0.10000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## 15 16 17 18 19 20 21
## 0.05555556 0.05555556 0.00000000 0.00000000 0.05555556 0.00000000 0.05555556
## 22 23 24 25 26 27 28
## 0.00000000 0.05555556 0.15789474 0.11111111 0.11111111 0.05555556 0.05000000
## 29 30 31 32 33 34
## 0.05263158 0.16666667 0.10526316 0.09523810 0.52631579 1.00000000
In the similarity computation, it is clear that Mr Hi is most similar with Actor 2. Similarly, we can also see that John A is most similar with Actor 3. That means, Mr Hi and John Performing similar activies as performing by Acotr 2 and Actor 3. It also can be seen that there will be strong relation between Mr Hi and Actor2, as well as between John A and Actor 3. Similarity is the score that tells us how much the nodes are similar. Where, score 1 means nodes are 100% similar, while, score 0 means nodes are 0% similar.