Social Network Analysis: Community Detection

Introduction

When we talk about networks and network analysis, perhaps the first thing that comes to our mind is social media and sites like Facebook or Instagram. However, we may find networks within many other domains throughout human history, such as trade, social connections, or infrastructure networks. Nevertheless, mathematical studies of this subject are relatively recent in history, and they have been responsible for many technological breakthroughs, particularly in the telecommunications industry.

Within the study of networks, a network is said to have community structure if the nodes (vertices) of the network can be grouped into sets of nodes that are densely connected between each other. In essence, network communities are groups of nodes such that nodes inside the group connected with many more edges than between groups.

The Importance of Community Detection

When we analyze networks, it is important to discover communities. With social media, for example, this is useful for creating algorithms to discover users that have common interests and keep them connected. Moreover, community detection can also be used with machine learning to detect communities with similar properties. Moreover, since people tend to group with others similar to them, community detection helps us to identify users with a high number of connections and observe the depth of their reach within a network.

Modularity

Modularity is a measure of community detection which measures the strength of division of a network into communities. Networks with a high modularity have a dense connection between the nodes within modules but a spread connections between nodes in different modules.

Detecting Communities with R Programming

In order to create a Community Detection Analysis with R, first we load the required libraries and set a random seed:

library(igraph)
library(NetData)
library(cluster)
library(sna)
library(knitr)

set.seed(123)

Our dataset is collected from the managers of a high-tech company. The company manufactured high-tech equipment on the west coast of the United States and had just over 100 employees with 21 managers. Each manager was asked to whom you go to for advice, who is your friend, and to whom do you report to.

##   ego alter advice_tie friendship_tie reports_to_tie
## 1   1     1          0              0              0
## 2   1     2          1              1              1
## 3   1     3          0              0              0
## 4   1     4          1              1              1
## 5   1     5          0              0              0
## 6   1     6          0              0              0

Community Structure

There are different ways to detect communities within a network. In this report we will display three:

Walktrap Method
Greedy Modularity Optimization
Edge Betweeness Method

Network Visualization

First, we visualize our network with 21 nodes, 1 for each manager:

layout.0 <- layout.fruchterman.reingold(krack_full)
plot(krack_full, layout=layout.0, edge.arrow.size=.5)

From a first glance, it is hard to detect any communities.

Pre-Processing

For a better understanding of our model, we’ll set the network to undirected and remove isolated nodes. In essence, the number of edges decreases as reciprocated directed ties are consolidated into single undirected ties, and the number of nodes decreases as isolates are removed.

krack_full_und <- as.undirected(krack_full, mode='collapse')

krack_full_no_iso <- igraph::delete.vertices(krack_full_und, V(krack_full_und)[igraph::degree(krack_full_und)==0])

Finally, we are ready to apply our different community detection algorithms.

Walktrap Method

The walktrap algorithm is used to identify communities in large networks via random walks. These random walks are then used to compute distances between nodes. Then, nodes are assigned into groups with small and large community distances via bottom-up hierarchical clustering.

It is important to mention that this algorithm considers only one community per node, which in some cases may be not right.

(krack_full_comm_wt <- walktrap.community(krack_full_no_iso, steps=200,modularity=TRUE))

## IGRAPH clustering walktrap, groups: 21, mod: 0
## + groups:
##   $`1`
##   [1] "1"
##   
##   $`2`
##   [1] "2"
##   
##   $`3`
##   [1] "3"
##   
##   $`4`
##   + ... omitted several groups/vertices

plot(krack_full_comm_wt, krack_full_no_iso)

From this plot we can identify that this method isolates all of the nodes to an individual network, which is not what we’re looking for. This happens because walktrap considers only one community per node and does not overlap. However, there’s still similarities between our nodes that we can observe in the following dendrogram:

krack_full_comm_dend <- as.dendrogram(krack_full_comm_wt, use.modularity=TRUE)
plot(krack_full_comm_dend)

Finally, as expected, our modularity function returns 0 since there aren’t any formed communities with the walktrap method:

modularity(krack_full_comm_wt, krack_full_comm_wt$membership)

## [1] 0

Greedy Modularity Optimization

The greedy modularity optimization is another method which helps us detect communities by iteratively including and removing nodes to maximize the modularity.

(krack_full_comm_fg <- fastgreedy.community(krack_full_no_iso, modularity=TRUE))

## IGRAPH clustering fast greedy, groups: 2, mod: 0.052
## + groups:
##   $`1`
##    [1] "2"  "5"  "6"  "7"  "8"  "9"  "13" "14" "15" "18" "21"
##   
##   $`2`
##    [1] "1"  "3"  "4"  "10" "11" "12" "16" "17" "19" "20"
##

plot(krack_full_comm_fg, krack_full_no_iso)

From this plot, we can observe 2 very clear communities with some nodes that are part of bothm which are called bridges.

Following, our similarity between nodes is observed with the following dendrogram which is splitted into two sides:

krack_full_comm_dend <- as.dendrogram(krack_full_comm_fg, use.modularity=TRUE)
plot(krack_full_comm_dend)

Finally, we have a Modularity of 0.52, which is clearly higher than the walktrap method, but it’s still very close to 0. However, it makes sense given the clumped distribution of our network. We should keep in mind that a low modularity means that there is not a sparse connection between nodes in different communities.

modularity(krack_full_comm_fg, krack_full_comm_fg$membership)

## [1] 0.05219335

Edge Betweeness Method

Edge betweenness is a centrality measure for edges in a network. Additionally, it may be used to partition the nodes in a network into communities because edges with high betweenness are usually connections between densely connected clusters of nodes. Thus, we iteratively find and remove the edge with largest betweenness to divide a network into a hierarchy of nested communities.

(krack_full_comm_eb <- edge.betweenness.community(krack_full_no_iso, modularity=TRUE))

## IGRAPH clustering edge betweenness, groups: 1, mod: 0
## + groups:
##   $`1`
##    [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
##   [15] "15" "16" "17" "18" "19" "20" "21"
##

plot(krack_full_comm_eb, krack_full_no_iso)

As opposed from walktrap where we find 21 individual communities, Edge Betweeness gives us 1 sole community for the whole network.

Furthermore, we observe our similarity between nodes with our dendrogram and it is clear how they all belong to the same community:

krack_full_comm_dend <- as.dendrogram(krack_full_comm_eb, use.modularity=TRUE)

plot(krack_full_comm_dend)

Finally, our modularity is observed as 0 since, again, there are no visible community splits, only 1 big community.

modularity(krack_full_no_iso, krack_full_comm_eb$membership)

## [1] 0

Conclusion

Community detection’s application has importance in understanding and evaluating the structure of large and complex networks. Different from Machine learning’s clustering methods, this approach uses the properties of edges in graphs or networks which is more suitable for network analysis.

Our network is clearly not divided into visible communities and its quite a challenge to observe any communities within it. Nevertheless, thanks to the Greedy Modularity approach we can observe a split between 2 communities, with some bridges in between both.

References

Bitten, H. (2019, July 4). Detecting communities in a language Co-occurrence network. Medium. https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab

Cruz, M. D. (2021, June 24). Social network Analysis - Community detection. Medium. https://towardsdatascience.com/social-network-analysis-community-detection-2b19e836c76c

Jayawickrama, T. D. (2021, February 1). Community detection algorithms. Medium. https://towardsdatascience.com/community-detection-algorithms-9bd8951e7dae

Modularity (networks). (2008, July 11). Wikipedia, the free encyclopedia. Retrieved March 20, 2022, from https://en.wikipedia.org/wiki/Modularity_(networks)

Nasini, Stefano. (2022). Social Network Analysis. [Course]. Lille: IESEG Management School. MSc in Big Data Analytics.