library('igraph')
TO SAVE YOUR TIME, PLEASE START DOWNLOADING THIS NETWORK RIGHT NOW
Graph clique is a subset of vertices of a graph such that every two vertices in the clique are adjacent.
How many cliques can you see on this graph?
plot(graph.famous("bull"))
There was a couple of definitions about the cliques in graph on the lecture.
A maximum clique is a clique that cannot be extended by including one more adjacent vertex (not included in larger one). Can you name maximum cliques in the given graph?
A maximal clique is a clique of the largest possible size in a given graph.
And, finally, graph clique number is the size of the maximum clique. Bull graph’s clique number is 3.
maximal.cliques returns lists of vertices, that form a maximum graph. Let’s see maximum cliques for a bull graph:
maximal.cliques(graph.famous("bull"))
## [[1]]
## [1] 4 2
##
## [[2]]
## [1] 5 3
##
## [[3]]
## [1] 1 2 3
Let’s demonstrate some useful functions for finding cliques. Our graph today is again Zachary’s Karate Club graph:
g = graph.famous("Zachary")
plot(g)
We can define sizes of maximal cliques we interested in:
maximal.cliques(g, min = 4, max = 5) # maximal cliques of sizes 4 and 5
## [[1]]
## [1] 24 34 33 30
##
## [[2]]
## [1] 34 9 33 31
##
## [[3]]
## [1] 2 1 4 3 8
##
## [[4]]
## [1] 2 1 4 3 14
maximal.cliques returns lists of vertices - maximal cliques. clique.number returns graph’s clique number.
Let’s find and show maximal cliques for Zachary Carate Club graph: lrg = largest.cliques(g) returns ids of nodes - largest cliques
largest = largest.cliques(g)
op = par(mfrow = c(1,2))
labels = rep(0, vcount(g))
labels[largest[[1]]] = 2
plot(g, vertex.color = labels)
labels = rep(0, vcount(g))
labels[largest[[2]]] = 2
plot(g, vertex.color = labels)
par(op)
k-core is a maximal subset of vertices such that each is connected to at least k others in the subset.
R has a function wich calculates the coreness for each vertex. The coreness of a vertex is k if it belongs to the k-core but not to the (k+1)-core.
# Let's make some graph
z<-graph.empty(n=11, directed = FALSE)
z <- add.edges(z,c(1,2, 1,3, 1,4, 1,6, 1,5, 2,3, 2,4, 3,10, 3,11, 3,8, 3,4, 4,8, 4,7, 8,9, 10,11))
plot(z)
Now we find maximum k-core and pick out it on graph
coreness <- graph.coreness(z)
max_cor <- max(coreness)
max_cor
## [1] 3
color_bar <- heat.colors(max_cor)
plot(z, vertex.color = color_bar[coreness])
edge.betweenness.community [Newman and Girvan, 2004]fastgreedy.community [Clauset et al., 2004] (modularity optimization method)label.propagation.community [Raghavan et al., 2007]leading.eigenvector.community [Newman, 2006]multilevel.community [Blondel et al., 2008] (the Louvain method)optimal.community [Brandes et al., 2008]spinglass.community [Reichardt and Bornholdt, 2006]walktrap.community [Pons and Latapy, 2005]infomap.community [Rosvall and Bergstrom, 2008]Edge betweenness is equal to the number of shortest paths from all vertices to all others that pass through that edge.
g<-graph.empty(n=6, directed = FALSE)
g <- add.edges(g,c(1,2, 2,3, 1,3, 2,4, 4,5, 4,6, 5,6))
plot(g)
betw <- edge.betweenness(g)
#E(g)
#betw
The Newman-Girvan algorithm detects communities by progressively removing edges from the original network. The Girvan-Newman algorithm focuses on edges that are most likely “between” communities.
Algorithm:
The best partition is selected based on modularity.
There is edge.betweenness.community function in R
g <- graph.famous("Zachary")
eb <- edge.betweenness.community(g)
plot(eb, g)
## A bit more hand-made way
# color_map = c("grey","blue","black","yellow","red","green")
# membership = cutat(eb, no = 4)
# membership = eb$membership
# plot(g, vertex.color = eb$membership)
Also you can obtain dendrogram:
dendPlot(eb, mode="hclust", rect = 5)
## Optionally you can run this
# dend <- as.dendrogram(eb)
# plot(dend)
Alternatively to the previous method, this one is agglomerative. Intially consider a network s.t. * There is no edges * All clusters consist of a single vertex
Iteratively add an edge that delivers maximum modularity gain and merge correspondent communitues.
g <- graph.famous(name = "Zachary")
mm <- fastgreedy.community(g)
plot(rev(mm$modularity), xlab = 'Number of clusters', ylab = 'Modularity value')
which.max(rev(mm$modularity))
## [1] 3
plot(mm, g)
Label propagation algorithm consists of four steps:
Warning! Due to step 2 you may get different results.
g <- graph.famous("Zachary")
lp <- label.propagation.community(g)
plot(lp, g)
Load wikipedia network in R and run some community detection algorithm. Extract article names in some communities and check whether they make sense?
g <- read.graph('wikipedia.gml', format = 'gml')
g <- as.undirected(g)
The next lines of code might be usefull for interpretation
mm <- fastgreedy.community(g)
l <- V(g)$label[mm$membership == 2]
text <- paste(l, collapse = ' ')
#install.packages(c("tm", "SnowballC", "wordcloud", "RColorBrewer", "XML"))
library(wordcloud)
## Loading required package: RColorBrewer
wordcloud(text, type="text",
lang="english", excludeWords = NULL,
textStemming = FALSE, colorPalette="Dark2",
max.words=200)
## Loading required package: tm
## Loading required package: NLP