Introduction

In a directed network, nodes are linked via other nodes if they have common neighbor by in or out direction. This principle is the base of shared nearest neigbor (SNN) projection. For example in the (in direction) projection of traveling network, two cities are connected if one traveler goes to the same city than from another city. And in the other (out direction) projected network two destination cities are connected if from a city one traveler goes to a city and another one travels to the other. If there are more than one travels are between two cities than the network is weighted. In the projected case the edge weight of projected network is the minimum of common edges.

The algorithm operates with edge list of a network, which is more efficient than with large and sparse matrix.

Projected network shows similarity by in/out travels of nodes which can be used for different aims.

Workflow

  1. Generate test network
    • Adjacency matrix of weighted-directed network
    • Plot network
  2. Shared nearest neighbor projections
    • By in direction
    • By out direction

Libraries

library("igraph")
library("reshape2")
library("dplyr")
library("data.table")

1. Generate test network

A <- matrix(c(0 ,   2   ,   0   ,   0   ,   0   ,   2   , 1  ,
              0 ,   0   ,   0   ,   0   ,   2   ,   1   , 4  ,
              0 ,   0   ,   0   ,   0   ,   3   ,   2   , 2  ,
              0 ,   0   ,   0   ,   0   ,   0   ,   0   , 0  ,
              2 ,   2   ,   3   ,   5   ,   0   ,   0   , 0  ,
              6 ,   4   ,   3   ,   2   ,   0   ,   0   , 0  ,
              0 ,   3   ,   0   ,   1   ,   0   ,   0   , 0  ), 
            nrow=7, ncol=7, byrow = T)
rownames(A) <- paste("x", 1:7, sep="")
colnames(A) <- paste("x", 1:7, sep="")
g <- graph.adjacency(A, mode="directed", weighted = T)
plot(g, edge.color="gray60", edge.width=E(g)$weight, vertex.size=20, vertex.label.cex=0.9, margin=0, asp=0.8, edge.curved=TRUE)

2. Shared nearest neighborhood

By IN/TO direction

Common neighbor is a node where travel to a traveler from other nodes. x2 is a common neighbor of x1 and x7 in the example network.

edges <- melt(A)
edges <- edges[edges$value!=0,]
colnames(edges) <- c("from", "to", "weight")
edges$from <- as.character(edges$from)
edges$to <- as.character(edges$to)

With the help of Stackoverflow response.

sim <- setDT(edges)[, {
  x <- combn(from, 2, FUN = list)
  data.frame(do.call(rbind, x),
             simvalue = sapply(x, function(y) min(weight[from %in% y])),
             stringsAsFactors= FALSE)
  }, by=to]

sim <- sim[,2:4]
colnames(sim) <- c("node1", "node2", "simvalue")
sim <- sim %>% group_by(node1, node2) %>% summarise(sum(simvalue))
colnames(sim) <- c("node1", "node2", "simvalue")
g.sim.in <- graph_from_edgelist(as.matrix(sim[,1:2]), directed=FALSE)
E(g.sim.in)$weight <- sim$simvalue
plot(g.sim.in, edge.color="gray60", edge.width=E(g)$weight, vertex.size=20, vertex.label.cex=0.9, margin=0, asp=0.8)

x4 has only indegree, and the outdegree is zero. That’s why x4 is not shown in the projected network.

By OUT/FROM direction

Common neighbor is a node from where a traveler goes to other cities. x3 is a common neighbor of x5 and x7 in the example network.

sim <- setDT(edges)[, {
  x <- combn(to, 2, FUN = list)
  data.frame(do.call(rbind, x),
             simvalue = sapply(x, function(y) min(weight[to %in% y])),
             stringsAsFactors= FALSE)
  }, by=from]

sim <- sim[,2:4]
colnames(sim) <- c("node1", "node2", "simvalue")
sim <- sim %>% group_by(node1, node2) %>% summarise(sum(simvalue))
colnames(sim) <- c("node1", "node2", "simvalue")
g.sim.out <- graph_from_edgelist(as.matrix(sim[,1:2]), directed=F)
E(g.sim.out)$weight <- sim$simvalue
plot(g.sim.out, edge.color="gray60", edge.width=E(g)$weight, vertex.size=20, vertex.label.cex=0.9, margin=0, asp=0.8)


Be happyR! :)

feedback: lgadar@gmail.com