Complex Networks: Assignment 1

This is a report for the first assignment of the AMSI Summer school course Complex Networks

Introduction

There are 32 files each is a vole networks. The networks are represented by adjacent matrixs. In this assignment, I need to:
1. calculate the average path length for all networks: using breadth first seaerch algorithm to transfer adjacent matrix to path matrix.
2. calculate the two clustering coefficients.
3. analyse how the average path length is affected by the size of the network and discuss if the vole networks fit the definition of small-world.

Code for calculating the results

Transfering adjacent matrix to path matrix

In order to calculate average path length, I need to implement breadth first seaerch algorithm to transfer adjacent matrix to path matrix.

# transfer the adjacent matrix to path matrix
## search function for one coloumn
s <- function(x){
                b = which(x == 1) # return index of those there is an edge
                b
}
## search function for target coloumns
myf01 <- function(x){
        x <- as.matrix(x)
        mylist <- apply(x, 2, s) # return index of edges for each target node. list
        b <- as.vector(unlist(mylist)) # transfer list to a single vector
        index <-  duplicated(b) # find duplicate element
        b <- b[!index] # delete duplicate element
        b
}
## transfer function, using function s and myf01
breadthfirst1 <- function(B){
        B <- as.matrix(B)
        n <- ncol(B) # noumber of nodes
        P <- B
        P[,] = 0 # initiate the path matrix, a matrix with n*n size and everywhere is 0
        visited <- NULL # initiate visited vector
        for (r in 1:n){ # choose a root in every iteration
                A <- B
                verte <- myf01(A[,r]) # search nodes who are linked to the root
                P[verte, r] <- 1 # for rth coloumn, set those nodes to be 1 in Path matrix
                A[c(r, verte), ] <- 0 # set the visited nodes' rows to be 0 in the Adjacent matrix
                visited <- c(r, verte) # add to visited
                p <- 2 # path length
                while (length(verte) > 0){
                        vvv <- myf01(A[, verte])
                        P[vvv, r] <- p
                        A[vvv, ] <- 0
                        visited <- c(visited, vvv)
                        verte <- vvv
                        p <- p + 1
                }
        }
        P
}

Deal with disconnected graphs and get average path length

Because there exist disconnected graphs, I need to think about how to calculate the avearage path length.

In a network, the average path length is the average shortest path between two nodes. If two nodes are disconnected, meaning there is no path between them, then the path length between them is in infinite. As a consequence, if a network contains disconnected components (collections of nodes that have no paths between them), then the mean path length also diverges to infinity. One way to avoid this problem is to calculate average path length only from nodes in the largest connected component.(Nykamp DQ, “Mean path length definition.”)

Thus here in the code I first delete isolate nodes. Then I identify disconnected parts and calculate the average path length for the largest connected component.

# deal with the disconnected part

# delete isolate nodes
deleteisolate <- function(A){
        a <- apply(A, 2, function(x) sum(x) == 0)
        A[!a, !a]
}

# identify disconnected part and get pathmatrix for each part
disjointpathmatrix <- function(A){
        p <- A
        q <- A
        diag(p) <- 1
        if (sum(p == 0) != 0){
                a <- apply(p,2, function(x) which(x == 0))
                index <- as.integer(row.names(p))
                groups <- list()
                pathmatrixs <- list()
                i <- 1
                while (length(a) != 0){
                        group <- lapply(a, function(x) identical(a[[1]], x))
                        groups[i] <- list(NULL)
                        groups[[i]] <- index[unlist(group)]
                        pathmatrixs[i] <- list(NULL)
                        pathmatrixs[[i]] <- q[unlist(group), unlist(group)]
                        q <- q[!unlist(group), !unlist(group)]
                        index <- index[!unlist(group)]
                        a <- a[!unlist(group)]
                        i <- i+1
                }
                pathmatrixs
        }
        else A
}

# calculate average path length for each part

averagepathlength <- function(A){
        if (is.matrix(A)){
                n <- ncol(A)
                sum(A)/(n*(n-1))
        }
        else {
                size <- unlist(lapply(A, ncol))
                a <- which(size == max(size))
                lapply(A, function(x) sum(x)/(ncol(x)*(ncol(x)-1)))[a]
        }
}

Calculate clustering coefficients

By implementing the two formular in the lecture notes I wrote the following function to calculate two clastering coefficients.

# Clustering coefficients: first method
clustercoefficient1 <- function(a){
        A <- as.matrix(a)
        n <- ncol(A)
        N <- 0
        N3 <- 0
        for (i in 1:n){
                for (j in 1:n){
                        for (k in 1:n){
                                if (k > j & j> i) {
                                        N <- N + A[i,j]*A[i,k]*A[j,k]
                                        N3 <- N3 + A[i,j]*A[i,k] + A[j,i]*A[j,k] + A[k,i]*A[k,j]
                                }
                        }
                }
        }
        3*N/N3
}

# Clustering coefficients: second method
clustercoefficient2 <- function(a){
        A <- as.matrix(a)
        n <- ncol(A)
        N <- 0
        N3 <- 0
        Ci <- c()
        for (i in 1:n){
                for (j in 1:n){
                        for (k in 1:n){
                                if (k > j) {
                                        N <- N + A[i,j]*A[i,k]*A[j,k]
                                        N3 <- N3 + A[i,j]*A[i,k]
                                }
                        }
                }
                if(N3 != 0){
                        Ci <- c(Ci, N/N3)
                }
                else Ci <- c(Ci, 0)
                N <- 0
                N3 <- 0
                
        }
        sum(Ci)/n
}

Put results together

# get the size, mean degree, average path length(select the max one) and two coefficient for each table
results <- function(a){
        a <- as.matrix(a)
        mytable <- deleteisolate(a) # delete isolate node first
        pathmatrix <- breadthfirst1(mytable) # then make the pathmatrix
        pathmatrixs <- disjointpathmatrix(pathmatrix)
        averagepl <- max(unlist(averagepathlength(pathmatrixs)))
        coeff1 <- clustercoefficient1(a)
        coeff2 <- clustercoefficient2(a)
        n <- ncol(a) # size of the network
        lamada <- sum(apply(a, 2, sum))/n # mean degree
        c(n, lamada, averagepl, coeff1, coeff2)
}

Get results

After creating all the functions, I read the 32 networks into R and using these functions to calculate the results.

# read all the tables in R and do the analysis
filename <- c()
for (i in 1:32){
        filename[i] <- paste("gvole",i,".txt", sep = "")
}
analysis <- c()

options(digits=2)
setwd("~/Desktop/2016AMSI/complex networks/vole_adjacency_matrices")
for (i in 1:32){
        a <- read.table(filename[i])
        result <- results(a)
        analysis <- cbind(analysis, result)
}

myresult <- as.data.frame(t(analysis))
row.names(myresult) <- 1:32
colnames(myresult) <- c("size", "average degree", "average path length", "cluster coefficient 1", "cluster coefficient 2")
myresult$bigo <- log(myresult$`size`)/log(myresult$`average degree`)

Analysis

results

The following table is the results from previous calculation and ordered by the network size. And I draw three figures to analysis if vole networks fit the definition of small-world.

myresultordered <- myresult[order(myresult$`size`),]
myresultordered

##    size average degree average path length cluster coefficient 1
## 21   17      0.3529412            1.666667             0.0000000
## 22   19      1.6842105            2.400000             0.4615385
## 20   21      0.8571429            1.300000             0.8000000
## 17   28      2.6428571            1.755556             0.5825243
## 1    29      2.2758621            3.409357             0.5526316
## 19   37      1.6756757            2.928105             0.3913043
## 5    43      5.4883721            2.848283             0.4722639
## 16   49      4.9795918            3.784864             0.5581395
## 27   49      2.8979592            3.752228             0.3954545
## 2    50      4.0800000            3.019231             0.4639831
## 18   64      3.7500000            4.000726             0.5060241
## 4    65      4.0000000            5.726062             0.4921466
## 32   69      3.4202899            5.590430             0.4230769
## 28   77      8.0000000            2.871497             0.5154420
## 26   81      3.2345679            3.816667             0.5315534
## 3    84      6.2857143            2.957242             0.4746094
## 6    87      6.9425287            2.918487             0.4570082
## 23   96      5.7291667            3.358098             0.4673807
## 29   98      5.3877551            3.442105             0.4428822
## 31   98      4.3877551            4.905285             0.5270270
## 25  107      5.4579439            4.270707             0.5188776
## 8   124      3.8870968            4.773226             0.4510135
## 10  125      4.0160000            5.862643             0.5436242
## 30  135      6.9037037            3.653841             0.4878301
## 9   141      4.9787234            5.060367             0.5236052
## 15  141      4.1560284            8.488258             0.5593220
## 24  141      6.9929078            3.369513             0.4448952
## 7   149      7.0469799            3.818441             0.5344353
## 11  164      8.0121951            3.676865             0.4974801
## 13  219      5.7534247            5.518799             0.5680603
## 14  232      5.2758621            4.295957             0.5795738
## 12  242      8.9504132            3.839340             0.5081858
##    cluster coefficient 2       bigo
## 21             0.0000000  -2.720441
## 22             0.2368421   5.648295
## 20             0.1666667 -19.750302
## 17             0.5892857   3.428686
## 1              0.3471264   4.094679
## 19             0.2424710   6.994968
## 5              0.5901859   2.209051
## 16             0.5924684   2.424285
## 27             0.3512148   3.657703
## 2              0.4800577   2.782186
## 18             0.4359003   3.146484
## 4              0.5411622   3.011184
## 32             0.5610766   3.443132
## 28             0.6465211   2.088929
## 26             0.5502254   3.743476
## 3              0.6253054   2.410306
## 6              0.6173597   2.304787
## 23             0.6362381   2.614818
## 29             0.6491109   2.722457
## 31             0.6008437   3.100428
## 25             0.6117751   2.753465
## 8              0.5866843   3.550427
## 10             0.6181177   3.472891
## 30             0.6425180   2.538886
## 9              0.6206059   3.083006
## 15             0.6113849   3.473887
## 24             0.6612692   2.544485
## 7              0.6652077   2.562710
## 11             0.6723626   2.450722
## 13             0.7861623   3.079830
## 14             0.8099106   3.274968
## 12             0.7420513   2.504421

library(ggplot2)
qplot(myresultordered$`bigo`[-c(1,3)], myresultordered$`average path length`[-c(1,3)]) + ylab("average path length") + xlab("log(n)/log(λ)") + ggtitle("plot 1")

qplot(1:32, myresultordered$`cluster coefficient 1`) + geom_abline(intercept = 0.5, slope = 0) + ylab("cluster coefficient 1") + xlab("networks") + ggtitle("plot 2")

qplot(1:32, myresultordered$`cluster coefficient 2`) + geom_abline(intercept = 0.5, slope = 0) + ylab("cluster coefficient 2") + xlab("networks") + ggtitle("plot 3")

According to the definition of small-world, A small-world graph, G, is a graph with relatively small average path length and a relatively high cluster coefficient., we check if
1. average path length grow with network size as O(log(n)/log(λ)) where λ is the mean degree of the network.
2. cluster coefficient greater than 0.5
From plot1, we can see that average path length is growing as O(log(n)/log(λ)) getting bigger.(I delete two negtive number in O(log(n)/log(λ) since they don’t make sense and make the figure unclear).
Thus I think the vole network meet the requirement 1.
From plot2 and plot3, we can see that not all the cluster coefficient are larger than 0.5.
For plot2, half of the cluster coefficient are larger than 0.5 and these value tend to be bigger as size growing.
For plot3, all the large size networks’ cluster coefficient are larger than 0.5.
Thus I think the vole network meet the requirement2.
In total, we can say that vole network is a small-world network.