1. (4 points) What is Degree Correlation? Categorize the following networks into Neutral, Assortative, or Disassortative networks-Citation, Facebook, Email, WWW, Internet, IMDB, Random graph, Scientific coauthorship. Degree correlation is displayed if the number of links between the high and low-degree nodes is systematically different from what is expected by chance. We try to find if the network is assortative (hubs connect to other hubs, and small degree nodes to small degree nodes), disassortative (hubs connect to nodes with small degree) or neutral (links are random). In social networks, nodes tend to be connected with other nodes with similar degree values. This tendency is referred to as assortative mixing, or assortativity. On the other hand, technological and biological networks typically show disassortative mixing, or dissortativity, as high degree nodes tend to attach to low degree nodes. Probability that a randomly chosen node has degree k is represented as Qk = Kpk/. Networks-Citation Assortative Facebook Assortative Email Disassortative WWW Disassortative Internet Assortative IMDB Assortative Random graph Neutral
    Scientific coauthorship Assortative

  2. (4 points) With respect to epidemic spreading, what is the SIRS model? Mention all the “state” and their transitions. An epidemic model is a simplified means of describing the transmission of communicable disease through individuals. We assume full mixing, infection rate is a probability per unit of time, and that an infection will be transmitted between Susceptible and Infected who are connected. SIRS is a type of epidemic model that is an extension of the SIR model, but it allows members of the recovered class to be free of infection and rejoin the susceptible class. It stands for: Susceptible -> Infected -> Recovered -> Susceptible

While in the susceptible stage, a host does not have the disease yet, but can potentially get it from a host who is infected. While in the infected stage, a host has the disease and can potentially pass it to another.
While in recovery, a host immune system fights back and retains immunity for some time, and then loses immunity and become susceptible again. The infection rate determines how fast hosts move from Susceptible to Infected. Recovery rate determines how fast a host moves from Infected to Recovery. If recover rate > infected rate, then there is no epidemic.

  1. (4 points) Give an example of a network (i.e., state or draw) where the Eigenvector centrality and Katz centrality are identical. Give an example of a network (i.e., state or draw) where the Eigenvector centrality and Katz centrality are not identical. A practical problem with eigenvector centrality is that it works well only if the graph is (strongly) connected and symmetric. Real undirected networks typically have a large connected component, of size proportional to the network size. However, real directed networks do not. If a directed network is not strongly connected, only vertices that are in strongly connected components or in the out-component of such components can have non-zero eigenvector centrality. The other vertices, such as those in the in-components of strongly connected components, all have, with little justification, null centrality. This happens because nodes with no incoming edges have, by definition, a null eigenvector centrality score, and so have nodes that are pointed to by only nodes with a null centrality score.

A way to work around this problem is to give each node a small amount of centrality for free, regardless of the position of the vertex in the network. Hence, each node has a minimum, positive amount of centrality that it can transfer to other nodes by referring to them. In particular, the centrality of nodes that are never referred to is exactly this minimum positive amount, while linked nodes have higher centrality. It follows that highly linked nodes have high centrality, regardless of the centrality of the linkers. However, nodes that receive few links may still have high centrality if the linkers have large centrality. This method has been proposed by Leo Katz (A new status index derived from sociometric analysis. Psychometrika, 1953) and later refined by Charles H. Hubbell (An input-output approach to clique identification. Sociometry, 1965). The Katz centrality thesis is then: A node is important if it is linked from other important nodes or if it is highly linked.

Identical Eigenvector and Katz Centrality

library("network")
library(igraph)
n<-3
p.<-1
g<-erdos.renyi.game(n,p.,type = c("gnp"),directed = FALSE,loops = FALSE)
plot.igraph(g)

eigen_centrality(g,directed=FALSE,scale=TRUE,weights=NULL)
## $vector
## [1] 1 1 1
## 
## $value
## [1] 2
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 3
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 1
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 3
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 3
alpha_centrality(g,nodes=V(g),alpha= 1,loops=FALSE,
  exo=-1,weights=NULL,tol=1e-07,sparse=FALSE)
## [1] 1 1 1

Different Eigenvector and Katz Centrality

n<-5
p.<-.5
kg<-erdos.renyi.game(n,p.,type = c("gnp"),directed = TRUE,loops = FALSE)
plot.igraph(kg)

eigen_centrality(kg,directed=TRUE,scale=TRUE,weights=NULL)
## $vector
## [1] 0.5698403 0.7548777 1.0000000 0.7548777 1.0000000
## 
## $value
## [1] 3.079596
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 5
## 
## $options$which
## [1] "LR"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 5
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 11
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 11
alpha_centrality(kg,nodes=V(kg),alpha= 1,loops=FALSE,
  exo=-1,weights=NULL,tol=1e-07,sparse=FALSE)
## [1] 0.2 0.4 0.8 0.4 0.8
  1. (4 points) Consider the SIS model will full fixing. Suppose “a” and “b” represent the number of persons in the Susceptible and Infected states respectively. Assume p is the random contact rate of persons in the susceptible state and q is the rate at which infected ones become susceptible again after recovery. Write the equations that show the time evolution of the two states. You do not have to solve the equations. Under what conditions will the disease die out? In the SIS model, recovery puts a person back into the Susceptible stage so there is no permanent immunity. Susceptible ds/dt = qb - pab Infected db/dt = pab - qb p + q = 1 If p < q, the disease dies out.

  2. (4 points) What is Eigenvector Centrality (EC)? Why is it useful? Find the EC for all the nodes in a k-regular network (i.e, a network where all nodes have degree k). A natural extension of degree centrality is eigenvector centrality. Degree centrality awards one centrality point for every link a node receives. In Eigenvector centrality not all vertices are equivalent. Some are more relevant than others, and, reasonably, endorsements from important nodes count more. The eigenvector centrality thesis states that a node is important if it is linked to by other important nodes. Eigenvector centrality differs from degree centrality because a node receiving many links does not necessarily have a high eigenvector centrality (it might be that all linkers have low or null eigenvector centrality). Moreover, a node with high eigenvector centrality is not necessarily highly linked (the node might have few but important linkers). The eigenvector centrality is proportional to the sum of the scores of all nodes which are connected to it.It is useful in considering the overall centrality of a node and not just the immediate centrality of a node. Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC. The EC for all the nodes in a k-regular network (i.e, a network where all nodes have degree k) is 1, summing up to k. Node 1,2,3,4,5

n<-5
p.<-1
g<-erdos.renyi.game(n,p.,type = c("gnp"),directed = FALSE,loops = FALSE)
plot.igraph(g)

eigen_centrality (g, directed = FALSE, scale = TRUE, weights = NULL,options = arpack_defaults)
## $vector
## [1] 1 1 1 1 1
## 
## $value
## [1] 4
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 5
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 1000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 1
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 3
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 3
  1. (4 points) Consider a linear network with n nodes numbered 1, 2, 3, .. n. Find the Betweenness centrality nodes 1, 2, 3,..n. Give the general equation for the ith node. (A node is “in between” if it is the source or destination.)
linnet<-graph_from_literal(1---2, 2---3,3---4,4---5,5----6)
plot(linnet)

Node 1,2,3,4,5,6 BC 5,9,11,11,9,5

In the situation where a node is considered in between if it is included as being a source or destination, and there is only one geodesic path (like in a linear network), then the general formula is:

formula 7.35 Newman In this example, we have an undirected network in which there is only one geodesic path between any pair of vertices. There could be no path if, for example, the vertices lie in different components. The betweenness centrality of a vertex i is defined to be the number of those paths that pass thru i.
n^i st is 1 if vertex i lies on the geodesic path from s to t, 0 otherwise.

  1. (4 points) Consider a 2-dimensional infinite discrete lattice. Let all lattice points be occupied with probability p. Draw a free-hand sketch of the average cluster size as p is increased from 0 to 1 considering i) all finite clusters, and ii) all clusters. Mark the point where phase transition occurs. In Notes graph where it has p =.6 and pics on the web

  2. (4 points) Discuss why scale-free networks, such as the Internet, are robust to random attacks but vulnerable to systematic attacks. The focus of robustness in complex networks is the response of the network to the removal of nodes or links. Percolation theory models the process of randomly placing pebbles on an n-dimensional lattice with probability p, and predicts the sudden formation of a single large cluster at a critical probability p. The mathematical model of robustness as a process can be thought of as an inverse of the percolation process. The Molloy-Reed criterion is derived from the basic principle that in order for a giant component to exist, on average each node in the network must have at least two links. The mathematical derivation for the threshold at which a complex network will lose its giant component is based on the Molloy-Reed criterion. This threshhold is represented by Fc, or the critical threshold for the fraction of nodes needed to be removed for the breakdown of the giant component of a complex network. The Fc for a random network is lower than for a scale free network. As a matter of fact, for a scale free network, it is close to 1. This means that we would need to randomly remove approximately 97% of nodes in a scale free network for the breakdown to occur. This shows the robustness against random attacks of a scale free network like the internet as we would need to bring down approximately 97% of all routers simultaneously.
    Systematic and targeted attacks have a lower Fc. We can attack from the highest degree node to the lowest degree node and harm the nodes having more connections, but we must know the topology of the internet.

pic from notes

  1. (4 points) Consider an undirected tree of n vertices. A particular edge in the tree joins vertices 1 and 2 and divides the tree into two disjoint regions of n1 and n2 vertices as sketched below. Show that the closeness centralities C1 and C2 of the two vertices are related by 1/C1 + n1/n = 1/C2 + n2/n. (n1 and n2 are generic - do not assume them to be 8 and 12 respectively as shown in the figure below.)

Ci = n/sumj dij C1 = 20/49

20/49
## [1] 0.4081633

C2 = 20/45

20/45
## [1] 0.4444444

n1/n = 8/20 n2/n = 12/20

1/C1 + n1/n =

(1/(20/49)) + (8/20)
## [1] 2.85

1/C2 + n2/n =

(1/(20/45)) + (12/20)
## [1] 2.85

Therefore, 1/C1 + n1/n = 1/C2 + n2/n is true and there is a relation. If the network is divided into 2 like the example and it is connected by a single link, then each disjoint side will be proportional to the network totalso n1 and n2 do not have to be 8 and 12 respectively. Below, I also tried a different approach and arrived at similar results.

The closeness centrality requires to sum over all distances from a certain vertex to all other vertices. Below are the calculations that show that C1 and C2 divides the tree into 2 disjoint regions. I calculated the closeness centrality for C1 for both n1 and n2(The disjoint regoins), then did the same for C2. We divide by n-1 for both n1 and n2 because we account for C1 and C2 vertices, respectively. The size of each region n1 or n2, are proportional to the total n, but the distance vary for the vertices because of the extra link to reach the opposite disjoint region. One can see the difference between C1n1 and C2n1 is 1, and the difference between C2n1 and C2n2 is 1, which the particular edge that joins the vertices that join the disjoint regions.

C1 Closeness Centrality in n1

(1+1+2+2+3+1+2)/7
## [1] 1.714286

C1 Closeness Centrality in n2

(2+3+3+4+2+3+3+3+4+4+5)/11
## [1] 3.272727

Total C1 Closeness Centrality

((1+1+2+2+3+1+2)/7)+((2+3+3+4+2+3+3+3+4+4+5)/11)
## [1] 4.987013

C2 Closeness Centrality in n1

(2+2+3+3+4+2+3)/7
## [1] 2.714286

C2 Closeness Centrality in n2

(1+2+2+3+1+2+2+2+3+3+4)/11
## [1] 2.272727

Total C2 Closeness Centrality

((2+2+3+3+4+2+3)/7)+((1+2+2+3+1+2+2+2+3+3+4)/11)
## [1] 4.987013