(4 points) What is Degree Correlation? Categorize the following networks into Neutral, Assortative, or Disassortative networks-Citation, Facebook, Email, WWW, Internet, IMDB, Random graph, Scientific coauthorship. Degree correlation is displayed if the number of links between the high and low-degree nodes is systematically different from what is expected by chance. We try to find if the network is assortative (hubs connect to other hubs, and small degree nodes to small degree nodes), disassortative (hubs connect to nodes with small degree) or neutral (links are random). In social networks, nodes tend to be connected with other nodes with similar degree values. This tendency is referred to as assortative mixing, or assortativity. On the other hand, technological and biological networks typically show disassortative mixing, or dissortativity, as high degree nodes tend to attach to low degree nodes. Probability that a randomly chosen node has degree k is represented as Qk = Kpk/
Scientific coauthorship Assortative
(4 points) With respect to epidemic spreading, what is the SIRS model? Mention all the “state” and their transitions. An epidemic model is a simplified means of describing the transmission of communicable disease through individuals. We assume full mixing, infection rate is a probability per unit of time, and that an infection will be transmitted between Susceptible and Infected who are connected. SIRS is a type of epidemic model that is an extension of the SIR model, but it allows members of the recovered class to be free of infection and rejoin the susceptible class. It stands for: Susceptible -> Infected -> Recovered -> Susceptible
While in the susceptible stage, a host does not have the disease yet, but can potentially get it from a host who is infected. While in the infected stage, a host has the disease and can potentially pass it to another.
While in recovery, a host immune system fights back and retains immunity for some time, and then loses immunity and become susceptible again. The infection rate determines how fast hosts move from Susceptible to Infected. Recovery rate determines how fast a host moves from Infected to Recovery. If recover rate > infected rate, then there is no epidemic.
A way to work around this problem is to give each node a small amount of centrality for free, regardless of the position of the vertex in the network. Hence, each node has a minimum, positive amount of centrality that it can transfer to other nodes by referring to them. In particular, the centrality of nodes that are never referred to is exactly this minimum positive amount, while linked nodes have higher centrality. It follows that highly linked nodes have high centrality, regardless of the centrality of the linkers. However, nodes that receive few links may still have high centrality if the linkers have large centrality. This method has been proposed by Leo Katz (A new status index derived from sociometric analysis. Psychometrika, 1953) and later refined by Charles H. Hubbell (An input-output approach to clique identification. Sociometry, 1965). The Katz centrality thesis is then: A node is important if it is linked from other important nodes or if it is highly linked.
Identical Eigenvector and Katz Centrality
library("network")
library(igraph)
n<-3
p.<-1
g<-erdos.renyi.game(n,p.,type = c("gnp"),directed = FALSE,loops = FALSE)
plot.igraph(g)
eigen_centrality(g,directed=FALSE,scale=TRUE,weights=NULL)
## $vector
## [1] 1 1 1
##
## $value
## [1] 2
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 3
##
## $options$which
## [1] "LA"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 1
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 3
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 3
alpha_centrality(g,nodes=V(g),alpha= 1,loops=FALSE,
exo=-1,weights=NULL,tol=1e-07,sparse=FALSE)
## [1] 1 1 1
Different Eigenvector and Katz Centrality
n<-5
p.<-.5
kg<-erdos.renyi.game(n,p.,type = c("gnp"),directed = TRUE,loops = FALSE)
plot.igraph(kg)
eigen_centrality(kg,directed=TRUE,scale=TRUE,weights=NULL)
## $vector
## [1] 0.5698403 0.7548777 1.0000000 0.7548777 1.0000000
##
## $value
## [1] 3.079596
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 5
##
## $options$which
## [1] "LR"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 5
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 11
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 11
alpha_centrality(kg,nodes=V(kg),alpha= 1,loops=FALSE,
exo=-1,weights=NULL,tol=1e-07,sparse=FALSE)
## [1] 0.2 0.4 0.8 0.4 0.8
(4 points) Consider the SIS model will full fixing. Suppose “a” and “b” represent the number of persons in the Susceptible and Infected states respectively. Assume p is the random contact rate of persons in the susceptible state and q is the rate at which infected ones become susceptible again after recovery. Write the equations that show the time evolution of the two states. You do not have to solve the equations. Under what conditions will the disease die out? In the SIS model, recovery puts a person back into the Susceptible stage so there is no permanent immunity. Susceptible ds/dt = qb - pab Infected db/dt = pab - qb p + q = 1 If p < q, the disease dies out.
(4 points) What is Eigenvector Centrality (EC)? Why is it useful? Find the EC for all the nodes in a k-regular network (i.e, a network where all nodes have degree k). A natural extension of degree centrality is eigenvector centrality. Degree centrality awards one centrality point for every link a node receives. In Eigenvector centrality not all vertices are equivalent. Some are more relevant than others, and, reasonably, endorsements from important nodes count more. The eigenvector centrality thesis states that a node is important if it is linked to by other important nodes. Eigenvector centrality differs from degree centrality because a node receiving many links does not necessarily have a high eigenvector centrality (it might be that all linkers have low or null eigenvector centrality). Moreover, a node with high eigenvector centrality is not necessarily highly linked (the node might have few but important linkers). The eigenvector centrality is proportional to the sum of the scores of all nodes which are connected to it.It is useful in considering the overall centrality of a node and not just the immediate centrality of a node. Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC. The EC for all the nodes in a k-regular network (i.e, a network where all nodes have degree k) is 1, summing up to k. Node 1,2,3,4,5
n<-5
p.<-1
g<-erdos.renyi.game(n,p.,type = c("gnp"),directed = FALSE,loops = FALSE)
plot.igraph(g)
eigen_centrality (g, directed = FALSE, scale = TRUE, weights = NULL,options = arpack_defaults)
## $vector
## [1] 1 1 1 1 1
##
## $value
## [1] 4
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 5
##
## $options$which
## [1] "LA"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 1
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 3
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 3
linnet<-graph_from_literal(1---2, 2---3,3---4,4---5,5----6)
plot(linnet)
Node 1,2,3,4,5,6 BC 5,9,11,11,9,5
In the situation where a node is considered in between if it is included as being a source or destination, and there is only one geodesic path (like in a linear network), then the general formula is:
formula 7.35 Newman In this example, we have an undirected network in which there is only one geodesic path between any pair of vertices. There could be no path if, for example, the vertices lie in different components. The betweenness centrality of a vertex i is defined to be the number of those paths that pass thru i.
n^i st is 1 if vertex i lies on the geodesic path from s to t, 0 otherwise.
(4 points) Consider a 2-dimensional infinite discrete lattice. Let all lattice points be occupied with probability p. Draw a free-hand sketch of the average cluster size as p is increased from 0 to 1 considering i) all finite clusters, and ii) all clusters. Mark the point where phase transition occurs. In Notes graph where it has p =.6 and pics on the web
(4 points) Discuss why scale-free networks, such as the Internet, are robust to random attacks but vulnerable to systematic attacks. The focus of robustness in complex networks is the response of the network to the removal of nodes or links. Percolation theory models the process of randomly placing pebbles on an n-dimensional lattice with probability p, and predicts the sudden formation of a single large cluster at a critical probability p. The mathematical model of robustness as a process can be thought of as an inverse of the percolation process. The Molloy-Reed criterion is derived from the basic principle that in order for a giant component to exist, on average each node in the network must have at least two links. The mathematical derivation for the threshold at which a complex network will lose its giant component is based on the Molloy-Reed criterion. This threshhold is represented by Fc, or the critical threshold for the fraction of nodes needed to be removed for the breakdown of the giant component of a complex network. The Fc for a random network is lower than for a scale free network. As a matter of fact, for a scale free network, it is close to 1. This means that we would need to randomly remove approximately 97% of nodes in a scale free network for the breakdown to occur. This shows the robustness against random attacks of a scale free network like the internet as we would need to bring down approximately 97% of all routers simultaneously.
Systematic and targeted attacks have a lower Fc. We can attack from the highest degree node to the lowest degree node and harm the nodes having more connections, but we must know the topology of the internet.
pic from notes
Ci = n/sumj dij C1 = 20/49
20/49
## [1] 0.4081633
C2 = 20/45
20/45
## [1] 0.4444444
n1/n = 8/20 n2/n = 12/20
1/C1 + n1/n =
(1/(20/49)) + (8/20)
## [1] 2.85
1/C2 + n2/n =
(1/(20/45)) + (12/20)
## [1] 2.85
Therefore, 1/C1 + n1/n = 1/C2 + n2/n is true and there is a relation. If the network is divided into 2 like the example and it is connected by a single link, then each disjoint side will be proportional to the network totalso n1 and n2 do not have to be 8 and 12 respectively. Below, I also tried a different approach and arrived at similar results.
The closeness centrality requires to sum over all distances from a certain vertex to all other vertices. Below are the calculations that show that C1 and C2 divides the tree into 2 disjoint regions. I calculated the closeness centrality for C1 for both n1 and n2(The disjoint regoins), then did the same for C2. We divide by n-1 for both n1 and n2 because we account for C1 and C2 vertices, respectively. The size of each region n1 or n2, are proportional to the total n, but the distance vary for the vertices because of the extra link to reach the opposite disjoint region. One can see the difference between C1n1 and C2n1 is 1, and the difference between C2n1 and C2n2 is 1, which the particular edge that joins the vertices that join the disjoint regions.
C1 Closeness Centrality in n1
(1+1+2+2+3+1+2)/7
## [1] 1.714286
C1 Closeness Centrality in n2
(2+3+3+4+2+3+3+3+4+4+5)/11
## [1] 3.272727
Total C1 Closeness Centrality
((1+1+2+2+3+1+2)/7)+((2+3+3+4+2+3+3+3+4+4+5)/11)
## [1] 4.987013
C2 Closeness Centrality in n1
(2+2+3+3+4+2+3)/7
## [1] 2.714286
C2 Closeness Centrality in n2
(1+2+2+3+1+2+2+2+3+3+4)/11
## [1] 2.272727
Total C2 Closeness Centrality
((2+2+3+3+4+2+3)/7)+((1+2+2+3+1+2+2+2+3+3+4)/11)
## [1] 4.987013