Google Page Rank, Power Iteration and the Second EigenValue of the Google Matrix

Sandipan Dey

2 Jan 2017

  • In this article, the basic concepts of the Google Page Rank Algorithm will be discussed along with a few examples.

  • Web net can be represented as a directed graph, with nodes represented by web pages and edges represented by the links between them.
  • Probabilistic point of view: Since the importance of a web page is measured by its popularity (how many incoming links it has), the importance of page i can be viewed as the probability that a random surfer on the Internet that opens a browser to any page and starts following hyperlinks, visits the page i.
  • The process can be modeled as a random walk on the web graph.
  • A smaller, but positive percentage of the time, the surfer will dump the current page and choose arbitrarily a different page from the web and “teleport” there. The damping factor p reflects the probability.
  • The PageRank vector for a web graph with transition matrix A, and damping factor p, is the unique probabilistic eigenvector of the matrix M (as shown in the following figure), corresponding to the dominant eigenvalue 1.
  • By Perron-Frobenius theorem, if the underlying web-graph is connected and aperiodic, then the power-iteration algorithm used to compute the page ranks is guaranteed to converge to a steady state vector, which is precisely the vector with the page ranks of all the nodes.

  • The next figure and animations show an example problem for computing page ranks that appeared in the final exam of the same edX (CornellX) course Networks, Crowds and Markets.
  • The following animation shows the page ranks computed using the power-iteration algorithm without a damping factor. As can be seen, the nodes with larger sizes have higher pageranks.
  • The next animation shows the page ranks computed using the power-iteration algorithm with a damping factor 0.2.
  • The next animations show the page ranks computed for an Erdos-Renyi random graph with n=100 nodes, with probability \(p=\frac{1}{\sqrt{n}}\) using the power-iteration algorithm and also directly as a dominant eigenvector of the matrix M. As can be seen, the nodes with ids in larger fonts have higher pageranks.
  • The next animation shows how fast the power-iteration algorithm converges to the true page ranks (computed directly as normalized dominant eigenvector) for the same E-R random graph.
  • The power-iteration algorithm convergence rate is directly proportional to the ratio of the dominant eigenvalues \(\frac{\lambda 2}{\lambda 1}\) of the matrix M. Since \(\lambda 1 = 1\), the page rank computation algorithm converges faster if the second eigenvalue is smaller.

  • The next figures show the result of an experiment done starting with an Erdos-Renyi random graph with n=100 nodes, with probability p=1/2 and then iteratively removing 10 edges randomly from the graph and running the power iteration algorithm and recording the # iterations to converge with an error at most \(10^{-5}\), also noting the \(\frac{\lambda 2}{\lambda 1}\) each time.

  • As can be seen, with ratio of the dominant eigenvalues (or \(\lambda 2\)) decreases as we go on removing the edges and as expected, #iterations taken to converge (with the same accuracy threshold) decreases (which means a faster convergence) as the second eigenvalue becomes smaller.