Google's PageRank Algorithm

K.S.
2022-05-11

Introduction

  • A directed graph of connected importance. [1]
  • A graph is represented by nodes connected by weighted edges.
  • Edge weights are based on the number of nodes that they are connected to.
  • Once the weights are defined this can be translated into an adjacency matrix.

  • PageRank (PR)

Intro (cont.)

  • The PageRank algorithm dives into graph theory with the mathematics of linear algebra.
  • Relies on matrices, eigenvalues, and Markov Chains.
  • Important to understand when doing any kind of web development.

Description of Method

  • Rogers explains that the Google PageRank algorithm can be viewed as a sort of democracy within pages[2].
  • If a page is cited as a link on another page, this citation is considered as a 'vote' that the page is important.
  • The number of 'votes' a page receives contributes to its rank based on an iterative algorithm.

Description of Method (cont.)

First, let us discuss the variables needed for this algorithm.

  1. The pages \( T_1, T_2, ..., T_n \) that are pointing, or 'voting.'
  2. The variable \( PR(A) \), which represents the last PageRank calculated for a given page \( A \).
  3. The damping variable \( d \), set between \( 0-1 \), usually \( 0.85 \).
  4. The variable \( C(A) \), which represents the 'votes', or citations, that are outgoing from the current page \( A \).

Description of Method (cont.)

Second, the steps for calculating PageRank of a given page \( A \):

  1. Take one minus the damping variable or \( 1 - d \).

\[ \begin{align} (1 - d) \end{align} \]

  1. Multiply the damping variable \( d \) by the sum of the PR of each page that points to \( A \) over the cost or number of outgoing links from that page.

\[ \begin{align} d*\sum_{k=1}^n\left(\frac{PR(T_1)}{C(T_1)} + \cdots + \frac{PR(T_n)}{C(T_n)}\right) \end{align} \]

Description of Method (cont.)

  1. Add the calculation from Step 1 to the repetitive sum from Step 2.

\[ \begin{align} PR(A) = (1-d) + d*\sum_{k=1}^n\left(\frac{PR(T_1)}{C(T_1)} + \cdots + \frac{PR(T_n)}{C(T_n)}\right) \end{align} \]

Algorithm Review

  • These steps are called recursively until all pages have their rank.

Let's put them to use in a short example:

  • Given two connected pages \( A \) and \( B \), we will recursively calculate their page rank until the average between the pages is found to be our expected value of \( 1 \).

Octave Code

d = 0.85; # The dampening variable
a = 0;
b = 0;
i = 45; # Our guess for some repetition

while (i != 0)
  a = (1 - d) + d*b;
  b = (1 - d) + d*a;
  printf("a = %8d \t b = %8d\n", [a, b])
  i -= 1;
endwhile

Commands and Output

a = 0.556295     b =  0.62285
a = 0.832657     b = 0.857758
a = 0.936887     b = 0.946354
a = 0.976197     b = 0.979767
a = 0.991023     b = 0.992369
a = 0.996614     b = 0.997122
a = 0.998723     b = 0.998915
a = 0.999518     b = 0.999591
a = 0.999818     b = 0.999846
a = 0.999931     b = 0.999942
a = 0.999974     b = 0.999978
a =  0.99999     b = 0.999992
a = 0.999996     b = 0.999997
a = 0.999999     b = 0.999999
a = 0.999999     b =        1
a =        1     b =        1

Description of Results

  • From the Octave code slides, we see that our guess caused the average between pages \( A \) and \( B \) to approach and eventually make it to an average page rank of 1.
  • Even with the simplest example of two interconnected pages, our algorithm had to go through about 50 iterations before the expected average was reached.

A Peek at more Complexity

  • Next, let's peek at a more complex example as we dive deeper into the Google PageRank algorithm.

Pseudo Code from [3]

  • Parameter M adjacency matrix where \( M_{i,j} \) represents the link from \( j \) to \( i \), such that for all \( j \), \( \sum(i, M_{i,j}) = 1 \)
  • Parameter \( d \): damping factor
  • Parameter \( v-quadratic-error \): quadratic error for vector v
  • Return \( v \), a vector of ranks such that \( v_i \) is the \( i \)-th rank from \( [0, 1] \)

Octave Code from [3]

function [v] = rank2(M, d, v_quadratic_error)
  N = size(M, 2)
  v = rand(N, 1);
  v = v ./ norm(v, 1);
  last_v = ones(N, 1) * inf;
  M_hat = (d.*M) + (((1-d)/N).*ones(N,N));

  while (norm(v-last_v,2)>v_quadratic_error)
    last_v = v;
    v = M_hat * v;
  endwhile

endfunction

Commands and Output [3]

M = [ 0  0  0  0 1; 
     0.5 0  0  0 0; 
     0.5 0  0  0 0; 
      0  1 0.5 0 0; 
      0  0 0.5 1 0];

rank2(M, 0.85, 0.001)
ranks =

   0.2545
   0.1382
   0.1382
   0.2057
   0.2635

References

  • [1] Machado, Jonathan. Linear Algebra Application: Google PAGERANK Algorithm. Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA, mathstats.uncg.edu/sites/yasaki/publications/machado-google-pagerank-linear-algebra-project.pdf.
  • [2] Rogers, Ian. “Understanding Google Page Rank.” Atom, 10 Apr. 2002, ianrogers.uk/google-page-rank/.
  • [3] Wikimedia Foundation. (2022, May 7). PageRank. Wikipedia. Retrieved May 9, 2022, from https://en.wikipedia.org/wiki/PageRank