Google's PageRank Algorithm

K.S.
2022-05-11

Introduction

A directed graph of connected importance. [1]
A graph is represented by nodes connected by weighted edges.
Edge weights are based on the number of nodes that they are connected to.
Once the weights are defined this can be translated into an adjacency matrix.

PageRank (PR)

Intro (cont.)

The PageRank algorithm dives into graph theory with the mathematics of linear algebra.
Relies on matrices, eigenvalues, and Markov Chains.
Important to understand when doing any kind of web development.

Description of Method

Rogers explains that the Google PageRank algorithm can be viewed as a sort of democracy within pages[2].
If a page is cited as a link on another page, this citation is considered as a 'vote' that the page is important.
The number of 'votes' a page receives contributes to its rank based on an iterative algorithm.

Description of Method (cont.)

First, let us discuss the variables needed for this algorithm.

The pages \( T_1, T_2, ..., T_n \) that are pointing, or 'voting.'
The variable \( PR(A) \), which represents the last PageRank calculated for a given page \( A \).
The damping variable \( d \), set between \( 0-1 \), usually \( 0.85 \).
The variable \( C(A) \), which represents the 'votes', or citations, that are outgoing from the current page \( A \).

Description of Method (cont.)

Second, the steps for calculating PageRank of a given page \( A \):

Take one minus the damping variable or \( 1 - d \).

\[ \begin{align} (1 - d) \end{align} \]

Multiply the damping variable \( d \) by the sum of the PR of each page that points to \( A \) over the cost or number of outgoing links from that page.

\[ \begin{align} d*\sum_{k=1}^n\left(\frac{PR(T_1)}{C(T_1)} + \cdots + \frac{PR(T_n)}{C(T_n)}\right) \end{align} \]

Description of Method (cont.)

Add the calculation from Step 1 to the repetitive sum from Step 2.

\[ \begin{align} PR(A) = (1-d) + d*\sum_{k=1}^n\left(\frac{PR(T_1)}{C(T_1)} + \cdots + \frac{PR(T_n)}{C(T_n)}\right) \end{align} \]

Algorithm Review

These steps are called recursively until all pages have their rank.

Let's put them to use in a short example:

Given two connected pages \( A \) and \( B \), we will recursively calculate their page rank until the average between the pages is found to be our expected value of \( 1 \).

Octave Code

d = 0.85; # The dampening variable
a = 0;
b = 0;
i = 45; # Our guess for some repetition

while (i != 0)
  a = (1 - d) + d*b;
  b = (1 - d) + d*a;
  printf("a = %8d \t b = %8d\n", [a, b])
  i -= 1;
endwhile

Commands and Output

a = 0.556295     b =  0.62285
a = 0.832657     b = 0.857758
a = 0.936887     b = 0.946354
a = 0.976197     b = 0.979767
a = 0.991023     b = 0.992369
a = 0.996614     b = 0.997122
a = 0.998723     b = 0.998915
a = 0.999518     b = 0.999591
a = 0.999818     b = 0.999846
a = 0.999931     b = 0.999942
a = 0.999974     b = 0.999978
a =  0.99999     b = 0.999992
a = 0.999996     b = 0.999997
a = 0.999999     b = 0.999999
a = 0.999999     b =        1
a =        1     b =        1

Description of Results

From the Octave code slides, we see that our guess caused the average between pages \( A \) and \( B \) to approach and eventually make it to an average page rank of 1.
Even with the simplest example of two interconnected pages, our algorithm had to go through about 50 iterations before the expected average was reached.

A Peek at more Complexity

Next, let's peek at a more complex example as we dive deeper into the Google PageRank algorithm.

Pseudo Code from [3]

Parameter M adjacency matrix where \( M_{i,j} \) represents the link from \( j \) to \( i \), such that for all \( j \), \( \sum(i, M_{i,j}) = 1 \)
Parameter \( d \): damping factor
Parameter \( v-quadratic-error \): quadratic error for vector v
Return \( v \), a vector of ranks such that \( v_i \) is the \( i \)-th rank from \( [0, 1] \)

Octave Code from [3]

function [v] = rank2(M, d, v_quadratic_error)
  N = size(M, 2)
  v = rand(N, 1);
  v = v ./ norm(v, 1);
  last_v = ones(N, 1) * inf;
  M_hat = (d.*M) + (((1-d)/N).*ones(N,N));

  while (norm(v-last_v,2)>v_quadratic_error)
    last_v = v;
    v = M_hat * v;
  endwhile

endfunction

Commands and Output [3]

M = [ 0  0  0  0 1; 
     0.5 0  0  0 0; 
     0.5 0  0  0 0; 
      0  1 0.5 0 0; 
      0  0 0.5 1 0];

rank2(M, 0.85, 0.001)

References

[1] Machado, Jonathan. Linear Algebra Application: Google PAGERANK Algorithm. Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA, mathstats.uncg.edu/sites/yasaki/publications/machado-google-pagerank-linear-algebra-project.pdf.
[2] Rogers, Ian. “Understanding Google Page Rank.” Atom, 10 Apr. 2002, ianrogers.uk/google-page-rank/.
[3] Wikimedia Foundation. (2022, May 7). PageRank. Wikipedia. Retrieved May 9, 2022, from https://en.wikipedia.org/wiki/PageRank