DATA 605 Assignment 10: Page Rank

Matrices A & B

As shown in the notes, the matrix \(A\) is given by \[A = \left[ \begin{array}{c} 0 & \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{3} & \frac{1}{3} & 0 & 0 & \frac{1}{3} & 0 \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 0 & 0 & 1 & 0 & 0 \end{array} \right]\]

Node 2 in this example is a dangling node i.e. it has no outlinks. This means that the matrix \(A\) above, as well as the decayed matrix \(B\) would not be row-stochastic and would yield a page rank vector of nearly or entirely all-zero entries. As such, the second row of \(A\) is replaced with a uniform row vector with values of \(/frac{1}{6}\), indicating that because there are no outlinks from node 2, there is an even probability that a user travels next to any of the six nodes in the example. The revised matrix \(A\) is given by

\[A = \left[ \begin{array}{c} 0 & \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 \\ \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} \\ \frac{1}{3} & \frac{1}{3} & 0 & 0 & \frac{1}{3} & 0 \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 0 & 0 & 1 & 0 & 0 \end{array} \right]\]

The matrix \(B\) is obtained by \[B = 0.85 \times A + \frac{0.15}{n} \approx \left[ \begin{array}{c} 0.025 & 0.45 & 0.45 & 0.025 & 0.025 & 0.025 \\ 0.1667 & 0.1667 & 0.1667 & 0.1667 & 0.1667 & 0.1667 \\ 0.3083 & 0.3083 & 0.025 & 0.025 & 0.3083 & 0.025 \\ 0.025 & 0.025 & 0.025 & 0.025 & 0.45 & 0.45 \\ 0.025 & 0.025 & 0.025 & 0.45 & 0.025 & 0.45 \\ 0.025 & 0.025 & 0.025 & 0.875 & 0.025 & 0.025 \end{array} \right]\]

In R, these are stored as below:

A <- matrix(
  c(0, 1/2, 1/2, 0, 0, 0,
  0, 0, 0, 0, 0, 0,
  1/3, 1/3, 0, 0, 1/3, 0,
  0, 0, 0, 0, 1/2, 1/2,
  0, 0, 0, 1/2, 0, 1/2,
  0, 0, 0, 1, 0, 0),
  nrow = 6, byrow = TRUE)

A[2, ] <- rep(1/6, 6)

B <- 0.85 * A + 0.15 / nrow(A)

Power Iterations

The following function is created to perform power iterations on \(B\) until convergence, utilizing a uniform rank vector \[r^T = \left[ \begin{array}{c} \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6} \end{array} \right]\]

power_iterate <- function(mat, vec) {
  converged <- FALSE
  n <- 0
  while(!converged) {
    vec <- crossprod(mat, vec)
    n <- n + 1
    if(identical(crossprod(mat, vec), vec)) {
      converged <- TRUE
    }
  }
  print(paste('Converged in', n, 'iterations'))
  return(vec)
}

r <- matrix(rep(1/nrow(B), nrow(B)), ncol=1)
it_results <- power_iterate(B, r)

[1] "Converged in 66 iterations"

The page rank vector and associated page rankings, as calculated, are:

page	vector	rank
1	0.0517	6
2	0.0737	4
3	0.0574	5
4	0.3487	1
5	0.1999	3
6	0.2686	2

Eiegen-Decomposition

For this exercise, we are interested in the largest eigenvalue of \(B\), as well as the associated left eigenvector of \(B\). Taking only the real components, this can be shown to be

Re(eigen(B)$values[1])

[1] 1

ev_results <- matrix(Re(eigen(t(B))$vectors[,1]))

The eigenvectors returned by the eigen function return normal vectors (i.e. vectors with length 1); the returned vector is divided by its sum to give a vector with a sum 1. This vector is identical to the vector returned using the power_iterate function:

ev_results <- ev_results / sum(ev_results)
identical(round(ev_results, 13), round(it_results, 13))

[1] TRUE

page	vector	rank
1	0.0517	6
2	0.0737	4
3	0.0574	5
4	0.3487	1
5	0.1999	3
6	0.2686	2

Visualizing the Network

Using the igraph package, the network can be visualized in a directed graph, and the page rank of the nodes in the network returned.

The igraph package handles decay using a damping factor of 0.85 and automatically assigns a uniform random probability to dangling nodes, which matches the two approaches outlined above. The page rank vector returned matches that returned through power iteration:

identical(round(it_results, 13), round(g_results, 13))

[1] TRUE

Comparison of Results

As shown in the above sections, the page rank vector for the given universe of six pages was derived through three methods:

Power iteration of the matrix \(B\)
Eigenvector corresponding to \(\lambda = 1\) for the matrix \(B\)
igraph implementation using the matrix \(A\)

The three methods return the same results (to 13 decimal points of accuracy) once the eigenvector is scaled from being a unit vector.

	Power Iteration	Eigenvector	Graph	Rank
1	0.0517047	0.0517047	0.0517047	6
2	0.0736793	0.0736793	0.0736793	4
3	0.0574124	0.0574124	0.0574124	5
4	0.3487037	0.3487037	0.3487037	1
5	0.1999038	0.1999038	0.1999038	3
6	0.2685961	0.2685961	0.2685961	2