You’ll verify for yourself that PageRank works by performing calculations on a small universe of web pages. Let’s use the 6-page universe that we had in the course notes.
\[\mathbf{A} =\begin{bmatrix} 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ \frac { 1 }{ 3 } & \frac { 1 }{ 3 } & 0 & 0 & \frac { 1 }{ 3 } & 0 \\ 0 & 0 & 0 & 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & \frac { 1 }{ 2 } & 0 & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}\]
For this directed graph, perform the following calculations in R.
The above discrete-time Markov chain since it is not stochastic (square with non-negative real elements that sum to one), irreducible (have a path from every node to every other node), or aperiodic (have a greatest common divisor, period, of 1 for the length of every cycle in the graph). This is due to the all-zero dangling node in the second row which represents a page with no outgoing links of its own. The discrete-time Markov chain is therefore introduced in order to show how the calculations yield the results expected for a well-behaved matrix (network, graph).
\[\mathbf{A} =\begin{bmatrix} 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } & 0 & 0 & 0 \\ \color{red}{0} & \color{red}{0} & \color{red}{0} & \color{red}{0} & \color{red}{0} & \color{red}{0} \\ \frac { 1 }{ 3 } & \frac { 1 }{ 3 } & 0 & 0 & \frac { 1 }{ 3 } & 0 \\ 0 & 0 & 0 & 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & \frac { 1 }{ 2 } & 0 & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}, \quad \mathbf{G} =\begin{bmatrix} 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } & 0 & 0 & 0 \\ \color{red}{\frac { 1 }{ 6 }} & \color{red}{\frac { 1 }{ 6 }} & \color{red}{\frac { 1 }{ 6 }} & \color{red}{\frac { 1 }{ 6 }} & \color{red}{\frac { 1 }{ 6 }} & \color{red}{\frac { 1 }{ 6 }} \\ \frac { 1 }{ 3 } & \frac { 1 }{ 3 } & 0 & 0 & \frac { 1 }{ 3 } & 0 \\ 0 & 0 & 0 & 0 & \frac { 1 }{ 2 } & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & \frac { 1 }{ 2 } & 0 & \frac { 1 }{ 2 } \\ 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}\]
(1.a) Form the \(\mathbf{A}\) matrix. Then, introduce decay (multiply by a number smaller than unity) and form the \(\mathbf{B}\) matrix as we did in the course notes.
a <- c(0,1/2,1/2,0,0,0, 0,0,0,0,0,0, 1/3,1/3,0,0,1/3,0,
0,0,0,0,1/2,1/2, 0,0,0,1/2,0,1/2, 0,0,0,1,0,0)
n <- sqrt(length(a))
behavior <- rowSums(matrix(a, nrow=n))
order <- all(behavior == 1 | behavior ==0)
(A <- matrix(a, nrow = n, ncol = n, byrow=order))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.0 0 0.3333333 0.0 0.0 0
## [2,] 0.5 0 0.3333333 0.0 0.0 0
## [3,] 0.5 0 0.0000000 0.0 0.0 0
## [4,] 0.0 0 0.0000000 0.0 0.5 1
## [5,] 0.0 0 0.3333333 0.5 0.0 0
## [6,] 0.0 0 0.0000000 0.5 0.5 0
G <- A; G[ ,2] <- rep(1 / n, n); G
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.0 0.1666667 0.3333333 0.0 0.0 0
## [2,] 0.5 0.1666667 0.3333333 0.0 0.0 0
## [3,] 0.5 0.1666667 0.0000000 0.0 0.0 0
## [4,] 0.0 0.1666667 0.0000000 0.0 0.5 1
## [5,] 0.0 0.1666667 0.3333333 0.5 0.0 0
## [6,] 0.0 0.1666667 0.0000000 0.5 0.5 0
Damping effectively links to all pages in this matrix (network, graph, web) as seen below. \[{ B }=d\times { A }+\frac { \left( 1-d \right) }{ n }\]
d <- 0.85
(B_1 <- d * A + (1 - d) / n)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.025 0.025 0.3083333 0.025 0.025 0.025
## [2,] 0.450 0.025 0.3083333 0.025 0.025 0.025
## [3,] 0.450 0.025 0.0250000 0.025 0.025 0.025
## [4,] 0.025 0.025 0.0250000 0.025 0.450 0.875
## [5,] 0.025 0.025 0.3083333 0.450 0.025 0.025
## [6,] 0.025 0.025 0.0250000 0.450 0.450 0.025
In R, A^n is to A %*% A.
pagerank <- function(B) {
pages <- dim(B)[1]; Test <- B; n = 1; check <- F
r <- r_i <- r_f <- matrix(rep(1/pages, pages), nrow = pages, ncol = 1)
while (check != TRUE) {
r_i <- r
Test <- Test %*% B
r <- Test %*% r
r_f <- r
n = n + 1
check <- all.equal(r_i, r_f)
}
r <- r_f / sqrt(sum(r_f^2)) # normalize
results <- list("Iterations" = n,
"r" = r,
"Stabilized" = round(Test, 7))
return(results)
}
(PageRank_1 <- pagerank(B_1))
## $Iterations
## [1] 26
##
## $r
## [,1]
## [1,] 0.07608991
## [2,] 0.11007101
## [3,] 0.08481827
## [4,] 0.71837250
## [5,] 0.39690895
## [6,] 0.54891237
##
## $Stabilized
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.0048201 0.0012777 0.0066545 0.0119616 0.0119616 0.0119616
## [2,] 0.0069728 0.0018483 0.0096263 0.0173035 0.0173035 0.0173035
## [3,] 0.0053731 0.0014243 0.0074178 0.0133337 0.0133337 0.0133337
## [4,] 0.0455075 0.0120629 0.0628256 0.1129303 0.1129303 0.1129303
## [5,] 0.0251434 0.0066649 0.0347119 0.0623953 0.0623953 0.0623953
## [6,] 0.0347725 0.0092173 0.0480054 0.0862907 0.0862907 0.0862907
At a tolerance of 0.000000015, \(r_f \rightarrow r_i\) after \(n = 26\) iterations.
max(Re(eigen(B_1)$values))
## [1] 0.9516527
data.frame(
"r" = PageRank_1$r[ ,1] / sqrt(sum(PageRank_1$r[ ,1]^2)),
"v" = matrix(Re(eigen(B_1)$vectors)[ ,1]))
## r v
## 1 0.07608991 0.07608991
## 2 0.11007101 0.11007101
## 3 0.08481827 0.08481827
## 4 0.71837250 0.71837250
## 5 0.39690895 0.39690895
## 6 0.54891237 0.54891237
sum(PageRank_1$Stabilized[,1])
## [1] 0.1225894
Given the decayed matrix \(\mathbf{B_1}\) obtained by decaying matrix \(\mathbf{A}\) which is not well-behaved, \(r=v\) for \(\max { \left( \lambda \right)}\), but the largest eigenvalue of \(\mathbf{B_1}\) is \(\lambda = 0.9516527 \neq 1\). Furthermore, the Perron Frobenius Theorem does not hold. Although the largest eigenvalue has an associated eigenvector that is strictly positive, the values do not sum to one: \(\sum { v } =0.1225894\neq 1\)
B_2 <- d * G + (1 - d) / n
PageRank_2 <- pagerank(B_2)
max(Re(eigen(B_2)$values))
## [1] 1
data.frame(
"r" = PageRank_2$r[ ,1] / sqrt(sum(PageRank_2$r[ ,1]^2)),
"v" = matrix(Re(eigen(B_2)$vectors)[ ,1]))
## r v
## 1 0.1044385 0.1044385
## 2 0.1488249 0.1488249
## 3 0.1159674 0.1159674
## 4 0.7043472 0.7043472
## 5 0.4037861 0.4037861
## 6 0.5425377 0.5425377
sum(PageRank_2$Stabilized[,1])
## [1] 1
Given the decayed matrix \(\mathbf{B_2}\) obtained by decaying the matrix \(\mathbf{G}\) which is well-behaved, \(r=v\) for \(\max { \left( \lambda \right)}\) and the largest eigenvalue of \(\mathbf{B_2}\) is \(\lambda = 1\). Furthermore, the Perron Frobenius Theorem holds. The largest eigenvalue has an associated eigenvector that is strictly positive, and the values sum to one: \(\sum { v } =1\)
library(igraph)
g_1 <- graph_from_adjacency_matrix(t(A), weighted = TRUE, mode = "directed")
data.frame(
"r" = PageRank_1$r,
"v" = matrix(Re(eigen(B_1)$vectors)[ ,1]),
"igraph" = matrix(page_rank(g_1)$vector) / sqrt(sum(matrix(page_rank(g_1)$vector)^2))
)
## r v igraph
## 1 0.07608991 0.07608991 0.1044385
## 2 0.11007101 0.11007101 0.1488249
## 3 0.08481827 0.08481827 0.1159674
## 4 0.71837250 0.71837250 0.7043472
## 5 0.39690895 0.39690895 0.4037861
## 6 0.54891237 0.54891237 0.5425377
When normalized, the igraph package page_rank method does not return the same vector as the previous two approaches for matrix \(\mathbf{A}\) which is not well-behaved.
g_2 <- graph_from_adjacency_matrix(t(G), weighted = TRUE, mode = "directed")
data.frame(
"r" = PageRank_2$r,
"v" = matrix(Re(eigen(B_2)$vectors)[ ,1]),
"igraph" = matrix(page_rank(g_2)$vector) / sqrt(sum(matrix(page_rank(g_2)$vector)^2))
)
## r v igraph
## 1 0.1044385 0.1044385 0.1044385
## 2 0.1488249 0.1488249 0.1488249
## 3 0.1159674 0.1159674 0.1159674
## 4 0.7043472 0.7043472 0.7043472
## 5 0.4037861 0.4037861 0.4037861
## 6 0.5425377 0.5425377 0.5425377
When normalized, the igraph package page_rank method does return the same vector as the previous two approaches for matrix \(\mathbf{G}\) which is well-behaved.
par(mfrow=c(1, 2))
plot.igraph(g_1, main="Matrix A", xlab ="Not Well-Behaved")
plot.igraph(g_2, main="Matrix G", xlab ="Well-Behaved")
http://mathworld.wolfram.com/Perron-FrobeniusTheorem.html
http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html
http://www.cs.cmu.edu/~elaw/pagerank.pdf
http://cesg.tamu.edu/wp-content/uploads/2014/09/ECEN689-lec18.pdf
http://www.math.uah.edu/stat/markov/Periodicity.html
http://www.math.uah.edu/stat/markov/Periodicity.html
http://www.mathematik.uni-ulm.de/stochastik/lehre/ss06/markov/skript_engl/node12.html
http://igraph.org/r/doc/page_rank.html
https://cran.r-project.org/web/packages/igraph/igraph.pdf
http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html