1. Final, Problem 1. 30 Points. Playing with PageRank

You’ll verify for yourself that PageRank works by performing calculations on a small universe of web pages. Let’s use the 6 page universe that we had in the previous discussion For this directed graph, perform the following calculations in R.

1.1 Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes. (5 Points)

Matrix A from the notes:

A <- matrix(c(0, 1/2, 1/2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1/3, 1/3, 0, 0, 1/3, 0, 0, 0, 0, 0, 1/2, 1/2, 0, 0, 0, 1/2, 0, 1/2, 0, 0, 0, 1, 0, 0), nrow = 6, byrow = TRUE)
A
##           [,1]      [,2] [,3] [,4]      [,5] [,6]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000  0.0
## [2,] 0.0000000 0.0000000  0.0  0.0 0.0000000  0.0
## [3,] 0.3333333 0.3333333  0.0  0.0 0.3333333  0.0
## [4,] 0.0000000 0.0000000  0.0  0.0 0.5000000  0.5
## [5,] 0.0000000 0.0000000  0.0  0.5 0.0000000  0.5
## [6,] 0.0000000 0.0000000  0.0  1.0 0.0000000  0.0

The second row should be filled with 1/6 instead of 0s as we are building the matrix for the the 6 page universe:

ri <- matrix(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6),ncol=6)
A[2,]<-ri
A
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0000000 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3333333 0.3333333 0.0000000 0.0000000 0.3333333 0.0000000
## [4,] 0.0000000 0.0000000 0.0000000 0.0000000 0.5000000 0.5000000
## [5,] 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
## [6,] 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000

\[ A= \begin{bmatrix} 0 & 1/2 & 1/2 & 0 & 0 & 0 \\ 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\ 1/3 & 1/3 & 0 & 0 & 1/3 & 0 \\ 0& 0& 0& 0& 1/2& 1/2 \\ 0& 0& 0& 1/2& 0&1/2 \\ 0& 0& 0& 1& 0& 0 \end{bmatrix}\]

A new matrix B which is a decayed version of the original matrix A: \[B=0.85\cdot A + \frac{1-d}{n}\]

In the classic version by Page and Brin, the decay and n (size of matrix A) is: \[d=0.85, n=6\]

decay=0.85
n <- nrow(A)
B <- decay * A + (1-decay)/n
B 
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0250000 0.4500000 0.4500000 0.0250000 0.0250000 0.0250000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3083333 0.3083333 0.0250000 0.0250000 0.3083333 0.0250000
## [4,] 0.0250000 0.0250000 0.0250000 0.0250000 0.4500000 0.4500000
## [5,] 0.0250000 0.0250000 0.0250000 0.4500000 0.0250000 0.4500000
## [6,] 0.0250000 0.0250000 0.0250000 0.8750000 0.0250000 0.0250000

\[ B = \begin{bmatrix} 0.025 & 0.45 & 0.45 & 0.025 & 0.025 & 0.025 \\ 0.167 & 0.167 & 0.167 & 0.167 & 0.167 & 0.167 \\ 0.308 & 0.308 & 0.025 & 0.025 & 0.308 & 0.025 \\ 0.025 & 0.025 & 0.025 & 0.025 & 0.45 & 0.45 \\ 0.025 & 0.025 & 0.025 & 0.45 & 0.025 & 0.45 \\ 0.025 & 0.025 & 0.025 & 0.875 & 0.025 & 0.025 \end{bmatrix} \]

1.2 Start with a uniform rank vector r and perform power iterations on B till convergence. That is, compute the solution r = B_n × r. Attemptthis for a sufficiently large n so that r actually converges. (5 Points)

The convergence happens at the interation=30.

r_i <- rep(1/6, 6)

n <- 0
iter_0  <- matrix.power(t(B), n) %*%  r_i
iter_0 
##           [,1]
## [1,] 0.1666667
## [2,] 0.1666667
## [3,] 0.1666667
## [4,] 0.1666667
## [5,] 0.1666667
## [6,] 0.1666667
n <- 10
iter_10 <- matrix.power(t(B), n) %*%  r_i
iter_10 
##            [,1]
## [1,] 0.05205661
## [2,] 0.07428990
## [3,] 0.05782138
## [4,] 0.34797267
## [5,] 0.19975859
## [6,] 0.26810085
n <- 20
iter_20 <- matrix.power(t(B), n) %*%  r_i
iter_20
##            [,1]
## [1,] 0.05170616
## [2,] 0.07368173
## [3,] 0.05741406
## [4,] 0.34870083
## [5,] 0.19990313
## [6,] 0.26859408
n <- 30
r_converge <- matrix.power(t(B), n) %*%  r_i
r_converge
##            [,1]
## [1,] 0.05170475
## [2,] 0.07367927
## [3,] 0.05741242
## [4,] 0.34870367
## [5,] 0.19990381
## [6,] 0.26859607
n <- 40
iter_40 <- matrix.power(t(B), n) %*%  r_i
iter_40
##            [,1]
## [1,] 0.05170475
## [2,] 0.07367926
## [3,] 0.05741241
## [4,] 0.34870369
## [5,] 0.19990381
## [6,] 0.26859608

1.3 Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to 1.

The eigen-decomposition of B:

eig_b <- eigen(B) 
eig_b
## eigen() decomposition
## $values
## [1]  1.00000000  0.57619235 -0.42500001 -0.42499999 -0.34991524 -0.08461044
## 
## $vectors
##            [,1]       [,2]          [,3]          [,4]         [,5]
## [1,] -0.4082483 -0.7278031 -5.345224e-01  5.345225e-01 -0.795670150
## [2,] -0.4082483 -0.3721164 -5.216180e-09 -5.216180e-09  0.059710287
## [3,] -0.4082483 -0.5389259  5.345225e-01 -5.345225e-01  0.602762996
## [4,] -0.4082483  0.1174605 -2.672613e-01  2.672612e-01  0.002611877
## [5,] -0.4082483  0.1174605 -2.672613e-01  2.672612e-01  0.002611877
## [6,] -0.4082483  0.1174605  5.345225e-01 -5.345224e-01  0.002611877
##              [,6]
## [1,]  0.486246420
## [2,] -0.673469294
## [3,]  0.556554233
## [4,] -0.009145393
## [5,] -0.009145393
## [6,] -0.009145393

The eigenvalue of 1 is the largest eigenvalue:

max_value <- which.max(eigen(B)$values)
max_value
## [1] 1

The eigenvector has all positive entries and it sums to 1:

pagerank_2 <- eig_b$vectors[,1]/sum(eig_b$vectors[,1])
sum(pagerank_2)
## [1] 1

1.4 Use the graph package in R and its page.rank method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above.

We got the same results from the plot:

graph <- graph_from_adjacency_matrix(A, weighted = T)
plot(graph)

pageRank <- page.rank(graph)$vector

as.matrix(page.rank(graph)$vector)
##            [,1]
## [1,] 0.05170475
## [2,] 0.07367926
## [3,] 0.05741241
## [4,] 0.34870369
## [5,] 0.19990381
## [6,] 0.26859608
results <- (round(r_converge, 4) == round(pageRank, 4))
results
##      [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] TRUE
## [4,] TRUE
## [5,] TRUE
## [6,] TRUE