DATA 605: Assignment 10

Page Rank

Verify that PageRank works by performing calculations on a small universe of web pages. We’ll use the 6 page universe that we had in the course notes. For this directed graph, perform the following calculations in R.

1. Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes.

# Form the A matrix.

r1 <- c(0,   1/2, 1/2, 0,   0,   0)
r2 <- c(0,   0,   0,   0,   0,   0)
r3 <- c(1/3, 1/3, 0,   0,   1/3, 0)
r4 <- c(0,   0,   0,   0,   1/2, 1/2)
r5 <- c(0,   0,   0,   1/2, 0,   1/2)
r6 <- c(0,   0,   0,   1,   0, 0)


A <- matrix(c(r1, r2, r3, r4, r5, r6), byrow = T, nrow = 6, ncol = 6 )
A

##           [,1]      [,2] [,3] [,4]      [,5] [,6]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000  0.0
## [2,] 0.0000000 0.0000000  0.0  0.0 0.0000000  0.0
## [3,] 0.3333333 0.3333333  0.0  0.0 0.3333333  0.0
## [4,] 0.0000000 0.0000000  0.0  0.0 0.5000000  0.5
## [5,] 0.0000000 0.0000000  0.0  0.5 0.0000000  0.5
## [6,] 0.0000000 0.0000000  0.0  1.0 0.0000000  0.0

# convert the transition matrix to a column stochastic matrix

A <- t(A)
A

##      [,1] [,2]      [,3] [,4] [,5] [,6]
## [1,]  0.0    0 0.3333333  0.0  0.0    0
## [2,]  0.5    0 0.3333333  0.0  0.0    0
## [3,]  0.5    0 0.0000000  0.0  0.0    0
## [4,]  0.0    0 0.0000000  0.0  0.5    1
## [5,]  0.0    0 0.3333333  0.5  0.0    0
## [6,]  0.0    0 0.0000000  0.5  0.5    0

# create the initial probability vector, which is 1/n where n = the number of websites
(r <- rep(1/6, 6))

## [1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

Calculate 1 iteration

This should match Table 1-Iteration 1 from the wk10_PangeRank.pdf reading

A %*% r

##            [,1]
## [1,] 0.05555556
## [2,] 0.13888889
## [3,] 0.08333333
## [4,] 0.25000000
## [5,] 0.13888889
## [6,] 0.16666667

Calculate 2 iterations

This should match Table 1-Iteration 2 from the wk10_PangeRank.pdf reading

A %*% A %*% r

##            [,1]
## [1,] 0.02777778
## [2,] 0.05555556
## [3,] 0.02777778
## [4,] 0.23611111
## [5,] 0.15277778
## [6,] 0.19444444

Before adding decay, first confirm that the column sums of matrix A equal 1.

Matrix A

##      [,1] [,2]      [,3] [,4] [,5] [,6]
## [1,]  0.0    0 0.3333333  0.0  0.0    0
## [2,]  0.5    0 0.3333333  0.0  0.0    0
## [3,]  0.5    0 0.0000000  0.0  0.0    0
## [4,]  0.0    0 0.0000000  0.0  0.5    1
## [5,]  0.0    0 0.3333333  0.5  0.0    0
## [6,]  0.0    0 0.0000000  0.5  0.5    0

# column sums of A
colSums(A)

## [1] 1 0 1 1 1 1

We see that node 2 sums to zero, which is indicative of a “dangling node” or a node with no outlinks. Node 2 is essentially disconnected from the remaining 5 nodes. Without accounting for this in the matrix, the future step to find convergence will not stablize.

To account for this, we’ll adjust the vector for node 2 to assume an equal chance that a user will nagivate to any one of the 6 web pages or 1/6.

r2_alt <- rep(1/6, 6)

# recreate the A matrix

A <- matrix(c(r1, r2_alt, r3, r4, r5, r6), byrow = T, nrow = 6, ncol = 6 )

# convert to column-stochastic
A <- t(A)

A

##      [,1]      [,2]      [,3] [,4] [,5] [,6]
## [1,]  0.0 0.1666667 0.3333333  0.0  0.0    0
## [2,]  0.5 0.1666667 0.3333333  0.0  0.0    0
## [3,]  0.5 0.1666667 0.0000000  0.0  0.0    0
## [4,]  0.0 0.1666667 0.0000000  0.0  0.5    1
## [5,]  0.0 0.1666667 0.3333333  0.5  0.0    0
## [6,]  0.0 0.1666667 0.0000000  0.5  0.5    0

# reconfirm the matrix column sums

colSums(A)

## [1] 1 1 1 1 1 1

Now, introduce decay and form the B matrix

\[ B = 0.85 * A + 0.15/n \]

(B <- 0.85 * A + 0.15/6)

##       [,1]      [,2]      [,3]  [,4]  [,5]  [,6]
## [1,] 0.025 0.1666667 0.3083333 0.025 0.025 0.025
## [2,] 0.450 0.1666667 0.3083333 0.025 0.025 0.025
## [3,] 0.450 0.1666667 0.0250000 0.025 0.025 0.025
## [4,] 0.025 0.1666667 0.0250000 0.025 0.450 0.875
## [5,] 0.025 0.1666667 0.3083333 0.450 0.025 0.025
## [6,] 0.025 0.1666667 0.0250000 0.450 0.450 0.025

2. Start with a uniform rank vector \(r\) and perform power iterations on \(B\) till convergence. That is, compute the solution \(r = Bn × r\) . Attempt this for a sufficiently large \(n\) so that \(r\) actually converges.

Good reference papers for this topic:

http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture2/lecture2.html http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html

require(matrixcalc)

## Loading required package: matrixcalc

# create the vector r with the starting probabilities of hitting one out of the 6 pages
# this is equivalent to 1/6 or 1/n where n = number of web pages

(r <- rep(1/6, 6))

## [1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

# for power iterations, use the matrix.power function from the package matrixcalc
# repeat using a higher number of iterations until convergence 

# two iterations

(matrix.power(B, 2) %*% r)   # equilavent of B %*% B %*% r

##            [,1]
## [1,] 0.08245370
## [2,] 0.12318287
## [3,] 0.08934028
## [4,] 0.28118056
## [5,] 0.19342593
## [6,] 0.23041667

# 20 iterations
(matrix.power(B, 20) %*% r)

##            [,1]
## [1,] 0.05170616
## [2,] 0.07368173
## [3,] 0.05741406
## [4,] 0.34870083
## [5,] 0.19990313
## [6,] 0.26859408

# 30 iterations
(matrix.power(B, 30) %*% r)

##            [,1]
## [1,] 0.05170475
## [2,] 0.07367927
## [3,] 0.05741242
## [4,] 0.34870367
## [5,] 0.19990381
## [6,] 0.26859607

# Convergence at n = 40 iterations
r_converge <- matrix.power(B, 40) %*% r

r_converge

##            [,1]
## [1,] 0.05170475
## [2,] 0.07367926
## [3,] 0.05741241
## [4,] 0.34870369
## [5,] 0.19990381
## [6,] 0.26859608

3. Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to 1.

eigen_B <- eigen(B)

eigenvalues_B <- as.numeric(eigen_B$values)

## Warning: imaginary parts discarded in coercion

# confirm the largest eigenvalue is 1
max(eigenvalues_B)

## [1] 1

# and the corresponding eigenvector is the same vector that you obtained in the previous power iteration method

eigenvectors_B <- as.numeric(eigen_B$vectors[, which.max(eigenvalues_B)])

An initial look at the resulting eigenvectors shows a different result than expected:

eigenvectors_B

## [1] 0.1044385 0.1488249 0.1159674 0.7043472 0.4037861 0.5425377

# summing the eigenvectors_B results in a value greater than 1

sum(eigenvectors_B)

## [1] 2.019902

However, we know that eigenvectors can be multipled by a scalar. To noramlize, we’ll multiple each eigenvector values by 1/sum(eigenvectors_B):

eigen_decomp <- (1/sum(eigenvectors_B))*eigenvectors_B; eigen_decomp

## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608

# confirm all entries greater than 0
eigen_decomp > 0

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

# summing the normalized eigenvectors now results in a sum of 1
sum(eigen_decomp)

## [1] 1

4. Use the graph package in R and its page.rank method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above.

suppressWarnings(suppressMessages(library(igraph)))

# Create the Adjacency Graph from the matrix A
# Note: we need to convert the A matrix back to row-stochastic form
# additionally, do not include the r2 vector which was included to allow for 
# convergence

A <- matrix(c(r1, r2, r3, r4, r5, r6), byrow = T, nrow = 6, ncol = 6 )
A

##           [,1]      [,2] [,3] [,4]      [,5] [,6]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000  0.0
## [2,] 0.0000000 0.0000000  0.0  0.0 0.0000000  0.0
## [3,] 0.3333333 0.3333333  0.0  0.0 0.3333333  0.0
## [4,] 0.0000000 0.0000000  0.0  0.0 0.5000000  0.5
## [5,] 0.0000000 0.0000000  0.0  0.5 0.0000000  0.5
## [6,] 0.0000000 0.0000000  0.0  1.0 0.0000000  0.0

graph_A <- graph_from_adjacency_matrix(A, weighted = T, mode = "directed")
plot(graph_A, main = "Example with 6 URLs")

# calculate PageRank using page.rank from the igraph package
pageRank <- page.rank(graph_A)$vector

pageRank

## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608

# show equivalence between power iteration and page.rank, down to 8 decimal places
round(r_converge, 8) == round(pageRank, 8)

##      [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] TRUE
## [4,] TRUE
## [5,] TRUE
## [6,] TRUE

# show equivalence between power iteration and eigen-decomposition, down to 8 decimal places
round(r_converge, 8) == round(eigen_decomp, 8)

##      [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] TRUE
## [4,] TRUE
## [5,] TRUE
## [6,] TRUE

Power Iteration	eigen-decomposition	igraph page.rank
0.0517047	0.0517047	0.0517047
0.0736793	0.0736793	0.0736793
0.0574124	0.0574124	0.0574124
0.3487037	0.3487037	0.3487037
0.1999038	0.1999038	0.1999038
0.2685961	0.2685961	0.2685961

DATA 605: Assignment 10

Keith Folsom

October 29, 2016

Page Rank

Calculate 1 iteration

Calculate 2 iterations

Before adding decay, first confirm that the column sums of matrix A equal 1.

PageRank Results Comparison