Final_Assignment: PageRank

Final Project (1/3) - PLAYING WITH PAGERANK

Required Libraries

library(matlib)
library(Matrix)
library(expm)
library(igraph)

This document will examine and verify that PageRank works by performing calculations on a small universe of web pages. We will analyze the 6 page universe provided in the course for the following calculations.

1. Matrices Formation

Forming the A matrix

An A matrix will be formed, then decayed to create a B matrix

#Transition Matrix for six websites(nodes)
A <- matrix(c(0,1/2,1/2,0,0,0,0,0,0,0,0,0,1/3,1/3,0,0,1/3,0,0,0,0,0,1/2,1/2,0,0,0,1/2,0,1/2,0,0,0,1,0,0),nrow  = 6)
A = t(A)
round(A, 3)

##       [,1]  [,2] [,3] [,4]  [,5] [,6]
## [1,] 0.000 0.500  0.5  0.0 0.000  0.0
## [2,] 0.000 0.000  0.0  0.0 0.000  0.0
## [3,] 0.333 0.333  0.0  0.0 0.333  0.0
## [4,] 0.000 0.000  0.0  0.0 0.500  0.5
## [5,] 0.000 0.000  0.0  0.5 0.000  0.5
## [6,] 0.000 0.000  0.0  1.0 0.000  0.0

Forming the B matrix (decay)

Using the insight of Page and Brin to handle the concept of decay (disjoint graphs). The transition matrix will become well-behaved and have rows that sum to 1. The introduction of a new matrix B which is a decayed version of the original matrix A (meaning each element of A is multiplied by a number smaller than unity) together with a uniform vector length of n which assigns some finite probability for reaching any web page at random.

To make every row add up to unity, the uniform probability is properly adjusted to be \((1-d)\) where d is the decay applied to the original transition matrix A. In the classic version by Page and Brin, they chose: \(d = 0.85\)

\[ B = d * A + \frac{1-d}{n} \\ B = 0.85 * A + \frac{1 - 0.85}{n} \\ B = 0.85 * A + \frac{0.15}{n} \]

Converting A matrix to a stochasitc matrix. Apply the decay to create a matrix B.

n=dim(A)[1]
A = apply(A,1,  function(x) if(sum(x)!=1) return(x+ (1/n)) else return(x))

B <- 0.85 * A + 0.15 / nrow(A)
round(B, 3)

##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] 0.025 0.167 0.308 0.025 0.025 0.025
## [2,] 0.450 0.167 0.308 0.025 0.025 0.025
## [3,] 0.450 0.167 0.025 0.025 0.025 0.025
## [4,] 0.025 0.167 0.025 0.025 0.450 0.875
## [5,] 0.025 0.167 0.308 0.450 0.025 0.025
## [6,] 0.025 0.167 0.025 0.450 0.450 0.025

2. Power Iterations

Perform a power iterations with a uniform rank vector r on the B matrix till convergence. That is, compute the solution \(r = B^n * r\). Attempt this for a sufficiently large n so that r actually converges.

Step 2a. Create an 1 x n vector \(r\) which represents the PageRank of all the n pages.

#initial rank of the URLs (vector r)
r <- matrix(c(1/6,1/6,1/6,1/6,1/6,1/6),ncol=1)
r

##           [,1]
## [1,] 0.1666667
## [2,] 0.1666667
## [3,] 0.1666667
## [4,] 0.1666667
## [5,] 0.1666667
## [6,] 0.1666667

Step 2b. Performing power iterations on B matrix till convergence (r converges at 30).

#Power iterations on decomposition matrix - 30 
br = format((B %^% 30) %*% r, scientific=F)
br = as.numeric(br)
br

## [1] 0.05170475 0.07367927 0.05741242 0.34870367 0.19990381 0.26859607

Step 2c. Performing Step 2. on a sufficiently large n so that r actually converges.

#Power iterations on decomposition matrix - 1000
format((B %^% 1000) %*% r, scientific=F)

##      [,1]        
## [1,] "0.05170475"
## [2,] "0.07367926"
## [3,] "0.05741241"
## [4,] "0.34870369"
## [5,] "0.19990381"
## [6,] "0.26859608"

3. Eigen Decomposition

Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to 1.

Computing the eigen-decompostion on B matrix.

#Eigen value for B
eigen(B)

## eigen() decomposition
## $values
## [1]  1.00000000+0i  0.57619235+0i -0.42500000+0i -0.42500000-0i -0.34991524+0i
## [6] -0.08461044+0i
## 
## $vectors
##              [,1]          [,2]                        [,3]
## [1,] 0.1044385+0i  0.2931457+0i  2.945054e-15+5.507002e-22i
## [2,] 0.1488249+0i  0.5093703+0i -1.223015e-15-0.000000e+00i
## [3,] 0.1159674+0i  0.3414619+0i -2.241513e-15-6.032865e-22i
## [4,] 0.7043472+0i -0.5890805+0i -7.071068e-01+0.000000e+00i
## [5,] 0.4037861+0i -0.1413606+0i  7.071068e-01+0.000000e+00i
## [6,] 0.5425377+0i -0.4135367+0i  0.000000e+00-2.145851e-08i
##                             [,4]           [,5]            [,6]
## [1,]  2.945054e-15-5.507002e-22i -0.06471710+0i -0.212296003+0i
## [2,] -1.223015e-15+0.000000e+00i  0.01388698+0i  0.854071294+0i
## [3,] -2.241513e-15+6.032865e-22i  0.07298180+0i -0.363638739+0i
## [4,] -7.071068e-01+0.000000e+00i -0.66058664+0i  0.018399984+0i
## [5,]  7.071068e-01-0.000000e+00i  0.73761812+0i -0.304719509+0i
## [6,]  0.000000e+00+2.145851e-08i -0.09918316+0i  0.008182973+0i

View the sum of eigenvalue vectors

# Sum of the vector is greater than 1 for the eigen value 1
sum(eigen(B)$vectors[,1])

## [1] 2.019902+0i

# Change it to unit vector
e1 = as.numeric(eigen(B)$vectors[,1]/ sum(eigen(B)$vectors[,1]))
e1

## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608

# Difference between power method and eigen vector is negligible
sum(br - e1)

## [1] -1e-08

print(paste0("The sum of the eigenvector is: ", sum(e1)))

## [1] "The sum of the eigenvector is: 1"

# Difference between power method and eigen vector is negligible
#sum(br - e1)
paste0("The difference between power method and eigen vector contains a negligible value of: ", sum(br - e1))

## [1] "The difference between power method and eigen vector contains a negligible value of: -1.00000000294309e-08"

paste0("The power method and eigen vector is identical, up to the Millionths: ",identical(round(e1, 6), round(br, 6)))

## [1] "The power method and eigen vector is identical, up to the Millionths: TRUE"

4. Visualization

Graph of Page Rank - A matrix : the graph package starts with a connected graph and applies decay internally.

#Converting to graph from adjacency matrix
g1 = graph_from_adjacency_matrix(t(A),weighted = T)
#Plot the graph
plot(g1)

#Resultant vector
page_rank(g1)$vector

## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608

5. Approach Comparison - Results

All calculations: Power Iteration, Eigenvector, and PageRank produced identical values.

data.frame(url = 1:6) %>%
    dplyr::mutate(Power_Iteration = br,
                  Eigenvector = e1,
                  `Graph (page_rank)` = page_rank(g1)$vector)

##   url Power_Iteration Eigenvector Graph (page_rank)
## 1   1      0.05170475  0.05170475        0.05170475
## 2   2      0.07367927  0.07367926        0.07367926
## 3   3      0.05741242  0.05741241        0.05741241
## 4   4      0.34870367  0.34870369        0.34870369
## 5   5      0.19990381  0.19990381        0.19990381
## 6   6      0.26859607  0.26859608        0.26859608

Additional Final Reports

RPubs - Final (2/3): Digit Recognizer

RPubs - Final (3/3): House Prices - Advanced Regression Techniques Competition