Problem set 1. Playing with PageRank

You’ll verify for yourself that PageRank works by performing calculations on a small universe of web pages. Let’s use the 6 page universe that we had in the course notes. For this directed graph, perform the following calculations in R.

Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes.

library(Matrix)

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
#created a matrix just to show if there are connections between the pages
A <- matrix(data = c(0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0), 
            byrow = TRUE, nrow = 6, ncol = 6)
rownames(A) <- colnames(A) <- 1:6


decnorm <- function(M, decay=0.85){
  #Introduces decay and normalizes
  for (i in 1:nrow(M)){
    x <- sum(M[i,])
    for(j in 1:ncol(M)){
      if(x>0){
        M[i,j] <- M[i,j]/x
      }else{
        #we have a node with no outlinks, so we make it stochastic (thanks to Marco on MB)
        M[i,j] <- 1/nrow(M)
      }
      
    }
  }
  #Introduce Decay
  N <- decay * M + (1 - decay)/nrow(A)
  return(N)
}



B <- decnorm(A)
B
##           1         2         3         4         5         6
## 1 0.0250000 0.4500000 0.4500000 0.0250000 0.0250000 0.0250000
## 2 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 3 0.3083333 0.3083333 0.0250000 0.0250000 0.3083333 0.0250000
## 4 0.0250000 0.0250000 0.0250000 0.0250000 0.4500000 0.4500000
## 5 0.0250000 0.0250000 0.0250000 0.4500000 0.0250000 0.4500000
## 6 0.0250000 0.0250000 0.0250000 0.8750000 0.0250000 0.0250000
#intial rank
r <- matrix(data = 1/nrow(B), nrow = nrow(B))

Start with a uniform rank vector r and perform power iterations on B till convergence. That is, compute the solution \[ B^n × r \] Attempt this for a sufficiently large n so that r actually converges.

#after one iteration I have convergence, which makes me think I did something wrong.  
crossprod(B,r)
##         [,1]
## 1 0.09583333
## 2 0.16666667
## 3 0.11944444
## 4 0.26111111
## 5 0.16666667
## 6 0.19027778
sum(crossprod(B,r))
## [1] 1
iterate <- function(M,n=50){
  #calculation to find page rank
  iter <- diag(nrow(B))
  #initial rank
  r <- matrix(data = 1/nrow(M), nrow = nrow(M))
  for(i in 1:n){
    iter <- crossprod(iter,M)
  }
  return(crossprod(iter,r))
}


x <- iterate(B,1)
x
##         [,1]
## 1 0.09583333
## 2 0.16666667
## 3 0.11944444
## 4 0.26111111
## 5 0.16666667
## 6 0.19027778
# I created a function anyway and went with 100 iterations
my_calc <- iterate(B,100)
my_calc
##         [,1]
## 1 0.05170475
## 2 0.07367926
## 3 0.05741241
## 4 0.34870369
## 5 0.19990381
## 6 0.26859608

Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method.Further, thiseigenvector has all positive entries and it sums to 1.

eigenV <- eigen(B)

#the largest eigenvalue is 1
eigenV$values
## [1]  1.00000000  0.57619235 -0.42500001 -0.42499999 -0.34991524 -0.08461044
#not sure where to go next
vec <- eigenV$vectors[,1]
vec
## [1] -0.4082483 -0.4082483 -0.4082483 -0.4082483 -0.4082483 -0.4082483

Use the graph package in R and its page.rank method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above.

g <- graph.adjacency(A)
get.edgelist(g)
##       [,1] [,2]
##  [1,] "1"  "2" 
##  [2,] "1"  "3" 
##  [3,] "3"  "1" 
##  [4,] "3"  "2" 
##  [5,] "3"  "5" 
##  [6,] "4"  "5" 
##  [7,] "4"  "6" 
##  [8,] "5"  "4" 
##  [9,] "5"  "6" 
## [10,] "6"  "4"
plot(g)

page.rank(g)
## $vector
##          1          2          3          4          5          6 
## 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608 
## 
## $value
## [1] 1
## 
## $options
## NULL
graph_calc <- as.numeric(page.rank(g)$vector)
graph_calc
## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608
df <- as.data.frame(graph_calc)
df["my_calc"] <- as.numeric(my_calc)
df
##   graph_calc    my_calc
## 1 0.05170475 0.05170475
## 2 0.07367926 0.07367926
## 3 0.05741241 0.05741241
## 4 0.34870369 0.34870369
## 5 0.19990381 0.19990381
## 6 0.26859608 0.26859608
df["my_rank"] <- rank(-df$my_calc)
df["graph_rank"] <- rank(-df$graph_calc)
df
##   graph_calc    my_calc my_rank graph_rank
## 1 0.05170475 0.05170475       6          6
## 2 0.07367926 0.07367926       4          4
## 3 0.05741241 0.05741241       5          5
## 4 0.34870369 0.34870369       1          1
## 5 0.19990381 0.19990381       3          3
## 6 0.26859608 0.26859608       2          2

My pagerank vector seems to be correct.