Data 605 - Week 10

1. Playing With Page Rank

From the course notes, Form the \(A\) matrix. Then, introduce decay and form the B matrix as we did in the course notes.

The network, as defined in the class notes, is below.

Forming the A matrix, where \(A_{ij} = \frac{1}{|P_i|}\), i.e. - the probability of randomly going from i \(\rightarrow\) j.

A <- matrix(c(
    0, 1/2, 1/2, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 
    1/3, 1/3, 0, 0, 1/3, 0,
    0, 0, 0, 0, 1/2, 1/2,
    0, 0, 0, 1/2, 0, 1/2,
    0, 0, 0, 1, 0, 0
    ),
    byrow = TRUE, nrow = 6
)

Our matrix is not stochastic, it has a disjointed node, 2. URL 2 is an island - it has no way off via an outgoing URL link. In order to for the matrix to converge and have a eigen value of 1 (under the Perron Frobenius Theorem), we need to do a little operation before forming the decay matrix \(B\).

The probability of randomly landing on a disconnected URL, in our case URL 2, is the same probability of randomly landing on any URL. Therefore, the disconnected row has the probability of \(\frac{1}{n}\), where \(n\) is the size of the URL-verse

(A_stoc <- (apply(A, 1, sum)==0)*1/nrow(A)+A)

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0000000 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3333333 0.3333333 0.0000000 0.0000000 0.3333333 0.0000000
## [4,] 0.0000000 0.0000000 0.0000000 0.0000000 0.5000000 0.5000000
## [5,] 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
## [6,] 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000

# ensuring our matrix is stochastic:
apply(A_stoc, 1, sum)

## [1] 1 1 1 1 1 1

The matrix \(B\) will adjust our altered stochastic matrix \(A\) to also account for the probability of randomly landing on any of the webpages randomly. The decay matrix will take the form of \(B = 0.85A + \frac{0.15}{n}\)

(B <- 0.85*A_stoc + 0.15/(nrow(A)))

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0250000 0.4500000 0.4500000 0.0250000 0.0250000 0.0250000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3083333 0.3083333 0.0250000 0.0250000 0.3083333 0.0250000
## [4,] 0.0250000 0.0250000 0.0250000 0.0250000 0.4500000 0.4500000
## [5,] 0.0250000 0.0250000 0.0250000 0.4500000 0.0250000 0.4500000
## [6,] 0.0250000 0.0250000 0.0250000 0.8750000 0.0250000 0.0250000

Start with a uniform rank vector r and perform power iterations on B till convergence. That is, compute the solution \(r = B^n\times r\). Attempt this for a sufficiently large \(n\) so that r actually converges.

Since the initial rank of all pages is \(1/(\Sigma\:pages)\), all pages will have an initial rank of 0.167.

We will run power iterations on B until it converges within an error of 1/1e8.

r_i <- matrix(c(rep(1/nrow(A), nrow(A))), ncol = 1)

converge <- matrix(rep(1/1e8, nrow(A)), ncol = 1)
n <- 1
compare <- matrix(rep(0, nrow(A)), ncol = 1)

while(sum(abs(t(B)%^%n%*%r_i-compare)>converge)!=0) {
    compare <- t(B)%^%n%*%r_i
    n <- n+1
}
(n)

## [1] 30

Ensuring that our matrix is converging and not decaying towards 0, we inspect the matrix before and after our resulted \(30^{th}\) iteration.

(t(B)%^%(n-1)%*%r_i)

##            [,1]
## [1,] 0.05170476
## [2,] 0.07367928
## [3,] 0.05741242
## [4,] 0.34870367
## [5,] 0.19990381
## [6,] 0.26859607

(t(B)%^%(n+1)%*%r_i)

##            [,1]
## [1,] 0.05170475
## [2,] 0.07367927
## [3,] 0.05741242
## [4,] 0.34870368
## [5,] 0.19990381
## [6,] 0.26859608

From the output it is clear that out iterations are causing the Page Rank \(r\) to converge.

Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to 1.

Confirming that the largest eigenvalue of our matrix \(B\) is 1. There are imaginary rows in the eigen column vectors, so the values must have been coerced to imaginary with all imaginary portions of 0:

#eigen object for B
eig_B <- eigen(t(B))

eig_B$values

## [1]  1.00000000+0i  0.57619235+0i -0.42500000+0i -0.42500000-0i
## [5] -0.34991524+0i -0.08461044+0i

We can see that the first column of our eiganvalues is indeed 1.

Testing to see if the corresponding column of the eigenvector matches our solution for page rank:

(test_equal <- data.frame(comp_solution = t(B)%^%(n-1)%*%r_i,
                          eigenVector = as.numeric(eig_B$vectors[,1])))

##   comp_solution eigenVector
## 1    0.05170476   0.1044385
## 2    0.07367928   0.1488249
## 3    0.05741242   0.1159674
## 4    0.34870367   0.7043472
## 5    0.19990381   0.4037861
## 6    0.26859607   0.5425377

Our solution is not normalized, unlike R’s eigenvector output. Normalizing our solution and comparing to the eigenvector:

r <- t(B)%^%(n-1)%*%r_i
r_norm <- r/sqrt(sum(r^2))
test_equal$comp_solution <- r_norm
test_equal

##   comp_solution eigenVector
## 1     0.1044385   0.1044385
## 2     0.1488249   0.1488249
## 3     0.1159675   0.1159674
## 4     0.7043472   0.7043472
## 5     0.4037861   0.4037861
## 6     0.5425377   0.5425377

We see that our solution \(r\) is equal to the eigenvector or the corresponding eigenvalue = 1.

Checking to see that our \(r\) solution is stochastic, i.e. sums to 1:

sum(r)

## [1] 1

Use the graph package in R and its page.rank method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above.

The page rank algorithm is available from the igraph package. The mapping of the network is shown below:

url_graph <- graph.adjacency(A, mode = "directed", weighted = TRUE)
plot(url_graph)

Testing whether igraph’s page.rank produces the same results as our previous methods above:

p_rank <- page.rank(url_graph)$vector
data.frame(comp_solution = r, igraph_solution = p_rank)

##   comp_solution igraph_solution
## 1    0.05170476      0.05170475
## 2    0.07367928      0.07367926
## 3    0.05741242      0.05741241
## 4    0.34870367      0.34870369
## 5    0.19990381      0.19990381
## 6    0.26859607      0.26859608

We see that the solution provided by page.rank is the same up to 7 decimal places as our computational solution. The page.rank alogorithm may use approximations due to possibile large matricies it could have to solve.

Data 605 - Week 10

Liam Byrne

April 9, 2017

1. Playing With Page Rank