library(igraph)  # used to compute the Page Rank of the n square matrix A

Final Exam


Final Problem 1. 30 Points

PAGERANK: AN APPLICATION OF PROBABILITY AND LINEAR ALGEBRA

1. Playing with PageRank

You’ll verify for yourself that PageRank works by performing calculations on a small universe of web pages.

Let’s use the 6 page universe that we had in the previous discussion For this directed graph, perform the following calculations in R.

* Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes. (5 Points)

Form the A matrix
a <- c(0  , 1/2, 1/2,   0,   0,   0,
       0  ,   0,   0,   0,   0,   0,
       1/3, 1/3,   0,   0, 1/3,   0,
       0  ,   0,   0,   0, 1/2, 1/2,
       0  ,   0,   0, 1/2,   0, 1/2,
       0  ,   0,   0,   1,   0,   0)


n <- 6   # Use 6 to correspond with the 6 URLs of our test universe

A <- matrix(data = a, nrow = n, ncol = n, byrow = TRUE)

# Print the A matrix
A
#>           [,1]      [,2] [,3] [,4]      [,5] [,6]
#> [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000  0.0
#> [2,] 0.0000000 0.0000000  0.0  0.0 0.0000000  0.0
#> [3,] 0.3333333 0.3333333  0.0  0.0 0.3333333  0.0
#> [4,] 0.0000000 0.0000000  0.0  0.0 0.5000000  0.5
#> [5,] 0.0000000 0.0000000  0.0  0.5 0.0000000  0.5
#> [6,] 0.0000000 0.0000000  0.0  1.0 0.0000000  0.0


Apply the decay and form the B matrix

\(B = d \times A + \frac{(1-d)}{n}\)

d <- 0.85  # chosen decay d = 0.85 (a.k.a damping factor)

B1 <- d * A + ( 1 - d ) / n

B1
#>           [,1]      [,2]  [,3]  [,4]      [,5]  [,6]
#> [1,] 0.0250000 0.4500000 0.450 0.025 0.0250000 0.025
#> [2,] 0.0250000 0.0250000 0.025 0.025 0.0250000 0.025
#> [3,] 0.3083333 0.3083333 0.025 0.025 0.3083333 0.025
#> [4,] 0.0250000 0.0250000 0.025 0.025 0.4500000 0.450
#> [5,] 0.0250000 0.0250000 0.025 0.450 0.0250000 0.450
#> [6,] 0.0250000 0.0250000 0.025 0.875 0.0250000 0.025


* Start with a uniform rank vector r and perform power iterations on B until convergence. That is, compute the solution \(r = B^n \times r\). Attempt this for a sufficiently large \(n\) so that \(r\) actually converges. (5 Points)


Define function that will compute the solution to \(r = B^n \times r\)
pageRank <- function(B, MaxIterations = 100) {

  urls        <- dim(B)[1]
  kth_power_B <- B
  i           <- 0
  converged_i <- TRUE
  converged   <- FALSE
  
  # define initial uniform rank vector r
  r <- r_f <- matrix( rep( 1 / urls, urls ), nrow = urls, ncol = 1)
  
  #while ( (converged != TRUE) & (i < MaxIterations) ) {
  while ( (converged != converged_i) & (i < MaxIterations) ) {
    
    # perform a new iteration because r_f has not yet converged
  
    converged_i <- converged
    
    # new ith r vector
    r_i <- r                         
    
    # matrix multiplication
    kth_power_B < kth_power_B %*% B  
    
    # probability of finding a user on a page/url
    r <- kth_power_B %*% r           
    
    r_f <- r
    
    # increase iteration counter
    i <- i + 1

    # check if the new r_f vector converged to the previous ith r vector
    converged <- all.equal(r_i, r_f)

  }
  
  # after the r_f converged, normalize it
  r <- r_f / sqrt( sum( r_f ^ 2 ) )
    
  output <- list("Converged" = converged,
                 "Iterations" = i,
                 "r" = r,
                 "Stabilized_B" =  kth_power_B)
  
  return(output)
}


Perform power iterations on B until convergence of r is achieved or max number of iterations is achieved.
r_ranking <- pageRank(B1, 100)


After \(n = 23\) iterations, the converged \(r_f\) vector is:

r_ranking["r"]
#> $r
#>           [,1]
#> [1,] 0.2159123
#> [2,] 0.0572329
#> [3,] 0.2980792
#> [4,] 0.5358032
#> [5,] 0.5358032
#> [6,] 0.5358032


The stabilized B matrix is:

r_ranking["Stabilized_B"]
#> $Stabilized_B
#>           [,1]      [,2]  [,3]  [,4]      [,5]  [,6]
#> [1,] 0.0250000 0.4500000 0.450 0.025 0.0250000 0.025
#> [2,] 0.0250000 0.0250000 0.025 0.025 0.0250000 0.025
#> [3,] 0.3083333 0.3083333 0.025 0.025 0.3083333 0.025
#> [4,] 0.0250000 0.0250000 0.025 0.025 0.4500000 0.450
#> [5,] 0.0250000 0.0250000 0.025 0.450 0.0250000 0.450
#> [6,] 0.0250000 0.0250000 0.025 0.875 0.0250000 0.025


* Compute the eigen-decomposition of \(B\) and verify that you indeed get an eigenvalue of \(1\) as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to \(1\).(10 points)


Compute the eigen-decomposition of \(B\)
eigen_decomp <- eigen(B1)

eigen_decomp
#> eigen() decomposition
#> $values
#> [1]  0.95165271 -0.42500000 -0.42500000  0.41436582 -0.34733611 -0.01868242
#> 
#> $vectors
#>            [,1]          [,2]          [,3]        [,4]         [,5]
#> [1,] -0.2159123  5.345225e-01  5.345225e-01 -0.77869913 -0.775079691
#> [2,] -0.0572329 -3.580707e-17 -3.124265e-17 -0.07537631  0.009090375
#> [3,] -0.2980792 -5.345225e-01 -5.345225e-01 -0.61034825  0.631781588
#> [4,] -0.5358032  2.672612e-01  2.672612e-01  0.07169632  0.002637034
#> [5,] -0.5358032  2.672612e-01  2.672612e-01  0.07169632  0.002637034
#> [6,] -0.5358032 -5.345225e-01 -5.345225e-01  0.07169632  0.002637034
#>             [,6]
#> [1,] -0.55060201
#> [2,]  0.61511270
#> [3,] -0.56386946
#> [4,]  0.01322899
#> [5,]  0.01322899
#> [6,]  0.01322899


Find the largest eigenvalue of the eigen-decomposition
maxEigenVal <- max(eigen_decomp$values)

maxEigenVal
#> [1] 0.9516527

The maximum value of the eigen-decomposition is not \(1\), but it is close to it.


Compare the eigenvector of the highest eigenvalue with the \(r\) vector of the power iteration method
eigenvec <- eigen_decomp$vectors[,1]

comp_vecs <- data.frame(
  "r" = r_ranking$r,
  "eigenvector" = eigenvec
)

comp_vecs
#>           r eigenvector
#> 1 0.2159123  -0.2159123
#> 2 0.0572329  -0.0572329
#> 3 0.2980792  -0.2980792
#> 4 0.5358032  -0.5358032
#> 5 0.5358032  -0.5358032
#> 6 0.5358032  -0.5358032

We notice that the values of the eigenvector are all negative but they have the same magnitude as the values of the \(r\) vector of the power iteration method.


Check if eigenvector has all positive entries and it sums to \(1\).
# sum all elements of the eigenvector
sum(eigenvec)
#> [1] -2.178634
# sum all elements of the eigenvector in normalized form
sum(1/sum(eigenvec) * eigenvec)
#> [1] 1

From the results above, we can see that all the elements of the eigenvector are negative and do not add up to \(1\). However, we can see that the normalized version of the eigenvector does add up to \(1\).


* Use the \(graph\) package in \(R\) and its \(page.rank\) method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above. (10 points)


Create a directed graph from the A adjacency matrix
A_graph <- igraph::graph_from_adjacency_matrix( adjmatrix = A, mode = "directed", weighted = TRUE)


Plot the graph
plot(A_graph)


Compute the Page Rank of the graph
igraphPageRank <- igraph::page.rank(A_graph)

as.matrix(igraphPageRank$vector)
#>            [,1]
#> [1,] 0.05170475
#> [2,] 0.07367926
#> [3,] 0.05741241
#> [4,] 0.34870369
#> [5,] 0.19990381
#> [6,] 0.26859608


Verify that you do get the same PageRank vector as the two approaches above.
comp_vecs <- data.frame(
  "r" = r_ranking$r,
  "eigenvector" = eigenvec,
  "igraph_vector" = igraphPageRank$vector
)

comp_vecs
#>           r eigenvector igraph_vector
#> 1 0.2159123  -0.2159123    0.05170475
#> 2 0.0572329  -0.0572329    0.07367926
#> 3 0.2980792  -0.2980792    0.05741241
#> 4 0.5358032  -0.5358032    0.34870369
#> 5 0.5358032  -0.5358032    0.19990381
#> 6 0.5358032  -0.5358032    0.26859608

From the results above, we can see that the PageRank vector is not the same as the vectors obtained in the two previous approaches above.



---
title: "DATA605 - Final Exam - Problem 1"
author: "Esteban Aramayo"
date: "2022-05-22"
output: openintro::lab_report
---

```{r global-options, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE,
                      collapse = FALSE,
                      comment = "#>" )
```

```{r}
library(igraph)  # used to compute the Page Rank of the n square matrix A
```



# Final Exam


<br>

## Final Problem 1. 30 Points

### PAGERANK: AN APPLICATION OF PROBABILITY AND LINEAR ALGEBRA

### 1. Playing with PageRank

You'll verify for yourself that PageRank works by performing calculations on a small universe of web pages.

Let's use the 6 page universe that we had in the previous discussion For this directed graph, perform the following calculations in R.


<img src="Toy_Example_6_URL.png" style="width:160px;height=80px">

<!-- ![Toy Example with 6 URLs](Toy_Example_6_URL.png){width=50%}{height=50%} -->


#### * Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes. (5 Points)


##### Form the A matrix


```{r}
a <- c(0  , 1/2, 1/2,   0,   0,   0,
       0  ,   0,   0,   0,   0,   0,
       1/3, 1/3,   0,   0, 1/3,   0,
       0  ,   0,   0,   0, 1/2, 1/2,
       0  ,   0,   0, 1/2,   0, 1/2,
       0  ,   0,   0,   1,   0,   0)


n <- 6   # Use 6 to correspond with the 6 URLs of our test universe

A <- matrix(data = a, nrow = n, ncol = n, byrow = TRUE)

# Print the A matrix
A

```


<!-- From the output above, we notice that the 2nd row of the A matrix contains all zeroes. This is because the node 2 of our URL universe has no outgoing links. This can cause convergence problems when performing power iterations of the A matrix to generate the B matrix. To address this potential issue, we will assign a value of 1/n to each element of the 2nd row to ensure that the probability of visiting any of the $n$ URLs from node 2 is equally probable and also will allow us to ensure that the total probability of that row adds up to 1. -->

<!-- Redefine the A matrix -->

```{r eval=FALSE, echo=FALSE}
# a <- c(0  , 1/2, 1/2,   0,   0,   0,
#        1/n, 1/n, 1/n, 1/n, 1/n, 1/n,
#        1/3, 1/3,   0,   0, 1/3,   0,
#        0  ,   0,   0,   0, 1/2, 1/2,
#        0  ,   0,   0, 1/2,   0, 1/2,
#        0  ,   0,   0,   1,   0,   0)
# 
# A <- matrix(data = a, nrow = n, ncol = n, byrow = TRUE)
# 
# # Print the A matrix
# A
```



<br>

##### Apply the decay and form the B matrix

$B = d \times A + \frac{(1-d)}{n}$

```{r}
d <- 0.85  # chosen decay d = 0.85 (a.k.a damping factor)

B1 <- d * A + ( 1 - d ) / n

B1
```



<br>

#### * Start with a uniform rank vector r and perform power iterations on B until convergence. That is, compute the solution $r = B^n \times r$. Attempt this for a sufficiently large $n$ so that $r$ actually converges. (5 Points)

<br>

##### Define function that will compute the solution to $r = B^n \times r$

```{r}
pageRank <- function(B, MaxIterations = 100) {

  urls        <- dim(B)[1]
  kth_power_B <- B
  i           <- 0
  converged_i <- TRUE
  converged   <- FALSE
  
  # define initial uniform rank vector r
  r <- r_f <- matrix( rep( 1 / urls, urls ), nrow = urls, ncol = 1)
  
  #while ( (converged != TRUE) & (i < MaxIterations) ) {
  while ( (converged != converged_i) & (i < MaxIterations) ) {
    
    # perform a new iteration because r_f has not yet converged
  
    converged_i <- converged
    
    # new ith r vector
    r_i <- r                         
    
    # matrix multiplication
    kth_power_B < kth_power_B %*% B  
    
    # probability of finding a user on a page/url
    r <- kth_power_B %*% r           
    
    r_f <- r
    
    # increase iteration counter
    i <- i + 1

    # check if the new r_f vector converged to the previous ith r vector
    converged <- all.equal(r_i, r_f)

  }
  
  # after the r_f converged, normalize it
  r <- r_f / sqrt( sum( r_f ^ 2 ) )
    
  output <- list("Converged" = converged,
                 "Iterations" = i,
                 "r" = r,
                 "Stabilized_B" =  kth_power_B)
  
  return(output)
}

```

<br>

##### Perform power iterations on B until convergence of r is achieved or max number of iterations is achieved.

```{r}
r_ranking <- pageRank(B1, 100)
```

<br>

After $n = `r r_ranking["Iterations"]`$ iterations, the converged $r_f$ vector is:

```{r}
r_ranking["r"]
```

<br>

The stabilized B matrix is:

```{r}
r_ranking["Stabilized_B"]
```




<br>

#### * Compute the eigen-decomposition of $B$ and verify that you indeed get an eigenvalue of $1$ as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to $1$.(10 points)

<br>

##### Compute the eigen-decomposition of $B$

```{r}
eigen_decomp <- eigen(B1)

eigen_decomp
```

<br>

##### Find the largest eigenvalue of the eigen-decomposition

```{r}
maxEigenVal <- max(eigen_decomp$values)

maxEigenVal
```

The maximum value of the eigen-decomposition is not $1$, but it is close to it.

<br>

##### Compare the eigenvector of the highest eigenvalue with the $r$ vector of the power iteration method

```{r}

eigenvec <- eigen_decomp$vectors[,1]

comp_vecs <- data.frame(
  "r" = r_ranking$r,
  "eigenvector" = eigenvec
)

comp_vecs

```

We notice that the values of the eigenvector are all negative but they have the same magnitude as the values of the $r$ vector of the power iteration method.

<br>

##### Check if eigenvector has all positive entries and it sums to $1$.

```{r}
# sum all elements of the eigenvector
sum(eigenvec)

# sum all elements of the eigenvector in normalized form
sum(1/sum(eigenvec) * eigenvec)
```

From the results above, we can see that **all the elements of the eigenvector are negative** and do not add up to $1$. However, we can see that **the normalized version of the eigenvector** does add up to $1$.



<br>

#### * Use the $graph$ package in $R$ and its $page.rank$ method to compute the Page Rank of the graph as given in A. Note that you don't need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above. (10 points)

<br>

##### Create a directed graph from the A adjacency matrix


```{r}
A_graph <- igraph::graph_from_adjacency_matrix( adjmatrix = A, mode = "directed", weighted = TRUE)
```

<br>

##### Plot the graph

```{r}
plot(A_graph)
```

<br>

##### Compute the Page Rank of the graph

```{r}
igraphPageRank <- igraph::page.rank(A_graph)

as.matrix(igraphPageRank$vector)
```

<br>

##### Verify that you do get the same PageRank vector as the two approaches above.

```{r}
comp_vecs <- data.frame(
  "r" = r_ranking$r,
  "eigenvector" = eigenvec,
  "igraph_vector" = igraphPageRank$vector
)

comp_vecs
```


From the results above, we can see that the PageRank vector is not the same as the vectors obtained in the two previous approaches above.

<br>

<br>

