CUNY Data 605 - Assignment# 10

This document pertains to assignment# 10 and is meant to explore the PageRank. We will do so by using various method for calcuating the PageRank of a limited universe of 6 websites.

The relations between the websites is as follows:

Site	connects to
1	2
1	3
2	none
3	1
3	2
3	5
4	5
4	6
5	4
5	6
6	4

These relationships can be represented in the following Adjancy Matrix.

connections <- c(0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0)

ad_A <- matrix(connections, nrow=6, ncol=6, byrow = TRUE)

ad_A

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    1    0    0    0
## [2,]    0    0    0    0    0    0
## [3,]    1    1    0    0    1    0
## [4,]    0    0    0    0    1    1
## [5,]    0    0    0    1    0    1
## [6,]    0    0    0    1    0    0

From examining the adjancy matrix and the sites relationships, it is clear that we have node 2 as a dandling node. There are no outlink from node 2. We will remedy this by replacing row 2 in Matrix by value 1.

# Handling dangling node; 2, by adding 1 all accross in row 2

ad_A_fix<- ad_A
ad_A_fix[2, ] <- c(1, 1, 1, 1, 1, 1)

ad_A_fix

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    1    0    0    0
## [2,]    1    1    1    1    1    1
## [3,]    1    1    0    0    1    0
## [4,]    0    0    0    0    1    1
## [5,]    0    0    0    1    0    1
## [6,]    0    0    0    1    0    0

We will now convert our matrix into a row-wise stochastic matrix by dividing each row by the number of outlinks for that node. For node 2 (i.e. row 2, we will divide by 6 so that sum for the row equal to 1).

#Convert Adjancy matrix into stochastic matrix by dividing each rows by the number of outlinks for this node, 
#sum of each row is 1, for row 2 we will divide by number of nodes.

sto_A <- ad_A_fix
sto_A[1,] <- ad_A_fix[1,]/2
sto_A[2,] <- ad_A_fix[2,]/6
sto_A[3,] <- ad_A_fix[3,]/3
sto_A[4,] <- ad_A_fix[4,]/2
sto_A[5,] <- ad_A_fix[5,]/2
sto_A[6,] <- ad_A_fix[6,]/1

sto_A

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0000000 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3333333 0.3333333 0.0000000 0.0000000 0.3333333 0.0000000
## [4,] 0.0000000 0.0000000 0.0000000 0.0000000 0.5000000 0.5000000
## [5,] 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
## [6,] 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000

We will now introduce the “decay” element and construc the matrix B based on following formula:
\(B\quad =\quad \alpha \cdot A\quad +\quad \frac { (1-\alpha ) }{ n } \cdot E,\quad where\quad E\quad is\quad matrix\quad with\quad all\quad 1's\).
We will use \(\alpha =0.85\)

ones <- rep(1, 36)

E <- matrix (ones, nrow=6, ncol=6, byrow=TRUE)
E

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    1    1    1    1
## [2,]    1    1    1    1    1    1
## [3,]    1    1    1    1    1    1
## [4,]    1    1    1    1    1    1
## [5,]    1    1    1    1    1    1
## [6,]    1    1    1    1    1    1

a <- 0.85
n <- 6
B <- a*sto_A + (1-a)/n*E

B

##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0250000 0.4500000 0.4500000 0.0250000 0.0250000 0.0250000
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [3,] 0.3083333 0.3083333 0.0250000 0.0250000 0.3083333 0.0250000
## [4,] 0.0250000 0.0250000 0.0250000 0.0250000 0.4500000 0.4500000
## [5,] 0.0250000 0.0250000 0.0250000 0.4500000 0.0250000 0.4500000
## [6,] 0.0250000 0.0250000 0.0250000 0.8750000 0.0250000 0.0250000

We will now perform the power interactions on matrix B untill we have convergence. We will do so by calling the following function. This function will calculate successive\({ r }_{ i+1 }\quad =\quad { r }_{ i }\cdot B\) until we have \({ r }_{ i+1 }={ r }_{ i }\).

B_converge <- function(B, r, l){
  # Calculate r%*%B until convergence
  r_i<- r
  n <- 0
  loop_indicator <- TRUE
  while (loop_indicator){
    n <- n + 1
#   r_j <- B%*%r_i
    r_j <- r_i%*%B
    if (all(r_j == r_i)){
      loop_indicator <- FALSE  # end while loop when convergence occurred
      result <- list(r_i, n-1, 'converged')   # return results
    }else if(n>= l){
      loop_indicator <- FALSE  # end while loop when convergence occurred
      result <- list(r_i, n-1, 'limit')   # return results
    } #end of if else
    r_i<-r_j
  } #end of while loop
  return(result)
}

We will start calling the function with a limit of 100 (the iterations will end eventhough we have no convergence). We will start with a uniform rank vector, c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)

r0 <- c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)

res1 <- B_converge (B, r0, 100)
res1[[3]]

## [1] "converged"

r_final <- res1[[1]]
r_final

##            [,1]       [,2]       [,3]      [,4]      [,5]      [,6]
## [1,] 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961

We wil now calculate the (left) eigen values and vectors for the matrix B.

b_eigen_vectors <- eigen(t(B))$vectors
b_eigen_values <- eigen(t(B))$values

b_eigen_values

## [1]  1.00000000+0i  0.57619235+0i -0.42500000+0i -0.42500000-0i
## [5] -0.34991524+0i -0.08461044+0i

b_eigen_vectors

##              [,1]          [,2]                      [,3]
## [1,] 0.1044385+0i  0.2931457+0i  2.486934e-15+0.0000e+00i
## [2,] 0.1488249+0i  0.5093703+0i -8.528385e-16-6.9832e-23i
## [3,] 0.1159674+0i  0.3414619+0i -1.930646e-15-0.0000e+00i
## [4,] 0.7043472+0i -0.5890805+0i -7.071068e-01+0.0000e+00i
## [5,] 0.4037861+0i -0.1413606+0i  7.071068e-01+0.0000e+00i
## [6,] 0.5425377+0i -0.4135367+0i  0.000000e+00-1.7058e-08i
##                           [,4]           [,5]            [,6]
## [1,]  2.486934e-15-0.0000e+00i -0.06471710+0i -0.212296003+0i
## [2,] -8.528385e-16+6.9832e-23i  0.01388698+0i  0.854071294+0i
## [3,] -1.930646e-15+0.0000e+00i  0.07298180+0i -0.363638739+0i
## [4,] -7.071068e-01+0.0000e+00i -0.66058664+0i  0.018399984+0i
## [5,]  7.071068e-01-0.0000e+00i  0.73761812+0i -0.304719509+0i
## [6,]  0.000000e+00+1.7058e-08i -0.09918316+0i  0.008182973+0i

v1 <- Re(b_eigen_vectors[,1])

As expected, we have the largest eignvalue (in absolute value) = 1 and the corresponding vector v1.

We will now compare this vector with the vector we found by convergence.

v2 <- v1/r_final
v2

##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
## [1,] 2.019902 2.019902 2.019902 2.019902 2.019902 2.019902

Since their are a scalar multiple of each other, we can conclude that the eigen vector and the vector found by convergence are the same.
We will now find the PageRank by using the igraph package.

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

g <- graph_from_adjacency_matrix(ad_A, 'directed', weighted=TRUE)
plot(g)

v3 <- page_rank(g)$vector
v3

## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608

v3/r_final

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    1    1    1    1

CUNY Data 605 - Assignment# 10

Valerie Briot

November 18, 2016