Answer:
Answer:
Next, create a link (or connectivity) matrix ???? in R/RStudio that represents this system of equations. Matrix ???? should be a column stochastic matrix. (Note: a “column stochastic” matrix is defined as having all non-negative entries and where the sum of each column equals 1 (one).)
Question a. What is the link (or connectivity) matrix ???? that represents this system of
equations?
Question b. Is the link matrix "column-stochastic"? Why?## 3. Set-up the link (connectivity) matrix
A <- matrix(c(0, 0, 1, 1/2, 1/3, 0, 0, 0, 1/3, 1/2, 0, 1/2, 1/3, 1/2, 0, 0),4,4, byrow = TRUE)
Last, scale this eigenvector to obtain the individual “importance score values”. You can use the sum of the whole number values of the eigenvector components to do this.
Find the eigenvalues and eigenvectors for the matrix ???? using the eigen() function we have discussed in R/RStudio. (Note: in general the eigenvectors produced will have both real and imaginary parts. We are only interested in the eigenvector associated with the eigenvalue = 1. It only has real numbers associated with it.) In order to compute the eigenvector comprised of all whole numbers as shown in Bryan and Leise’s paper multiply the eigenvector by function by 31 (remember that eigenvectors are not unique). This will return the desired eigenvector [12 4 9 6]T. Answer:
## Find and normalize the dominant eigenvector to compute the pageRanks
ea <- eigen(A)
ea
## eigen() decomposition
## $values
## [1] 1.0000000+0.0000000i -0.3606233+0.4109755i -0.3606233-0.4109755i
## [4] -0.2787533+0.0000000i
##
## $vectors
## [,1] [,2] [,3]
## [1,] 0.7210101+0i -0.7552157+0.0000000i -0.7552157+0.0000000i
## [2,] 0.2403367+0i 0.3036721+0.3460725i 0.3036721-0.3460725i
## [3,] 0.5407576+0i 0.0931532-0.2746779i 0.0931532+0.2746779i
## [4,] 0.3605051+0i 0.3583904-0.0713946i 0.3583904+0.0713946i
## [,4]
## [1,] 0.5064856+0i
## [2,] -0.6056557+0i
## [3,] -0.3815392+0i
## [4,] 0.4807092+0i
# normalize the dominant eigenvector values
nr_scaled_values <- length(ea$vectors[,1])
ea_vectors <- ea$vectors[,1]
sca <- sum(ea$vectors)
for(i in 1:nr_scaled_values){
ea_vectors[i] <- ea$vectors[i]/sca
}
ea_vectors
## [1] 0.3870968+0i 0.1290323+0i 0.2903226+0i 0.1935484+0i
ea_vectors <- Re(ea_vectors)
ea_vectors * 31
## [1] 12 4 9 6
## automate the process to determine pageRank
##
## first set-up a representative webgraph
##
library(igraph)
## Warning: package 'igraph' was built under R version 3.4.4
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
g <- graph(c(1, 2, 1, 3, 1, 4, 2, 3, 2, 4, 3, 1, 4, 1, 4, 3), directed = TRUE)
# plot the webgraph
tkplot(g)
## [1] 1
## (additional solution) 4 . Use the first method to compute the pageRanks
library(matlib)
## Warning: package 'matlib' was built under R version 3.4.4
Imatrix4 <- matrix(c(1,0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), 4, 4, byrow = TRUE)
ea2 <- gaussianElimination(A-Imatrix4)
ea2[4,4] = -1
ea2_wholeNumbers <- ea2*-6
ea2_wholeNumbers
## [,1] [,2] [,3] [,4]
## [1,] -6 0 0 12
## [2,] 0 -6 0 4
## [3,] 0 0 -6 9
## [4,] 0 0 0 6
ea2_scaled <- ea2_wholeNumbers/31
ea2_scaled[,4]
## [1] 0.3870968 0.1290323 0.2903226 0.1935484
####################################
Question a. Briefly discuss what the "importance scores" or page ranks of the nodes in
the web shown in Figure 2.1 are. Include in your discussion how the nodes are
ordered by page rank and why they are ordered in this way.
# normalize the dominant eigenvector values
nr_scaled_values <- length(ea$vectors[,1])
ea_vectors <- ea$vectors[,1]
sca <- sum(ea$vectors)
## This finds the modified matrix M
B = get.adjacency(g, sparse = FALSE)
B = t(B / rowSums(B))
B
## [,1] [,2] [,3] [,4]
## [1,] 0.0000000 0.0 1 0.5
## [2,] 0.3333333 0.0 0 0.0
## [3,] 0.3333333 0.5 0 0.5
## [4,] 0.3333333 0.5 0 0.0
n = nrow(B)
U = matrix(data = rep(1/n, n^2), nrow = n, ncol = n)
U
## [,1] [,2] [,3] [,4]
## [1,] 0.25 0.25 0.25 0.25
## [2,] 0.25 0.25 0.25 0.25
## [3,] 0.25 0.25 0.25 0.25
## [4,] 0.25 0.25 0.25 0.25
beta = 0.85
M = beta*B + (1-beta)*U
M
## [,1] [,2] [,3] [,4]
## [1,] 0.0375000 0.0375 0.8875 0.4625
## [2,] 0.3208333 0.0375 0.0375 0.0375
## [3,] 0.3208333 0.4625 0.0375 0.4625
## [4,] 0.3208333 0.4625 0.0375 0.0375
## This finds the modified matrix M
B = get.adjacency(g, sparse = FALSE)
B = t(B / rowSums(B))
B
## [,1] [,2] [,3] [,4]
## [1,] 0.0000000 0.0 1 0.5
## [2,] 0.3333333 0.0 0 0.0
## [3,] 0.3333333 0.5 0 0.5
## [4,] 0.3333333 0.5 0 0.0
n = nrow(B)
U = matrix(data = rep(1/n, n^2), nrow = n, ncol = n)
U
## [,1] [,2] [,3] [,4]
## [1,] 0.25 0.25 0.25 0.25
## [2,] 0.25 0.25 0.25 0.25
## [3,] 0.25 0.25 0.25 0.25
## [4,] 0.25 0.25 0.25 0.25
beta = 0.85
M = beta*B + (1-beta)*U
M
## [,1] [,2] [,3] [,4]
## [1,] 0.0375000 0.0375 0.8875 0.4625
## [2,] 0.3208333 0.0375 0.0375 0.0375
## [3,] 0.3208333 0.4625 0.0375 0.4625
## [4,] 0.3208333 0.4625 0.0375 0.0375
Question a. Equation 3.1 has two terms, (1 ??? ????)???? and ????????. What do each of these
terms represent?
Question b. Consistent with Bryan and Leise's paper you were instructed to use a
value of 0.15 for the weighting factor ????. Why use this particular value?
Find the eigenvalues and eigenvectors for the matrix ???? using the eigen() function as we have discussed in R/RStudio. The values of the eigenvector associated with the dominant eigenvalue need to be scaled as before.
Question a. Briefly discuss the page rank values obtained for the modified link matrix
???? as compared to those obtained from the original link matrix ????. Include in
our discussion any changes in the order of the page ranks and why or why not such
changes occur. ## Calculate the eigenvalues/eigenvectors of M
e = eigen(M)
v <- e$vectors[,1]
#Scale the values of the dominant eigenvector of M to get the pageRanks
v <- as.numeric(v) / sum(as.numeric(v))
v
## [1] 0.3681507 0.1418094 0.2879616 0.2020783
## Now use the page_rank() function to find the pageRanks
page_rank(g)$vector
## [1] 0.3681507 0.1418094 0.2879616 0.2020783
In general, the power method depends on recursion to find a single eigenvalue, e.g. the largest or smallest eigenvalue associated with a system. For more information see Page and Brin’s original paper (1999), “The PageRank Citation Ranking: Bringing Order to the Web”. For example, civil engineers may want to find the smallest eigenvalue for a bridge or support column. This smallest eigenvalue occurs when the system is under maximum load. The corresponding eigenvector represents the shape of the system, e.g. shape of the support column at the instant of failure under this maximum load. The single eigenvalue we are interested in is 1 (one). This is the largest, and dominant, eigenvalue and has associated a dominant left dominant eigenvector.
There are a couple things you should understand in order to reliably use an algorithm embodying the power method. You should be aware that there is a difference between using a simple loop in an iterative fashion in order to repeat an operation and a recursive procedure that actually calls itself (i.e. the recursive procedure) multiple times. When loops are used in algorithms they are typically used to iterate a specified number of times. In general, a recursive algorithm will continue to run until some convergence criteria is met. So, it is important to carefully specify the convergence criteria. And, it is important to properly use arguments, i.e. input values in matrices that allow recursion to achieve the convergence criteria.
Computations using numerical methods normally do not result in exact values. Convergence criteria represent the tolerance allowed due to this inexact nature of numerical methods. Convergence criteria are established based on the desire that the difference in a computed value from iteration to iteration approaches the exact amount desired. That is, as this difference becomes smaller and approaches zero, assuming the algorithm has been coded correctly, then the computed value approaches the desired exact value. Typical convergence criteria are small, e.g. the difference in computed value equals or is less than 0.001 or smaller.
However, numerical methods might not produce monotonic results. The differences in computed values can both decrease and increase. Therefore, care must be taken in implementing convergence criteria. There are many different problems that can occur, e.g. an algorithm might never converge to a given criteria. If there are problems with either the convergence criteria or with data values in the arguments then an infinite loop can occur. In this case you might or might not get an error message. Alternatively, you can set an algorithm to only run for some number of recursions but then you may fail to converge.
To use the script you will need to install and attach 3 (three) additional packages in R/RStudio; igraph, expm, and arpack. For some reason the dependencies wouldn’t install with igraph. So, I had to use the following command rather than the pull down menu:
Before we implement the power method the R script uses a slightly different syntax to recompute the modified matrix ????.
## Now use a slightly different method to compute the pageRanks of M
library(expm)
## Warning: package 'expm' was built under R version 3.4.4
## Loading required package: Matrix
##
## Attaching package: 'expm'
## The following object is masked from 'package:Matrix':
##
## expm
n = nrow(M)
U = matrix(data=rep(1/n, n^2), nrow = n, ncol = n)
U
## [,1] [,2] [,3] [,4]
## [1,] 0.25 0.25 0.25 0.25
## [2,] 0.25 0.25 0.25 0.25
## [3,] 0.25 0.25 0.25 0.25
## [4,] 0.25 0.25 0.25 0.25
beta = 0.85
C = beta*B + (1-beta)*U
C
## [,1] [,2] [,3] [,4]
## [1,] 0.0375000 0.0375 0.8875 0.4625
## [2,] 0.3208333 0.0375 0.0375 0.0375
## [3,] 0.3208333 0.4625 0.0375 0.4625
## [4,] 0.3208333 0.4625 0.0375 0.0375
r = matrix(data = rep(1/n, n), nrow = n, ncol = 1)
r
## [,1]
## [1,] 0.25
## [2,] 0.25
## [3,] 0.25
## [4,] 0.25
## Use the power method to compute page ranks by setting the number of
## iterations as 100
pageRanks <- t(C%^%100 %*% r)
pageRanks
## [,1] [,2] [,3] [,4]
## [1,] 0.3681507 0.1418094 0.2879616 0.2020783
Question a. Briefly discuss how the power method is implemented in this part of the
laboratory, e.g. what matrix operations are carried out, is the original link matrix A
used in the computations, etc
Question b. Briefly discuss how the page ranks ("importance scores") computed after
100 iterations compare to those calculated previously including any changes in the
ordering of the nodes. Also include in your discussion how you would determine the
number of iterations to use if this were required input to run the algorithm.
## Use the power method to compute page ranks using convergence criteria
set.seed(1234)
n <- length(M[,1])
iter <- 1
maxiter <- 1000
tolerance <- 0.00001
ccrit <- 100
pwr <- M %*% M
ckCC <- matrix(nrow=n, ncol=maxiter)
ckCC[1,1] <- 100
while((abs(ccrit) > tolerance) && (iter < maxiter)) {
iter <- iter + 1
pwr <- M %*% pwr
PRout <- pwr %*% r
ckCC[,iter] <- PRout[,]
ccrit <- ckCC[1,iter-1] - ckCC[1,iter]
}
print(iter)
## [1] 12
print(t(PRout))
## [,1] [,2] [,3] [,4]
## [1,] 0.3681576 0.141809 0.2879588 0.2020746
Question a. How many iterations are required for the algorithm to converge? Is it
important to know how many iterations are required? Why? How does the number
of iterations required for the algorithm to converge to a solution compare to the
specified 100 iterations in Part 10?
Question b. If the number of iterations is significantly less than 100 is that important
and why or why not? For example, mention whether or not the convergence criteria
used is sufficient. Also, if the number of iterations required is small briefly explain
how this can be true in the context of extremely large matrix operations.
Question c. Briefly discuss how the page ranks obtained using convergence criteria
compare with those obtained using a specified number of iterations. Include in your
discussion whether or not any differences in values are important and why or why
not.