Ken Jinkun Xiao
March 19, 2014
Vectors in R can be constructed using the c() function. Pat- terned vectors can be constructed using seq(), rep() and \( : \). Matrices can be constructed using the functions matrix(), cbind() or rbind().
The class of Hilbert Matrices is useful when studying stability of numerical linear algebra techniques. \[ H_{ij} = 1/(i+j - 1) \]
Here is the \( 4 \times 4 \) Hilbert Matrix:
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
We could construct the Hilbert matrix by binding columns to- gether as follows:
1/cbind(seq(1, 4), seq(2, 5), seq(3, 6), seq(4,7))
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
In this example, rbind() would give the same result, because of symmetry.
Note that
rep(1:4,rep(4,4))+rep(0:3,4)
[1] 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
So
matrix(rep(1:4,rep(4,4))+rep(0:3,4),nrow=4)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 7
We can get the Hilbert matrix using:
H4 <- 1/matrix(rep(1:4,rep(4,4))+rep(0:3,4),nrow=4)
H4
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
This is a better way to construct Hilbert matrices, since we now can use the idea for any size of matrix:
n <- 4 # this gives a 4 x 4 Hilbert Matrix
H4 <- 1/matrix(rep(1:n,rep(n,n))+rep(0:(n-1),n),nrow=n)
H4
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
Bonus Question: Create an excel file with a cell input of the dimension number and produce a Hilbert Matrix following
Example:
matrix(seq(1, 6), nrow=2)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Example 2:
cbind(H4, 1/5:8)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000 0.1666667
[3,] 0.3333333 0.2500000 0.2000000 0.1666667 0.1428571
[4,] 0.2500000 0.2000000 0.1666667 0.1428571 0.1250000
We access the \( (i, j) \) element of a matrix \( X \) using \( X[i,j] \).
H4[3, 2]
[1] 0.25
We can access the ith row using \( X[i,] \), and the jth column using \( X[, j] \).
H4[3,]
[1] 0.3333333 0.2500000 0.2000000 0.1666667
Usually, the names of rows and columns of a matrix do not have names:
colnames(H4)
NULL
rownames(H4)
NULL
We can add names to columns and/or rows:
colnames(H4) <- c("h1", "h2", "h3", "h4")
Example:
colnames(H4)
[1] "h1" "h2" "h3" "h4"
We can now access the second column of H4 with
H4[,"h2"]
[1] 0.5000000 0.3333333 0.2500000 0.2000000
We can create a matrix with rownames by applying rbind() to some vectors.
colors <- c("red", "blue", "yellow")
numbers <- c("one", "two", "three")
charmat <- rbind(colors, numbers)
charmat
[,1] [,2] [,3]
colors "red" "blue" "yellow"
numbers "one" "two" "three"
The dimension of a matrix is its number of rows and its number of columns. For example,
dim(charmat)
[1] 2 3
We can count the numbers of rows and columns with nrow() and ncol():
nrow(charmat)
[1] 2
ncol(charmat)
[1] 3
The determinant can be found in R using the det() function, as in
H4
h1 h2 h3 h4
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
det(H4)
[1] 1.653439e-07
A diagonal matrix can be obtained using the diag() function applied to a vector (which will contain the elements of the diagonal):
diag(1:4)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 2 0 0
[3,] 0 0 3 0
[4,] 0 0 0 4
The diagonal elements of a matrix can be obtained using the diag() function, as in
diag(H4)
[1] 1.0000000 0.3333333 0.2000000 0.1428571
Exercise: What is the result of the following?
diag(diag(H4))
The outer product of column vectors x and y is \( xy^T \).
The outer() function computes this product, by default, but can produce other forms as well.
The function outer() can be used to perform an operation on all possible pairs of elements coming from two vectors. For example, we will compute all quotients among pairs of elements of the sequence running from 1 through 5.
x1 <- seq(1, 5)
outer(x1, x1, "/")
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0.5 0.3333333 0.25 0.2
[2,] 2 1.0 0.6666667 0.50 0.4
[3,] 3 1.5 1.0000000 0.75 0.6
[4,] 4 2.0 1.3333333 1.00 0.8
[5,] 5 2.5 1.6666667 1.25 1.0
Replacing the division operation with the subtraction operator gives all pairwise differences.
outer(x1, x1, "-")
[,1] [,2] [,3] [,4] [,5]
[1,] 0 -1 -2 -3 -4
[2,] 1 0 -1 -2 -3
[3,] 2 1 0 -1 -2
[4,] 3 2 1 0 -1
[5,] 4 3 2 1 0
The third argument can be any function that takes two vector arguments. The second argument can dier from the first. For example,
y <- seq(5, 10)
outer(x1, y, "+")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 6 7 8 9 10 11
[2,] 7 8 9 10 11 12
[3,] 8 9 10 11 12 13
[4,] 9 10 11 12 13 14
[5,] 10 11 12 13 14 15
This is a very useful function which is often more effcient than a for() loop. For example, compute the product of the rows of H4 using
apply(H4, 2, prod)
h1 h2 h3 h4
0.041666667 0.008333333 0.002777778 0.001190476
The second argument specifies whether the operation is to be applied to rows (1) or columns (2).
The third argument species the function which should be ap- plied. In this case, prod(x) function multiplies the elements of x.
test <- matrix(1:6, nrow=3)
test
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
apply(test, 1, sum)
[1] 5 7 9
apply(test, 2, sum)
[1] 6 15
The \( (i, j) \) element of the following outer product matrix is of the form \( x_i^{y_j} \)
X <- outer(1:3, 1:2, function(x,y) x^y)
X
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
We can compute the trace of a matrix \( X \) using
diag(X)
[1] 1 4
sum(diag(X))
[1] 5
sum(diag(H4))
[1] 1.67619
The t() function is used to calculate the matrix transpose \( X^T \):
t(charmat)
colors numbers
[1,] "red" "one"
[2,] "blue" "two"
[3,] "yellow" "three"
The functions lower.tri() and upper.tri() can be used to obtain the lower and upper triangular parts of matrices.
The output of the functions is a matrix of logical elements, with TRUE representing the relevant triangular elements. For example,
lower.tri(H4)
[,1] [,2] [,3] [,4]
[1,] FALSE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE
[3,] TRUE TRUE FALSE FALSE
[4,] TRUE TRUE TRUE FALSE
We can obtain the lower triangular matrix whose nonzero ele- ments match the lower triangular elements of H4 by using
Hnew <- H4
Hnew[upper.tri(H4, diag=TRUE)] <- 0
Hnew
h1 h2 h3 h4
[1,] 0.0000000 0.00 0.0000000 0
[2,] 0.5000000 0.00 0.0000000 0
[3,] 0.3333333 0.25 0.0000000 0
[4,] 0.2500000 0.20 0.1666667 0
Multiplication of a matrix by a scalar constant is the same as multiplication of a vector by a constant.
For example, using the X matrix from the previous section, we can multiply each element by 2 as in
Y <- 2 * X
Y
[,1] [,2]
[1,] 2 2
[2,] 4 8
[3,] 6 18
Elementwise addition of matrices also proceeds as for vectors. For example,
Y + X
[,1] [,2]
[1,] 3 3
[2,] 6 12
[3,] 9 27
When adding matrices, always ensure that the dimensions match properly. If they do not match correctly, an error message will appear, as in
t(Y) + X
The command X * Y performs elementwise multiplication. Note that this differs from the usual form of matrix multiplication that we will discuss below. For example,
X * Y
[,1] [,2]
[1,] 2 2
[2,] 8 32
[3,] 18 162
Again, in order for this kind of multiplication to work, the dimensions of the matrices must match.
If A and B are matrices, then the matrix product AB is the matrix representing the composition of the two operations: first apply B, then apply A to the result.
For matrix multiplication to be a properly defined operation, the matrices to be multiplied must conform.
That is, the number of columns of the first matrix must match the number of rows of the second matrix.
The resulting matrix AB will have its row dimension taken from A and its column dimension taken from B.
In R, this form of matrix multiplication can be performed using the operator %*%, for example
t(Y) %*% X
[,1] [,2]
[1,] 28 72
[2,] 72 196
We saw that t(Y) has 3 columns and X has 3 rows, so we can perform the multiplication \( Y^TX \).
The result is a \( 2\times2 \) matrix, since t(Y) has 2 rows and X has 2 columns.
If we failed to transpose Y, we would obtain an error, as in
Y %*% X
The crossprod() function is a somewhat more ecient way to calculate \( Y^TX \):
crossprod(Y,X)
[,1] [,2]
[1,] 28 72
[2,] 72 196
The first argument of crossprod() is transposed automatically.
t(Y) %*% X needs to make a new object t(Y) before performing the multiplication. If Y is a large matrix, this will consume a lot of memory and noticeable computation time.
The crossprod(Y, X) function call can access the elements of Y directly, since the \( (i, j) \) element of \( Y^T \) is simply the \( (j, i) \) element of Y
The inverse of a square \( n \times n \) matrix A, denoted by \( A^{-1} \), is the solution to the matrix equation \( AA^{-1} = I \), where \( I \) is the \( n \times n \) identity matrix.
We can view this as n separate systems of linear equations in \( n \) unknowns, whose solutions are the columns of \( A^{-1} \).
For example, with
[,1] [,2]
[1,] 3 -4
[2,] 1 2
The usual computational approach to finding A????1 involves solv- ing these two equations.
However, this is not always a good idea!
Often the reason we are trying to and \( A^{-1} \) is so that we can solve a system \( Ax = b \) with the solution \( x = A^{-1}b \).
It doesn't make sense from a computational point of view to solve n systems of linear equations in order to obtain a result which will be used as the solution to one system.
If we know how to solve systems, we should use that knowledge to solve \( Ax = b \) directly.
Furthermore, using A..1 may give a worse approximation to the final result than the direct approach, because there are so many more operations involved, giving opportunties for much more rounding error to creep into our results.
The general strategy for solving a system of equations \( Ax = b \) is to break down the problem into simpler ones.
This often involves rewriting the matrix A in a special form; one such form is called the LU decomposition.
In the LU decomposition, we write A as a product of two ma- trices L and U.
The matrix L is lower triangular with unit values on the diagonal, \[ L=\begin{pmatrix} 1 & 0 & \cdots & 0 \\ l_{21} & 1 & \cdots & \vdots \\ \vdots&\vdots & \ddots & 0 \\ l_{n1} & l_{n2} & \cdots & 1 \\ \end{pmatrix} \]
U is upper triangular, i.e.
\[ U=\begin{pmatrix} u_{11} & u_{12} & \cdots & u_{1n} \\ 0 & u_{22} & \cdots & u_{2n} \\ \vdots&\vdots & \ddots & 0 \\ 0 & \cdots & 0 & u_{nn} \\ \end{pmatrix} \]
Consider the matrix
\[ A=\begin{pmatrix} 2 & 4 & 3 \\ 6 & 16 & 10 \\ 4&12 & 9 \\ \end{pmatrix} \]
Write the entries of A as \( a_{ij} \).
Use the relation
\[ a_{ij}=\sum_{k=1}^3l_{ik}u_{kj} \]
Once we have L and U in hand, solving the system of equations \( Ax = b \) is easy.
We write the system as \( L(Ux) = b \), set \( y = Ux \) and solve \( Ly = b \) for \( y \) first.
Because \( L \) is lower triangular, this is straightforward using a procedure known as forward elimination.
Continuing the example above, with $b = [-1,-2,-7]T , and setting \( y = [y_1, y_2, y_3]^T \) , we make use of the relation
\[ b_i=\sum_{j=1}^{3}l_{ij}y_j \]
to calculate
Finally, we solve Ux = y. This time the fact that U is upper triangular means solving for the entries in reverse order is easy, using a procedure called back substitution:
By processing these steps successively, the problem of solving Ax = b has been reduced to solving 15 successive linear equa- tions, each with just one unknown.
The procedure is easily automated.
In fact, the default method used in R for solving linear equations is based on this technique; the only substantial dierence is that the ordering of the columns is rearranged before factoring so that rounding error is minimized.
In R, matrices are inverted and linear systems of equations are solved using the solve() or qr.solve() functions.
solve() uses a method based on the LU decomposition; qr.solve() is based on the QR decomposition that is described later.
As an example, we compute the inverse of the \( 4 \times 4 \) Hilbert matrix.
H4inv <- solve(H4)
H4inv
[,1] [,2] [,3] [,4]
h1 16 -120 240 -140
h2 -120 1200 -2700 1680
h3 240 -2700 6480 -4200
h4 -140 1680 -4200 2800
To verify that this is the inverse of H4, we can check that the product of H4inv and H4 is the \( 4 \times 4 \) identity.
\[ H4inv %*% H4 \]
The function solve(A, b) gives the solution to systems of equa- tions of the form \( Ax = b \).
For example, let us find x such that \( H_4x = b \) where \( H_4 \) is the \( 4 \times 4 \) Hilbert matrix and \( b = [1 2 3 2]_T \) .
b <- c(1, 2, 3, 2)
x <- solve(H4, b)
x
h1 h2 h3 h4
216 -2460 5880 -3780
Eigenvalues and eigenvectors can be computed using the func- tion eigen(). For example,
eigen(H4)
$values
[1] 1.5002142801 0.1691412202 0.0067382736 0.0000967023
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.7926083 0.5820757 0.1791863 -0.02919332
[2,] 0.4519231 -0.3705022 -0.7419178 0.32871206
[3,] 0.3224164 -0.5095786 0.1002281 -0.79141115
[4,] 0.2521612 -0.5140483 0.6382825 0.51455275
Let \( x_1 \) denote the first column of the $vectors output, i.e. [0.793 0.452 0.322 0.252]T . This is the first eigenvector, corresponding to the eigenvalue \( 1.50 \). Thus,
\[ H_4x_1 = 1.50x_1 \]
Denoting the second and third columns of $vectors by \( x_2 \) and \( x_3 \), we have
\[ H_4x_2 = 0.169x_2 \] \[ H_4x_3 = 0.00674x_3 \]
and
\[ H_4x_4 = .0000967x_4: \]
More topics about the decomposition and other linear algebra change would be mentioned in this chapter.
The operations of matrix would cause the computation intensity increasing exponentially. Therefore, we will always attempt to deal with the matrix with special format in order to reduce the computation intensity here.
The singular value decomposition of a square matrix A consists of three square matrices, \( U \), \( D \) and \( V \) . The matrix \( D \) is a diagonal matrix. The relation among these matrices is \[ A=UDV^T \]
The matrices \( U \) and \( V \) are said to be orthogonal, which means that \( U^{-1} = U^T \) and \( V^{-1} = V^T \).
The singular value decomposition of a matrix is often used to obtain accurate solutions to linear systems of equations.
The elements of D are called the singular values of A. Note that
\[ A^TA=(UDV^T)^TUDV^T=VD^TU^TUDV^T=VD^2V^{-1} \]
where \( D \) is a diagonal matrix with non-negative numbers. This is a similarity transformation which tells us that the squares of the singular values of \( A \) are the eigenvalues of \( A^TA \).
The singular value decomposition can be obtained using the function svd().
For example, the singular value decomposition of the \( 4 \times 4 \) Hilbert matrix \( H_4 \) is
H4.svd<-svd(H4)
H4.svd
$d
[1] 1.5002142801 0.1691412202 0.0067382736 0.0000967023
$u
[,1] [,2] [,3] [,4]
[1,] -0.7926083 0.5820757 -0.1791863 -0.02919332
[2,] -0.4519231 -0.3705022 0.7419178 0.32871206
[3,] -0.3224164 -0.5095786 -0.1002281 -0.79141115
[4,] -0.2521612 -0.5140483 -0.6382825 0.51455275
$v
[,1] [,2] [,3] [,4]
[1,] -0.7926083 0.5820757 -0.1791863 -0.02919332
[2,] -0.4519231 -0.3705022 0.7419178 0.32871206
[3,] -0.3224164 -0.5095786 -0.1002281 -0.79141115
[4,] -0.2521612 -0.5140483 -0.6382825 0.51455275
We can verify that these components can be multiplied in the appropriate way to reconstruct \( H4 \).
H4.svd$u %*% diag(H4.svd$d) %*% t(H4.svd$v)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
Because of the properties of the U, V and D matrices, the singular value decomposition provides a simple way to compute a matrix inverse. For example, \( H_4^{-1}=VD^{-1}U^T \) and can be recalculated as
H4.svd$v %*% diag(1/H4.svd$d) %*% t(H4.svd$u)
[,1] [,2] [,3] [,4]
[1,] 16 -120 240 -140
[2,] -120 1200 -2700 1680
[3,] 240 -2700 6480 -4200
[4,] -140 1680 -4200 2800
If a matrix A is positive definite, it possesses a square root. In fact, there are usually several matrices B such that \( B^2 = A \). The Choleski decomposition is similar, but the idea is to find an upper triangular matrix U such that \( U^TU = A \).
The function chol() accomplishes this task.
For example, we can compute the Choleski decomposition of the \( 3 \times 3 \) Hilbert matrix.
H4.chol <- chol(H4)
H4.chol
h1 h2 h3 h4
[1,] 1 0.5000000 0.3333333 0.25000000
[2,] 0 0.2886751 0.2886751 0.25980762
[3,] 0 0.0000000 0.0745356 0.11180340
[4,] 0 0.0000000 0.0000000 0.01889822
crossprod(H4.chol, H4.chol) # Multiplying U^T U
h1 h2 h3 h4
h1 1.0000000 0.5000000 0.3333333 0.2500000
h2 0.5000000 0.3333333 0.2500000 0.2000000
h3 0.3333333 0.2500000 0.2000000 0.1666667
h4 0.2500000 0.2000000 0.1666667 0.1428571
Once the Choleski decomposition of a matrix A = UTU has been obtained, we can compute the inverse of A using the fact that \( A^{-1}=U^{-1}U^{-T} \) (where \( U^{-T} \) is a short way to write the transpose of \( U^{-1} \)).
This computation is much more stable than direct calculation of \( A^{-1} \) by Gaussian elimination.
The function chol2inv() does this calculation.
For example, we can compute the inverse of \( H4 \) as
chol2inv(H4.chol)
[,1] [,2] [,3] [,4]
[1,] 16 -120 240 -140
[2,] -120 1200 -2700 1680
[3,] 240 -2700 6480 -4200
[4,] -140 1680 -4200 2800
Once the Choleski decomposition has been obtained, we can compute solutions to linear systems of the form
\[ Ax=b \]
If \( A = U^TU \), then we see that \( Ux = U^{-T}b \). Therefore, the solution x can be obtained in a two step procedure:
Solve \( U^Ty = b \) for y. The solution will satisfy \( y = U^{-T}b \).
Solve \( Ux = y \).
The first system is lower triangular, so forward elimination can be used to solve it.
The function forwardsolve() can be used for this.
The second system is upper triangular, so back substitution using function backsolve() can be used.
For the problem \( H_4x = b \), where \( b = [1 2 3 2]^T \) , we can proceed as follows:
b <- c(1, 2, 3, 2)
y <- forwardsolve(t(H4.chol), b)
backsolve(H4.chol, y)
[1] 216 -2460 5880 -3780
Lastly, we will learn an important decomposition in Regression Analysis.
This time, \[ A=QR \]
where \( Q \) is an orthogonal matrix, and \( R \) is an upper triangular matrix.
This decomposition can be applied even if \( A \) is not square.
Again, this decomposition can be used to obtain accurate solu- tions to linear systems of equations.
For example, suppose we want to solve
\[ Ax=b \]
for \( x \), given the \( n \times n \) matrix \( A \) and n-vector \( b \). If we compute the QR decomposition of \( A \) forst, we can write
\[ QRx = b \]
Multiplying through by \( Q^T \) on the left gives
\[ Rx=Q^Tb \]
This is an easier system to solve, because R is an upper trian- gular matrix.
To obtain the decomposition, we use the qr() function.
H4.qr <- qr(H4)
The functions qr.Q() and qr.R() can be applied to this object to obtain the explicit Q and R matrices.
For our example, we have
Q <- qr.Q(H4.qr)
Q
[,1] [,2] [,3] [,4]
[1,] -0.8381164 0.5226484 -0.1539728 -0.02630668
[2,] -0.4190582 -0.4417133 0.7277538 0.31568019
[3,] -0.2793721 -0.5288214 -0.1395055 -0.78920046
[4,] -0.2095291 -0.5020717 -0.6536092 0.52613364
R <- qr.R(H4.qr)
R
h1 h2 h3 h4
[1,] -1.193152 -0.6704931 -0.474932601 -0.3698354709
[2,] 0.000000 -0.1185333 -0.125655095 -0.1175419928
[3,] 0.000000 0.0000000 -0.006221774 -0.0095660929
[4,] 0.000000 0.0000000 0.000000000 0.0001879049
We can recover \( H^4 \) by multiplying \( Q \) by \( R \):
Q %*% R
h1 h2 h3 h4
[1,] 1.0000000 0.5000000 0.3333333 0.2500000
[2,] 0.5000000 0.3333333 0.2500000 0.2000000
[3,] 0.3333333 0.2500000 0.2000000 0.1666667
[4,] 0.2500000 0.2000000 0.1666667 0.1428571
The inverse of \( H4 \) can be obtained quickly from \( R^{-1}Q^T \) , since R is upper triangular.
qr.solve(R) %*% t(Q)
[,1] [,2] [,3] [,4]
h1 16 -120 240 -140
h2 -120 1200 -2700 1680
h3 240 -2700 6480 -4200
h4 -140 1680 -4200 2800