\(\def\a{\boldsymbol{a}}\) \(\def\b{\boldsymbol{b}}\) \(\def\c{\boldsymbol{c}}\) \(\def\f{\boldsymbol{f}}\) \(\def\g{\boldsymbol{g}}\) \(\def\h{\boldsymbol{h}}\) \(\def\j{\boldsymbol{j}}\) \(\def\u{\boldsymbol{u}}\) \(\def\x{\boldsymbol{x}}\) \(\def\y{\boldsymbol{y}}\) \(\def\A{\boldsymbol{\mathrm{A}}}\) \(\def\B{\boldsymbol{\mathrm{B}}}\) \(\def\C{\boldsymbol{\mathrm{C}}}\) \(\def\D{\boldsymbol{\mathrm{D}}}\) \(\def\E{\boldsymbol{\mathrm{E}}}\) \(\def\I{\boldsymbol{\mathrm{I}}}\) \(\def\J{\boldsymbol{\mathrm{J}}}\) \(\def\M{\boldsymbol{\mathrm{M}}}\) \(\def\O{\boldsymbol{\mathrm{O}}}\) \(\def\P{\boldsymbol{\mathrm{P}}}\) \(\def\Q{\boldsymbol{\mathrm{Q}}}\) \(\def\T{\boldsymbol{\mathrm{T}}}\) \(\def\U{\boldsymbol{\mathrm{U}}}\) \(\def\X{\boldsymbol{\mathrm{X}}}\) \(\def\zeros{\boldsymbol{0}}\) \(\def\diag{\mathrm{diag}}\) \(\def\rank{\mathrm{rank}}\) \(\def\trace{\mathrm{tr}}\) \(\def\tr{^\top}\) \(\def\ds{\displaystyle}\) \(\def\bea{\begin{eqnarray}}\) \(\def\nnn{\nonumber}\) \(\def\eea{\nnn\end{eqnarray}}\)
Definition 2.1.1 (p.5): A matrix is a rectangular array of elements.
For example, \(\A=\left(\begin{array}{ccc} 4 & 6 & -1 \\ 3 & 0 & 1 \end{array}\right)\) is a \(2\times 3\) matrix. The entry in the 1st row and 3rd column is \(a_{13}=-1\).
Definition 2.1.2 (p.6): A vector is a matrix with a single row or column.
For example, \(\x=\left(\begin{array}{c} 2 \\ -5 \\ 3.4 \\ 0\end{array}\right)\) is a 4-dimensional column vector; \(\x'=\left( 2,-5,3.4,0 \right)\) is a 4-dimensional row vector.
Definition 2.1.3 (p.6): Two matrices \(\A=\left(a_{ij}\right)\) and \(\B=\left(b_{ij}\right)\) are equal if both have the same size and \(a_{ij}=b_{ij}\) for all \(i\) and \(j\).
Definition 2.1.4 (p.7): The transpose of an \(m\times n\) matrix \(\A\) is the \(n\times m\) matrix \(\A'\) (or \(\A\tr\)) obtained by interchanging the rows and columns of \(\A\): \[ \A'=(a_{ij})'=(a_{ji})=\left(\begin{array}{ccc} a_{11} & \cdots & a_{n1} \\ \vdots & \ddots & \vdots \\ a_{1m} & \cdots & a_{nm} \end{array}\right) . \]
For example, the transpose of \(\A=\left(\begin{array}{ccc} 4 & 6 & -1 \\ 3 & 0 & 1 \end{array}\right)\) is \(\A'=\left(\begin{array}{cc} 4 & 3 \\ 6 & 0 \\ -1 & 1 \end{array}\right)\).
Theorem 2.1.1 (p.7): \((\A')'=\A\)
Definition 2.1.4 (p.7): A matrix \(\A\) is symmetric if \(\A=\A'\).
Definition 2.1.5 (p.7): The diagonal of an \(m\times n\) matrix \(\A=(a_{ij})\) are the elements \(a_{11}, \ldots, a_{\ell\ell}\) where \(\ell=\min\left\{m,n\right\}\).
For example, if \(\A=\left(\begin{array}{ccc} 4 & 6 & -1 \\ 3 & 0 & 1 \end{array}\right)\), then the diagonal of \(\A\) are the elements \(4\) and \(0\).
Definition 2.1.6 (p.7): A matrix \(\A=(a_{ij})\) is diagonal if \(a_{ij}=0\) for all \(i\) and \(j\) such that \(i\neq j\). We use \(\diag(a_{11},\ldots,a_{nn})\) or \(\diag(\A)\) to denote a square diagonal matrix with diagonal elements \(a_{11},\ldots,a_{nn}\).
For example, if \(\A=\left(\begin{array}{ccc} 4 & 6 & -1 \\ 3 & 0 & 1 \end{array}\right)\), then \(\diag(\A)=\diag(4,0)=\left(\begin{array}{cc} 4 & 0 \\ 0 & 0 \end{array}\right)\).
Definition 2.1.7 (p.8): A matrix \(\A=(a_{ij})\) is upper triangular if all elements below the diagonal are \(0\); that is, \(a_{ij}=0\) for all \(i>j\). Similarly, a matrix \(\A=(a_{ij})\) is lower triangular if \(a_{ij}=0\) for all \(i<j\).
Definition 2.1.8 (p.8): The identity matrix of order \(n\) is an \(n\times n\) diagonal matrix such that \(a_{11}=\cdots=a_{nn}=1\). It is denoted by \(\I\) or \(\I_n\).
Here is some other notation reserved for special matrices: \[ \j=\left(\begin{array}{c} 1 \\ \vdots \\ 1\end{array}\right), \J=\left(\begin{array}{ccc} 1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ 1 & \cdots & 1 \end{array}\right), \zeros=\left(\begin{array}{c} 0 \\ \vdots \\ 0\end{array}\right), \O=\left(\begin{array}{ccc} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{array}\right). \] In each case, subscripts can be used to indicate the size of the vector or matrix.
R Example 2.1.1: The matrix \(\A=\left(\begin{array}{ccc} 4 & 6 & -1 \\ 3 & 0 & 1 \end{array}\right)\) can be created in R using any of the following commands.
A=cbind(c(4,3),c(6,0),c(-1,1))
A=matrix(c(4,3,6,0,-1,1),2,3)
A=rbind(c(4,6,-1),c(3,0,1))
A=matrix(c(4,6,-1,3,0,1),2,3,byrow=TRUE)
A
## [,1] [,2] [,3]
## [1,] 4 6 -1
## [2,] 3 0 1
Technically in R, vectors are different from matrices as shown below.
x=c(2,-5,3.4,0)
x
## [1] 2.0 -5.0 3.4 0.0
is.vector(x)
## [1] TRUE
is.matrix(x)
## [1] FALSE
X=matrix(x)
X
## [,1]
## [1,] 2.0
## [2,] -5.0
## [3,] 3.4
## [4,] 0.0
is.vector(X)
## [1] FALSE
is.matrix(X)
## [1] TRUE
Now, we illustrate syntax in R for computing and working with some of the definitions. The transpose of A can be computed using the function t.
t(A)
## [,1] [,2]
## [1,] 4 3
## [2,] 6 0
## [3,] -1 1
Here is the diagonal of A obtain with the function diag.
diag(A)
## [1] 4 0
The function diag can also be used to create a square diagonal matrix with the diagonal elements specified by an input vector.
diag(c(1,2))
## [,1] [,2]
## [1,] 1 0
## [2,] 0 2
So combining these different uses of diag gives the correct output.
diag(diag(A))
## [,1] [,2]
## [1,] 4 0
## [2,] 0 0
The function rep is an easy way to create \(\j\).
j=rep(1,3);j
## [1] 1 1 1
The function diag also allows a convenient way to create \(\I\) because it is defined differently when the input is a vector of length 1.
I=diag(4);I
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
If we actually want to create a \(1\times 1\) matrix with entry \(4\), we can do so directly.
matrix(4,1,1)
## [,1]
## [1,] 4
The function matrix can also be used to create \(\O\).
O=matrix(0,2,3);O
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
The difference is defined by \(\D=\A-\B\) where \(\D=(d_{ij})=(a_{ij}-b_{ij})\). Here we say that \(\A\) and \(\B\) are conformal for addition.
+ operator.A=rbind(c(6,-1),c(0,4))
B=rbind(c(1,0),c(2,1))
A+B
## [,1] [,2]
## [1,] 7 -1
## [2,] 2 5
So, we see that \(\A+\B=\left(\begin{array}{cc} 7 & -1 \\ 2 & 5 \end{array}\right)\).
The scalar product can be computed using the * operator since it is an elementwise operation.
3*A
## [,1] [,2]
## [1,] 18 -3
## [2,] 0 12
So, we see that \(3\A=\left(\begin{array}{cc} 18 & -3 \\ 0 & 12 \end{array}\right)\).
A matrix product is computed in R with the %*% operator.
A%*%B
## [,1] [,2]
## [1,] 4 -1
## [2,] 8 4
So, we see that \(\A\B=\left(\begin{array}{cc} 4 & -1 \\ 8 & 4 \end{array}\right)\).
It is important to note that the * operator instead computes the elementwise product.
A*B
## [,1] [,2]
## [1,] 6 0
## [2,] 0 4
The elementwise product is often useful for products involving vectors. For instance, suppose \(\x=\left(\begin{array}{c} 1 \\ 2 \\ -2\end{array}\right)\). Here are two different ways to compute the square of the length of \(\x\).
x=c(1,2,-2)
t(x)%*%x
## [,1]
## [1,] 9
x*x
## [1] 1 4 4
sum(x*x)
## [1] 9
So, we see that \(\x'\x=\ds{\sum_{i=1}^3 x_i^2}=9\). Note that, in R, the first expression is treated as a matrix while the second is treated as a vector.
sqrt(sum(x^2))
## [1] 3
So, the length of \(\x\) is \(\sqrt{9}=3\).
Using these types of operations, we can obtain a convenient “row equivalent” matrix. \[ \rank(\A)=\rank(\T_1\A), \ \ \ \ \ \T_1\A= \left(\begin{array}{ccc}0 & 1 & 0\\ 1 & 0 & 0 \\ 0 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}0 & 3 & -6\\ 1 & -1 & 3 \\ 3 & -1 & 5\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 3 & -6 \\ 3 & -1 & 5\end{array}\right) \] \[ \rank(\T_1\A)=\rank(\T_2\T_1\A), \ \ \ \ \ \T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & 1 & 0 \\ -3 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 3 & -6 \\ 3 & -1 & 5\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 3 & -6 \\ 0 & 2 & -4\end{array}\right) \] \[ \rank(\T_2\T_1\A)=\rank(\T_3\T_2\T_1\A), \ \T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & \frac{1}{3} & 0 \\ 0 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 3 & -6 \\ 0 & 2 & -4\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 2 & -4\end{array}\right) \] \[ \rank(\T_3\T_2\T_1\A)=\rank(\T_4\T_3\T_2\T_1\A), \ \T_4\T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & \frac{1}{2}\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 2 & -4\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 1 & -2\end{array}\right) \] \[ \rank(\T_4\T_3\T_2\T_1\A)=\rank(\T_5\T_4\T_3\T_2\T_1\A), \ \T_5\T_4\T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & -1 & 1\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 1 & -2\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 0 & 0\end{array}\right) \] \[ \rank(\T_5\T_4\T_3\T_2\T_1\A)=\rank(\T_6\T_5\T_4\T_3\T_2\T_1\A), \ \T_6\T_5\T_4\T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & -1 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & -2 \\ 0 & 0 & 0\end{array}\right) = \left(\begin{array}{ccc}1 & 0 & 1\\ 0 & 1 & -2 \\ 0 & 0 & 0\end{array}\right) \] Since \(\T_6\T_5\T_4\T_3\T_2\T_1\A\) has two rows with leading 1’s, \(\rank(\A)=\rank(\T_6\T_5\T_4\T_3\T_2\T_1\A)=2\).
library(Matrix)
A_2_4_1=matrix(c(0, 3, -6, 1, -1, 3, 3, -1, 5),3,3,byrow=TRUE)
A_2_4_1
## [,1] [,2] [,3]
## [1,] 0 3 -6
## [2,] 1 -1 3
## [3,] 3 -1 5
rankMatrix(A_2_4_1)[1]
## [1] 2
Definition 2.5.1 (p.21): A square matrix \(\A\) is nonsingular (or invertible) if there exists a square matrix \(\A^{-1}\) such that \(\A^{-1}\A=\A\A^{-1}=\I\). The matrix \(\A^{-1}\) is called the inverse of \(\A\). If an inverse for \(\A\) does not exist, then \(\A\) is singular.
Theorem 2.5.1 (p.23): Let \(\A\) be a square matrix. Then \(\A\) is nonsingular if and only \(\A\) is full rank.
Example 2.5.1: Find the inverse of \(\A=\left(\begin{array}{cc}2 & 5\\1 & 4\end{array}\right)\), if it exists.
Answer: A systematic way to do this for general square matrices is to use row operations put the augmented matrix \(\left(\A \ \Big| \ \I\right)\) in “reduced row echelon form”. \[ \left(\A \ \Big| \ \I\right)=\left(\begin{array}{cc|cc} 2 & 5 & 1 & 0\\ 1 & 4 & 0 & 1\end{array}\right) \] \[ \T_1=\left(\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right), \ \ \ \ \T_1\left(\A\ \Big| \ \I\right)=\left(\begin{array}{cc|cc} 1 & 4 & 0 & 1 \\ 2 & 5 & 1 & 0\end{array}\right) \] \[ \T_2=\left(\begin{array}{cc} 1 & 0 \\ -2 & 1 \end{array}\right), \ \ \ \ \T_2\T_1\left(\A \ \Big| \ \I\right)=\left(\begin{array}{cc|cc} 1 & 4 & 0 & 1 \\ 0 & -3 & 1 & -2\end{array}\right) \] \[ \T_3=\left(\begin{array}{cc} 1 & 0 \\ 0 & -\frac{1}{3} \end{array}\right), \ \ \ \ \T_3\T_2\T_1\left(\A \ \Big| \ \I\right)=\left(\begin{array}{cc|cc} 1 & 4 & 0 & 1 \\ 0 & 1 & -\frac{1}{3} & \frac{2}{3}\end{array}\right) \] \[ \T_4=\left(\begin{array}{cc} 1 & -4 \\ 0 & 1 \end{array}\right), \ \ \ \ \T_4\T_3\T_2\T_1\left(\A \ \Big| \ \I\right)=\left(\begin{array}{cc|cc} 1 & 0 & \frac{4}{3} & -\frac{5}{3} \\ 0 & 1 & -\frac{1}{3} & \frac{2}{3}\end{array}\right) \] Since \(\T_4\T_3\T_2\T_1\left(\A \ \Big| \ \I\right)= \left(\T_4\T_3\T_2\T_1\A \ \Big| \ \T_4\T_3\T_2\T_1\right)= \left(\I \ \Big| \ \T_4\T_3\T_2\T_1\right)\), the left side of the augmented matrix shows that \(\A^{-1}=\T_4\T_3\T_2\T_1\) and the right side of the augmented matrix computes \(\A^{-1}\).
So, we have the result \(\A^{-1}=\ds{\left(\begin{array}{cc} \frac{4}{3} & -\frac{5}{3} \\ -\frac{1}{3} & \frac{2}{3}\end{array}\right)}\).
For \(2\times 2\) matrices, there is a shortcut formula \(\left(\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22}\end{array}\right)=\frac{1}{a_{11}a_{22}-a_{12}a_{21}}\left(\begin{array}{cc} a_{22} & -a_{12} \\ -a_{21} & a_{11}\end{array}\right)\). So, we see that \(\A^{-1}=\frac{1}{3}\left(\begin{array}{cc} 4 & -5 \\ -1 & 2\end{array}\right)\).
solve computes the inverse of a matrix.A=cbind(c(2,1),c(5,4))
solve(A)
## [,1] [,2]
## [1,] 1.3333333 -1.6666667
## [2,] -0.3333333 0.6666667
chol in R to obtain the Cholesky decomposition for the positive definite symmetric matrix \(\A=\left(\begin{array}{ccc} 5 & 0 & 1 \\ 0 & 3 & 0 \\ 1 & 0 & 1\end{array}\right)\).A=cbind(c(5,0,1),c(0,3,0),c(1,0,1))
T=chol(A); T
## [,1] [,2] [,3]
## [1,] 2.236068 0.000000 0.4472136
## [2,] 0.000000 1.732051 0.0000000
## [3,] 0.000000 0.000000 0.8944272
So let \(\T=\left(\begin{array}{ccc} \sqrt{5} & 0 & \frac{1}{\sqrt{5}} \\ 0 & \sqrt{3} & 0 \\ 0 & 0 & \frac{2}{\sqrt{5}}\end{array}\right)\). The following command checks that \(\T'\T=\A\).
t(T)%*%T
## [,1] [,2] [,3]
## [1,] 5 0 1
## [2,] 0 3 0
## [3,] 1 0 1
plot(-5:5,-5:5,type="n",xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]),family="serif",cex.lab=1.5)
abline(h=-2,col="red") #this option draws a horizontal line at x2=-2
abline(a=-3,b=1,col="blue") #a is the y-intercept and b is the slope of the line x2=x1-3
x1=c(-5,5); x2=-5+3*x1; points(x1,x2,type="l",col="#FFA200") #this shows how to plot the line directly with two points
Definition 2.8.1 (p.33): A generalized inverse of an \(n\times p\) matrix \(\A\) is any matrix \(\A^-\) which satisfies \(\A\A^-\A=\A\).
Theorem 2.8.1 (p.34): Suppose \(\A=\left(\begin{array}{cc}\A_{11} & \A_{12} \\ \A_{21} & \A_{22}\end{array}\right)\) is an \(n\times p\) matrix of rank \(r\) where \(\A_{11}\) is an \(r\times r\) matrix of rank \(r\). Then \[ \A^-=\left(\begin{array}{cc} \A_{11}^{-1} & \O \\ \O & \O\end{array}\right) \] is a generalized inverse of \(\A\).
Every matrix has a generalized inverse.
If a square matrix is invertible, then its inverse is the unique generalized inverse.
Theorem 2.8.2 (p.37): The system of equations \(\A\x=\b\) is consistent if and only if \(\A\A^-\b=\b\) for every generalized inverse \(\A^-\) of \(\A\).
Example 2.8.1: Find a generalized inverse of \(\A=\left(\begin{array}{ccc} 3 & 1 & 4\\ 5 & 2 & 7 \\ 1 & 0 & 1\end{array}\right)\).
Answer: It is easy to see that \(\rank(\A)=2\) since the third column is the sum of the first two columns, and the second column is not a multiple of the first. So, since \(\left(\begin{array}{cc} 3 & 1 \\ 5 & 2\end{array}\right)^{-1}=\left(\begin{array}{cc} 2 & -1 \\ -5 & 3\end{array}\right)\), \(\A^{-}=\left(\begin{array}{ccc} 2 & -1 & 0\\ -5 & 3 & 0 \\ 0 & 0 & 0\end{array}\right)\) is a generalized inverse of \(\A\).
There are many other generalized inverses of \(\A\) such as \(\A_2^{-}=\left(\begin{array}{ccc} -1 & 0 & 4\\ 0 & 0 & 0 \\ 1 & 0 & -3\end{array}\right)\) and \(\A_3^{-}=\left(\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0.5 & -3.5 \\ 0 & 0 & 1\end{array}\right)\).
The determinant of an \(n\times n\) matrix \(\A=(a_{ij})\) is \(\ds{\det(\A)=\sum_{j=1}^n (-1)^{i+j} a_{ij}\det(\M_{ij})}\) where \(i\) is any integer between \(1\) and \(n\) and \(M_{ij}\) is the submatrix formed by eliminating the \(i\)th row and \(j\)th column of \(\A\).
The notation \(\left| \A \right|\) is sometimes also used to represent the determinant of \(\A\).
\[\bea \nnn \left|\begin{array}{ccc} 2 & 4 & 6\\ 2 & -1 & 4 \\ 0 & 1 & 5\end{array}\right|&=& 0 \left|\begin{array}{cc} 4 & 6\\ -1 & 4\end{array}\right| -1 \left|\begin{array}{cc} 2 & 6\\ 2 & 4\end{array}\right| +5 \left|\begin{array}{cc} 2 & 4\\ 2 & -1\end{array}\right| \\ \nnn &=& 0 - 1(8-12) + 5(-2-8) \\ \nnn &=& 0+4-50 = -46. \eea\]
A different but equivalent definition is described on p.37 based on permutations of \(\left\{1,\ldots,n\right\}\).
Theorem 2.9.1: If \(\T=(t_{ij})\) is an \(n\times n\) triangular matrix, then \(\ds{\det(\T)=\prod_{i=1}^n t_{ii}}\).
Theorem 2.9.2 (p.38): Suppose \(\A\) is a square matrix.
Theorem 2.9.3 (p.40): If \(\A\) and \(\B\) are square matrices with the same size, then \(\det(\A\B)=\det(\A)\det(\B)=\det(\B)\det(\A)=\det(\B\A)\).
This gives us another way to find determinants using row (or column) operations. For example, in Example 2.9.1, consider the following products. \[ \T_1\A= \left(\begin{array}{ccc}\frac{1}{2} & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}2 & 4 & 6\\ 2 & -1 & 4 \\ 0 & 1 & 5\end{array}\right) = \left(\begin{array}{ccc}1 & 2 & 3\\ 2 & -1 & 4 \\ 0 & 1 & 5\end{array}\right) \] \[ \T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ -2 & 1 & 0 \\ 0 & 0 & 1\end{array}\right) \left(\begin{array}{ccc}1 & 2 & 3\\ 2 & -1 & 4 \\ 0 & 1 & 5\end{array}\right) = \left(\begin{array}{ccc}1 & 2 & 3\\ 0 & -5 & -2 \\ 0 & 1 & 5\end{array}\right) \] \[ \T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & 0 & 1 \\ 0 & 1 & 0\end{array}\right) \left(\begin{array}{ccc}1 & 2 & 3\\ 0 & -5 & -2 \\ 0 & 1 & 5\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & 5 \\ 0 & -5 & -2\end{array}\right) \] \[ \T_4\T_3\T_2\T_1\A= \left(\begin{array}{ccc}1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 5 & 1\end{array}\right) \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & 5 \\ 0 & -5 & -2\end{array}\right) = \left(\begin{array}{ccc}1 & -1 & 3\\ 0 & 1 & 5 \\ 0 & 0 & 23\end{array}\right) \] Then we see that
\[\bea \nnn \det(\T_4\T_3\T_2\T_1\A)&=&\det(\T_4)\det(\T_3)\det(\T_2)\det(\T_1)\det(\A) \\ \nnn 23 &=& (1)(-1)(1)\left(\frac{1}{2}\right)\det(\A) \\ \nnn \det(\A) &=& 23(-2)=-46. \eea\]
det.A=rbind(c(2,4,6),c(2,-1,4),c(0,1,5))
det(A)
## [1] -46
Definition 2.12.1 (p.46): An eigenvector of square matrix \(\A\) is a non-zero vector \(\x\) such that \(\A\x=\lambda\x\) for some scalar \(\lambda\). The scalar \(\lambda\) is called an eigenvalue of \(\A\).
Definition 2.12.2 (p.47): The characteristic equation for \(\A\) is \(\det(\A-\lambda\I)=0\).
To find eigenvalue-eigenvector pairs of an \(n\times n\) matrix \(\A\), we find all eigenvalues such that the characteristic equation (which is an \(n\)th degree polynomial function of \(\lambda\)). For each eigenvalue \(\lambda\), we then find the nonzero vectors \(\x\) (up to a constant multiple) such that \((\A-\lambda\I)\x=\zeros\).
Example 2.12.1: Find all eigenvalue-eigenvector pairs of \(\A=\left(\begin{array}{cc} 5 & 2\\2 & 2 \end{array}\right)\).
Answer: Since \(\ds{\det(\A-\lambda\I)=\left|\begin{array}{cc}5-\lambda & 2\\2 & 2-\lambda\end{array}\right|=\lambda^2-7\lambda+6}\), the characteristic equation \(\lambda^2-7\lambda+6=0\) yields two eigenvalues \(\lambda_1=6\) and \(\lambda_2=1\). Solving \((\A-\lambda_1\I)\x=\zeros\), we obtain the solution \(\x=c\left(\begin{array}{c} 2\\ 1\end{array}\right)\) so \(\lambda=6\) and any multiple of \(\left(\begin{array}{c} 2\\ 1\end{array}\right)\) is an eigenvalue-eigenvector pair. Solving \((\A-\lambda_2\I)\x=\zeros\), we obtain the solution \(\x=c\left(\begin{array}{c} 1\\ -2\end{array}\right)\) so \(\lambda=1\) and any multiple of \(\left(\begin{array}{c} 1\\ -2\end{array}\right)\) is an eigenvalue-eigenvector pair.
Theorem 2.12.1 (p.51): If \(\A\) is a square matrix and \(\P\) is a nonsingular \(n\times n\) matrix, then \(\P^{-1}\A\P\) has the same eigenvalues as \(\A\).
Theorem 2.12.2 (p.51): Let \(\A\) be an \(n\times n\) symmetric matrix with eigenvalues \(\lambda_1,\lambda_2,\ldots\lambda_n\). Let \(\x_i\) be an eigenvector corresponding to the eigenvalue \(\lambda_i\) for \(i=1,\ldots,n\).
Theorem 2.12.3 (p.51): Suppose \(\A\) is an \(n\times n\) symmetric matrix with eigenvalues \(\lambda_1,\lambda_2,\ldots\lambda_n\). Let \(\u_i\) be an eigenvector corresponding to the eigenvalue \(\lambda_i\) for \(i=1,\ldots,n\) such that \(\u_i'\u_j=\left\{\begin{array}{cl} 1&\mbox{if } i=j\\0&\mbox{if } i\neq j\end{array}\right.\). Then \[ \A=\U\D\U\tr=\sum_{i=1}^n \lambda_i\u_i\u_i' \] where \(\D=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\) and \(\U\) is a matrix with columns \(\u_1,\u_2,\ldots,\u_n\).
The expression for \(\A\) in Theorem 2.12.3 is called the spectral decomposition of \(\A\).
Theorem 2.12.4 (p.52): Suppose \(\A\) is an \(n\times n\) matrix with eigenvalues \(\lambda_1,\lambda_2,\ldots\lambda_n\). Then the following results hold.
Definition 2.12.3 (p.53): Let \(\A=\U\D\U\tr\) be a positive definite matrix where \(\D=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\) and \(\U\) is an orthogonal matrix. Then its square root matrix is \(\A^{1/2}=\U\D^{1/2}\U\tr\) where \(\D^{1/2}=\diag(\sqrt{\lambda_1},\sqrt{\lambda_2},\ldots,\sqrt{\lambda_n})\).
R Example 2.12.1: Let’s see how to obtain a spectral decomposition in R using the built-in eigen function.
A=rbind(c(5,2),c(2,2))
eigen.A=eigen(A)
lambda=eigen.A$values;lambda
## [1] 6 1
U=eigen.A$vector;U
## [,1] [,2]
## [1,] -0.8944272 0.4472136
## [2,] -0.4472136 -0.8944272
The eigenvalues are stored in the vector lambda and the respective eigenvectors are stored in the columns of U. The following command verifies that the spectral decomposition equals A.
U%*%diag(lambda)%*%t(U)
## [,1] [,2]
## [1,] 5 2
## [2,] 2 2
Definition 2.14.1 (p.57): Let \(f(\X)\) be a real-valued function of the elements of an \(m\times n\) dimensional matrix \[ \X=(x_{ij})=\left(\begin{array}{ccc} x_{11} & \cdots & x_{1n}\\ \vdots & \ddots & \vdots\\ x_{m1} & \cdots & x_{mn} \end{array} \right) . \] The derivative of \(f\) with respect to \(\X\) is defined as \[ \frac{\partial f}{\partial \X}= \left(\begin{array}{ccc} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{array}\right). \]
This includes the case of differentiating by a column vector or by a row vector.
Theorem 2.14.1 (p.56): If \(\x\) is an \(n\)-dimensional vector and \(\c\) is an \(n\)-dimensional vector of constants, then \(\ds{\frac{\partial[\c'\x]}{\partial\x}=\frac{\partial[\x'\c]}{\partial\x}=\c}\).
Theorem 2.14.2 (p.56): If \(\x\) is an \(n\)-dimensional vector and \(\C\) is an \(n \times n\) matrix of constants, then \(\ds{\frac{\partial[\x'\C\x]}{\partial\x}=(\C+\C')\x}\). If \(\C\) is also symmetric, then \(\ds{\frac{\partial[\x'\C\x]}{\partial\x}=2\C\x}\).
Definition 2.14.2 (p.58): Let \[ \A(x)=(a_{ij}(x))=\left( \begin{array}{ccc} a_{11}(x) & \cdots & a_{1n}(x) \\ \vdots & \ddots & \vdots \\ a_{m1}(x) & \ldots & a_{mn}(x) \end{array} \right) \] be an \(m\times n\) matrix with elements which are functions of a scalar \(x\). Then the derivative of \(\A\) with respect to \(x\) is \[ \frac{d \A(x)}{dx}=\left(\frac{d a_{ij}}{dx}\right)=\left( \begin{array}{ccc} \frac{da_{11}}{dx} & \cdots & \frac{da_{1n}}{dx} \\ \vdots & \ddots & \vdots \\ \frac{da_{m1}}{dx} & \ldots & \frac{da_{mn}}{dx} \end{array} \right). \]
Definition 2.14.3 (p.60): Let \(\x\) be an \(n\)-dimensional column vector and let \(\h(\x)=\left(\begin{array}{ccc} h_1(\x),\ldots,h_m(\x)\end{array} \right)\) be a row vector-valued function. Then the derivative of \(\h\) with respect to \(\x\) is \[ \frac{\partial}{\partial \x}h(\x)=\left(\begin{array}{c} \frac{\partial}{\partial x_1} \\ \vdots \\ \frac{\partial}{\partial x_n}\end{array}\right)\left(\begin{array}{ccc} h_1(\x),\ldots,h_m(\x)\end{array}\right)= \left( \begin{array}{ccc} \frac{\partial h_1}{\partial x_1} & \cdots & \frac{\partial h_m}{\partial x_1} \\ \vdots & \ddots & \vdots \\ \frac{\partial h_1}{\partial x_n} & \ldots & \frac{\partial h_m}{\partial x_n} \end{array} \right) . \]
Theorem 2.14.3: If \(\x\) is an \(n\)-dimensional vector and \(\g(\x)\) and \(\h(\x)\) are \(n\)-dimensional vector-valued functions of \(\x\), then
Rencher, Alvin C, and G Bruce Schaalje. 2008. Linear Models in Statistics. John Wiley & Sons.