Show that \(A^TA \neq AA^T\) in general. (Proof and demonstration.)
Here we take a generalized 2x2 matrices with elements (a,b,c,d) and show how changing the order of multiplication with it’s transpose provides different results unless (a,b,c,d) meet specific criteria.
Here we transpose A.
\[ A= \left[ {\begin{array}{cc} a & b \\ c & d \\ \end{array} } \right] \]
\[ A^T= \left[ {\begin{array}{cc} a & c \\ b & d \\ \end{array} } \right] \]
Here we multiply \(AA^T\) and \(A^TA\)
\[ AA^T= \left[ {\begin{array}{cc} a^2+b^2 & ac+bd \\ ac+bd & c^2+d^2 \\ \end{array} } \right] \]
\[ A^TA= \left[ {\begin{array}{cc} a^2+c^2 & ab+cd \\ ab+cd & b^2+d^2 \\ \end{array} } \right] \]
When we set the matrices equal to each other element-wise we get
three equations that must be true for the two orders of multiplication
to be true. An obvious case is if b=c
. However not in all
cases will they be equal, so we have proved \(AA^T\neq A^TA\)
\(a^2+b^2=a^2+c^2\)
\(c^2+d^2=b^2+d^2\)
\(ac+bd=ab+cd\)
Since not for all values of (a,b,c,d) will the three equations above be true, we have proved \(AA^T\neq A^TA\)
We’re going to demonstrate this with a simple matrix.
Here we transpose B.
\[ B= \left[ {\begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} } \right] \]
\[ B^T= \left[ {\begin{array}{cc} 1 & 3 \\ 2 & 4 \\ \end{array} } \right] \]
Here we multiply \(BB^T\) and \(B^TB\)
\[ BB^T= \left[ {\begin{array}{cc} 1^2+2^2 & 1*3+2*4 \\ 1*3+2*4 & 3^2+4^2 \\ \end{array} } \right] \]
\[ B^TB= \left[ {\begin{array}{cc} 1^2+3^2 & 1*2+3*4 \\ 1*2+3*4 & 2^2+4^2 \\ \end{array} } \right] \]
Here we multiply \(BB^T\) and \(B^TB\)
\[ BB^T= \left[ {\begin{array}{cc} 5 & 11 \\ 11 & 25 \\ \end{array} } \right] \]
\[ B^TB= \left[ {\begin{array}{cc} 10 & 14 \\ 14 & 20 \\ \end{array} } \right] \]
Above you can see that \(BB^T\neq B^TB\)
Here we input the matrix B
into R and calculate both
\(BB^T\) and \(B^TB\) to show they equal what we
calculated by hand above.
B <- matrix(c(1,3,2,4),ncol=2)
B
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
B%*%t(B)
## [,1] [,2]
## [1,] 5 11
## [2,] 11 25
t(B)%*%B
## [,1] [,2]
## [1,] 10 14
## [2,] 14 20
For a special type of square matrix A, we get \(A^TA=AA^T\). Under what conditions could this be true?
We incidentally solved this in completing Part 1. If a 2x2 matrix satisfies the following criteria then in fact \(A^TA=AA^T\).
\(a^2+b^2=a^2+c^2\)
\(c^2+d^2=b^2+d^2\)
\(ac+bd=ab+cd\)
Any 2x2 matrix where b=c
is symmetric
and would satisfy the criteria. An example of this is the Identity
matrix.
\[ Symmetric\ Matrix= \left[ {\begin{array}{cc} a & b \\ b & d \\ \end{array} } \right] \]
When a matrix satisfies the property that it commutes with it’s
transpose then necessarily it is symmetric. I tried coming up with a 2x2
matrix that wasn’t symmetric but did commute with it’s transpose but the
answer is in the criteria; I don’t think you can satisfy all three
equations unless b=c
.
It looks like there are complex matrices that are normal when they commute with their transpose, and they might have additional properties, but another trail for another day. From the wikipedia article for ‘Normal matrix’, “a complex square matrix \(A\) is normal if it commutes with its conjugate transpose \(A*\)”.
Matrix factorization is a very important problem. There are supercomputers built just to do matrix factorizations. Every second you are on an airplane, matrices are being factorized. Radars that track flights use a technique called Kalman filtering. At the heart of Kalman Filtering is a Matrix Factorization operation. Kalman Filters are solving linear systems of equations when they track your flight using radars.
Write an R function to factorize a square matrix A (4x4 3x3 or 2x2) into LU. You don’t have to worry about permuting rows of A.
To verify our work we’re going to use the lu
function
from the pracma
library.
library("pracma")
Here are the matrices we’ve used so far to test our functions.
#A <- matrix(c(1,3, 2,4),ncol=2)
A <- matrix(c(2,4, 3,5),ncol=2)
B <- matrix(c(1,2,-2, 2,3,3, 3,1,-2),ncol=3)
#B <- matrix(c(1,-2,3, 4,8,4, -3,5,7),ncol=3)
C <- matrix(c(1,3,-1,-3, -2,-9,2,-6, -2,0,4,26, -3,-9,7,2),ncol=4)
Here is the function we built to do LU factorization on a 2x2 matrix.
The return code I found from typing lu
in the console to
read the underlying code for that function.
lu_2by2 <- function(A) {
L <- diag(2)
U <- A
L[2,1] <- A[2,1]/A[1,1]
U[2,1] <- U[2,1]-L[2,1]*U[1,1]
U[2,2] <- U[2,2]-L[2,1]*U[1,2]
return(list(L = L, U = U))
}
The function we wrote should return the same output as the
pracma
lu
function for any given 2x2
matrix.
lu_2by2(A)
## $L
## [,1] [,2]
## [1,] 1 0
## [2,] 2 1
##
## $U
## [,1] [,2]
## [1,] 2 3
## [2,] 0 -1
lu(A)
## $L
## [,1] [,2]
## [1,] 1 0
## [2,] 2 1
##
## $U
## [,1] [,2]
## [1,] 2 3
## [2,] 0 -1
Here is the function we built to do LU factorization on a 3x3 matrix. The line doing the row operation on U[3,1] for the second time is redundant since U[2,1] is already zero however I left it in to look for patterns.
lu_3by3 <- function(B) {
L <- diag(3)
U <- B
L[2,1] <- U[2,1]/U[1,1]
U[2,1] <- U[2,1]-L[2,1]*U[1,1]
U[2,2] <- U[2,2]-L[2,1]*U[1,2]
U[2,3] <- U[2,3]-L[2,1]*U[1,3]
L[3,1] <- U[3,1]/U[1,1]
U[3,1] <- U[3,1]-L[3,1]*U[1,1]
U[3,2] <- U[3,2]-L[3,1]*U[1,2]
U[3,3] <- U[3,3]-L[3,1]*U[1,3]
L[3,2] <- U[3,2]/U[2,2]
U[3,1] <- U[3,1]-L[3,2]*U[2,1]
U[3,2] <- U[3,2]-L[3,2]*U[2,2]
U[3,3] <- U[3,3]-L[3,2]*U[2,3]
return(list(L = L, U = U))
}
The function we wrote should return the same output as the
pracma
lu
function for any given 3x3
matrix.
lu_3by3(B)
## $L
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 2 1 0
## [3,] -2 -7 1
##
## $U
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 0 -1 -5
## [3,] 0 0 -31
lu(B)
## $L
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 2 1 0
## [3,] -2 -7 1
##
## $U
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 0 -1 -5
## [3,] 0 0 -31
Here is the function we built to do LU factorization on a 4x4 matrix. There are a lot of redundant row operations since progressively we have more zeros; however I was interested in perserving the pattern. Next steps would be to remove unnecessary lines of code, switch to a vectorized programming solution, and to generalize the pattern for all sizes of matrix.
lu_4by4 <- function(C) {
L <- diag(4)
U <- C
L[2,1] <- U[2,1]/U[1,1]
U[2,1] <- U[2,1]-L[2,1]*U[1,1]
U[2,2] <- U[2,2]-L[2,1]*U[1,2]
U[2,3] <- U[2,3]-L[2,1]*U[1,3]
U[2,4] <- U[2,4]-L[2,1]*U[1,4]
L[3,1] <- U[3,1]/U[1,1]
U[3,1] <- U[3,1]-L[3,1]*U[1,1]
U[3,2] <- U[3,2]-L[3,1]*U[1,2]
U[3,3] <- U[3,3]-L[3,1]*U[1,3]
U[3,4] <- U[3,4]-L[3,1]*U[1,4]
L[4,1] <- U[4,1]/U[1,1]
U[4,1] <- U[4,1]-L[4,1]*U[1,1]
U[4,2] <- U[4,2]-L[4,1]*U[1,2]
U[4,3] <- U[4,3]-L[4,1]*U[1,3]
U[4,4] <- U[4,4]-L[4,1]*U[1,4]
L[3,2] <- U[3,2]/U[2,2]
U[3,1] <- U[3,1]-L[3,2]*U[2,1]
U[3,2] <- U[3,2]-L[3,2]*U[2,2]
U[3,3] <- U[3,3]-L[3,2]*U[2,3]
U[3,4] <- U[3,4]-L[3,2]*U[2,4]
L[4,2] <- U[4,2]/U[2,2]
U[4,1] <- U[4,1]-L[4,2]*U[2,1]
U[4,2] <- U[4,2]-L[4,2]*U[2,2]
U[4,3] <- U[4,3]-L[4,2]*U[2,3]
U[4,4] <- U[4,4]-L[4,2]*U[2,4]
L[4,3] <- U[4,3]/U[3,3]
U[4,1] <- U[4,1]-L[4,3]*U[3,1]
U[4,2] <- U[4,2]-L[4,3]*U[3,2]
U[4,3] <- U[4,3]-L[4,3]*U[3,3]
U[4,4] <- U[4,4]-L[4,3]*U[3,4]
return(list(L = L, U = U))
}
The function we wrote should return the same output as the
pracma
lu
function for any given 4x4
matrix.
lu_4by4(C)
## $L
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 3 1 0 0
## [3,] -1 0 1 0
## [4,] -3 4 -2 1
##
## $U
## [,1] [,2] [,3] [,4]
## [1,] 1 -2 -2 -3
## [2,] 0 -3 6 0
## [3,] 0 0 2 4
## [4,] 0 0 0 1
lu(C)
## $L
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 3 1 0 0
## [3,] -1 0 1 0
## [4,] -3 4 -2 1
##
## $U
## [,1] [,2] [,3] [,4]
## [1,] 1 -2 -2 -3
## [2,] 0 -3 6 0
## [3,] 0 0 2 4
## [4,] 0 0 0 1