CUNY 605 Fundamentals of Computational Mathematics 2017
(1) Show that \(A^T A\neq AA^T\) in general. (Proof and demonstration.)
Please typeset your response using LaTeX mode in RStudio. If you do it in paper, please either scan or take a picture of the work and submit it. Please ensure that your image is legible and that your submissions are named using your first initial, last name, assignment and problem set within the assignment.
Suppose A is a m x n matrix, where m is the number of rows and n is the number of columns. AT is the transposition of A where its number of rows are m and its numbers of columns are n.
\(A = \left[\begin{array}{rrrr} a_{11} & a_{12} & a_{...} & a_{1n} \\ a_{21} & a_{22} & a_{...} & a_{2n} \\ ... & ... & ... & ... \\ a_{m1} & a_{m2} & a_{...} & a_{mn} \\ \end{array} \right]\) \(, A^T = \left[\begin{array}{rrrr} a_{11} & a_{21} & a_{...} & a_{n1} \\ a_{12} & a_{22} & a_{...} & a_{n2} \\ ... & ... & ... & ... \\ a_{1m} & a_{2m} & a_{...} & a_{nm} \\ \end{array} \right]\)
In order to provide a proof, we must maintain these properties regarding transpositions.
1. \([(A)^t]_{ji} = [A]_{ij}\)
This is the definition of a transpose of a matrix.
2. \((A + B)t = At + Bt\)
Theorem TMA: Transpose and Matrix Addition. Suppose that A and B are m × n matrices. Then \((A + B)t = At + Bt\).
3. \((\alpha A)t = (\alpha)At\)
Theorem TMSM: Transpose and Matrix Scalar Multiplication. Suppose that \(\alpha \in \mathbb{C}\) and A is an m × n matrix. Then \((\alpha A)t = (\alpha)At\).
4. \((AB)^t = B^tA^t\)
Theorem EMP: Entries of Matrix Products. Suppose A is an m x n matrix and B is an n x p matrix. Then for $1 <= i <= m, 1 <= j <= p, the individual entries of AB are given by \([AB]_{ij} = [A]_{i1}[B]_{1j} + [A]_{i2}[B]_{2j} + [A]_{i3}[B]_{3j} + ... + [A]_{in}[B]_{nj}) = \sum_{k=1}^n [A]_{ik}[B]_{kj}\).
Prior to demonstrating a proof, let’s clarify a definition first:
Definition SYM: Symmetric Matrix. The matrix A is symmetric if \(A = A^t\).
We will start by evaluating nonsymmetric matrices to see if the above statement in the question stem is true.
Let us assume that \(A^TA = AA^T\). Let’s evaluate the left side of the equation first.
\(A^TA = \left[\begin{array}{rrrr} a_{11} & a_{21} & a_{...} & a_{n1} \\ a_{12} & a_{22} & a_{...} & a_{n2} \\ ... & ... & ... & ... \\ a_{1m} & a_{2m} & a_{...} & a_{nm} \\ \end{array} \right]\) \(* \left[\begin{array}{rrrr} a_{11} & a_{12} & a_{...} & a_{1n} \\ a_{21} & a_{22} & a_{...} & a_{2n} \\ ... & ... & ... & ... \\ a_{m1} & a_{m2} & a_{...} & a_{mn} \\ \end{array} \right]\)
\(= \left[\begin{array}{rrrr} a_{11}a_{11} + a_{21}a_{21} + ... + a_{n1}a_{m1} & a_{11}a_{12} + a_{21}a_{22} + ... + a_{n1}a_{m2} & ... & a_{11}a_{1n} + a_{21}a_{2n} + ... + a_{n1}a_{mn} \\ a_{12}a_{11} + a_{22}a_{21} + ... + a_{n2}a_{m1} & a_{12}a_{12} + a_{22}a_{22} + ... + a_{n2}a_{m2} & ... & a_{12}a_{1n} + a_{22}a_{2n} + ... + a_{n2}a_{mn} \\ ... & ... & ... & ... \\ a_{1m}a_{11} + a_{2m}a_{21} + ... + a_{nm}a_{m1} & a_{1m}a_{12} + a_{2m}a_{22} + ... + a_{nm}a_{m2} & ... & a_{1m}a_{1n} + a_{2m}a_{2n} + ... + a_{nm}a_{mn} \\ \end{array} \right]\)
Now, let’s evaluate the right side of the equation \(AA^T\).
\(AA^T = \left[\begin{array}{rrrr} a_{11} & a_{12} & a_{...} & a_{1n} \\ a_{21} & a_{22} & a_{...} & a_{2n} \\ ... & ... & ... & ... \\ a_{m1} & a_{m2} & a_{...} & a_{mn} \\ \end{array} \right]\) \(* \left[\begin{array}{rrrr} a_{11} & a_{21} & a_{...} & a_{n1} \\ a_{12} & a_{22} & a_{...} & a_{n2} \\ ... & ... & ... & ... \\ a_{1m} & a_{2m} & a_{...} & a_{nm} \\ \end{array} \right]\)
\(= \left[\begin{array}{rrrr} a_{11}a_{11} + a_{12}a_{12} + ... + a_{1n}a_{1m} & a_{11}a_{21} + a_{12}a_{22} + ... + a_{1n}a_{2m} & ... & a_{11}a_{n1} + a_{12}a_{n2} + ... + a_{1n}a_{nm} \\ a_{21}a_{11} + a_{22}a_{12} + ... + a_{2n}a_{1m} & a_{21}a_{21} + a_{22}a_{22} + ... + a_{2n}a_{2m} & ... & a_{21}a_{n1} + a_{22}a_{n2} + ... + a_{2n}a_{nm} \\ ... & ... & ... & ... \\ a_{m1}a_{11} + a_{m2}a_{12} + ... + a_{mn}a_{1m} & a_{m1}a_{21} + a_{m2}a_{22} + ... + a_{mn}a_{2m} & ... & a_{m1}a_{n1} + a_{m2}a_{n2} + ... + a_{mn}a_{nm} \\ \end{array} \right]\)
Now given the assumption that the left side of the equation equals to the right side of the equation…
\(\left[\begin{array}{rrrr} a_{11}a_{11} + a_{21}a_{21} + ... + a_{n1}a_{m1} & a_{11}a_{12} + a_{21}a_{22} + ... + a_{n1}a_{m2} & ... & a_{11}a_{1n} + a_{21}a_{2n} + ... + a_{n1}a_{mn} \\ a_{12}a_{11} + a_{22}a_{21} + ... + a_{n2}a_{m1} & a_{12}a_{12} + a_{22}a_{22} + ... + a_{n2}a_{m2} & ... & a_{12}a_{1n} + a_{22}a_{2n} + ... + a_{n2}a_{mn} \\ ... & ... & ... & ... \\ a_{1m}a_{11} + a_{2m}a_{21} + ... + a_{nm}a_{m1} & a_{1m}a_{12} + a_{2m}a_{22} + ... + a_{nm}a_{m2} & ... & a_{1m}a_{1n} + a_{2m}a_{2n} + ... + a_{nm}a_{mn} \\ \end{array} \right]\)
\(= \left[\begin{array}{rrrr} a_{11}a_{11} + a_{12}a_{12} + ... + a_{1n}a_{1m} & a_{11}a_{21} + a_{12}a_{22} + ... + a_{1n}a_{2m} & ... & a_{11}a_{n1} + a_{12}a_{n2} + ... + a_{1n}a_{nm} \\ a_{21}a_{11} + a_{22}a_{12} + ... + a_{2n}a_{1m} & a_{21}a_{21} + a_{22}a_{22} + ... + a_{2n}a_{2m} & ... & a_{21}a_{n1} + a_{22}a_{n2} + ... + a_{2n}a_{nm} \\ ... & ... & ... & ... \\ a_{m1}a_{11} + a_{m2}a_{12} + ... + a_{mn}a_{1m} & a_{m1}a_{21} + a_{m2}a_{22} + ... + a_{mn}a_{2m} & ... & a_{m1}a_{n1} + a_{m2}a_{n2} + ... + a_{mn}a_{nm} \\ \end{array} \right]\)
If we take the upper top left \(A^TA_{1,1} = AA^T_{1,1}\)…
\(a_{11}a_{11} + a_{21}a_{21} + a_{31}a_{31} + ... + a_{n1}a_{m1} = a_{11}a_{11} + a_{12}a_{12} + a_{13}a_{13} + ... + a_{1n}a_{1m}\)
Let’s substract \(a_{11}a_{11}\) from each side.
\(a_{21}a_{21} + a_{31}a_{31} + ... + a_{n1}a_{m1} = a_{12}a_{12} + a_{13}a_{13} + ... + a_{1n}a_{1m}\)
Again, we had specified prior to the start of our proof that A was not a symmetric matrix. Therefore, value at each \(a_{ij}\) does not necessarily equal \(a_{ji}\) for 1 <= i <= m, 1 <= j <= n. Therefore, given the matrix’s asymmetry, this particular example of \(a_{21}a_{21} + a_{31}a_{31} + ... + a_{n1}a_{m1} = a_{12}a_{12} + a_{13}a_{13} + ... + a_{1n}a_{1m}\) cannot necessarily be true at all times. As a result, this is a contradiction, and \(A^T A = AA^T\) is not true for asymmetric matrices.
Hence: \(A^T A\neq AA^T\).
# Example
A <- matrix(seq(from = 1, to = 9), ncol = 3, byrow = TRUE)
A.transpose <- t(A)
paste0("A.transpose %*% A: ")
## [1] "A.transpose %*% A: "
Ex1 <- A.transpose %*% A
Ex1
## [,1] [,2] [,3]
## [1,] 66 78 90
## [2,] 78 93 108
## [3,] 90 108 126
paste0("A %*% A.transpose: ")
## [1] "A %*% A.transpose: "
Ex2 <- A %*% A.transpose
Ex2
## [,1] [,2] [,3]
## [1,] 14 32 50
## [2,] 32 77 122
## [3,] 50 122 194
paste0("Does A.transpose %*% A = A %*% A.transpose? ")
## [1] "Does A.transpose %*% A = A %*% A.transpose? "
Ex1 == Ex2
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
(2) For a special type of square matrix A, we get \(A^T A = AA^T\). Under what conditions could this be true? (Hint: The Identity matrix I is an example of such a matrix).
As we noted above, we proved that asymmetric matrices do not work in the above equation. But what about symmetric matrices?
For example:
sym1 <- matrix(c(2,1,1,1,2,1,1,1,2), ncol = 3, byrow = FALSE)
sym2 <- matrix(c(1,0,0,0,1,0,0,0,1), ncol = 3, byrow = FALSE)
sym1
## [,1] [,2] [,3]
## [1,] 2 1 1
## [2,] 1 2 1
## [3,] 1 1 2
sym2
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
paste0("Does sym1.transpose %*% sym1 = sym1 %*% sym1.transpose?")
## [1] "Does sym1.transpose %*% sym1 = sym1 %*% sym1.transpose?"
sym1.transpose <- t(sym1)
sym1.transpose %*% sym1
## [,1] [,2] [,3]
## [1,] 6 5 5
## [2,] 5 6 5
## [3,] 5 5 6
sym1.transpose %*% sym1 == sym1 %*% sym1.transpose
## [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE
paste0("Does sym2.transpose %*% sym2 = sym2 %*% sym2.transpose?")
## [1] "Does sym2.transpose %*% sym2 = sym2 %*% sym2.transpose?"
sym2.transpose <- t(sym2)
sym2.transpose %*% sym2
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
sym2.transpose %*% sym2 == sym2 %*% sym2.transpose
## [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE
It appears that these particular symmetric matrices make the \(A^T A = AA^T\) true. But is this true for all symmetric matrices? Again, let’s assume that \(A^T A = AA^T\) for symmetric matrices.
The very definition of symmetric matrix (as noted earlier) states: \(A = A^T\). By simple substitution, \(AA = AA\), which is TRUE. As a result, we have proven that all symmetric matrices make this statement \(A^T A = AA^T\) true.
Matrix factorization is a very important problem. There are supercomputers built just to do matrix factorizations. Every second you are on an airplane, matrices are being factorized. Radars that track flights use a technique called Kalman filtering. At the heart of Kalman Filtering is a Matrix Factorization operation. Kalman Filters are solving linear systems of equations when they track your flight using radars.
Write an R function to factorize a square matrix A into LU or LDU, whichever you prefer. Please submit your response in an R Markdown document using our class naming convention.
You don’t have to worry about permuting rows of A and you can assume that A is less than 5x5, if you need to hard-code any variables in your code. If you doing the entire assignment in R, then please submit only one markdown document for both the problems.
# Let's write a function that converts a matrix into a 'lower triangle' and 'upper triangle'
# As the question states, this is assuming for matrices less than 5 x 5
# Reference: Week 2 Module 1 PDF Lecture
library(Matrix)
LU_convert <- function(A){
#if matrix is 2x2
pivot <- A[1,1]
matrix.size <- nrow(A) # Determine the size of the matrix
matrix.size.col <- ncol(A)
if (matrix.size != matrix.size.col){break} # If matrix A is not a square matrix, break
if (matrix.size == 2 & pivot != 0){ # For matrix sizes 2 x 2
print("Original Matrix: ")
print(A)
A.1=rbind(A[1,], A[2,]-A[1,]*A[2,1]/A[1,1]) # Zero out A[2,1]
print("Upper Triangle")
print(A.1)
B=matrix(c(1,0,0,1), ncol = 2, byrow = FALSE) # Creates the lower triangle
B[2,1]=A[2,1]/pivot
print("Lower Triangle")
print(B)
print("L %*% U: ")
B %*% A.1
}else if(matrix.size == 3 & pivot != 0){ # For matrix sizes 3 x 3
print("Original Matrix: ")
print(A)
A.1=rbind(A[1,], A[2,]-A[1,]*A[2,1]/A[1,1], A[3,]) # Zero out A[2,1]
A.1=rbind(A.1[1,], A.1[2,], A.1[3,]-A.1[1,]*A.1[3,1]/A.1[1,1]) # Zero out A[3,0]
pivot2=A.1[2,2] # Setting new pivot in 2nd column
if (pivot2 == 0){break}
A.2=rbind(A.1[1,], A.1[2,], A.1[3,]-A.1[2,]*A.1[3,2]/pivot2) # Zero out A[3,2]
print("Upper Triangle")
print(A.2)
B=matrix(c(1,0,0,0,1,0,0,0,1), ncol = 3, byrow = FALSE) # Creating the lower matrix
B[2,1]=A[2,1]/pivot
B[3,1]=A[3,1]/pivot
B[3,2]=A.1[3,2]/pivot2
print("Lower Triangle")
print(B)
print("L %*% U: ")
B %*% A.2
}else if(matrix.size == 4 & pivot != 0){ # For matrix sizes 4 x 4
print("Original Matrix: ")
print(A)
A.1=rbind(A[1,], A[2,]-A[1,]*A[2,1]/A[1,1], A[3,], A[4,]) # Zero out A[2,1]
A.1=rbind(A.1[1,], A.1[2,], A.1[3,]-A.1[1,]*A.1[3,1]/A.1[1,1], A[4,]) # Zero out A[3,0]
A.1=rbind(A.1[1,], A.1[2,], A.1[3,], A.1[4,]-A.1[1,]*A.1[4,1]/A.1[1,1]) # Zero out A[4,0]
pivot2=A.1[2,2] # Setting new pivot in 2nd column
if (pivot2 == 0){break}
A.2=rbind(A.1[1,], A.1[2,], A.1[3,]-A.1[2,]*A.1[3,2]/pivot2, A.1[4,]) # Zero out A[3,2]
A.2=rbind(A.2[1,], A.2[2,], A.2[3,], A.2[4,]-A.2[2,]*A.2[4,2]/pivot2) # Zero out A[4,2]
pivot3=A.2[3,3]
if (pivot3 == 0){break}
A.3=rbind(A.2[1,], A.2[2,], A.2[3,], A.2[4,]-A.2[3,]*A.2[4,3]/pivot3) # Zero out [4,3]
print("Upper Triangle")
print(A.3)
B=matrix(c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1), ncol = 4, byrow = FALSE) # Creates the lower triangle
B[2,1]=A[2,1]/pivot
B[3,1]=A[3,1]/pivot
B[4,1]=A[4,1]/pivot
B[3,2]=A.1[3,2]/pivot2
B[4,2]=A.1[4,2]/pivot2
B[4,3]=A.2[4,3]/pivot3
print("Lower Triangle")
print(B)
print("L %*% U: ")
B %*% A.3
}else if (pivot == 0){
print("First column is a non-nonzero column.")
}
}
Let’s trial the function and see if our examples worlk:
two <- matrix(c(2,3,4,5), ncol = 2)
three <- matrix(c(1,5,2,4,5,2,-1,-3,11), ncol = 3)
four <- matrix(c(1,5,4,21,17,16,-1,-3,4,10,32,15,3,-1,2,7), ncol = 4)
print("Two by Two Matrix: ")
## [1] "Two by Two Matrix: "
LU_convert(two)
## [1] "Original Matrix: "
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
## [1] "Upper Triangle"
## [,1] [,2]
## [1,] 2 4
## [2,] 0 -1
## [1] "Lower Triangle"
## [,1] [,2]
## [1,] 1.0 0
## [2,] 1.5 1
## [1] "L %*% U: "
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
print("Three by Three Matrix: ")
## [1] "Three by Three Matrix: "
LU_convert(three)
## [1] "Original Matrix: "
## [,1] [,2] [,3]
## [1,] 1 4 -1
## [2,] 5 5 -3
## [3,] 2 2 11
## [1] "Upper Triangle"
## [,1] [,2] [,3]
## [1,] 1 4 -1.0
## [2,] 0 -15 2.0
## [3,] 0 0 12.2
## [1] "Lower Triangle"
## [,1] [,2] [,3]
## [1,] 1 0.0 0
## [2,] 5 1.0 0
## [3,] 2 0.4 1
## [1] "L %*% U: "
## [,1] [,2] [,3]
## [1,] 1 4 -1
## [2,] 5 5 -3
## [3,] 2 2 11
print("Four by Four Matrix: ")
## [1] "Four by Four Matrix: "
LU_convert(four)
## [1] "Original Matrix: "
## [,1] [,2] [,3] [,4]
## [1,] 1 17 4 3
## [2,] 5 16 10 -1
## [3,] 4 -1 32 2
## [4,] 21 -3 15 7
## [1] "Upper Triangle"
## [,1] [,2] [,3] [,4]
## [1,] 1 17 4 3.0000
## [2,] 0 -69 -10 -16.0000
## [3,] 0 0 26 6.0000
## [4,] 0 0 0 31.3612
## [1] "Lower Triangle"
## [,1] [,2] [,3] [,4]
## [1,] 1 0.000000 0.0000000 0
## [2,] 5 1.000000 0.0000000 0
## [3,] 4 1.000000 1.0000000 0
## [4,] 21 5.217391 -0.6471572 1
## [1] "L %*% U: "
## [,1] [,2] [,3] [,4]
## [1,] 1 17 4 3
## [2,] 5 16 10 -1
## [3,] 4 -1 32 2
## [4,] 21 -3 15 7
As you can see in these particular examples, this function decomposes matrix A into an upper and lower triangle, and then confirms that the multiplication of L & U does indeed equal to A.