Math Review

1 Theory

Either look at the math camp notes, or on some simplified online blogs.

  • Inverse of a matrix is defined only for a square matrix, and cannot be calculated for non-square matrices.
  • Furthermore, not all squares matrices have an inverse - it requires linear independence of all columns and rows i.e. the vector or data is not redundant in the matrix.
  • A simpler way to see if the inverse of matrix exists if you write out the formula of the inverse of a matrix in terms of the dividing the adjoint of the matrix (which always exists) by the determinant of the matrix. \[ A^{-1}=\dfrac{Adj \ A}{det(A)} \]
  • The determinant of the matrix will be 0 if the rows or columns are linearly dependent. Since we cannot divide by zero, the matrix inverse will not exist if the determinant is 0. In this instance, we will say we have an singular matrix.
  • A matrix is nonsingular if and only if its inverse exist - i.e. its determinant is not 0.
  • We need non-singular (data) matrix for our linear regression.

Thus, you should always check for variable values and duplicate rows in your data to be able to check if you can estimate your “beta” coefficients in your linear regression as you have to invert the your data matrix to find it. Most statistical software will automatically drop duplicate rows but will give you an error if the matrix is not invertible.

2 Implementation in R

Will you be able to invert these 4 matrices ?

2.1 Matrix A

\[ \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} \]

A <- matrix(data = c(1,2,
                     3,4),
            nrow = 2,
            ncol = 2, 
            byrow = TRUE
            )
A
     [,1] [,2]
[1,]    1    2
[2,]    3    4
det(A)
[1] -2

Yes, you can invert matrix A as its determinant is not 0. It is a non-singular matrix.

solve(A) # inverse of A 
     [,1] [,2]
[1,] -2.0  1.0
[2,]  1.5 -0.5
A %*% solve(A)  # Identity matrix
     [,1]         [,2]
[1,]    1 1.110223e-16
[2,]    0 1.000000e+00
solve(A) %*% A  # Identity matrix
              [,1]         [,2]
[1,]  1.000000e+00 4.440892e-16
[2,] -5.551115e-17 1.000000e+00

2.1.1 Alternative ways to create a matrix

2.1.1.1 Row bind

Combine rows under each other.

a1 <- c(1,2)
a2 <- c(3,4)

?rbind
A  <- rbind(a1,a2)
A
   [,1] [,2]
a1    1    2
a2    3    4
det(A)
[1] -2
       a1   a2
[1,] -2.0  1.0
[2,]  1.5 -0.5

2.1.1.2 Column bind

Combine columns after each other.

a1 <- c(1,3)
a2 <- c(2,4)

?cbind
A <- cbind(a1,a2)

det(A)
[1] -2
   [,1] [,2]
a1 -2.0  1.0
a2  1.5 -0.5

2.2 RConics for adjugate of a matrix

# install.packages("RConics")
require(RConics)
Loading required package: RConics
?RConics

R <- matrix(data = c(1,0,1,
                     2,2,1,
                     0,0,3),
            nrow = 3,
            ncol = 3, 
            byrow = TRUE
            )
R
     [,1] [,2] [,3]
[1,]    1    0    1
[2,]    2    2    1
[3,]    0    0    3
adj_R <- adjoint(R) # Compute the classical adjoint (also called adjugate) of a square matrix. The adjoint is the transpose of the cofactor matrix.
adj_R
     [,1] [,2] [,3]
[1,]    6    0   -2
[2,]   -6    3    1
[3,]    0    0    2
det(R)
[1] 6
R_inverse <- adj_R/det(R)
R_inverse
     [,1] [,2]       [,3]
[1,]    1  0.0 -0.3333333
[2,]   -1  0.5  0.1666667
[3,]    0  0.0  0.3333333
     [,1] [,2]       [,3]
[1,]    1  0.0 -0.3333333
[2,]   -1  0.5  0.1666667
[3,]    0  0.0  0.3333333

2.3 Matrix B

Row 1 and row 3 are the same !

\[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 1 & 2 & 3 \\ \end{bmatrix} \]

B <- matrix(data = c(1,2,3,
                     4,5,6,
                     1,2,3),
            nrow = 3,
            ncol = 3, 
            byrow = TRUE
            )
B
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    1    2    3
[1] "double"
?det
det(B)
[1] 0
??inverse
# Use the solve() function to calculate the inverse.

# solve(B)

# Using tryCatch to catch and print the error message
result <- tryCatch({
  solve(B)
}, error = function(e) {
  cat("Error: ", conditionMessage(e), "\n")
})
Error:  Lapack routine dgesv: system is exactly singular: U[3,3] = 0 

2.4 Matrix C

\[ \begin{bmatrix} 1 & 2 & 2 \\ -2 & 0.5 & 0 \\ -2 & 5 & 4 \\ \end{bmatrix} \]

You may not be able to visually eyeball that the third row is a linear combination of the first two rows, with an equal weight (2) on row 1 and row 2. But calculating the determinant identifies this (i.e. det(C) = 0) and thus you know you will not be able to calculate the inverse of the matrix C as it should not exist. solve(C) command gives us an error, as expected.

# Row 1
m11 <- 1
m12 <- 2
m13 <- 2

# Row 2
m21 <- -2
m22 <- .5
m23 <- 0

# Row 3 is a linear combination of Row 1 and Row 2
C <- matrix(data = c(m11,m12,m13,
                     m21,m22,m23,
                     2*m11+2*m21,2*m12+2*m22,2*m13+2*m23),
            nrow = 3,
            ncol = 3, 
            byrow = TRUE)
C
     [,1] [,2] [,3]
[1,]    1  2.0    2
[2,]   -2  0.5    0
[3,]   -2  5.0    4
det(C)
[1] 0
#solve(C)

# Using tryCatch to catch and print the error message
result <- tryCatch({
  solve(C)
}, error = function(e) {
  cat("Error: ", conditionMessage(e), "\n")
})
Error:  Lapack routine dgesv: system is exactly singular: U[3,3] = 0 

2.5 Matrix D

You may not be able to visually eyeball that the third column is a linear combination of the first two columns, with a weight of -2 on column 1 and a weight of 3 on the second column. Seeing the pattern is harder here than matrix C above. But again calculating the determinant identifies this (i.e. det(D) = 0) and thus you know you will not be able to calculate the inverse of the matrix D as it should not exist. solve(D) command gives us an error, as expected.

\[ \begin{bmatrix} 1 & -6 & -20 \\ 2 & -1 & -7 \\ 3 & -3 & 3 \\ \end{bmatrix} \]

# Col 1
m11 <- 1
m21 <- 2
m31 <- 3

# Col 2
m12 <- -6
m22 <- -1
m32 <- 3

w_col1 <- -2
w_col2 <- 3
D <- matrix(data = c(m11,m21,m31,
                     m12,m22,m32,
                     w_col1 * m11 + w_col2 * m12, w_col1 * m21 + w_col2 * m22, w_col1 * m31 + w_col2 * m32),
            nrow = 3,
            ncol = 3, 
            byrow = FALSE)
D
     [,1] [,2] [,3]
[1,]    1   -6  -20
[2,]    2   -1   -7
[3,]    3    3    3
det(D)
[1] 1.049161e-14
#solve(D)

# Using tryCatch to catch and print the error message
result <- tryCatch({
  solve(D)
}, error = function(e) {
  cat("Error: ", conditionMessage(e), "\n")
})
Error:  system is computationally singular: reciprocal condition number = 2.77556e-18 

2.6 Matrix E

Determinant of E is not defined if you have NA , non-numerical or missing values.

2.6.1 NA values

# define columns
e1 <- c(1,3)
e2 <- c(2,NA)

E <- cbind(e1,e2)
E
     e1 e2
[1,]  1  2
[2,]  3 NA
det(E)
[1] NA

Do not replace NA values with 0. Think about how to impute missing values.

E[is.na(E)] <- 0
det(E)
[1] -6

3 Matrix Algebra

3.1 Derivative of a Linear Form

Let \(a\) and \(b\) be \(K \times 1\) column vectors. Consider the scalar product:

\[ f(b) = a^\top b = \sum_{i=1}^K a_i b_i \]

Then the derivative with respect to \(b\) is:

\[ \frac{\partial (a^\top b)}{\partial b} = a \]

This is because \(a^\top b\) is a linear combination of the \(b_i\), and when differentiating, each \(a_i\) becomes the partial derivative with respect to \(b_i\).

Since \(a^\top b = b^\top a\) (a scalar), we also have:

\[ \frac{\partial (b^\top a)}{\partial b} = a \]

3.1.1 Intuition

The expression \(a^\top b\) is simply the dot product between two vectors:

\[ a^\top b = \sum_{i=1}^K a_i b_i \]

This is a scalar — a weighted sum of the components of b, where the weights are given by the elements of a.

When we take the derivative of this scalar with respect to the vector b, we are effectively collecting the coefficients \(a_i\):

\[ \frac{\partial}{\partial b} \begin{bmatrix} a_1 b_1 + a_2 b_2 + \dots + a_K b_K \end{bmatrix} = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_K \end{bmatrix} = a \]

Thus, the gradient of \(a^\top b\) with respect to b is simply a.

3.2 Derivative of a Quadratic Form

Let \(A\) be a symmetric \(K \times K\) matrix, and \(b\) a \(K \times 1\) vector. Then:

\[ f(b) = b^\top A b \]

This is a quadratic form, which expands to:

\[ \sum_{i=1}^K \sum_{j=1}^K b_i A_{ij} b_j \]

Taking the derivative with respect to \(b\), we get:

\[ \frac{\partial (b^\top A b)}{\partial b} = 2 A b \quad \text{(if } A = A^\top \text{)} \]

This is the vector version of the familiar scalar derivative \(\frac{d}{dx}(ax^2) = 2ax\).

If \(A\) is not symmetric, the derivative becomes:

\[ \frac{\partial (b^\top A b)}{\partial b} = A^\top b + A b = (A + A^\top) b \]


3.3 Summary

Expression Derivative w.r.t. \(b\) Condition
\(a^\top b\) or \(b^\top a\) \(a\) Always
\(b^\top A b\) \(2Ab\) If \(A\) is symmetric
\(b^\top A b\) \((A + A^\top)b\) If \(A\) is not symmetric

4 APPENDIX

The appendix below might be of particular interest.