If \(X\) is an \(n\times m\) tall matrix, with \(n\geq m\), then the columns of \(X\) span a subspace of \(\mathbb{R}^n\), which we call the column-space of \(X\). Thus the column-space of \(X\) is the set of all linear combinations of the columns of \(X\). The null-space of \(X\) is the set of all elements \(y\) of \(\mathbb{R}^n\) such that \(y'X=0\).
The matrix \(X\) can be column-singular. A basis for the column-space of \(X\) is an \(n\times r\) column-nonsingular matrix \(Q\) such that the column-space of \(Q\) is the column-space of \(X\). The number of columns in the basis is the column-rank of \(X\), and thus the rank of \(X\). The rank is the dimensionality of the column-space. Note that if \(Q\) is a basis for a subspace, then so is \(QB\) for any non-singular \(B\).
If \(X=QR\) is a QR-decomposition (or any full-rank decomposition) of \(X\), then \(Q\) is a basis for the column-space of \(X\). As the QR-decomposition, and the Gram-Schmidt algorithm, show, a basis can always be chosen to be orthonormal, i.e. such that \(Q'Q=I\). Note that if \(Q\) is an orthonormal basis for a subspace, then so is \(QK\) for any square orthonormal \(K\).
set.seed(12345)
x <- matrix (rnorm (40), 10, 4)
x[,3] <- (x[, 1] + x[, 2]) / 2
x
## [,1] [,2] [,3] [,4]
## [1,] 0.5855288 -0.1162478 0.23464051 0.81187318
## [2,] 0.7094660 1.8173120 1.26338903 2.19683355
## [3,] -0.1093033 0.3706279 0.13066227 2.04919034
## [4,] -0.4534972 0.5202165 0.03335964 1.63244564
## [5,] 0.6058875 -0.7505320 -0.07232227 0.25427119
## [6,] -1.8179560 0.8168998 -0.50052806 0.49118828
## [7,] 0.6300986 -0.8863575 -0.12812949 -0.32408658
## [8,] -0.2761841 -0.3315776 -0.30388085 -1.66205024
## [9,] -0.2841597 1.1207127 0.41827645 1.76773385
## [10,] -0.9193220 0.2987237 -0.31029915 0.02580105
z <- qr (x)
z $ pivot
## [1] 1 2 4 3
z $ rank
## [1] 3
print (q <- qr.Q(z))
## [,1] [,2] [,3] [,4]
## [1,] -0.23639535 -0.031314525 -0.25812994 0.108543255
## [2,] -0.28643248 -0.805572716 0.18947595 -0.010200237
## [3,] 0.04412899 -0.130939066 -0.59304249 -0.489557998
## [4,] 0.18309026 -0.144384151 -0.42420664 0.836989084
## [5,] -0.24461474 0.214660988 -0.35681261 -0.056199971
## [6,] 0.73396276 -0.081522285 -0.08657856 -0.142027101
## [7,] -0.25438948 0.264727463 -0.20566446 -0.006608343
## [8,] 0.11150372 0.166248868 0.40547100 0.114107299
## [9,] 0.11472372 -0.402023129 -0.15493187 -0.088087839
## [10,] 0.37115757 0.003611426 -0.01724983 -0.061178503
print (r <- qr.R(z))
## [,1] [,2] [,3] [,4]
## [1,] -2.476905 0.8296643 -0.02404181 -8.236203e-01
## [2,] 0.000000 -2.5509241 -3.35729693 -1.275462e+00
## [3,] 0.000000 0.0000000 -2.71590986 -5.839747e-17
## [4,] 0.000000 0.0000000 0.00000000 -2.523113e-16
(q%*%r)[,z $ pivot]
## [,1] [,2] [,3] [,4]
## [1,] 0.5855288 -0.1162478 0.23464051 0.81187318
## [2,] 0.7094660 1.8173120 1.26338903 2.19683355
## [3,] -0.1093033 0.3706279 0.13066227 2.04919034
## [4,] -0.4534972 0.5202165 0.03335964 1.63244564
## [5,] 0.6058875 -0.7505320 -0.07232227 0.25427119
## [6,] -1.8179560 0.8168998 -0.50052806 0.49118828
## [7,] 0.6300986 -0.8863575 -0.12812949 -0.32408658
## [8,] -0.2761841 -0.3315776 -0.30388085 -1.66205024
## [9,] -0.2841597 1.1207127 0.41827645 1.76773385
## [10,] -0.9193220 0.2987237 -0.31029915 0.02580105
In our example the first three columns of q are a basis for the column space
If \(Q\) is an \(n\times r\) orthonormal matrix, then we write \(Q_\perp\) for the orthonormal complement of \(Q\). Thus \(Q_\perp\) is \(n\times(n-r)\), with \(Q_\perp'Q_\perp=I\) and \(Q'Q_\perp=0\).
If \(X=QR\), in which both \(Q\) and \(X\) are \(n\times m\) matrices, and \(Q'Q=I\), then \(rank(X)=rank(R)\), and \(rank(R)\) is the number of non-zero diagonal elements of \(R\). The columns of \(Q\) corresponding to the non-zero diagonal elements of \(R\) are a basis for the column-space of \(X\), the remaining \(m-r\) columns are in the null-space of \(X\).
How do we find an orthonormal basis for the null space ? One way is to observe that \(QQ'+Q_\perp Q_\perp'=I\) and thus \(I-QQ'=Q_\perp Q_\perp'\). This is a full-rank decomposition of \(I-QQ'\), and thus we can find a \(Q_\perp\) by applying QR to \(I-QQ'\).
qc <- q[,1 : 3]
h <- diag (10) - tcrossprod (qc)
g <- qr (h)
g $ rank
## [1] 7
g $ pivot
## [1] 1 2 3 4 5 6 7 8 9 10
print (qn <- qr.Q(g))
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.93621876 -2.081668e-17 1.127570e-17 3.469447e-17 7.112366e-17
## [2,] 0.04702749 -4.805167e-01 -2.802944e-17 -2.859825e-17 -1.751998e-16
## [3,] 0.15674801 -2.529540e-02 -7.771733e-01 2.447264e-17 1.023006e-16
## [4,] 0.07555934 -2.695985e-02 3.745412e-01 -7.867419e-01 6.867938e-17
## [5,] 0.15296403 -3.397870e-01 2.641300e-01 2.481476e-01 -7.046801e-01
## [6,] -0.15872790 -3.505139e-01 1.008711e-01 2.772396e-01 3.422030e-02
## [7,] 0.11208367 -3.622960e-01 1.322896e-01 8.926671e-02 5.531289e-01
## [8,] -0.14550998 -1.995348e-01 -3.539372e-01 -3.988236e-01 -4.018511e-01
## [9,] 0.02719617 5.471638e-01 1.801485e-01 2.536412e-01 -1.849296e-01
## [10,] -0.08908208 -2.428190e-01 2.356563e-02 1.059980e-01 2.490136e-02
## [,6] [,7] [,8] [,9] [,10]
## [1,] 2.775558e-17 1.457168e-16 -0.30306443 -0.01666805 -0.17711171
## [2,] -1.672076e-16 -4.476284e-16 0.07658474 0.74736371 -0.44997151
## [3,] 1.336856e-16 2.139362e-16 -0.57771634 0.12245707 0.14845840
## [4,] 1.054146e-16 8.277104e-17 -0.38480279 0.16401580 0.24360978
## [5,] 6.887359e-17 2.731398e-16 -0.38620693 -0.26257366 -0.12300500
## [6,] -4.592326e-01 -4.076424e-17 0.06319015 0.20651571 0.71147827
## [7,] -5.284422e-02 -5.872590e-01 -0.23764703 -0.30995887 -0.15665851
## [8,] -7.361299e-02 -5.322586e-01 0.42910812 -0.13447270 0.04755846
## [9,] 3.581643e-02 -6.073014e-01 -0.15175476 0.41207868 0.07713411
## [10,] 8.829559e-01 -5.488726e-02 0.06057375 0.06093294 0.36150593
print (rn <- qr.R(g))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.9362188 0.04702749 0.1567480 0.07555934 0.1529640 -0.1587279
## [2,] 0.0000000 -0.48051671 -0.0252954 -0.02695985 -0.3397870 -0.3505139
## [3,] 0.0000000 0.00000000 -0.7771733 0.37454117 0.2641300 0.1008711
## [4,] 0.0000000 0.00000000 0.0000000 -0.78674187 0.2481476 0.2772396
## [5,] 0.0000000 0.00000000 0.0000000 0.00000000 -0.7046801 0.0342203
## [6,] 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 -0.4592326
## [7,] 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.0000000
## [8,] 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.0000000
## [9,] 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.0000000
## [10,] 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.0000000
## [,7] [,8] [,9] [,10]
## [1,] 0.11208367 -1.455100e-01 2.719617e-02 -8.908208e-02
## [2,] -0.36229597 -1.995348e-01 5.471638e-01 -2.428190e-01
## [3,] 0.13228961 -3.539372e-01 1.801485e-01 2.356563e-02
## [4,] 0.08926671 -3.988236e-01 2.536412e-01 1.059980e-01
## [5,] 0.55312895 -4.018511e-01 -1.849296e-01 2.490136e-02
## [6,] -0.05284422 -7.361299e-02 3.581643e-02 8.829559e-01
## [7,] -0.58725901 -5.322586e-01 -6.073014e-01 -5.488726e-02
## [8,] 0.00000000 5.293721e-16 5.482222e-16 -3.272848e-16
## [9,] 0.00000000 0.000000e+00 2.000124e-16 5.422542e-17
## [10,] 0.00000000 0.000000e+00 0.000000e+00 -2.335682e-16
qn <- qn[, 1 : 7]
crossprod (qn, x)
## [,1] [,2] [,3] [,4]
## [1,] -2.775558e-16 -1.353084e-16 -1.665335e-16 2.775558e-16
## [2,] -1.110223e-16 8.881784e-16 3.885781e-16 3.330669e-16
## [3,] -1.110223e-16 -1.110223e-16 -9.020562e-17 0.000000e+00
## [4,] 5.551115e-17 -1.665335e-16 -4.163336e-17 5.551115e-17
## [5,] 4.163336e-17 -5.551115e-16 -2.498002e-16 1.110223e-16
## [6,] -2.359224e-16 -1.665335e-16 -2.567391e-16 3.885781e-16
## [7,] 2.498002e-16 -1.165734e-15 -4.996004e-16 -4.440892e-16
If \(Q\) is a basis for the column-space of \(X\) and \(Q_\perp\) is a basis for the null-space, then both \(QQ'\) and \(Q_\perp Q_\perp'\) are orthogonal projectors. They are square, symmetric, and idempotent. Moreover their rank is equal to their trace and as we shall see later their eigenvalues are either zero or one (the number of eignevalues equal to one is their rank).
An important decomposition of any vector \(y\in\mathbb{R}^n\) is \[y=QQ'y+(I-QQ')y=QQ'y+Q_\perp Q_\perp'y,\] which implies a corresponding decomposition of the sum of squares \[y'y=y'QQ'y+y'Q_\perp Q_\perp'y.\]
Consider the problem of finding the vector \(z\) in the column-space of \(X\) which is closest to \(y\) in the least-squares sense. This is the same as finding \(b\) such that \((y-Qb)'(y-Qb)\) is minimized over \(b\), where \(Q\) is an orthonormal basis for the column-space of \(X\). Now \[ (y-Qb)'(y-Qb)=y'y-2y'Qb+b'b=y'y-y'QQ'y+(b-Q'y)'(b-Q'y) \] which is equal to its minimum value \[ y'y-y'QQ'y=y'(I-QQ')y=y'Q_\perp Q_\perp'y \] for \(b=Q'y\) and thus \(Qb=QQ'y\).
So \(QQ'y\) is the vector in the column-space of \(X\) closest to \(y\), i.e. the projection of \(y\) on the column-space. And \[y-QQy=Q_\perp Q_\perp'y\] is the residual, which also happens to be the vector in the null-space of \(X\) closest to \(y\), i.e. the projection of \(y\) on the null-space.