from SVD to Cholesky decomposition and Spectral decomposition
對於任何 \(A\) 矩陣,都可以將其分解成 \(U\Sigma V^{T}\), 這種拆解法稱為 Singular Value Decomposition (SVD) \[\text{for any }A \in R^{m*n} A = U\Sigma V^{T},\ \\ U \in R^{m*m},\ UU^{T} = U^{T}U = I_{m}\\ V \in R^{n*n},\ VV^{T} = V^{T}V = I_{n}\\ \Sigma \in R^{m*n}\begin{cases} \Sigma_{ij} = 0,\ i \neq j\\ \Sigma_{ij} = \sigma_{i},\ i = j,\ i = 1,2,\dots,\text{min(m,n) } \end{cases}\]
若將 \(A\) transpose 乘上 \(A\) \[
A^{T}A = VDV^{T},\ V^{T}V = VV^{T} = I_{n}\\
D_{i,j}= \begin{cases}
0,\ i \neq j\\
\sigma_{i}^{2},\ i = j,\ i = 1,2,\dots,n
\end{cases}
\] 其中 \(A^{T}A = (A^{T}A)^{T}\) , 所以 \(A^{T}A \in R^{n*n}\) 為對稱矩陣.
透過上面的推倒我們可以知道一個由 \(B^{T}B,\ B \in R^{m*n}\) 生成的對稱矩陣 \(A \in R^{n*n}\) 可以進行對角化,現在問題變成一個對稱矩陣 \(A \in R^{n*n}\) 是否能夠拆解成 \(B^{T}B,\ B \in R^{m*n}\) ? 若可以,那我們就能夠說明我們可以對任一對稱矩陣做正交對角化
\[ \text{assume } A \text{ is a symmetric matrix and it can't be diagonalized}\\ \rightarrow \text{ must have a generalized eigenvalues of order 2 or higher (learned in Jordan form)}\\ (A - \lambda I)^{j}x = 0,\ j \geq 2,\text{ but } (A - \lambda I) \neq 0\\ \text{assume j = 2, }x^{T}(A - \lambda I)^{2}x = ((A - \lambda I)x)^{T}(A - \lambda I)^{}x \neq 0\ (矛盾)\\ \text{also assume }\lambda_1 \neq \lambda_2,\ \lambda_{1}x_{1}^{T}x_{2} = x_{1}^{T}\lambda_{1}x_{2} = x_{1}^{T}Ax_{2} = x_{1}^{T}A^{T}x_{2} = \lambda_{2}x_{1}^{T}x_{2}\\ \text{because }\lambda_1 \neq \lambda_2 \rightarrow x_{1}^{T}x_{2} = 0,\text{ (orthogonal)}\\ \text{so } A = PDP^{T} = B^{T}B,\ B = D^{\frac{1}{2}}P^{T} \] 透過上面推倒我們可以得知一個對稱矩陣 \(A \in R^{n*n}\) 能夠拆解成 \(B^{T}B,\ B \in R^{m*n}\)
綜上所述,一個對稱矩陣可以被對角化,且其特徵向量矩陣為正交矩陣
一般而言,在統計裡,假若討論的是變異數矩陣,一般而言都是正定或是非負定,當變異數等於零,該資料失去隨機性(degenerated),才會有非負定的狀況發生,所以我們不仿假設接下來所探討的矩陣為正定,同時變異數矩陣是對稱的(i.e. \(cor(x,y) = cor(y,x)\))
一個對稱矩陣 \[\begin{align*} \Sigma &= PDP^{T}\\ &= PD^{\frac{1}{2}}D^{\frac{1}{2}}P^{T} \\ &= LL^{T} \end{align*}\] 我們稱其為 Cholesky decomposition
一個對稱矩陣 \(\Sigma = PDP^{T}\) 可以對其展開成 \(\lambda_{1,1}p_{1}p_{1}^{T} + \lambda_{2,2}p_{2}p_{2}^{T}+\dots + \lambda_{p,p}p_{p}p_{p}^{T}\)
\[\begin{align*} \Sigma &= PDP^{T} \\ &=\begin{pmatrix} p_{1} &p_{2}&\dots&p_{p} \end{pmatrix} \begin{pmatrix} \lambda_{1,1} & 0 & \dots & 0\\ 0 & \lambda_{2,2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0\\ 0 & \dots& 0 & \lambda_{p,p} \end{pmatrix} \begin{pmatrix} p_{1}^{T}\\ p_{2}^{T}\\ \vdots\\ p_{p}^{T} \end{pmatrix}\\ &= \lambda_{1,1}p_{1}p_{1}^{T} + \lambda_{2,2}p_{2}p_{2}^{T}+\dots + \lambda_{p,p}p_{p}p_{p}^{T}\\ \text{其中 }p_{i} &= \text{ eigenvector of }\lambda_{i,i} \end{align*}\] 我們稱其為 Spectral decomposition
QR decomposition and linear model
考慮正規方程式 \[\hat\beta = (X^{T}X)^{-1}X^{T}Y\] 我們知道在程式計算中,去解反矩陣或是除法沒有效率,我們會想辦法去避免做這幾種運算
在線性代數中,我們有學到 QR 分解\[ A \in R^{m*n},\ A = QR, \\ \text{ where }Q^{T}Q = QQ^{T} = I_{m}\\ R \text{ is upper triangular} \]
因此\[\begin{align*} X^{T}X\beta &= X^{T}Y\\ R^{T}R\beta &= R^{T}Q^{T}Y\\ \text{if }X\text{ is full rank, } & R^{T} \text{ is invertable}\\ R\beta &= Q^{T}Y \end{align*}\]
要讓 \(X\) full rank 並不難,只要稍稍注意一下即可
接著考慮 hat matrix \[\begin{align*} H &= X(X^{T}X)^{-1}X^{T}\\ &= QR(R^{T}R)^{-1}R^{T}Q^{T}\\ &= QRR^{-1}R^{-T}R^{T}Q^{T}\\ &= QQ^{T} \end{align*}\]
可以看到,透過 QR 分解可以更有效率的利用最小平方法解出 \(\hat\beta\)