Si tenemos una matriz \(A \in \mathbb{R}^{m \times n}\) (cuadrada o no, simetrica o no) tiene una factorización de la forma:
\[ A = U \Sigma V^T \] donde \(U \in \mathbb{R}^{m \times n}\) con columnas ortogonales y \(V \in \mathbb{R}^{n \times n}\) matriz ortogonal y D una matriz diagonal\((n \times n)\). Este resultado se llama descomposición en valores singulares”(DVS), y es una de las más importantes entre las descomposiciones de matrices. Además, \(( \Sigma )\) contiene los valores singulares, que se obtienen como:
\[ \sigma_i = \sqrt{\lambda_i(A^T A)} \] Ahora, explicaré como obtenerla considerando datos simulados, y con una amplicación de rango \(1\).
En lugar de usar una matriz fija, generamos una matriz aleatoria controlada.
set.seed(123) # Reproducibilidad
# Dimensiones
m <- 6 # número de filas
n <- 4 # número de columnas
# Simulación de matriz con estructura
A <- matrix(rnorm(m * n, mean = 0, sd = 1), nrow = m)
A
## [,1] [,2] [,3] [,4]
## [1,] -0.56047565 0.4609162 0.4007715 0.7013559
## [2,] -0.23017749 -1.2650612 0.1106827 -0.4727914
## [3,] 1.55870831 -0.6868529 -0.5558411 -1.0678237
## [4,] 0.07050839 -0.4456620 1.7869131 -0.2179749
## [5,] 0.12928774 1.2240818 0.4978505 -1.0260044
## [6,] 1.71506499 0.3598138 -1.9666172 -0.7288912
AtA <- t(A) %*% A
AtA
## [,1] [,2] [,3] [,4]
## [1,] 5.759821 -0.2938070 -4.2990114 -3.3468073
## [2,] -0.293807 4.1110473 -0.4680829 0.2337788
## [3,] -4.299011 -0.4680829 7.7903245 1.3554449
## [4,] -3.346807 0.2337788 1.3554449 3.4871599
eig <- eigen(AtA)
lambda <- eig$values
V <- eig$vectors
lambda
## [1] 12.3469903 4.6430094 3.4993672 0.6589858
V
## [,1] [,2] [,3] [,4]
## [1,] 0.629272015 0.2771954 -0.2420838 0.68452529
## [2,] 0.007386372 -0.7356815 -0.6752641 0.05231205
## [3,] -0.696811267 0.4511279 -0.4767018 0.28929749
## [4,] -0.344116596 -0.4223968 0.5081035 0.66707986
Los valores singulares dependen de los autovalores de \(A^TA\)
\[\begin{equation} A^T A v_i = \lambda_i v_i \end{equation}\]
Son las raíces cuadradas de los autovalores.
\[\begin{equation} \sigma_i = \sqrt \lambda_i \end{equation}\]
sigma <- sqrt(lambda)
sigma
## [1] 3.5138284 2.1547643 1.8706596 0.8117794
\[\begin{equation} \Sigma = \begin{pmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \end{pmatrix} \end{equation}\]
Sigma <- diag(sigma)
Sigma
## [,1] [,2] [,3] [,4]
## [1,] 3.513828 0.000000 0.00000 0.0000000
## [2,] 0.000000 2.154764 0.00000 0.0000000
## [3,] 0.000000 0.000000 1.87066 0.0000000
## [4,] 0.000000 0.000000 0.00000 0.8117794
Se obtiene usando los autovectores de \(A^TA\)
\[\begin{equation} u_i = \frac{Av_i}{\sigma_i} \end{equation}\]
U <- A %*% V %*% diag(1 / sigma)
U
## [,1] [,2] [,3] [,4]
## [1,] -0.24756400 -0.28304723 -0.005476774 0.2762503
## [2,] -0.01952801 0.51816137 0.329821171 -0.6246890
## [3,] 0.49249712 0.52797492 -0.102170103 0.1945325
## [4,] -0.32131770 0.57807166 -0.362817440 0.4884259
## [5,] 0.02747886 -0.09593674 -0.864144229 -0.4777961
## [6,] 0.76927053 -0.17106970 -0.048657246 0.1695804
\[\begin{equation} A = U \Sigma V^T \end{equation}\]
A_reconstruida <- U %*% Sigma %*% t(V)
A_reconstruida
## [,1] [,2] [,3] [,4]
## [1,] -0.56047565 0.4609162 0.4007715 0.7013559
## [2,] -0.23017749 -1.2650612 0.1106827 -0.4727914
## [3,] 1.55870831 -0.6868529 -0.5558411 -1.0678237
## [4,] 0.07050839 -0.4456620 1.7869131 -0.2179749
## [5,] 0.12928774 1.2240818 0.4978505 -1.0260044
## [6,] 1.71506499 0.3598138 -1.9666172 -0.7288912
error <- sum((A - A_reconstruida)^2)
error
## [1] 5.097937e-30
svd_A <- svd(A)
U_svd <- svd_A$u
D_svd <- svd_A$d
V_svd <- svd_A$v
U_svd
## [,1] [,2] [,3] [,4]
## [1,] -0.24756400 -0.28304723 0.005476774 0.2762503
## [2,] -0.01952801 0.51816137 -0.329821171 -0.6246890
## [3,] 0.49249712 0.52797492 0.102170103 0.1945325
## [4,] -0.32131770 0.57807166 0.362817440 0.4884259
## [5,] 0.02747886 -0.09593674 0.864144229 -0.4777961
## [6,] 0.76927053 -0.17106970 0.048657246 0.1695804
D_svd
## [1] 3.5138284 2.1547643 1.8706596 0.8117794
V_svd
## [,1] [,2] [,3] [,4]
## [1,] 0.629272015 0.2771954 0.2420838 0.68452529
## [2,] 0.007386372 -0.7356815 0.6752641 0.05231205
## [3,] -0.696811267 0.4511279 0.4767018 0.28929749
## [4,] -0.344116596 -0.4223968 -0.5081035 0.66707986
Sigma_R <- diag(D_svd)
A_rec <- U_svd %*% Sigma_R %*% t(V_svd)
A_rec
## [,1] [,2] [,3] [,4]
## [1,] -0.56047565 0.4609162 0.4007715 0.7013559
## [2,] -0.23017749 -1.2650612 0.1106827 -0.4727914
## [3,] 1.55870831 -0.6868529 -0.5558411 -1.0678237
## [4,] 0.07050839 -0.4456620 1.7869131 -0.2179749
## [5,] 0.12928774 1.2240818 0.4978505 -1.0260044
## [6,] 1.71506499 0.3598138 -1.9666172 -0.7288912
Reducimos dimensionalidad usando el primer valor singular.
\[\begin{equation} A_1 = \sigma_1 u_1 v_1^T \end{equation}\]
\[\begin{equation} A_1 = \sigma_1 u_1 v_1^T \end{equation}\]
u1 <- U_svd[, 1, drop = FALSE]
v1 <- V_svd[, 1, drop = FALSE]
sigma1 <- D_svd[1]
A1 <- sigma1 * u1 %*% t(v1)
A1
## [,1] [,2] [,3] [,4]
## [1,] -0.54740209 -0.0064253860 0.60615431 0.29934613
## [2,] -0.04317942 -0.0005068385 0.04781384 0.02361261
## [3,] 1.08898693 0.0127824893 -1.20586701 -0.59551111
## [4,] -0.71048290 -0.0083396226 0.78673845 0.38852666
## [5,] 0.06076000 0.0007131987 -0.06728132 -0.03322653
## [6,] 1.70097554 0.0199659895 -1.88353986 -0.93017630
error_rango1 <- sum((A - A1)^2)
error_rango1
## [1] 8.801362
k <- 2
Uk <- U_svd[, 1:k]
Vk <- V_svd[, 1:k]
Sigmak <- diag(D_svd[1:k])
Ak <- Uk %*% Sigmak %*% t(Vk)
Ak
## [,1] [,2] [,3] [,4]
## [1,] -0.716463571 0.4422668 0.3310114 0.55696599
## [2,] 0.266313539 -0.8219067 0.5515052 -0.44800005
## [3,] 1.404341439 -0.8241740 -0.6926361 -1.07605573
## [4,] -0.365206071 -0.9247105 1.3486671 -0.13761429
## [5,] 0.003457877 0.1527941 -0.1605390 0.05409179
## [6,] 1.598797200 0.2911491 -2.0498323 -0.77447454
error_k <- sum((A - Ak)^2)
error_k
## [1] 4.158353
\[\begin{equation} \frac{\sigma_i ^2}{\sum \sigma_i^2} \end{equation}\]
total <- sum(D_svd^2)
var_exp <- D_svd^2 / total * 100
var_acum <- cumsum(var_exp)
data.frame(
Componente = 1:length(D_svd),
Varianza = var_exp,
Acumulada = var_acum
)
## Componente Varianza Acumulada
## 1 1 58.382752 58.38275
## 2 2 21.954473 80.33723
## 3 3 16.546760 96.88399
## 4 4 3.116015 100.00000