이 글은 고려대학교 통계학과 신승준 교수님의 2018년 1학기 통계 계산 방법론(대학원) 강의를 바탕으로 작성되었습니다.
Houserhodler Transformation
\[U=I-duu^T, \quad \quad where \;\; d=\frac{2}{u^Tu}\]
d is constant
U is nxn symmetric & orthogonal matrix
I is nxn Identity matrix
u is nx1 vector
Householder transformation은
\(Ux = se_1\) where \(e_1^T=(1,0, \cdots,0)\) for some constant s
\(u=x+se_1\) where \(s^2=x^Tx\)가 되도록하는 벡터 \(u\)를 찾는 것이다.
Householder <- function(x)
{
s <- sign(x[1]) * c(sqrt(t(x) %*% x))
nn <- length(x)
e1 <- rep(0, nn)
e1[1] <- 1
u <- x + s * e1
d <- 2 / c(t(u) %*% u) # matrix -> scalar로 만들어 주기 위해 c()에 넣어줌
U <- diag(nn) - outer(d*u, u)
list(U = U, u = u/s)
}x <- 1:4
Householder(x)## $U
## [,1] [,2] [,3] [,4]
## [1,] -0.1825742 -0.3651484 -0.5477226 -0.7302967
## [2,] -0.3651484 0.8872516 -0.1691226 -0.2254968
## [3,] -0.5477226 -0.1691226 0.7463161 -0.3382452
## [4,] -0.7302967 -0.2254968 -0.3382452 0.5490064
##
## $u
## [1] 1.1825742 0.3651484 0.5477226 0.7302967
x <- 0:3
Householder(x)## $U
## [,1] [,2] [,3] [,4]
## [1,] 1 0.0000000 0.0000000 0.0000000
## [2,] 0 0.8571429 -0.2857143 -0.4285714
## [3,] 0 -0.2857143 0.4285714 -0.8571429
## [4,] 0 -0.4285714 -0.8571429 -0.2857143
##
## $u
## [1] NaN Inf Inf Inf
Application to Regression
Linaer regression은 다음 식을 minimize해서 풀 수 있다.
\[||Y-X\beta||^2=||UY-UX\beta||^2=||Z-\begin{pmatrix} R \\ 0 \end{pmatrix}\beta||^2= ||\begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix}-\begin{pmatrix} R\beta \\ 0 \end{pmatrix}||^2\]
\(Z_1=R\beta\)일 때 식이 minimize 되므로 backsolve를 이용해서 \(\beta\)를 구해주면 된다.
- backsolve(R, z1)
HouseReg <- function(X, y) {
p <- ncol(X)
n <- nrow(X)
Xy <- cbind(X, y)
u <- as.list(1:p)
# initialization
temp.hh <- Householder(X[,1])
U <- temp.hh$U
u[[1]] <- temp.hh$u
for (i in 2:p) {
x <- (U %*% X)[i:n,i] # 새로운 x
temp.hh <- Householder(x)
u[[i]] <- temp.hh$u
temp.U <- temp.hh$U
temp.I <- diag(i-1)
temp.01 <- matrix(0, i-1, n - i +1)
temp.02 <- matrix(0, n - i + 1, i-1)
U <- rbind(cbind(temp.I, temp.01), cbind(temp.02, temp.U)) %*% U # U1 %*% U2를 새로운 U로 저장
}
temp <- U %*% Xy
R <- temp[1:p,1:p]
z1 <- temp[1:p,p+1]
z2 <- temp[-(1:p),p+1]
obj <- list(U = U, R = R, z = c(z1, z2), z1 = z1, z2 = z2, u = u)
}X <- cbind(rep(1, 6), 1:6, (1:6)^2)
X## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 2 4
## [3,] 1 3 9
## [4,] 1 4 16
## [5,] 1 5 25
## [6,] 1 6 36
y <- c(4.5, 5.5, 6.5, 8, 10, 12)
y## [1] 4.5 5.5 6.5 8.0 10.0 12.0
p <- ncol(X)
n <- nrow(X)
Xy <- cbind(X, y)
Xy## y
## [1,] 1 1 1 4.5
## [2,] 1 2 4 5.5
## [3,] 1 3 9 6.5
## [4,] 1 4 16 8.0
## [5,] 1 5 25 10.0
## [6,] 1 6 36 12.0
u <- as.list(1:p)
# initialization
temp.hh <- Householder(X[,1])
U <- temp.hh$U
u[[1]] <- temp.hh$u
for (i in 2:p) {
x <- (U %*% X)[i:n,i] # 새로운 x
temp.hh <- Householder(x)
u[[i]] <- temp.hh$u
temp.U <- temp.hh$U
temp.I <- diag(i-1)
temp.01 <- matrix(0, i-1, n - i +1)
temp.02 <- matrix(0, n - i + 1, i-1)
U <- rbind(cbind(temp.I, temp.01), cbind(temp.02, temp.U)) %*% U # U1 %*% U2를 새로운 U로 저장
}U## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.40824829 -0.4082483 -0.40824829 -0.4082483 -0.4082483 -0.408248290
## [2,] -0.59761430 -0.3585686 -0.11952286 0.1195229 0.3585686 0.597614305
## [3,] 0.54554473 -0.1091089 -0.43643578 -0.4364358 -0.1091089 0.545544726
## [4,] 0.02715063 0.1686667 -0.66001772 0.6965468 -0.2234604 -0.008886065
## [5,] -0.09551508 0.4267771 -0.43546097 -0.2991286 0.6751073 -0.271779698
## [6,] -0.41074462 0.6944567 0.05763483 -0.2321982 -0.4326116 0.323462922
round(U%*%X, 3)## [,1] [,2] [,3]
## [1,] -2.449 -8.573 -37.151
## [2,] 0.000 4.183 29.283
## [3,] 0.000 0.000 6.110
## [4,] 0.000 0.000 0.000
## [5,] 0.000 0.000 0.000
## [6,] 0.000 0.000 0.000
temp <- U %*% Xy
round(temp, 3)## y
## [1,] -2.449 -8.573 -37.151 -18.984
## [2,] 0.000 4.183 29.283 6.275
## [3,] 0.000 0.000 6.110 0.982
## [4,] 0.000 0.000 0.000 -0.009
## [5,] 0.000 0.000 0.000 0.184
## [6,] 0.000 0.000 0.000 0.044
R <- temp[1:p,1:p]
round(R, 3)##
## [1,] -2.449 -8.573 -37.151
## [2,] 0.000 4.183 29.283
## [3,] 0.000 0.000 6.110
z1 <- temp[1:p,p+1]
z2 <- temp[-(1:p),p+1]
obj <- list(U = U, R = R, z = c(z1, z2), z1 = z1, z2 = z2, u = u)hat.beta <- backsolve(R, z1)
SSE <- sum((obj$z2)^2)
SSR <- sum((obj$z1)^2)
SSR## [1] 400.7143
sum((R %*% hat.beta)^2)## [1] 400.7143
Comparision to lm()
lm.obj <- lm(y ~ -1 + X) # 이미 X에 constant term이 포함되어 있어서 빼줌
hat.beta.lm <- lm.obj$coefficients
lm.ss <- anova(lm.obj)
SSR.lm <- lm.ss$`Sum Sq`[1]
SSE.lm <- lm.ss$`Sum Sq`[2]
list(hat.beta = cbind(hat.beta, hat.beta.lm),
SSE = c(SSE, SSE.lm),
SSR = c(SSR, SSR.lm))## $hat.beta
## hat.beta hat.beta.lm
## X1 4.0000000 4.0000000
## X2 0.3750000 0.3750000
## X3 0.1607143 0.1607143
##
## $SSE
## [1] 0.03571429 0.03571429
##
## $SSR
## [1] 400.7143 400.7143
Comparison to qr()
qr.obj <- qr(X) # QR decompositon
qr.obj## $qr
## [,1] [,2] [,3]
## [1,] -2.4494897 -8.5732141 -37.1505944
## [2,] 0.4082483 4.1833001 29.2831009
## [3,] 0.4082483 -0.0537243 6.1101009
## [4,] 0.4082483 -0.2927700 0.6606006
## [5,] 0.4082483 -0.5318157 0.3871728
## [6,] 0.4082483 -0.7708615 -0.2135819
##
## $rank
## [1] 3
##
## $qraux
## [1] 1.408248 1.185321 1.606702
##
## $pivot
## [1] 1 2 3
##
## attr(,"class")
## [1] "qr"
qr.Q <- qr.Q(qr.obj) # Q for QR decomp
qr.Q## [,1] [,2] [,3]
## [1,] -0.4082483 -0.5976143 0.5455447
## [2,] -0.4082483 -0.3585686 -0.1091089
## [3,] -0.4082483 -0.1195229 -0.4364358
## [4,] -0.4082483 0.1195229 -0.4364358
## [5,] -0.4082483 0.3585686 -0.1091089
## [6,] -0.4082483 0.5976143 0.5455447
qr.Q.complete <- qr.Q(qr.obj, complete = T) # t(U)
qr.Q.complete## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.4082483 -0.5976143 0.5455447 0.027150626 -0.09551508 -0.41074462
## [2,] -0.4082483 -0.3585686 -0.1091089 0.168666677 0.42677705 0.69445665
## [3,] -0.4082483 -0.1195229 -0.4364358 -0.660017723 -0.43546097 0.05763483
## [4,] -0.4082483 0.1195229 -0.4364358 0.696546844 -0.29912858 -0.23219822
## [5,] -0.4082483 0.3585686 -0.1091089 -0.223460360 0.67510728 -0.43261156
## [6,] -0.4082483 0.5976143 0.5455447 -0.008886065 -0.27177970 0.32346292
qr.Qty <- qr.qty(qr.obj, y) # z
qr.y.hat <- qr.qy(qr.obj, c(qr.Qty[1:3], rep(0, 3))) # fitted values
qr.resid <- qr.qy(qr.obj, c(rep(0, 3), qr.Qty[4:6])) # residuals
SSR.qr <- sum(qr.y.hat^2)
SSE.qr <- sum(qr.resid^2)list(hat.beta = cbind(hat.beta, hat.beta.lm, qr.y.hat),
SSE = c(SSE, SSE.lm, SSE.qr),
SSR = c(SSR, SSR.lm, SSR.qr))## $hat.beta
## hat.beta hat.beta.lm qr.y.hat
## [1,] 4.0000000 4.0000000 4.535714
## [2,] 0.3750000 0.3750000 5.392857
## [3,] 0.1607143 0.1607143 6.571429
## [4,] 4.0000000 4.0000000 8.071429
## [5,] 0.3750000 0.3750000 9.892857
## [6,] 0.1607143 0.1607143 12.035714
##
## $SSE
## [1] 0.03571429 0.03571429 0.03571429
##
## $SSR
## [1] 400.7143 400.7143 400.7143
QR <- qr.Q(qr.obj) %*% qr.R(qr.obj)
QR## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 2 4
## [3,] 1 3 9
## [4,] 1 4 16
## [5,] 1 5 25
## [6,] 1 6 36
X## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 2 4
## [3,] 1 3 9
## [4,] 1 4 16
## [5,] 1 5 25
## [6,] 1 6 36