이 글은 고려대학교 통계학과 신승준 교수님의 2018년 1학기 통계 계산 방법론(대학원) 강의를 바탕으로 작성되었습니다.


Houserhodler Transformation

\[U=I-duu^T, \quad \quad where \;\; d=\frac{2}{u^Tu}\]


Householder transformation은

\(Ux = se_1\) where \(e_1^T=(1,0, \cdots,0)\) for some constant s

\(u=x+se_1\) where \(s^2=x^Tx\)가 되도록하는 벡터 \(u\)를 찾는 것이다.

Householder <- function(x)
{
  s <- sign(x[1]) * c(sqrt(t(x) %*% x))
  nn <- length(x)
  e1 <- rep(0, nn)
  e1[1] <- 1
  u <- x + s * e1
  d <- 2 / c(t(u) %*% u) # matrix -> scalar로 만들어 주기 위해 c()에 넣어줌
  U <- diag(nn) - outer(d*u, u)
  list(U = U, u = u/s)
}
x <- 1:4
Householder(x)
## $U
##            [,1]       [,2]       [,3]       [,4]
## [1,] -0.1825742 -0.3651484 -0.5477226 -0.7302967
## [2,] -0.3651484  0.8872516 -0.1691226 -0.2254968
## [3,] -0.5477226 -0.1691226  0.7463161 -0.3382452
## [4,] -0.7302967 -0.2254968 -0.3382452  0.5490064
## 
## $u
## [1] 1.1825742 0.3651484 0.5477226 0.7302967
x <- 0:3
Householder(x)
## $U
##      [,1]       [,2]       [,3]       [,4]
## [1,]    1  0.0000000  0.0000000  0.0000000
## [2,]    0  0.8571429 -0.2857143 -0.4285714
## [3,]    0 -0.2857143  0.4285714 -0.8571429
## [4,]    0 -0.4285714 -0.8571429 -0.2857143
## 
## $u
## [1] NaN Inf Inf Inf

Application to Regression


Linaer regression은 다음 식을 minimize해서 풀 수 있다.

\[||Y-X\beta||^2=||UY-UX\beta||^2=||Z-\begin{pmatrix} R \\ 0 \end{pmatrix}\beta||^2= ||\begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix}-\begin{pmatrix} R\beta \\ 0 \end{pmatrix}||^2\]

\(Z_1=R\beta\)일 때 식이 minimize 되므로 backsolve를 이용해서 \(\beta\)를 구해주면 된다.


HouseReg <- function(X, y) {
  p <- ncol(X)
  n <- nrow(X)
  Xy <- cbind(X, y)

  u <- as.list(1:p)

  # initialization
  temp.hh <- Householder(X[,1])
  U <- temp.hh$U
  u[[1]] <- temp.hh$u


  for (i in 2:p) {
    x <- (U %*% X)[i:n,i] # 새로운 x
    temp.hh <- Householder(x)
    u[[i]] <- temp.hh$u
    temp.U <- temp.hh$U
    temp.I <- diag(i-1)
    temp.01 <- matrix(0, i-1, n - i +1)
    temp.02 <- matrix(0, n - i + 1, i-1)
    U <- rbind(cbind(temp.I, temp.01), cbind(temp.02, temp.U)) %*% U # U1 %*% U2를 새로운 U로 저장
  }

  temp <- U %*% Xy
  R <- temp[1:p,1:p]
  z1 <- temp[1:p,p+1]
  z2 <- temp[-(1:p),p+1]

  obj <- list(U = U, R = R, z = c(z1, z2), z1 = z1, z2 = z2, u = u)
}
X <- cbind(rep(1, 6), 1:6, (1:6)^2)
X
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    2    4
## [3,]    1    3    9
## [4,]    1    4   16
## [5,]    1    5   25
## [6,]    1    6   36
y <- c(4.5, 5.5, 6.5, 8, 10, 12)
y
## [1]  4.5  5.5  6.5  8.0 10.0 12.0
p <- ncol(X)
n <- nrow(X)
Xy <- cbind(X, y)
Xy
##                y
## [1,] 1 1  1  4.5
## [2,] 1 2  4  5.5
## [3,] 1 3  9  6.5
## [4,] 1 4 16  8.0
## [5,] 1 5 25 10.0
## [6,] 1 6 36 12.0
u <- as.list(1:p)

# initialization
temp.hh <- Householder(X[,1])
U <- temp.hh$U
u[[1]] <- temp.hh$u


for (i in 2:p) {
  x <- (U %*% X)[i:n,i] # 새로운 x
  temp.hh <- Householder(x)
  u[[i]] <- temp.hh$u
  temp.U <- temp.hh$U
  temp.I <- diag(i-1)
  temp.01 <- matrix(0, i-1, n - i +1)
  temp.02 <- matrix(0, n - i + 1, i-1)
  U <- rbind(cbind(temp.I, temp.01), cbind(temp.02, temp.U)) %*% U # U1 %*% U2를 새로운 U로 저장
}
U
##             [,1]       [,2]        [,3]       [,4]       [,5]         [,6]
## [1,] -0.40824829 -0.4082483 -0.40824829 -0.4082483 -0.4082483 -0.408248290
## [2,] -0.59761430 -0.3585686 -0.11952286  0.1195229  0.3585686  0.597614305
## [3,]  0.54554473 -0.1091089 -0.43643578 -0.4364358 -0.1091089  0.545544726
## [4,]  0.02715063  0.1686667 -0.66001772  0.6965468 -0.2234604 -0.008886065
## [5,] -0.09551508  0.4267771 -0.43546097 -0.2991286  0.6751073 -0.271779698
## [6,] -0.41074462  0.6944567  0.05763483 -0.2321982 -0.4326116  0.323462922
round(U%*%X, 3)
##        [,1]   [,2]    [,3]
## [1,] -2.449 -8.573 -37.151
## [2,]  0.000  4.183  29.283
## [3,]  0.000  0.000   6.110
## [4,]  0.000  0.000   0.000
## [5,]  0.000  0.000   0.000
## [6,]  0.000  0.000   0.000
temp <- U %*% Xy
round(temp, 3)
##                                  y
## [1,] -2.449 -8.573 -37.151 -18.984
## [2,]  0.000  4.183  29.283   6.275
## [3,]  0.000  0.000   6.110   0.982
## [4,]  0.000  0.000   0.000  -0.009
## [5,]  0.000  0.000   0.000   0.184
## [6,]  0.000  0.000   0.000   0.044
R <- temp[1:p,1:p]
round(R, 3)
##                           
## [1,] -2.449 -8.573 -37.151
## [2,]  0.000  4.183  29.283
## [3,]  0.000  0.000   6.110
z1 <- temp[1:p,p+1]
z2 <- temp[-(1:p),p+1]
obj <- list(U = U, R = R, z = c(z1, z2), z1 = z1, z2 = z2, u = u)
hat.beta <- backsolve(R, z1)
SSE <- sum((obj$z2)^2)
SSR <- sum((obj$z1)^2)

SSR
## [1] 400.7143
sum((R %*% hat.beta)^2)
## [1] 400.7143

Comparision to lm()

lm.obj <- lm(y ~ -1 + X) # 이미 X에 constant term이 포함되어 있어서 빼줌
hat.beta.lm <- lm.obj$coefficients
lm.ss <- anova(lm.obj)
SSR.lm <- lm.ss$`Sum Sq`[1]
SSE.lm <- lm.ss$`Sum Sq`[2]
list(hat.beta = cbind(hat.beta, hat.beta.lm),
     SSE = c(SSE, SSE.lm),
     SSR = c(SSR, SSR.lm))
## $hat.beta
##     hat.beta hat.beta.lm
## X1 4.0000000   4.0000000
## X2 0.3750000   0.3750000
## X3 0.1607143   0.1607143
## 
## $SSE
## [1] 0.03571429 0.03571429
## 
## $SSR
## [1] 400.7143 400.7143

Comparison to qr()

qr.obj <- qr(X) # QR decompositon
qr.obj
## $qr
##            [,1]       [,2]        [,3]
## [1,] -2.4494897 -8.5732141 -37.1505944
## [2,]  0.4082483  4.1833001  29.2831009
## [3,]  0.4082483 -0.0537243   6.1101009
## [4,]  0.4082483 -0.2927700   0.6606006
## [5,]  0.4082483 -0.5318157   0.3871728
## [6,]  0.4082483 -0.7708615  -0.2135819
## 
## $rank
## [1] 3
## 
## $qraux
## [1] 1.408248 1.185321 1.606702
## 
## $pivot
## [1] 1 2 3
## 
## attr(,"class")
## [1] "qr"
qr.Q <- qr.Q(qr.obj) # Q for QR decomp
qr.Q
##            [,1]       [,2]       [,3]
## [1,] -0.4082483 -0.5976143  0.5455447
## [2,] -0.4082483 -0.3585686 -0.1091089
## [3,] -0.4082483 -0.1195229 -0.4364358
## [4,] -0.4082483  0.1195229 -0.4364358
## [5,] -0.4082483  0.3585686 -0.1091089
## [6,] -0.4082483  0.5976143  0.5455447
qr.Q.complete <- qr.Q(qr.obj, complete = T) # t(U)
qr.Q.complete
##            [,1]       [,2]       [,3]         [,4]        [,5]        [,6]
## [1,] -0.4082483 -0.5976143  0.5455447  0.027150626 -0.09551508 -0.41074462
## [2,] -0.4082483 -0.3585686 -0.1091089  0.168666677  0.42677705  0.69445665
## [3,] -0.4082483 -0.1195229 -0.4364358 -0.660017723 -0.43546097  0.05763483
## [4,] -0.4082483  0.1195229 -0.4364358  0.696546844 -0.29912858 -0.23219822
## [5,] -0.4082483  0.3585686 -0.1091089 -0.223460360  0.67510728 -0.43261156
## [6,] -0.4082483  0.5976143  0.5455447 -0.008886065 -0.27177970  0.32346292
qr.Qty <- qr.qty(qr.obj, y) # z
qr.y.hat <- qr.qy(qr.obj, c(qr.Qty[1:3], rep(0, 3))) # fitted values
qr.resid <- qr.qy(qr.obj, c(rep(0, 3), qr.Qty[4:6])) # residuals

SSR.qr <- sum(qr.y.hat^2)
SSE.qr <- sum(qr.resid^2)
list(hat.beta = cbind(hat.beta, hat.beta.lm, qr.y.hat),
     SSE = c(SSE, SSE.lm, SSE.qr),
     SSR = c(SSR, SSR.lm, SSR.qr))
## $hat.beta
##       hat.beta hat.beta.lm  qr.y.hat
## [1,] 4.0000000   4.0000000  4.535714
## [2,] 0.3750000   0.3750000  5.392857
## [3,] 0.1607143   0.1607143  6.571429
## [4,] 4.0000000   4.0000000  8.071429
## [5,] 0.3750000   0.3750000  9.892857
## [6,] 0.1607143   0.1607143 12.035714
## 
## $SSE
## [1] 0.03571429 0.03571429 0.03571429
## 
## $SSR
## [1] 400.7143 400.7143 400.7143
QR <- qr.Q(qr.obj) %*% qr.R(qr.obj)
QR
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    2    4
## [3,]    1    3    9
## [4,]    1    4   16
## [5,]    1    5   25
## [6,]    1    6   36
X
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    2    4
## [3,]    1    3    9
## [4,]    1    4   16
## [5,]    1    5   25
## [6,]    1    6   36