2.12 Example

<목표>

1) gala데이터를 이용해 (X^{T} X)^{-1}와 Beta_hat(=(X^{T} X)^{-1} X^{T}Y) 구하기

2) lm함수 vs 직접 행렬연산 비교

Load data

library(faraway)
data(gala)
head(gala)
##              Species Endemics  Area Elevation Nearest Scruz Adjacent
## Baltra            58       23 25.09       346     0.6   0.6     1.84
## Bartolome         31       21  1.24       109     0.6  26.3   572.33
## Caldwell           3        3  0.21       114     2.8  58.7     0.78
## Champion          25        9  0.10        46     1.9  47.4     0.18
## Coamano            2        1  0.05        77     1.9   1.9   903.82
## Daphne.Major      18       11  0.34       119     8.0   8.0     1.84
dim(gala)
## [1] 30  7

데이터 정보

행(row): 섬 이름
<변수(column)>
- Species: 해당 섬에서 발견된 거북이 종의 갯수
- Endemics: 해당 섬에서“만” 유일하게 발견된 거북이 종의 갯수
- Area: 해당 섬의 크기(km^{2})
- Elevation: 해당 섬의 가장 높은 고도(높이)(m)
- Nearest: 다른 섬과의 거리(km)
- Scruz: Santa Cruz 섬과의 거리(km)
- Adjacent: 인접한 섬의 크기(km^{2})

lm 함수

설명변수: 5개(Area, Elevation, Nearest, Scrua, Adjacent )
종속변수: 1개(Species)
상수항 포함
gfit <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent,data=gala)
summary(gfit)
## 
## Call:
## lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent, 
##     data = gala)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -111.679  -34.898   -7.862   33.460  182.584 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.068221  19.154198   0.369 0.715351    
## Area        -0.023938   0.022422  -1.068 0.296318    
## Elevation    0.319465   0.053663   5.953 3.82e-06 ***
## Nearest      0.009144   1.054136   0.009 0.993151    
## Scruz       -0.240524   0.215402  -1.117 0.275208    
## Adjacent    -0.074805   0.017700  -4.226 0.000297 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60.98 on 24 degrees of freedom
## Multiple R-squared:  0.7658, Adjusted R-squared:  0.7171 
## F-statistic:  15.7 on 5 and 24 DF,  p-value: 6.838e-07
gs <- summary(gfit)
names(gs)
##  [1] "call"          "terms"         "residuals"     "coefficients" 
##  [5] "aliased"       "sigma"         "df"            "r.squared"    
##  [9] "adj.r.squared" "fstatistic"    "cov.unscaled"

(X^{T} X)^{-1}

gs$cov.unscaled
##               (Intercept)          Area     Elevation       Nearest
## (Intercept)  9.867829e-02  3.778242e-05 -1.561976e-04 -2.339027e-04
## Area         3.778242e-05  1.352247e-07 -2.593617e-07  1.294003e-06
## Elevation   -1.561976e-04 -2.593617e-07  7.745339e-07 -3.549366e-06
## Nearest     -2.339027e-04  1.294003e-06 -3.549366e-06  2.988732e-04
## Scruz       -3.760293e-04 -4.913149e-08  3.080831e-07 -3.821077e-05
## Adjacent     2.309832e-05  4.620303e-08 -1.640241e-07  1.424729e-06
##                     Scruz      Adjacent
## (Intercept) -3.760293e-04  2.309832e-05
## Area        -4.913149e-08  4.620303e-08
## Elevation    3.080831e-07 -1.640241e-07
## Nearest     -3.821077e-05  1.424729e-06
## Scruz        1.247941e-05 -1.958356e-07
## Adjacent    -1.958356e-07  8.426543e-08

Beta_hat = (X^{T} X){-1}X^{T}Y

gs$coefficients[,1]
##  (Intercept)         Area    Elevation      Nearest        Scruz 
##  7.068220709 -0.023938338  0.319464761  0.009143961 -0.240524230 
##     Adjacent 
## -0.074804832

lm함수 안쓰고 직접구해보자!

- 상수항 만들어주기위해 1(컬럼)벡터 추가
- 추정에 쓰일 설명변수 5개(Area, Elevation, Nearest, Scrua, Adjacent)만 남기기
- 행렬 연산을 위해 as.matrix()로 class 변환
x <- cbind(1,gala[,-c(1,2)])
x_mat <- as.matrix(x)
y_vec <- gala$Species
head(x_mat)
##              1  Area Elevation Nearest Scruz Adjacent
## Baltra       1 25.09       346     0.6   0.6     1.84
## Bartolome    1  1.24       109     0.6  26.3   572.33
## Caldwell     1  0.21       114     2.8  58.7     0.78
## Champion     1  0.10        46     1.9  47.4     0.18
## Coamano      1  0.05        77     1.9   1.9   903.82
## Daphne.Major 1  0.34       119     8.0   8.0     1.84
dim(x_mat)
## [1] 30  6
X^{T}X 계산
- t() Transpose 연산 함수
xtx_mat <- t(x_mat)%*%x_mat
head(xtx_mat)
##                  1        Area  Elevation   Nearest     Scruz    Adjacent
## 1            30.00     7851.26    11041.0    301.80   1709.30     7832.95
## Area       7851.26 23708665.46 10852798.5  39240.84 275516.84  5950313.65
## Elevation 11041.00 10852798.53  9218227.0 109139.20 616237.80  8553187.95
## Nearest     301.80    39240.84   109139.2   8945.30  34527.34    37196.67
## Scruz      1709.30   275516.84   616237.8  34527.34 231613.77   534409.98
## Adjacent   7832.95  5950313.65  8553187.9  37196.67 534409.98 23719568.46
dim(xtx_mat)
## [1] 6 6
(X^{T}X){-1} 계산
- solve() Inverse 연산 함수
xtxi_mat <- solve(xtx_mat)
xtxi_mat
##                       1          Area     Elevation       Nearest
## 1          9.867829e-02  3.778242e-05 -1.561976e-04 -2.339027e-04
## Area       3.778242e-05  1.352247e-07 -2.593617e-07  1.294003e-06
## Elevation -1.561976e-04 -2.593617e-07  7.745339e-07 -3.549366e-06
## Nearest   -2.339027e-04  1.294003e-06 -3.549366e-06  2.988732e-04
## Scruz     -3.760293e-04 -4.913149e-08  3.080831e-07 -3.821077e-05
## Adjacent   2.309832e-05  4.620303e-08 -1.640241e-07  1.424729e-06
##                   Scruz      Adjacent
## 1         -3.760293e-04  2.309832e-05
## Area      -4.913149e-08  4.620303e-08
## Elevation  3.080831e-07 -1.640241e-07
## Nearest   -3.821077e-05  1.424729e-06
## Scruz      1.247941e-05 -1.958356e-07
## Adjacent  -1.958356e-07  8.426543e-08
dim(xtxi_mat)
## [1] 6 6
Beta_hat 계산
Beta_hat = (X^{T} X){-1}X^{T}Y
beta_hat <- xtxi_mat%*%t(x_mat)%*%y_vec
head(beta_hat)
##                   [,1]
## 1          7.068220709
## Area      -0.023938338
## Elevation  0.319464761
## Nearest    0.009143961
## Scruz     -0.240524230
## Adjacent  -0.074804832
dim(beta_hat)
## [1] 6 1

계산과정에서의 차원 변화

cat(sprintf("xtxi_dim: %iX%i \n
t(x_mat)_dim: %iX%i \n
y_vec_dim: %iX%i \n
beta_hat_dim: %iX%i \n", 
dim(xtxi_mat)[1],dim(xtxi_mat)[2],
dim(t(x_mat))[1],dim(t(x_mat))[2],
dim(as.matrix(y_vec))[1],dim(as.matrix(y_vec))[2],
dim(beta_hat)[1],dim(beta_hat)[2]))
## xtxi_dim: 6X6 
## 
## t(x_mat)_dim: 6X30 
## 
## y_vec_dim: 30X1 
## 
## beta_hat_dim: 6X1

lm함수와 행렬연산을 통해 구한값 비교

(X^{T} X)^{-1} 값 비교

gs$cov.unscaled
##               (Intercept)          Area     Elevation       Nearest
## (Intercept)  9.867829e-02  3.778242e-05 -1.561976e-04 -2.339027e-04
## Area         3.778242e-05  1.352247e-07 -2.593617e-07  1.294003e-06
## Elevation   -1.561976e-04 -2.593617e-07  7.745339e-07 -3.549366e-06
## Nearest     -2.339027e-04  1.294003e-06 -3.549366e-06  2.988732e-04
## Scruz       -3.760293e-04 -4.913149e-08  3.080831e-07 -3.821077e-05
## Adjacent     2.309832e-05  4.620303e-08 -1.640241e-07  1.424729e-06
##                     Scruz      Adjacent
## (Intercept) -3.760293e-04  2.309832e-05
## Area        -4.913149e-08  4.620303e-08
## Elevation    3.080831e-07 -1.640241e-07
## Nearest     -3.821077e-05  1.424729e-06
## Scruz        1.247941e-05 -1.958356e-07
## Adjacent    -1.958356e-07  8.426543e-08
xtxi_mat
##                       1          Area     Elevation       Nearest
## 1          9.867829e-02  3.778242e-05 -1.561976e-04 -2.339027e-04
## Area       3.778242e-05  1.352247e-07 -2.593617e-07  1.294003e-06
## Elevation -1.561976e-04 -2.593617e-07  7.745339e-07 -3.549366e-06
## Nearest   -2.339027e-04  1.294003e-06 -3.549366e-06  2.988732e-04
## Scruz     -3.760293e-04 -4.913149e-08  3.080831e-07 -3.821077e-05
## Adjacent   2.309832e-05  4.620303e-08 -1.640241e-07  1.424729e-06
##                   Scruz      Adjacent
## 1         -3.760293e-04  2.309832e-05
## Area      -4.913149e-08  4.620303e-08
## Elevation  3.080831e-07 -1.640241e-07
## Nearest   -3.821077e-05  1.424729e-06
## Scruz      1.247941e-05 -1.958356e-07
## Adjacent  -1.958356e-07  8.426543e-08
gs$cov.unscaled - xtxi_mat
##               (Intercept)          Area     Elevation       Nearest
## (Intercept) -1.387779e-16 -1.355253e-19  1.355253e-19  2.764716e-18
## Area        -8.131516e-20 -1.323489e-22  4.235165e-22 -1.694066e-21
## Elevation    2.710505e-19  4.764560e-22 -1.164670e-21  3.811648e-21
## Nearest      2.656295e-18 -2.964615e-21  1.524659e-20 -2.710505e-19
## Scruz        2.710505e-19  7.477713e-22 -2.011703e-21  6.776264e-21
## Adjacent    -3.049319e-20 -9.264423e-23  2.382280e-22 -8.470329e-22
##                     Scruz      Adjacent
## (Intercept)  3.794708e-19 -1.761829e-19
## Area         4.235165e-22 -1.389663e-22
## Elevation   -1.270549e-21  4.499863e-22
## Nearest      2.710505e-20 -2.752857e-21
## Scruz       -5.082198e-21  1.032321e-21
## Adjacent     2.646978e-22 -6.617445e-23
round(gs$cov.unscaled - xtxi_mat)
##             (Intercept) Area Elevation Nearest Scruz Adjacent
## (Intercept)           0    0         0       0     0        0
## Area                  0    0         0       0     0        0
## Elevation             0    0         0       0     0        0
## Nearest               0    0         0       0     0        0
## Scruz                 0    0         0       0     0        0
## Adjacent              0    0         0       0     0        0

Beta_hat 값 비교

gs$coefficients[,1]
##  (Intercept)         Area    Elevation      Nearest        Scruz 
##  7.068220709 -0.023938338  0.319464761  0.009143961 -0.240524230 
##     Adjacent 
## -0.074804832
beta_hat
##                   [,1]
## 1          7.068220709
## Area      -0.023938338
## Elevation  0.319464761
## Nearest    0.009143961
## Scruz     -0.240524230
## Adjacent  -0.074804832
gs$coefficients[,1] - beta_hat
##                    [,1]
## 1         -4.236611e-13
## Area       1.144917e-16
## Elevation  2.775558e-16
## Nearest    2.046106e-14
## Scruz     -7.494005e-16
## Adjacent  -1.387779e-17
round(gs$coefficients[,1] - beta_hat)
##           [,1]
## 1            0
## Area         0
## Elevation    0
## Nearest      0
## Scruz        0
## Adjacent     0