Transformations on the Regressors

1st iteration

  • step 1 : 중회귀모형 적합 -> \(\beta_1, \beta_2\) 추정
windmill <- read.csv("windmill.csv")
windmill <- windmill[, c(1:3)]
names(windmill) <- c("i", "xi", "yi")
lm(yi~xi, data=windmill)
## 
## Call:
## lm(formula = yi ~ xi, data = windmill)
## 
## Coefficients:
## (Intercept)           xi  
##      0.1309       0.2411
# > sum(windmill$xi)
# [1] 153.3
# > sum(windmill$yi)
# [1] 40.24
# > sum(windmill$xi^2)
# [1] 1093.59
# > sum(windmill$yi^2)
# [1] 74.98149
# > sum(windmill$yi*windmill$xi)
# [1] 283.7812
# > length(windmill$xi)

=> \(\beta_1=0.1309 \ \ \beta_2=0.2411\)

  • step 2 : \(\hat\gamma\) 추정

    • w = x ln x
    • \(E(y) = \beta_0^*+\beta_1^*x+(\alpha-1)\beta_1w=\beta_0^*+\beta_1^*x+\gamma w\)
    • by least square giving \(\hat{y}=\hat{\beta_0^*}+\hat{\beta_1^*}+\hat{\gamma}w\)
x <- windmill$xi
windmill$w <- x*log(x=x) # xlnx

lm(yi~xi+w, data=windmill)
## 
## Call:
## lm(formula = yi ~ xi + w, data = windmill)
## 
## Coefficients:
## (Intercept)           xi            w  
##     -2.4168       1.5344      -0.4626

==> \(\hat\gamma=-0.4626\)

  • step 3 : \(\alpha\) 구함

    • \(\alpha_1 = {\hat{\gamma} \over \hat{\beta_1}}+1 = {-0.4626 \over 0.2411}+1 = -0.92\)

2nd iteration

  • step 1 : define a new regressor variable \(x'=x^{-0.92}\) and fit the model
windmill$xii <- windmill$xi^-0.92
lm(yi~xii, data=windmill)
## 
## Call:
## lm(formula = yi ~ xii, data = windmill)
## 
## Coefficients:
## (Intercept)          xii  
##       3.101       -6.683

\(\hat{y}=\hat{\beta_0}+{\hat{\beta_1}x'=3.1039-6.6784x'}\)

  • w’=x’ ln x’
  • -> repeat….

car::boxTidwell() 활용

3번 반복으로 \(\alpha=-0,83\) 얻음

library(car)
## Loading required package: carData
boxTidwell(yi ~ xi, data = windmill)
##  MLE of lambda Score Statistic (z)  Pr(>|z|)    
##       -0.83334             -9.1324 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## iterations =  3
windmill$xiii <- windmill$xi^-0.83

선형성 비교

par(mfrow=c(2,2))
plot(windmill$xi, windmill$yi)
plot(windmill$xii, windmill$yi)
plot(windmill$xiii, windmill$yi)

잔차분석

fit1 <- lm(yi~xi, data=windmill)
par(mfrow=c(2,2))
plot(fit1)

fit2 <- lm(yi~xii, data=windmill)
par(mfrow=c(2,2))
plot(fit2)