Box Cox Transformation

In order to perform linear regression, there are certain assumptions that need to be considered before building a model. Assumptions regarding residuals, such as normal distribution, constant variance i.e homoscadaticity need to be checked while building a model. However, there will be times when the above assumptions are violated by data. In such instances data transformation can be employed to transform residuals from having a non-normal distribution to a normal distribution. In this post we will look at one such technique called “Box Cox” transformation.

Linear transformation can be applied on both predicitors and response variable. Box cox transformation is applied on the response variable.

library(readr) # importing csv files
library(MASS) # Boxcox function
library(car) # qqPlot function
library(moments) # skeweness and kurtosis functions

TGT <- read_csv("C:/Users/welcome/Downloads/TGT.csv")

attach(TGT)

Data preparation

As the class of TAT is integer, it is converted to numeric
Grades “ASE1”, “ASE2”, “ASE3”,“ASE4”, “ASE5” are changed to 1, 2, 3, 4 & 5 respectively, and class of Grade is converted from character to numeric.

TGT[,"TAT"] <- as.numeric(TAT) # convert to num


TGT[which(GRADE == 'ASE1'), "GRADE"] <- 1

TGT[which(GRADE == 'ASE2'), "GRADE"] <- 2

TGT[which(GRADE == 'ASE3'), "GRADE"] <- 3

TGT[which(GRADE == 'ASE4'), "GRADE"] <- 4

TGT[which(GRADE == 'ASE5'), "GRADE"] <- 5

TGT[,"GRADE"] <- as.numeric(TGT$GRADE)

Model fitting ( Before transformation)

TAT is regressed with GRADE with out any data transformation.

f <- lm(TAT ~ as.factor(GRADE))

plot(f$fitted.values,  rstandard(f)) # Examine raw residuals  vs fitted values

From the above plot, it appears that there is a trend in the residual distribution, which takes the shape of an outward funnel.

hist(f$residuals) # examine histogram of raw residuals

Histogram indicate distribution of raw residuals is postively skewed.

qqPlot(rstandard(f)) # examine quantile-quantile plot for raw residuals

skewness(f$residuals)

## [1] 1.076633

kurtosis(f$residuals)

## [1] 5.420123

Estimation of likelihood function

Lets estimate the maximum likelihood function for linear transformation.

b <- boxcox(TAT ~ GRADE)

lambda <- b$x # lambda values

lik <- b$y # log likelihood values for SSE

bc <- cbind(lambda, lik) # combine lambda and lik

sorted_bc <- bc[order(-lik),] # values are sorted to identify the lambda value for the maximum log likelihood for obtaining minimum SSE

head(sorted_bc, n = 10)

##          lambda       lik
##  [1,] 0.3838384 -10576.90
##  [2,] 0.3434343 -10576.99
##  [3,] 0.4242424 -10579.12
##  [4,] 0.3030303 -10579.44
##  [5,] 0.4646465 -10583.62
##  [6,] 0.2626263 -10584.28
##  [7,] 0.5050505 -10590.36
##  [8,] 0.2222222 -10591.55
##  [9,] 0.5454545 -10599.31
## [10,] 0.1818182 -10601.31

The lambda value for for the maximum log likeihood for obtaining minimum SSE is 0.38( also can be seen in the box cox plot above). We will consider 0.5 for easy transformation of values back to their original scale.

Model fitting (After transformation)

After identifying the lambda value, which is 0.5, lets fit the model with the transformed data.

f1 <- lm(TAT^(1/2) ~ as.factor(GRADE))

plot(f1$fitted.values,  rstandard(f1)) # Examine raw residuals vs fitted values

The distribution of residuals around the fitted line has improved from the previous model. Nearly close to constant variance.

hist(f1$residuals) # examine histogram of raw residuals

Histogram indicate distribution of raw residuals is very close to normal distribution.

qqPlot(rstandard(f1)) # examine quantile-quantile plot for raw residuals

skewness(f1$residuals)

## [1] 0.2029516

kurtosis(f1$residuals)

## [1] 3.475396

The adjusted R square value has improved from model fitted with untransformed data to model fitted with transformed data.

adj_rsq1 <- summary(f)$adj.r.squared

adj_rsq2 <- summary(f1)$adj.r.squared

cat("Adjusted R-Square (before transformation):", adj_rsq1, "Adjusted R-Square (after transformation):", adj_rsq2, sep = "\n")

## Adjusted R-Square (before transformation):
## 0.04481049
## Adjusted R-Square (after transformation):
## 0.04714018

Transforming values back to their original scale

The estimated lambda value for linear transformation of TAT is 0.38. However, with 0.38, it would be difficult to revert the transformed values to their original scale. Therefore, we take lambda as 0.5 which is nothing but taking square root of the values.

After transforming TAT by taking square root, the fitted values have a different scale from the fitted values without transformation. In order to back transform the fitted values to their original scale we simply take the square of the values.

Original regression equation without transformation:

\[\ y^{'} = B_{0}^{'} + B_{1}^{'}x_{1}\] Regression equation with transformation: \[\sqrt{y^{'}} = B_{0}^{'} + B_{1}^{'}x_{1}\]

Regression equation with back transformation: \[\ y^{'2} = B_{0}^{'} + B_{1}^{'}x_{1}\]

original <- f$fitted.values

after_transformation <- f1$fitted.values

back_transformation <- f1$fitted.values^2

fittedvalues <- data.frame(original, after_transformation, back_transformation)

head(fittedvalues, n = 10)

##    original after_transformation back_transformation
## 1  48.61722             6.778532             45.9485
## 2  48.61722             6.778532             45.9485
## 3  48.61722             6.778532             45.9485
## 4  48.61722             6.778532             45.9485
## 5  48.61722             6.778532             45.9485
## 6  48.61722             6.778532             45.9485
## 7  48.61722             6.778532             45.9485
## 8  48.61722             6.778532             45.9485
## 9  48.61722             6.778532             45.9485
## 10 48.61722             6.778532             45.9485