In order to perform linear regression, there are certain assumptions that need to be considered before building a model. Assumptions regarding residuals, such as normal distribution, constant variance i.e homoscadaticity need to be checked while building a model. However, there will be times when the above assumptions are violated by data. In such instances data transformation can be employed to transform residuals from having a non-normal distribution to a normal distribution. In this post we will look at one such technique called “Box Cox” transformation.
Linear transformation can be applied on both predicitors and response variable. Box cox transformation is applied on the response variable.
library(readr) # importing csv files
library(MASS) # Boxcox function
library(car) # qqPlot function
library(moments) # skeweness and kurtosis functions
TGT <- read_csv("C:/Users/welcome/Downloads/TGT.csv")
attach(TGT)
TGT[,"TAT"] <- as.numeric(TAT) # convert to num
TGT[which(GRADE == 'ASE1'), "GRADE"] <- 1
TGT[which(GRADE == 'ASE2'), "GRADE"] <- 2
TGT[which(GRADE == 'ASE3'), "GRADE"] <- 3
TGT[which(GRADE == 'ASE4'), "GRADE"] <- 4
TGT[which(GRADE == 'ASE5'), "GRADE"] <- 5
TGT[,"GRADE"] <- as.numeric(TGT$GRADE)
TAT is regressed with GRADE with out any data transformation.
f <- lm(TAT ~ as.factor(GRADE))
plot(f$fitted.values, rstandard(f)) # Examine raw residuals vs fitted values
From the above plot, it appears that there is a trend in the residual distribution, which takes the shape of an outward funnel.
hist(f$residuals) # examine histogram of raw residuals
Histogram indicate distribution of raw residuals is postively skewed.
qqPlot(rstandard(f)) # examine quantile-quantile plot for raw residuals
skewness(f$residuals)
## [1] 1.076633
kurtosis(f$residuals)
## [1] 5.420123
Lets estimate the maximum likelihood function for linear transformation.
b <- boxcox(TAT ~ GRADE)
lambda <- b$x # lambda values
lik <- b$y # log likelihood values for SSE
bc <- cbind(lambda, lik) # combine lambda and lik
sorted_bc <- bc[order(-lik),] # values are sorted to identify the lambda value for the maximum log likelihood for obtaining minimum SSE
head(sorted_bc, n = 10)
## lambda lik
## [1,] 0.3838384 -10576.90
## [2,] 0.3434343 -10576.99
## [3,] 0.4242424 -10579.12
## [4,] 0.3030303 -10579.44
## [5,] 0.4646465 -10583.62
## [6,] 0.2626263 -10584.28
## [7,] 0.5050505 -10590.36
## [8,] 0.2222222 -10591.55
## [9,] 0.5454545 -10599.31
## [10,] 0.1818182 -10601.31
The lambda value for for the maximum log likeihood for obtaining minimum SSE is 0.38( also can be seen in the box cox plot above). We will consider 0.5 for easy transformation of values back to their original scale.
After identifying the lambda value, which is 0.5, lets fit the model with the transformed data.
f1 <- lm(TAT^(1/2) ~ as.factor(GRADE))
plot(f1$fitted.values, rstandard(f1)) # Examine raw residuals vs fitted values
The distribution of residuals around the fitted line has improved from the previous model. Nearly close to constant variance.
hist(f1$residuals) # examine histogram of raw residuals
Histogram indicate distribution of raw residuals is very close to normal distribution.
qqPlot(rstandard(f1)) # examine quantile-quantile plot for raw residuals
skewness(f1$residuals)
## [1] 0.2029516
kurtosis(f1$residuals)
## [1] 3.475396
The adjusted R square value has improved from model fitted with untransformed data to model fitted with transformed data.
adj_rsq1 <- summary(f)$adj.r.squared
adj_rsq2 <- summary(f1)$adj.r.squared
cat("Adjusted R-Square (before transformation):", adj_rsq1, "Adjusted R-Square (after transformation):", adj_rsq2, sep = "\n")
## Adjusted R-Square (before transformation):
## 0.04481049
## Adjusted R-Square (after transformation):
## 0.04714018
The estimated lambda value for linear transformation of TAT is 0.38. However, with 0.38, it would be difficult to revert the transformed values to their original scale. Therefore, we take lambda as 0.5 which is nothing but taking square root of the values.
After transforming TAT by taking square root, the fitted values have a different scale from the fitted values without transformation. In order to back transform the fitted values to their original scale we simply take the square of the values.
Original regression equation without transformation:
\[\ y^{'} = B_{0}^{'} + B_{1}^{'}x_{1}\] Regression equation with transformation: \[\sqrt{y^{'}} = B_{0}^{'} + B_{1}^{'}x_{1}\]
Regression equation with back transformation: \[\ y^{'2} = B_{0}^{'} + B_{1}^{'}x_{1}\]
original <- f$fitted.values
after_transformation <- f1$fitted.values
back_transformation <- f1$fitted.values^2
fittedvalues <- data.frame(original, after_transformation, back_transformation)
head(fittedvalues, n = 10)
## original after_transformation back_transformation
## 1 48.61722 6.778532 45.9485
## 2 48.61722 6.778532 45.9485
## 3 48.61722 6.778532 45.9485
## 4 48.61722 6.778532 45.9485
## 5 48.61722 6.778532 45.9485
## 6 48.61722 6.778532 45.9485
## 7 48.61722 6.778532 45.9485
## 8 48.61722 6.778532 45.9485
## 9 48.61722 6.778532 45.9485
## 10 48.61722 6.778532 45.9485