Assumptions
Linearity
##
## Call:
## lm(formula = Chance.of.Admit ~ University.Rating + CGPA + GRE.Score +
## LOR + Research + TOEFL.Score, data = admission_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.265067 -0.023077 0.009819 0.034848 0.155793
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2581051 0.1210271 -10.395 < 0.0000000000000002 ***
## University.Rating 0.0080847 0.0040786 1.982 0.048149 *
## CGPA 0.1123190 0.0110055 10.206 < 0.0000000000000002 ***
## GRE.Score 0.0018775 0.0005874 3.196 0.001505 **
## LOR 0.0157465 0.0045408 3.468 0.000583 ***
## Research 0.0275544 0.0075070 3.670 0.000276 ***
## TOEFL.Score 0.0030492 0.0010032 3.040 0.002527 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06155 on 393 degrees of freedom
## Multiple R-squared: 0.8117, Adjusted R-squared: 0.8089
## F-statistic: 282.4 on 6 and 393 DF, p-value: < 0.00000000000000022
##
## Call:
## lm(formula = Chance.of.Admit ~ ., data = admission_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.264996 -0.023110 0.009709 0.034993 0.155816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2584116 0.1213883 -10.367 < 0.0000000000000002 ***
## GRE.Score 0.0018755 0.0005899 3.179 0.001594 **
## TOEFL.Score 0.0030544 0.0010115 3.020 0.002696 **
## University.Rating 0.0081625 0.0044621 1.829 0.068114 .
## SOP -0.0002268 0.0052454 -0.043 0.965540
## LOR 0.0158052 0.0047446 3.331 0.000947 ***
## CGPA 0.1124005 0.0111793 10.054 < 0.0000000000000002 ***
## Research 0.0275598 0.0075177 3.666 0.000280 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06163 on 392 degrees of freedom
## Multiple R-squared: 0.8117, Adjusted R-squared: 0.8084
## F-statistic: 241.4 on 7 and 392 DF, p-value: < 0.00000000000000022
Model with all variables (model.admission.all) has two insignificant variables, while model with stepwise method (model.admission) has all variables proven significant/linear.
Error’s Normality
Let’s visualize the residuals


Both model visually have the same histogram chart, and also it’s kinda subjective to define a data normality based on chart, so let’s try and compare the Shapiro-Wilk value with Shapiro Test.
H0: error/residuals normally distributed (if p-value > 0.05)
H1: error/residuals are not normally distributed (if p-value < 0.05)
##
## Shapiro-Wilk normality test
##
## data: model.admission$residuals
## W = 0.92251, p-value = 0.0000000000001648
##
## Shapiro-Wilk normality test
##
## data: model.admission.all$residuals
## W = 0.92263, p-value = 0.0000000000001693
Both model has p-value lower than 0.05, so we can reject the H0. The residuals from both model are not normally distributed.
Heteroscedasticity
Determines whether the residuals have a pattern. Let’s visualize the residuals distribution


Based on scatter plot above, there are no visible pattern in the residuals. However, it is better to check from the statistical point of view using Breusch-Pagan test
H0: error variance spreads constantly/has no pattern (Homoscedasticity) (if p-value > 0.05)
H1: errors generate a pattern (Heteroscedasticity) (if p-value < 0.05)
##
## studentized Breusch-Pagan test
##
## data: model.admission
## BP = 20.731, df = 6, p-value = 0.00205
##
## studentized Breusch-Pagan test
##
## data: model.admission.all
## BP = 23.236, df = 7, p-value = 0.001551
Again, both model has a p-value below 0.05, so we can reject the H0. The model’s residuals is heteroscedastic
Multicolinearity
Defines whether the variables related to each other or not. We use the Variance Inflation Factor (VIF) value, which if any of its value is above 10, then the model is multicolinear.
## University.Rating CGPA GRE.Score LOR
## 2.303959 4.678755 4.516008 1.843592
## Research TOEFL.Score
## 1.456273 3.926298
## GRE.Score TOEFL.Score University.Rating SOP
## 4.543281 3.981715 2.750559 2.799827
## LOR CGPA Research
## 2.007702 4.815469 1.456683
None of the variables in the model are strongly correlated to each other. Thus, there are no multicolinearity