In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’, or ‘features’).
Primarily, following types of regression models are used—
Linear regression
Logistic regression
Polynomial regression
Stepwise regression
Ridge regression
Lasso regression
ElasticNet regression
Linear regression is used for predictive analysis. Linear regression is a linear approach for modeling the relationship between the criterion or the scalar response and the multiple predictors or explanatory variables. Linear regression focuses on the conditional probability distribution of the response given the values of the predictors. For linear regression, there is a danger of overfitting.
The formula for linear regression is:
\[Y’ = bX + A\]
Logistic regression is used when the dependent variable is dichotomous. Logistic regression estimates the parameters of a logistic model and is form of binomial regression. Logistic regression is used to deal with data that has two possible criterions and the relationship between the criterions and the predictors. The equation for logistic regression is:
\[l = \beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}\]
Polynomial regression is used for curvilinear data. Polynomial regression is fit with the method of least squares. The goal of regression analysis to model the expected value of a dependent variable y in regards to the independent variable x. The equation for polynomial regression is:
\[l = \beta_{0}+\beta_{0}x_{1}+\epsilon\]
Stepwise regression is used for fitting regression models with predictive models. It is carried out automatically. With each step, the variable is added or subtracted from the set of explanatory variables. The approaches for stepwise regression are forward selection, backward elimination, and bidirectional elimination. The formula for stepwise regression is
\[b_{j.std} = b_{j}(s_{x} * s_{y}^{-1})\]
Ridge regression is a technique for analyzing multiple regression data. When multicollinearity occurs, least squares estimates are unbiased. A degree of bias is added to the regression estimates, and a result, ridge regression reduces the standard errors. The formula for ridge regression is
\[\beta = (X^{T}X + \lambda * I)^{-1}X^{T}y\]
Lasso regression is a regression analysis method that performs both variable selection and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a subset of the provided covariates for use in the final model. Lasso regression is
\[N^{-1}\sum^{N}_{i=1}f(x_{i}, y_{I}, \alpha, \beta)\]
ElasticNet regression is a regularized regression method that linearly combines the penalties of the lasso and ridge methods. ElasticNet regression is used for support vector machines, metric learning, and portfolio optimization. The penalty function is given by:
\[||\beta||_{1} = \sum^{p}_{j=1}|\beta_{j}|\]
With little bit of tweaks, there are other forms of regression models are formulated for specific applications—
Quantile Regression
Prinicipal Components Regression
Partial Least Square (PLS) Regression
Support Vector Regression
Ordinal Regression
Poisson Regression
Negative Binomial Regression
Quasi Poisson Regression
Cox Regression
Tobin Regression
How to choose the correct regression model?
If dependent variable is continuous and model is suffering from collinearity or there are a lot of independent variables, you can try PCR, PLS, ridge, lasso and elastic net regressions. You can select the final model based on Adjusted r-square, RMSE, AIC and BIC.
If you are working on count data, you should try poisson, quasi-poisson and negative binomial regression. To avoid overfitting, we can use cross-validation method to evaluate models used for prediction. We can also use ridge, lasso and elastic net regressions techniques to correct overfitting issue.
Try support vector regression when you have non-linear model.
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
[1] 50 2
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00
Assumption 1
The regression model is linear in parameters An example of model equation that is linear in parameters
\[Y = a + (β1*X1) + (β2*X2)\]
Assumption 2
The mean of residuals is zero
How to check?
Check the mean of the residuals. If it zero (or very close), then this assumption is held true for that model.
Assumption 3
Homoscedasticity of residuals or equal variance
Let us check the assumptions for the dataset :cars.
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
[1] 8.65974e-17
Mean of the model residuals is very very close to zero. Hence Assumption 1 is met.
# Checking the Heteroscadicity
# Ho: Variance of the model residuals is constant
# Ha: Variance of the model residuals is not constant
lmtest::bptest(model)
studentized Breusch-Pagan test
data: model
BP = 3.2149, df = 1, p-value = 0.07297
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 4.650233, Df = 1, p = 0.031049
The Assumption of Homoscadicity is not met as the tests fail to accept the null hypthesis.
Definition
Multivariate Regression is a method used to measure the degree at which more than one independent variable (predictors) and more than one dependent variable (responses), are linearly related.
\[Y' = \beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{3}+......\beta_{n}x_{n}+\epsilon\]
In above equation, \[\beta_{0}\] is the intercept and \[\beta_{1},beta_{2},beta_{3},.....,beta_{n}\] are the regression coefficients corresponding to each predictor \[\x_{1},x_{2},x_{3},......,x_{n}\]. \[Y'\] is the dependent variable.
[1] 210 8
The data contains following parameters for seeds of three varieties of wheat (Kama=1, Rose=2 and Canadian=3)—
1. area A,
2. perimeter P,
3. compactness C = 4piA/P^2,
4. length of kernel,
5. width of kernel,
6. asymmetry coefficient
7. length of kernel groove
8. Wheat Variety
Source:M. Charytanowicz, J. Niewczas, P. Kulczycki, P.A. Kowalski, S. Lukasik, S. Zak, ‘A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images’, in: Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, Berlin-Heidelberg, 2010, pp. 15-24.
Contributors gratefully acknowledge support of their work by the Institute of Agrophysics of the Polish Academy of Sciences in Lublin.
Area Perimeter Compactness length_of_kernel
Min. :10.59 Min. :12.41 Min. :0.8081 Min. :4.899
1st Qu.:12.27 1st Qu.:13.45 1st Qu.:0.8569 1st Qu.:5.262
Median :14.36 Median :14.32 Median :0.8734 Median :5.524
Mean :14.85 Mean :14.56 Mean :0.8710 Mean :5.629
3rd Qu.:17.30 3rd Qu.:15.71 3rd Qu.:0.8878 3rd Qu.:5.980
Max. :21.18 Max. :17.25 Max. :0.9183 Max. :6.675
width_of_kernel asymmetry_coefficient length_of_kernel_groove
Min. :2.630 Min. :0.7651 Min. :4.519
1st Qu.:2.944 1st Qu.:2.5615 1st Qu.:5.045
Median :3.237 Median :3.5990 Median :5.223
Mean :3.259 Mean :3.7002 Mean :5.408
3rd Qu.:3.562 3rd Qu.:4.7687 3rd Qu.:5.877
Max. :4.033 Max. :8.4560 Max. :6.550
Wheat_Variety
1:70
2:70
3:70
Assumption 1
The regression model is linear in parameters An example of model equation that is linear in parameters
\[Y = a + (β1*X1) + (β2*X2)\]
Assumption 2
The mean of residuals is zero
How to check?
Check the mean of the residuals. If it zero (or very close), then this assumption is held true for that model.
Assumption 3
Homoscedasticity of residuals or equal variance
Let us check the assumptions for the dataset :cars.
Call:
lm(formula = length_of_kernel_groove ~ ., data = seeds[, -8])
Residuals:
Min 1Q Median 3Q Max
-0.43176 -0.08756 0.01050 0.09855 0.36292
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.191174 2.419963 5.038 1.04e-06 ***
Area 0.495831 0.083106 5.966 1.07e-08 ***
Perimeter -0.690882 0.179084 -3.858 0.000154 ***
Compactness -6.052916 1.756266 -3.446 0.000690 ***
length_of_kernel 0.663013 0.149602 4.432 1.53e-05 ***
width_of_kernel -0.821043 0.264683 -3.102 0.002196 **
asymmetry_coefficient 0.035005 0.007384 4.740 4.01e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1431 on 203 degrees of freedom
Multiple R-squared: 0.9177, Adjusted R-squared: 0.9152
F-statistic: 377.1 on 6 and 203 DF, p-value: < 2.2e-16
[1] 6.988691e-18
Mean of the model residuals is very very close to zero. Hence Assumption 1 is met.
# Checking the Heteroscadicity
# Ho: Variance of the model residuals is constant
# Ha: Variance of the model residuals is not constant
lmtest::bptest(model)
studentized Breusch-Pagan test
data: model
BP = 7.5698, df = 6, p-value = 0.2713
We can see that the p-value is 0.2713 much larger than significance level 0.05.
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.3126986, Df = 1, p = 0.57603
The Assumption of Homoscadicity ismet as the tests fail to reject the null hypthesis.
So, model is robust because all three assumptions are met and model has Rsquared Value more than 0.90.
The various metrics used to evaluate the results of the prediction are :
Mean Squared Error(MSE)
Root-Mean-Squared-Error(RMSE)
Mean-Absolute-Error(MAE)
R² or Coefficient of Determination
Adjusted R²
Mean Squared Error: MSE or Mean Squared Error is one of the most preferred metrics for regression tasks. It is simply the average of the squared difference between the target value and the value predicted by the regression model. As it squares the differences, it penalizes even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better.
Figure :Mean Squared Error Formula
Root Mean Squared Error: RMSE is the most widely used metric for regression tasks and is the square root of the averaged squared difference between the target value and the value predicted by the model. It is preferred more in some cases because the errors are first squared before averaging which poses a high penalty on large errors. This implies that RMSE is useful when large errors are undesired.
Figure :The formula of Root Mean Squared Error
Mean Absolute Error: MAE is the absolute difference between the target value and the value predicted by the model. The MAE is more robust to outliers and does not penalize the errors as extremely as mse. MAE is a linear score which means all the individual differences are weighted equally. It is not suitable for applications where you want to pay more attention to the outliers.
Figure :The Formula of Mean Absolute Error
R² Error: Coefficient of Determination or R² is another metric used for evaluating the performance of a regression model. The metric helps us to compare our current model with a constant baseline and tells us how much our model is better. The constant baseline is chosen by taking the mean of the data and drawing a line at the mean. R² is a scale-free score that implies it doesn’t matter whether the values are too large or too small, the R² will always be less than or equal to 1.
Figure :The Formula for R²
Adjusted R²: Adjusted R² depicts the same meaning as R² but is an improvement of it. R² suffers from the problem that the scores improve on increasing terms even though the model is not improving which may misguide the researcher. Adjusted R² is always lower than R² as it adjusts for the increasing predictors and only shows improvement if there is a real improvement.t
Figure :The Formula of Adjusted R²
Why is R² Negative?
There is a misconception among people that R² score ranges from 0 to 1 but actually it ranges from -∞ to 1. Due to this misconception, they are sometimes scared why the R² is negative which is not a possibility according to them.
Following are the reasons for negative R² —
1. When model does not follow the trend
2. Due to large amount of outliers, the mse of model is more than mse of baseline.
3. Sometimes by mistake, you foreget to add the intercept in the model building.
Dr. Amita Sharma
Post Doc from Erasmus University, the Netherlands, PhD, MBA
Assistant Professor
Institute of Agri-Business Management
Swami Keshwanand Rajasthan Agricultural University
Bikaner (Raj) India
Visit the blog : www.thinkingai.in
Arun Kumar Sharma
Machine Learning Enthusiast, Hobbyist, writer, blogger and S&M Training Professional
Certified Business Analytics Professional
Certified in Predictive Analytics from IIMx Bangalore
Certified in Macroeconomic Forecasting from IMFx
Certified in Text Analytics from openSAP
Contact for How Machine Learning can Transform Your Business: 9468567418/aks10000@gmail.com
---
title: "REGRESSION ANALYSIS"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social : ["facebook","twitter", "menu"]
source_code : embed
---
```{r}
library(flexdashboard)
options(rgl.printRglwidget = TRUE)
```
Regression Analysis: Definition {data-navmenu="MENU"}
=============================================
Definition
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features').
Regression Analysis: Types {data-navmenu="MENU"}
=======================================
Column
---------------------------------------
Primarily, following types of regression models are used---
Linear regression
Logistic regression
Polynomial regression
Stepwise regression
Ridge regression
Lasso regression
ElasticNet regression
Linear regression is used for predictive analysis. Linear regression is a linear approach for modeling the relationship between the criterion or the scalar response and the multiple predictors or explanatory variables. Linear regression focuses on the conditional probability distribution of the response given the values of the predictors. For linear regression, there is a danger of overfitting.
The formula for linear regression is:
$$Y’ = bX + A$$
Logistic regression is used when the dependent variable is dichotomous. Logistic regression estimates the parameters of a logistic model and is form of binomial regression. Logistic regression is used to deal with data that has two possible criterions and the relationship between the criterions and the predictors. The equation for logistic regression is:
$$l = \beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}$$
Polynomial regression is used for curvilinear data. Polynomial regression is fit with the method of least squares. The goal of regression analysis to model the expected value of a dependent variable y in regards to the independent variable x. The equation for polynomial regression is:
$$l = \beta_{0}+\beta_{0}x_{1}+\epsilon$$
Stepwise regression is used for fitting regression models with predictive models. It is carried out automatically. With each step, the variable is added or subtracted from the set of explanatory variables. The approaches for stepwise regression are forward selection, backward elimination, and bidirectional elimination. The formula for stepwise regression is
$$b_{j.std} = b_{j}(s_{x} * s_{y}^{-1})$$
Ridge regression is a technique for analyzing multiple regression data. When multicollinearity occurs, least squares estimates are unbiased. A degree of bias is added to the regression estimates, and a result, ridge regression reduces the standard errors. The formula for ridge regression is
$$\beta = (X^{T}X + \lambda * I)^{-1}X^{T}y$$
Lasso regression is a regression analysis method that performs both variable selection and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a subset of the provided covariates for use in the final model. Lasso regression is
$$N^{-1}\sum^{N}_{i=1}f(x_{i}, y_{I}, \alpha, \beta)$$
ElasticNet regression is a regularized regression method that linearly combines the penalties of the lasso and ridge methods. ElasticNet regression is used for support vector machines, metric learning, and portfolio optimization. The penalty function is given by:
$$||\beta||_{1} = \sum^{p}_{j=1}|\beta_{j}|$$
With little bit of tweaks, there are other forms of regression models are formulated for specific applications---
Quantile Regression
Prinicipal Components Regression
Partial Least Square (PLS) Regression
Support Vector Regression
Ordinal Regression
Poisson Regression
Negative Binomial Regression
Quasi Poisson Regression
Cox Regression
Tobin Regression
How to Choose Right Regression Model {data-navmenu="MENU"}
=====================================
How to choose the correct regression model?
If dependent variable is continuous and model is suffering from collinearity or there are a lot of independent variables, you can try PCR, PLS, ridge, lasso and elastic net regressions. You can select the final model based on Adjusted r-square, RMSE, AIC and BIC.
If you are working on count data, you should try poisson, quasi-poisson and negative binomial regression.
To avoid overfitting, we can use cross-validation method to evaluate models used for prediction. We can also use ridge, lasso and elastic net regressions techniques to correct overfitting issue.
Try support vector regression when you have non-linear model.
Univariate Linear Regression Example {data-navmenu="MENU"}
===================================
Column {.tabset}
-----------------------------------------
### Top 10 Observations of the Dataset
```{r}
head(cars,10)
```
### Dimensions of the Dataset
```{r}
dim(cars)
```
### Summary of the Dataset
```{r}
summary(cars)
```
### Visualization
```{r}
scatter.smooth(x=cars$speed,
y=cars$dist,
main="Speed Vs Distance",
xlab = "Speed of the Car",
ylab="Distance", col="blue", lwd=2)
```
### Assumptions of Linear Regression Model
Assumption 1
The regression model is linear in parameters
An example of model equation that is linear in parameters
$$Y = a + (β1*X1) + (β2*X2)$$
Assumption 2
The mean of residuals is zero
How to check?
Check the mean of the residuals. If it zero (or very close), then this assumption is held true for that model.
Assumption 3
Homoscedasticity of residuals or equal variance
Let us check the assumptions for the dataset :cars.
### Linear Regression Model
```{r echo=TRUE}
model=lm(dist~speed, data=cars)
summary(model)
```
### Checking the LM Assumptions:Statistical Methods
```{r echo=TRUE}
# Calculating mean of model residuals
mean(model$residuals)
```
Mean of the model residuals is very very close to zero. Hence Assumption 1 is met.
```{r echo=TRUE}
# Checking the Heteroscadicity
# Ho: Variance of the model residuals is constant
# Ha: Variance of the model residuals is not constant
lmtest::bptest(model)
```
```{r echo=TRUE}
car::ncvTest(model)
```
The Assumption of Homoscadicity is not met as the tests fail to accept the null hypthesis.
### Checking the LM Assumptions:Graphical Methods
```{r echo=TRUE}
par(mfrow=c(2,2)) # init 4 charts in 1 panel
plot(model)
```
Multivariate Linear Regression Example {data-navmenu="MENU"}
===================================
Column {.tabset}
-----------------------------------------
### What is Multivariate Linear Regression?
Definition
Multivariate Regression is a method used to measure the degree at which more than one independent variable (predictors) and more than one dependent variable (responses), are linearly related.
$$Y' = \beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{3}+......\beta_{n}x_{n}+\epsilon$$
In above equation, $$\beta_{0}$$ is the intercept and $$\beta_{1},beta_{2},beta_{3},.....,beta_{n}$$ are the regression coefficients corresponding to each predictor $$\x_{1},x_{2},x_{3},......,x_{n}$$. $$Y'$$ is the dependent variable.
### Top 10 Observations of the Dataset
```{r}
seeds=read.delim("C:\\Users\\ARUN SHARMA\\Desktop\\regression analysis\\seeds_dataset.txt",header=FALSE)
names(seeds)=c("Area","Perimeter","Compactness","length_of_kernel","width_of_kernel","asymmetry_coefficient","length_of_kernel_groove", "Wheat_Variety")
seeds$Wheat_Variety=as.factor(seeds$Wheat_Variety)
DT::datatable(seeds, filter="top")
```
### Dimensions of the Dataset
```{r}
dim(seeds)
```
The data contains following parameters for seeds of three varieties of wheat (Kama=1, Rose=2 and Canadian=3)---
1. area A,
2. perimeter P,
3. compactness C = 4*pi*A/P^2,
4. length of kernel,
5. width of kernel,
6. asymmetry coefficient
7. length of kernel groove
8. Wheat Variety
Source:M. Charytanowicz, J. Niewczas, P. Kulczycki, P.A. Kowalski, S. Lukasik, S. Zak, 'A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images', in: Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, Berlin-Heidelberg, 2010, pp. 15-24.
Contributors gratefully acknowledge support of their work by the Institute of Agrophysics of the Polish Academy of Sciences in Lublin.
### Summary of the Dataset
```{r}
summary(seeds)
```
### 2D Visualization
```{r}
library(ggplot2)
ggplot(seeds,aes(x=Area,
y=length_of_kernel_groove,
color=Wheat_Variety))+geom_point()+
ggtitle("Length of Kernel Groove Vs Perimeter of Seed")+ xlab("Perimeter of Seed")+
ylab("Length of Kernel Groove")
```
### 3D Visualization
```{r}
library(plot3Drgl)
library(plot3D)
scatter3D(x=seeds$length_of_kernel,
y=seeds$length_of_kernel_groove,
z=seeds$width_of_kernel, bty = "u",
col.panel ="lightyellow", expand =1,
col.grid = "black",
col.var = as.integer(seeds$Wheat_Variety),
col = c("#1B9E77", "#D95F02", "#7570B3"),
pch = 18, ticktype = "detailed",
colkey = list(at = c(2,3, 4), side = 4,
addlines = TRUE, length = 0.5, width = 0.5,
labels = c("Kami", "Rosa", "Canadian")), main ="Length Kernel Groove Vs Width and Length of Kernel")
```
### Assumptions of Linear Regression Model
Assumption 1
The regression model is linear in parameters
An example of model equation that is linear in parameters
$$Y = a + (β1*X1) + (β2*X2)$$
Assumption 2
The mean of residuals is zero
How to check?
Check the mean of the residuals. If it zero (or very close), then this assumption is held true for that model.
Assumption 3
Homoscedasticity of residuals or equal variance
Let us check the assumptions for the dataset :cars.
### Linear Multivariate Regression Model
```{r echo=TRUE}
model=lm(length_of_kernel_groove~., data=seeds[,-8])
summary(model)
```
### Checking the LM Assumptions:Statistical Methods
```{r echo=TRUE}
# Calculating mean of model residuals
mean(model$residuals)
```
Mean of the model residuals is very very close to zero. Hence Assumption 1 is met.
```{r echo=TRUE}
# Checking the Heteroscadicity
# Ho: Variance of the model residuals is constant
# Ha: Variance of the model residuals is not constant
lmtest::bptest(model)
```
We can see that the p-value is 0.2713 much larger than significance level 0.05.
```{r echo=TRUE}
car::ncvTest(model)
```
The Assumption of Homoscadicity ismet as the tests fail to reject the null hypthesis.
So, model is robust because all three assumptions are met and model has Rsquared Value more than 0.90.
### Checking the LM Assumptions:Graphical Methods
```{r echo=FALSE}
par(mfrow=c(2,2)) # init 4 charts in 1 panel
plot(model)
```
Performance Metrics for Linear Regression {data-navmenu="MENU"}
============================================
### Performance Metrics for Linear Regression
The various metrics used to evaluate the results of the prediction are :
Mean Squared Error(MSE)
Root-Mean-Squared-Error(RMSE)
Mean-Absolute-Error(MAE)
R² or Coefficient of Determination
Adjusted R²
Mean Squared Error: MSE or Mean Squared Error is one of the most preferred metrics for regression tasks. It is simply the average of the squared difference between the target value and the value predicted by the regression model. As it squares the differences, it penalizes even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better.
Figure :Mean Squared Error Formula
Root Mean Squared Error: RMSE is the most widely used metric for regression tasks and is the square root of the averaged squared difference between the target value and the value predicted by the model. It is preferred more in some cases because the errors are first squared before averaging which poses a high penalty on large errors. This implies that RMSE is useful when large errors are undesired.
Figure :The formula of Root Mean Squared Error
Mean Absolute Error: MAE is the absolute difference between the target value and the value predicted by the model. The MAE is more robust to outliers and does not penalize the errors as extremely as mse. MAE is a linear score which means all the individual differences are weighted equally. It is not suitable for applications where you want to pay more attention to the outliers.
Figure :The Formula of Mean Absolute Error
R² Error: Coefficient of Determination or R² is another metric used for evaluating the performance of a regression model. The metric helps us to compare our current model with a constant baseline and tells us how much our model is better. The constant baseline is chosen by taking the mean of the data and drawing a line at the mean. R² is a scale-free score that implies it doesn't matter whether the values are too large or too small, the R² will always be less than or equal to 1.
Figure :The Formula for R²
Adjusted R²: Adjusted R² depicts the same meaning as R² but is an improvement of it. R² suffers from the problem that the scores improve on increasing terms even though the model is not improving which may misguide the researcher. Adjusted R² is always lower than R² as it adjusts for the increasing predictors and only shows improvement if there is a real improvement.t
Figure :The Formula of Adjusted R²
Why is R² Negative?
There is a misconception among people that R² score ranges from 0 to 1 but actually it ranges from -∞ to 1. Due to this misconception, they are sometimes scared why the R² is negative which is not a possibility according to them.
Following are the reasons for negative R² ---
1. When model does not follow the trend
2. Due to large amount of outliers, the mse of model is more than mse of baseline.
3. Sometimes by mistake, you foreget to add the intercept in the model building.
About Us {data-navmenu="MENU"}
============================================
Dr. Amita Sharma
Post Doc from Erasmus University, the Netherlands, PhD, MBA
Assistant Professor
Institute of Agri-Business Management
Swami Keshwanand Rajasthan Agricultural University
Bikaner (Raj) India
Visit the blog : www.thinkingai.in
Arun Kumar Sharma
Machine Learning Enthusiast, Hobbyist, writer, blogger and S&M Training Professional
Certified Business Analytics Professional
Certified in Predictive Analytics from IIMx Bangalore
Certified in Macroeconomic Forecasting from IMFx
Certified in Text Analytics from openSAP
Contact for How Machine Learning can Transform Your Business: 9468567418/aks10000@gmail.com