Introduction
Logistic Regression is a type of classification algorithm that is used when the response variable is binary in nature. Being binary means that the response variable is a two level categorical variable. The Logistic Regression model belongs to the family of generalized linear models (GLMs) and it’s used when the response is a two-level categorical variable. Typical examples of binary response variables are Yes/No; Male/Female; Cancer/No Cancer; Approve/Deny, etc. These response variables are often coded as 1 or 0. Even though the name contains regression, the logistic regression model is often used when the response variable is discrete. i.e It is a classification algorithm.
Logistic Regressions must not necessarily be binary as there are possible situations where there are more than two-level categories in the response variables and such situations are regarded as multinomial logistic regression which is beyond the scope of this article.
The logistic regression model relates the probability that a response variable would be successful to the predictors \(x_{1, i}, x_{2, i}, ..., x_{k, i}\) through a framework like that of multiple regression:
\(logit(p_{i}) = log_{e}(\frac{p_{i}}{1-p_{i}}) = \beta_{0} + \beta_{1}x_{1,i} + \beta_{2}x_{2,i} + ... + \beta_{k}x_{k,i}\)
Assumptions for Logistic Regression
Each outcome of the response variable is independent of the other outcomes.
The response variable must follow a binomial distribution.
Each predictor \(x_{i}\) is linearly related to the \(logit(p_{i})\) if other predictors are held constant.
Evaluating the Logistic Regression Model
After obtaining and fitting the logistic regression model, we often use certain metrics to evaluate how well the model performs. Below are some metrics that can be used to evaluate the performance of a logistic regression model:

source: https://www.debadityachakravorty.com/ai-ml/cmatrix/
The type of metrics to be used depends on the situation.
Accuracy: This is the most common measure used to evaluate a classification model. Accuracy is the ratio of correctly classified observations to the total. This tells us the percentage of observations that our model is correctly classifying.
\(accuracy = \frac{TP + TN}{TP+FP+TN+FN}\)
Accuracy is great for symmetric datasets and when the cost of false positives and false negatives are similar.
Precision: This is the percentage of the results that are relevant. It is computed by calculating the number of true positives (TP) divided by all positively classified observations by the model (both TP and FP). The precision tells us the percentage of observations that are actually positive from all positively classified observations by the model. i.e a precision of 90% means that of all the observations that are classified as positive by our model, only 90% of those are actually positive which means that the model correctly classified the positive cases in 90% of the cases.
\(precision = \frac{TP}{TP+FP}\)
We use precision when we want to be more confident about our true positives. For example, in spam/ham emails, you want to be sure that the email is spam before putting it in the spam box.
Recall (Sensitivity): The recall is also regarded as the sensitivity of the model. This refers to the percentage of total relevant results that are correctly classified by the model. Essentially, it means the ratio of the number of positively classified observations to the total number of actual positives both those correctly classified and incorrectly classified by the model (TP and FN). It is computed by dividing the number of true positives by the total number of observations that are actually positive (whether correctly classified by the model or not). The Recall also known as Sensitivity tells us what proportion of the positive class got correctly classified. i.e What percentage of cancer patients got correctly classified as cancer patients by the model. A recall of 95% means that of all the actually positive cases, only 95% were correctly classified by the model.
\(recall = \frac{TP}{TP+FN}\)
We use recall when having a false positive is way better than having a false negative. for example, you want to tell someone that they have cancer (FP) which in fact they don’t instead of telling them that they don’t have cancer(FN) when infact they have it. It would be disastrous to give a False Negative to a cancer patient because they would probably have had time to ameliorate the situation if they had been informed earlier. The recall is better when the cost of false negatives is unacceptable. i.e. False positive is better than false negative.
F1 Score: This is the harmonic mean of the precision and recall. It is also called F-score or F-Measure.
\(F1 = \frac{2 * precision * recall}{precision + recall}\)
F1 is best for uneven distribution and it can be used to compare different models.
References
Diez, D., Barr, C. D., & Cetinkaya-Rundel, M. (2019). OpenIntro statistics 4th Edition.