Logistic Regression
Hypothesis Mapping Function
In logistic regression, the hypothesis mapping function predicts the probability that an event (e.g., buying insurance) will occur, given input features such as income level and age.
The hypothesis function (also known as the sigmoid function) is expressed as:
\[ h_\theta(X) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 \cdot \text{Age} + \theta_2 \cdot \text{Income_Level})}} \]
Components:
- \(h_\theta(X)\): The predicted probability that a person buys insurance, i.e., \(P(\text{BuyInsurance} = 1 \mid \text{Age}, \text{Income\_Level})\).
- \(\theta_0\): The intercept (bias term).
- \(\theta_1\): The weight or
coefficient associated with the
Agefeature. - \(\theta_2\): The weight or
coefficient associated with the
Income\_Levelfeature. - \(X\): The input features (Age and Income_Level).
- \(e\): The base of the natural logarithm (approximately 2.718).
Hypothesis Explanation:
- The function \(h_\theta(X)\) outputs a value between 0 and 1, which is interpreted as the probability of the person buying insurance.
- If the output is close to 1, the person is likely to buy insurance.
- If the output is close to 0, the person is unlikely to buy insurance.
Prediction Rule:
Once the probability \(h_\theta(X)\) is computed, the final prediction is made by applying a threshold:
\[ \hat{y} = \begin{cases} 1 & \text{if } h_\theta(X) \geq 0.5 \quad (\text{person will buy insurance}) \\ 0 & \text{if } h_\theta(X) < 0.5 \quad (\text{person will not buy insurance}) \end{cases} \]
Graph of Linear Regression vs Logistic Regression
Difference Between Linear and Logistic Models
| Feature | Linear Model | Logistic Model |
|---|---|---|
| Graph Shape | Straight line | S-shaped curve (sigmoid function) |
| Equation | \(\hat{y} = \theta_0 + \theta_1 \cdot X_1 + \theta_2 \cdot X_2 + \dots\) | \(h_\theta(X) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 \cdot X_1 + \dots)}}\) |
| Output Range | \(-\infty\) to \(+\infty\) | 0 to 1 (probability) |
| Use Case | Regression (predicting continuous values) | Binary Classification (predicting probabilities) |
| Prediction Interpretation | Can predict values outside of 0 and 1, which is unsuitable for probabilities | Predicts probabilities, making it suitable for classification |
| Example | Predicting house prices, stock prices | Predicting whether a person buys insurance (yes/no) |
Import Necessary Libraries
Download the csv file
!()[https://docs.google.com/spreadsheets/d/1dzW--P-yVGQsmfQIbjrqa4g5vhM7nl-oEgNz2o8hHK4/edit?usp=sharing]
Prepare data for modeling
Split the data into training and test sets (80% train, 20% test)
Train the model
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
Evaluate the model
Print the results
## Accuracy: 1.00
##
## Confusion Matrix:
## [[4 0]
## [0 2]]
##
## Classification Report:
## precision recall f1-score support
##
## 0 1.00 1.00 1.00 4
## 1 1.00 1.00 1.00 2
##
## accuracy 1.00 6
## macro avg 1.00 1.00 1.00 6
## weighted avg 1.00 1.00 1.00 6
Problem to Predict for the data from the given excel file and generate list of predictions in an excel file
!()[https://docs.google.com/spreadsheets/d/1XfbCjxErZU4zENn0mkcVM8KlaNrS3TyX1jxWiYIVdQg/edit?usp=sharing]