Logistic Regression

Hypothesis Mapping Function

In logistic regression, the hypothesis mapping function predicts the probability that an event (e.g., buying insurance) will occur, given input features such as income level and age.

The hypothesis function (also known as the sigmoid function) is expressed as:

\[ h_\theta(X) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 \cdot \text{Age} + \theta_2 \cdot \text{Income_Level})}} \]

Components:

\(h_\theta(X)\): The predicted probability that a person buys insurance, i.e., \(P(\text{BuyInsurance} = 1 \mid \text{Age}, \text{Income\_Level})\).
\(\theta_0\): The intercept (bias term).
\(\theta_1\): The weight or coefficient associated with the Age feature.
\(\theta_2\): The weight or coefficient associated with the Income\_Level feature.
\(X\): The input features (Age and Income_Level).
\(e\): The base of the natural logarithm (approximately 2.718).

Hypothesis Explanation:

The function \(h_\theta(X)\) outputs a value between 0 and 1, which is interpreted as the probability of the person buying insurance.
If the output is close to 1, the person is likely to buy insurance.
If the output is close to 0, the person is unlikely to buy insurance.

Prediction Rule:

Once the probability \(h_\theta(X)\) is computed, the final prediction is made by applying a threshold:

\[ \hat{y} = \begin{cases} 1 & \text{if } h_\theta(X) \geq 0.5 \quad (\text{person will buy insurance}) \\ 0 & \text{if } h_\theta(X) < 0.5 \quad (\text{person will not buy insurance}) \end{cases} \]

Graph of Linear Regression vs Logistic Regression

Difference Between Linear and Logistic Models

Feature	Linear Model	Logistic Model
Graph Shape	Straight line	S-shaped curve (sigmoid function)
Equation	\(\hat{y} = \theta_0 + \theta_1 \cdot X_1 + \theta_2 \cdot X_2 + \dots\)	\(h_\theta(X) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 \cdot X_1 + \dots)}}\)
Output Range	\(-\infty\) to \(+\infty\)	0 to 1 (probability)
Use Case	Regression (predicting continuous values)	Binary Classification (predicting probabilities)
Prediction Interpretation	Can predict values outside of 0 and 1, which is unsuitable for probabilities	Predicts probabilities, making it suitable for classification
Example	Predicting house prices, stock prices	Predicting whether a person buys insurance (yes/no)

Import Necessary Libraries

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Download the csv file

!()[https://docs.google.com/spreadsheets/d/1dzW--P-yVGQsmfQIbjrqa4g5vhM7nl-oEgNz2o8hHK4/edit?usp=sharing]

Load the dataset from the csv file

data = pd.read_csv("insurance_data.csv")

Prepare data for modeling

X = data[['Age', 'Income_Level']]  # Features
y = data['Bought_Insurance']           # Target

Split the data into training and test sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Create a logistic regression model

log_reg = LogisticRegression()

Train the model

log_reg.fit(X_train, y_train)

LogisticRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Make predictions on the test set

y_pred = log_reg.predict(X_test)

Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

Print the results

print(f"Accuracy: {accuracy:.2f}")

## Accuracy: 1.00

print("\nConfusion Matrix:")

## 
## Confusion Matrix:

print(conf_matrix)

## [[4 0]
##  [0 2]]

print("\nClassification Report:")

## 
## Classification Report:

print(class_report)

##               precision    recall  f1-score   support
## 
##            0       1.00      1.00      1.00         4
##            1       1.00      1.00      1.00         2
## 
##     accuracy                           1.00         6
##    macro avg       1.00      1.00      1.00         6
## weighted avg       1.00      1.00      1.00         6

Problem to Predict for the data from the given excel file and generate list of predictions in an excel file

!()[https://docs.google.com/spreadsheets/d/1XfbCjxErZU4zENn0mkcVM8KlaNrS3TyX1jxWiYIVdQg/edit?usp=sharing]