Understanding Logistic Regression

We are going to look at Linear Regression compared to Logistic Regression when looking at a cars dataset that has binary data for Engine Types. We will see more in the next slide.

MTCARS DATASET

We’ll be using the R dataset ‘mtcars’. 1 = automatic, 0 = manual.

##                    mpg    wt vs
## Mazda RX4         21.0 2.620  0
## Mazda RX4 Wag     21.0 2.875  0
## Datsun 710        22.8 2.320  1
## Hornet 4 Drive    21.4 3.215  1
## Hornet Sportabout 18.7 3.440  0
## Valiant           18.1 3.460  1

Linear regression for binary data

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_smooth()`).

Sigmoid Function

Here is the formula for the Sigmoid Function:

\[P(x) = \frac{1}{1 + e^{-x}}\] This transforms output of linear regression into an S-shaped curve.

Logit Link Function

Logistic Regression for Binary Data

This approach is more effective than linear regression for binary data as seen from the graph.

## `geom_smooth()` using formula = 'y ~ x'

Now in R Code

This slide shows steps to visualize the data in R code.

Fit model:

model <- glm(vs ~ mpg, data = mtcars, family = binomial)

Visualize:

ggplot(mtcars, aes(mpg, vs)) + geom_point(alpha = 0.5) + stat_smooth(method = “glm”, method.args = list(family = “binomial”), se = FALSE, color = “blue”)

3D Scatter-Plot in Plotly

This data shows us a 3-D image of whether cars are automatic or manual, given MPG, Weight, and Engine Type