We are going to look at Linear Regression compared to Logistic Regression when looking at a cars dataset that has binary data for Engine Types. We will see more in the next slide.
We are going to look at Linear Regression compared to Logistic Regression when looking at a cars dataset that has binary data for Engine Types. We will see more in the next slide.
We’ll be using the R dataset ‘mtcars’. 1 = automatic, 0 = manual.
## mpg wt vs ## Mazda RX4 21.0 2.620 0 ## Mazda RX4 Wag 21.0 2.875 0 ## Datsun 710 22.8 2.320 1 ## Hornet 4 Drive 21.4 3.215 1 ## Hornet Sportabout 18.7 3.440 0 ## Valiant 18.1 3.460 1
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 20 rows containing missing values or values outside the scale range ## (`geom_smooth()`).
Here is the formula for the Sigmoid Function:
\[P(x) = \frac{1}{1 + e^{-x}}\] This transforms output of linear regression into an S-shaped curve.
Now we define the logit function:
\[\text{logit}(p) = ln(\frac{p}{1-p})\] This transforms the S-shaped probabilities to a linear equation.
This approach is more effective than linear regression for binary data as seen from the graph.
## `geom_smooth()` using formula = 'y ~ x'
This slide shows steps to visualize the data in R code.
Fit model:
model <- glm(vs ~ mpg, data = mtcars, family = binomial)
Visualize:
ggplot(mtcars, aes(mpg, vs)) + geom_point(alpha = 0.5) + stat_smooth(method = “glm”, method.args = list(family = “binomial”), se = FALSE, color = “blue”)
This data shows us a 3-D image of whether cars are automatic or manual, given MPG, Weight, and Engine Type