Discussion 15

Haiding Luo

2023 12 18

1. Implement the logistic regression on any dataset of your choice, and interpret your coefficients.  Tell us why you should not run a multivariate regression.

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
str(cars)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
data(cars)
median_dist <- median(cars$dist)
cars$stop_long <- ifelse(cars$dist > median_dist, 1, 0)
model <- glm(stop_long ~ speed, family = binomial, data = cars)
summary(model)
## 
## Call:
## glm(formula = stop_long ~ speed, family = binomial, data = cars)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -8.5971     2.3665  -3.633 0.000280 ***
## speed         0.5460     0.1489   3.666 0.000247 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 69.235  on 49  degrees of freedom
## Residual deviance: 35.469  on 48  degrees of freedom
## AIC: 39.469
## 
## Number of Fisher Scoring iterations: 6
exp(coef(model))
##  (Intercept)        speed 
## 0.0001846333 1.7262913012
exp(coef(model)["speed"])
##    speed 
## 1.726291

1.726291 This value is the exponent of the coefficient corresponding to speed, and it indicates that for each unit increase in the speed of the car, the odds of the car’s stopping distance exceeding the median increase by approximately 72.63%.

Why not running the multivariate regression?

The cars dataset, with speed and stopping distance measurements, is unsuitable for multivariate regression because it only includes one independent and one dependent variable, fitting univariate analysis instead. Multivariate regression is for scenarios with multiple outcomes, whereas here we have a single outcome.

2.REFLECTION

Reflecting on the past 14 weeks of data analysis coursework, I am amazed by the breadth and depth of topics that we have covered. The journey began with the basics of R and an understanding of different types of variables and their measurements, which laid a solid foundation for the essential statistical programming skills for any data analyst. Through the semester, I feel that I have not only mastered the theoretical knowledge of data analysis principles but also acquired practical skills in applying these principles in real-world scenarios using R. This course has been a comprehensive introduction to the world of data analysis, and I am eager to continue learning on this solid foundation.