data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
log_model <- glm(vs ~ wt + hp +cyl, family = binomial
, data = mtcars)
summary(log_model)
##
## Call:
## glm(formula = vs ~ wt + hp + cyl, family = binomial, data = mtcars)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.95269 4.38923 2.268 0.0234 *
## wt 2.97179 1.90352 1.561 0.1185
## hp -0.05270 0.03552 -1.484 0.1379
## cyl -2.16258 1.50997 -1.432 0.1521
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.860 on 31 degrees of freedom
## Residual deviance: 12.698 on 28 degrees of freedom
## AIC: 20.698
##
## Number of Fisher Scoring iterations: 7
#The weight (wt) variable’s coefficient of 2.97179 reveals that each unit increase in weight augments the log-odds of having a straight engine by approximately 2.97179. However, this effect is not statistically significant (p-value = 0.1185), indicating uncertainty in this predictor’s impact.
#The horsepower (hp) coefficient of -0.05270 suggests that each additional horsepower decreases the log-odds of a straight engine by 0.05270, though this relationship is not statistically significant (p-value = 0.1379). Similarly, the cylinder (cyl) coefficient of -2.16258 indicates that the absence of cylinders (cyl = 0) is associated with a decrease in the log-odds of a straight engine by 2.16258, but this relationship also lacks statistical significance (p-value = 0.1521).
#Multivariate regression analysis is not recommended because of limited sample size that can lead to overfitting if the number of predictors is too high relative to the data available. High collinearity among predictors complicates the model by making it difficult to discern individual variable effects and can destabilize coefficient estimates. Lastly, it might be more appropriate to focus the analysis on specific variables based on theory or prior knowledge, especially when certain variables are known to be more critical based on previous research, to ensure clarity and relevance of the findings.
#In the past 14 weeks, I have developed my data analysis skills, primarily by learning R programming and dealing with large datasets. Our coursework has covered a wide range of statistical concepts, including probability, hypothesis testing, and regression analysis, contributing to our theoretical understanding. Moreover, we have gained practical knowledge by implementing these statistical methods on actual data, exploring data visualization, and interpreting complex patterns. The lessons on Bayesian statistics were exceptionally informative, providing insights into its increasing relevance in the field. Initially, I was hesitant about statistics, but the hands-on approach used in this course has helped me appreciate the subject more profoundly. Professor Arvind Sharma’s guidance was crucial in building our confidence in data analytics. Overall, this course has established a strong foundation for my further exploration in data science.