I. LOGISTIC

data(mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

log_model <- glm(vs ~ wt + hp +cyl,  family = binomial
, data = mtcars)

summary(log_model)

## 
## Call:
## glm(formula = vs ~ wt + hp + cyl, family = binomial, data = mtcars)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  9.95269    4.38923   2.268   0.0234 *
## wt           2.97179    1.90352   1.561   0.1185  
## hp          -0.05270    0.03552  -1.484   0.1379  
## cyl         -2.16258    1.50997  -1.432   0.1521  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.860  on 31  degrees of freedom
## Residual deviance: 12.698  on 28  degrees of freedom
## AIC: 20.698
## 
## Number of Fisher Scoring iterations: 7

3 The intercept of 9.95269 in the logistic regression model denotes the log-odds of having a V-shaped engine when all predictor variables (weight, horsepower, and cylinder count) are zero. This coefficient is statistically significant with a p-value of 0.0234, indicating its meaningfulness in this context.

#The weight (wt) variable’s coefficient of 2.97179 reveals that each unit increase in weight augments the log-odds of having a straight engine by approximately 2.97179. However, this effect is not statistically significant (p-value = 0.1185), indicating uncertainty in this predictor’s impact.

#The horsepower (hp) coefficient of -0.05270 suggests that each additional horsepower decreases the log-odds of a straight engine by 0.05270, though this relationship is not statistically significant (p-value = 0.1379). Similarly, the cylinder (cyl) coefficient of -2.16258 indicates that the absence of cylinders (cyl = 0) is associated with a decrease in the log-odds of a straight engine by 2.16258, but this relationship also lacks statistical significance (p-value = 0.1521).

#Multivariate regression analysis is not recommended because of limited sample size that can lead to overfitting if the number of predictors is too high relative to the data available. High collinearity among predictors complicates the model by making it difficult to discern individual variable effects and can destabilize coefficient estimates. Lastly, it might be more appropriate to focus the analysis on specific variables based on theory or prior knowledge, especially when certain variables are known to be more critical based on previous research, to ensure clarity and relevance of the findings.

II.Please reflect over the last 14 weeks - maybe even skim over the material that we have seen, to consolidate the topics we have seen in class. What have your learned about data analysis - both theoretically and empirically?

#In the past 14 weeks, I have developed my data analysis skills, primarily by learning R programming and dealing with large datasets. Our coursework has covered a wide range of statistical concepts, including probability, hypothesis testing, and regression analysis, contributing to our theoretical understanding. Moreover, we have gained practical knowledge by implementing these statistical methods on actual data, exploring data visualization, and interpreting complex patterns. The lessons on Bayesian statistics were exceptionally informative, providing insights into its increasing relevance in the field. Initially, I was hesitant about statistics, but the hands-on approach used in this course has helped me appreciate the subject more profoundly. Professor Arvind Sharma’s guidance was crucial in building our confidence in data analytics. Overall, this course has established a strong foundation for my further exploration in data science.

Discussion7D

ANDI XU

2024-05-01

I. LOGISTIC

3

II.Please reflect over the last 14 weeks - maybe even skim over the material that we have seen, to consolidate the topics we have seen in class. What have your learned about data analysis - both theoretically and empirically?