I.

A.

data <- mtcars
summary(data)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
# Creating a binary variable for high/low gas mileage
mtcars$high_mpg = ifelse(mtcars$mpg >= 25, 1, 0)

# Optional: Scale continuous predictors (example)
mtcars$disp <- scale(mtcars$disp)
mtcars$hp <- scale(mtcars$hp)

# Fit a logistic regression model with increased iterations
model <- glm(high_mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, 
             data = mtcars, family = "binomial", control = list(maxit = 50))
## Warning: glm.fit:拟合機率算出来是数值零或一
# Summary of the model
summary(model)
## 
## Call:
## glm(formula = high_mpg ~ cyl + disp + hp + drat + wt + qsec + 
##     vs + am + gear + carb, family = "binomial", data = mtcars, 
##     control = list(maxit = 50))
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.287e+02  5.425e+06       0        1
## cyl          5.986e+01  3.354e+05       0        1
## disp        -6.163e+00  1.122e+06       0        1
## hp          -7.162e+01  5.753e+05       0        1
## drat         1.809e+01  7.784e+05       0        1
## wt          -3.173e+01  7.026e+05       0        1
## qsec         1.032e+01  1.043e+05       0        1
## vs           1.120e+01  3.913e+05       0        1
## am          -2.334e+00  5.967e+05       0        1
## gear         8.160e+01  7.720e+05       0        1
## carb        -1.961e+01  2.000e+05       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3.0885e+01  on 31  degrees of freedom
## Residual deviance: 5.1838e-10  on 21  degrees of freedom
## AIC: 22
## 
## Number of Fisher Scoring iterations: 26
# Exponentiating coefficients to get odds ratios
exp(coef(model))
##  (Intercept)          cyl         disp           hp         drat           wt 
## 0.000000e+00 9.936999e+25 2.106133e-03 7.884541e-32 7.221176e+07 1.662291e-14 
##         qsec           vs           am         gear         carb 
## 3.041118e+04 7.306324e+04 9.694556e-02 2.734170e+35 3.035191e-09

Why not running the multivariate regression?

Multivariate regression is used when there are multiple dependent variables. In the “mtcars” dataset, we have only one dependent variable (the binary outcome for mileage), making multivariate regression inappropriate.

EXTRA.

This code uses many packages. For example, ‘Amelia’ for multiple imputation, ‘broom’ for converting statistical objects into tidy data frames, ‘caret’ for classification and regression training, and ‘ggplot2’ for data visualization. The code focuses on creating an interactive map using the ‘leaflet’ package to display state-wise bankruptcy data. The map is color-coded and interactive, offering a visual representation of the data. It also uses two specific financial distress models: Altman’s Z-score and Ohlson’s O-score. These models are used to predict bankruptcy and are based on different financial ratios and variables.

II.

First and Foremost, I want to talk about the programming language we used for the whole semester: R. R is a convenient, efficient language in the tasks about data analysis. I like these excellent packages which help me finish the difficult assignments.

Second, I want to mention the knowledge about the statistics. My undergraduate major is Computer Science, and I study statistics only in the course Probability Theory. In this course Data Analysis, I learned a lot of new things like CLT and some kinds of regressions, and apply them into real cases.

Overall, it’s a start of data analysis for me in my career. I think it will help me a lot in my future work.