data <- mtcars
summary(data)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
# Creating a binary variable for high/low gas mileage
mtcars$high_mpg = ifelse(mtcars$mpg >= 25, 1, 0)
# Optional: Scale continuous predictors (example)
mtcars$disp <- scale(mtcars$disp)
mtcars$hp <- scale(mtcars$hp)
# Fit a logistic regression model with increased iterations
model <- glm(high_mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars, family = "binomial", control = list(maxit = 50))
## Warning: glm.fit:拟合機率算出来是数值零或一
# Summary of the model
summary(model)
##
## Call:
## glm(formula = high_mpg ~ cyl + disp + hp + drat + wt + qsec +
## vs + am + gear + carb, family = "binomial", data = mtcars,
## control = list(maxit = 50))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.287e+02 5.425e+06 0 1
## cyl 5.986e+01 3.354e+05 0 1
## disp -6.163e+00 1.122e+06 0 1
## hp -7.162e+01 5.753e+05 0 1
## drat 1.809e+01 7.784e+05 0 1
## wt -3.173e+01 7.026e+05 0 1
## qsec 1.032e+01 1.043e+05 0 1
## vs 1.120e+01 3.913e+05 0 1
## am -2.334e+00 5.967e+05 0 1
## gear 8.160e+01 7.720e+05 0 1
## carb -1.961e+01 2.000e+05 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3.0885e+01 on 31 degrees of freedom
## Residual deviance: 5.1838e-10 on 21 degrees of freedom
## AIC: 22
##
## Number of Fisher Scoring iterations: 26
# Exponentiating coefficients to get odds ratios
exp(coef(model))
## (Intercept) cyl disp hp drat wt
## 0.000000e+00 9.936999e+25 2.106133e-03 7.884541e-32 7.221176e+07 1.662291e-14
## qsec vs am gear carb
## 3.041118e+04 7.306324e+04 9.694556e-02 2.734170e+35 3.035191e-09
Why not running the multivariate regression?
Multivariate regression is used when there are multiple dependent variables. In the “mtcars” dataset, we have only one dependent variable (the binary outcome for mileage), making multivariate regression inappropriate.
This code uses many packages. For example, ‘Amelia’ for multiple imputation, ‘broom’ for converting statistical objects into tidy data frames, ‘caret’ for classification and regression training, and ‘ggplot2’ for data visualization. The code focuses on creating an interactive map using the ‘leaflet’ package to display state-wise bankruptcy data. The map is color-coded and interactive, offering a visual representation of the data. It also uses two specific financial distress models: Altman’s Z-score and Ohlson’s O-score. These models are used to predict bankruptcy and are based on different financial ratios and variables.
First and Foremost, I want to talk about the programming language we used for the whole semester: R. R is a convenient, efficient language in the tasks about data analysis. I like these excellent packages which help me finish the difficult assignments.
Second, I want to mention the knowledge about the statistics. My undergraduate major is Computer Science, and I study statistics only in the course Probability Theory. In this course Data Analysis, I learned a lot of new things like CLT and some kinds of regressions, and apply them into real cases.
Overall, it’s a start of data analysis for me in my career. I think it will help me a lot in my future work.