library(corrplot)
library(tidyverse)

Intro

In this homework assignment, we are asked to explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0).

Our objective is to build a binary logistic regression model on the training data set to predict whether the neighborhood will be at risk for high crime levels. Below is a short description of the variables of interest in the data set:

zn: proportion of residential land zoned for large lots (over 25000 square feet) (predictor variable)
indus: proportion of non-retail business acres per suburb (predictor variable)
chas: a dummy var. for whether the suburb borders the Charles River (1) or not (0) (predictor variable)
nox: nitrogen oxides concentration (parts per 10 million) (predictor variable)
rm: average number of rooms per dwelling (predictor variable)
age: proportion of owner-occupied units built prior to 1940 (predictor variable)
dis: weighted mean of distances to five Boston employment centers (predictor variable)
rad: index of accessibility to radial highways (predictor variable)
tax: full-value property-tax rate per $10,000 (predictor variable)
ptratio: pupil-teacher ratio by town (predictor variable)
lstat: lower status of the population (percent) (predictor variable)
medv: median value of owner-occupied homes in $1000s (predictor variable)
target: whether the crime rate is above the median crime rate (1) or not (0) (response variable)

Prep

Data Upload

x <- "C:\\Users\\bloom\\Downloads\\crime-training-data_modified.csv"
train_set_full <- read.csv(x)

nrow(train_set_full)

## [1] 466

summary(train_set_full)

##        zn             indus             chas              nox        
##  Min.   :  0.00   Min.   : 0.460   Min.   :0.00000   Min.   :0.3890  
##  1st Qu.:  0.00   1st Qu.: 5.145   1st Qu.:0.00000   1st Qu.:0.4480  
##  Median :  0.00   Median : 9.690   Median :0.00000   Median :0.5380  
##  Mean   : 11.58   Mean   :11.105   Mean   :0.07082   Mean   :0.5543  
##  3rd Qu.: 16.25   3rd Qu.:18.100   3rd Qu.:0.00000   3rd Qu.:0.6240  
##  Max.   :100.00   Max.   :27.740   Max.   :1.00000   Max.   :0.8710  
##        rm             age              dis              rad       
##  Min.   :3.863   Min.   :  2.90   Min.   : 1.130   Min.   : 1.00  
##  1st Qu.:5.887   1st Qu.: 43.88   1st Qu.: 2.101   1st Qu.: 4.00  
##  Median :6.210   Median : 77.15   Median : 3.191   Median : 5.00  
##  Mean   :6.291   Mean   : 68.37   Mean   : 3.796   Mean   : 9.53  
##  3rd Qu.:6.630   3rd Qu.: 94.10   3rd Qu.: 5.215   3rd Qu.:24.00  
##  Max.   :8.780   Max.   :100.00   Max.   :12.127   Max.   :24.00  
##       tax           ptratio         lstat             medv      
##  Min.   :187.0   Min.   :12.6   Min.   : 1.730   Min.   : 5.00  
##  1st Qu.:281.0   1st Qu.:16.9   1st Qu.: 7.043   1st Qu.:17.02  
##  Median :334.5   Median :18.9   Median :11.350   Median :21.20  
##  Mean   :409.5   Mean   :18.4   Mean   :12.631   Mean   :22.59  
##  3rd Qu.:666.0   3rd Qu.:20.2   3rd Qu.:16.930   3rd Qu.:25.00  
##  Max.   :711.0   Max.   :22.0   Max.   :37.970   Max.   :50.00  
##      target      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4914  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

We see above that our categorical variables are treated as numerical. Let’s adjust this:

train_set_full$chas <- as.factor(train_set_full$chas)
train_set_full$target <- as.factor(train_set_full$target)

summary(train_set_full)

##        zn             indus        chas         nox               rm       
##  Min.   :  0.00   Min.   : 0.460   0:433   Min.   :0.3890   Min.   :3.863  
##  1st Qu.:  0.00   1st Qu.: 5.145   1: 33   1st Qu.:0.4480   1st Qu.:5.887  
##  Median :  0.00   Median : 9.690           Median :0.5380   Median :6.210  
##  Mean   : 11.58   Mean   :11.105           Mean   :0.5543   Mean   :6.291  
##  3rd Qu.: 16.25   3rd Qu.:18.100           3rd Qu.:0.6240   3rd Qu.:6.630  
##  Max.   :100.00   Max.   :27.740           Max.   :0.8710   Max.   :8.780  
##       age              dis              rad             tax       
##  Min.   :  2.90   Min.   : 1.130   Min.   : 1.00   Min.   :187.0  
##  1st Qu.: 43.88   1st Qu.: 2.101   1st Qu.: 4.00   1st Qu.:281.0  
##  Median : 77.15   Median : 3.191   Median : 5.00   Median :334.5  
##  Mean   : 68.37   Mean   : 3.796   Mean   : 9.53   Mean   :409.5  
##  3rd Qu.: 94.10   3rd Qu.: 5.215   3rd Qu.:24.00   3rd Qu.:666.0  
##  Max.   :100.00   Max.   :12.127   Max.   :24.00   Max.   :711.0  
##     ptratio         lstat             medv       target 
##  Min.   :12.6   Min.   : 1.730   Min.   : 5.00   0:237  
##  1st Qu.:16.9   1st Qu.: 7.043   1st Qu.:17.02   1:229  
##  Median :18.9   Median :11.350   Median :21.20          
##  Mean   :18.4   Mean   :12.631   Mean   :22.59          
##  3rd Qu.:20.2   3rd Qu.:16.930   3rd Qu.:25.00          
##  Max.   :22.0   Max.   :37.970   Max.   :50.00

Data Exploration

Correlation

Lets start by seeing how correlated our numeric/continuous variables are:

train_set_full_0 <- train_set_full %>%
                      select("age", "dis", "indus","lstat","medv","nox","ptratio","rad","rm","tax","zn")

train_set_full_cor <- cor(train_set_full_0)
corrplot.mixed(train_set_full_cor)

Distribution

Now, we can take a look at our distributions.

train_set_full_0 %>% 
  #select("age", "dis", "indus","lstat","medv","nox","ptratio","rad","rm","tax","zn") %>%
  pivot_longer(cols = everything(), names_to = "var", values_to = "val") %>%
  ggplot(aes(val)) + geom_density() + facet_wrap(~var, scales = "free")

Transformation

We see some skewing above, lets see if we can address this. This new DF will only be used for illustration purposes, as we will likely supply both the transformed and actual data in our step-wise regression model.

train_set_full1 <- train_set_full_0

train_set_full1$age <- train_set_full1$age^2
train_set_full1$ptratio <- train_set_full1$ptratio^2


train_set_full1$dis <- log(train_set_full1$dis)
train_set_full1$lstat <- log(train_set_full1$lstat)
train_set_full1$nox <- log(train_set_full1$nox)
train_set_full1$zn <- log(train_set_full1$zn+1)

train_set_full1 %>% 
  select("age", "dis", "indus","lstat","medv","nox","ptratio","rad","rm","tax","zn") %>%
  pivot_longer(cols = everything(), names_to = "var", values_to = "val") %>%
  ggplot(aes(val)) + geom_density() + facet_wrap(~var, scales = "free")

In general, we find that everythign is a bit more normal after transformation. Our “zn” and “age” variables are now considerably bimodal, which is great, as the bimodal variable should be more predictive.

Models

Logistic Regression Model

We will start with a “basic” Logistic Regression Model, using our non-transformed variables, to find a starting place.

Logit_Model1 <- glm(target ~., data=train_set_full, family = binomial (link = "logit"))

summary(Logit_Model1)

## 
## Call:
## glm(formula = target ~ ., family = binomial(link = "logit"), 
##     data = train_set_full)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8464  -0.1445  -0.0017   0.0029   3.4665  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -40.822934   6.632913  -6.155 7.53e-10 ***
## zn           -0.065946   0.034656  -1.903  0.05706 .  
## indus        -0.064614   0.047622  -1.357  0.17485    
## chas1         0.910765   0.755546   1.205  0.22803    
## nox          49.122297   7.931706   6.193 5.90e-10 ***
## rm           -0.587488   0.722847  -0.813  0.41637    
## age           0.034189   0.013814   2.475  0.01333 *  
## dis           0.738660   0.230275   3.208  0.00134 ** 
## rad           0.666366   0.163152   4.084 4.42e-05 ***
## tax          -0.006171   0.002955  -2.089  0.03674 *  
## ptratio       0.402566   0.126627   3.179  0.00148 ** 
## lstat         0.045869   0.054049   0.849  0.39608    
## medv          0.180824   0.068294   2.648  0.00810 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.88  on 465  degrees of freedom
## Residual deviance: 192.05  on 453  degrees of freedom
## AIC: 218.05
## 
## Number of Fisher Scoring iterations: 9

At 192 (vs 645) residual deviance, these variables seem fairly predictive. We will now use a step-wise regression to see how successful we can be with our variables:

Logit_Model2 <- glm(target ~ nox + age+ dis + rad+ tax+ ptratio+ medv + zn, data=train_set_full, family = binomial (link = "logit"))

summary(Logit_Model2)

## 
## Call:
## glm(formula = target ~ nox + age + dis + rad + tax + ptratio + 
##     medv + zn, family = binomial(link = "logit"), data = train_set_full)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8295  -0.1752  -0.0021   0.0032   3.4191  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -37.415922   6.035013  -6.200 5.65e-10 ***
## nox          42.807768   6.678692   6.410 1.46e-10 ***
## age           0.032950   0.010951   3.009  0.00262 ** 
## dis           0.654896   0.214050   3.060  0.00222 ** 
## rad           0.725109   0.149788   4.841 1.29e-06 ***
## tax          -0.007756   0.002653  -2.924  0.00346 ** 
## ptratio       0.323628   0.111390   2.905  0.00367 ** 
## medv          0.110472   0.035445   3.117  0.00183 ** 
## zn           -0.068648   0.032019  -2.144  0.03203 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.88  on 465  degrees of freedom
## Residual deviance: 197.32  on 457  degrees of freedom
## AIC: 215.32
## 
## Number of Fisher Scoring iterations: 9

Our Residual Deviance increased to 197, but the AIC dropped to 215, meaning this is the more parsimonious model.

We will now add in our transformed variables, and let the step-wise regression determine if this adds predictive value.

Logit_Model3 <- glm(target ~ zn + indus + chas + nox + rm + age + dis + rad + tax+ ptratio + lstat + medv + 
                      age^2 + 
                      ptratio^2 + 
                      log(dis) + 
                      log(lstat) + 
                      log(nox) + 
                      log(zn+1), data=train_set_full, family = binomial (link = "logit"))

summary(step(Logit_Model3))

## Start:  AIC=202.64
## target ~ zn + indus + chas + nox + rm + age + dis + rad + tax + 
##     ptratio + lstat + medv + age^2 + ptratio^2 + log(dis) + log(lstat) + 
##     log(nox) + log(zn + 1)
## 
##               Df Deviance    AIC
## - indus        1   168.64 200.64
## - log(zn + 1)  1   168.72 200.72
## - chas         1   168.82 200.82
## - zn           1   169.81 201.81
## - rm           1   169.90 201.90
## <none>             168.63 202.63
## - log(lstat)   1   172.26 204.26
## - lstat        1   173.04 205.04
## - tax          1   173.21 205.21
## - medv         1   173.53 205.53
## - log(nox)     1   177.89 209.89
## - age          1   178.55 210.55
## - dis          1   180.65 212.65
## - ptratio      1   180.73 212.73
## - nox          1   181.24 213.24
## - log(dis)     1   187.76 219.76
## - rad          1   218.84 250.84
## 
## Step:  AIC=200.64
## target ~ zn + chas + nox + rm + age + dis + rad + tax + ptratio + 
##     lstat + medv + log(dis) + log(lstat) + log(nox) + log(zn + 
##     1)
## 
##               Df Deviance    AIC
## - log(zn + 1)  1   168.72 198.72
## - chas         1   168.85 198.85
## - zn           1   169.82 199.82
## - rm           1   169.90 199.90
## <none>             168.64 200.64
## - log(lstat)   1   172.29 202.29
## - lstat        1   173.05 203.05
## - medv         1   173.53 203.53
## - tax          1   174.11 204.11
## - log(nox)     1   177.97 207.97
## - age          1   178.55 208.55
## - ptratio      1   180.89 210.89
## - nox          1   181.35 211.35
## - dis          1   182.25 212.25
## - log(dis)     1   190.73 220.73
## - rad          1   225.34 255.34
## 
## Step:  AIC=198.72
## target ~ zn + chas + nox + rm + age + dis + rad + tax + ptratio + 
##     lstat + medv + log(dis) + log(lstat) + log(nox)
## 
##              Df Deviance    AIC
## - chas        1   168.91 196.91
## - rm          1   169.94 197.94
## <none>            168.72 198.72
## - log(lstat)  1   172.30 200.30
## - zn          1   172.56 200.56
## - lstat       1   173.18 201.18
## - medv        1   173.81 201.81
## - tax         1   174.14 202.14
## - log(nox)    1   177.97 205.97
## - age         1   178.71 206.71
## - nox         1   181.35 209.35
## - ptratio     1   181.88 209.88
## - dis         1   182.87 210.87
## - log(dis)    1   190.91 218.91
## - rad         1   225.35 253.35
## 
## Step:  AIC=196.91
## target ~ zn + nox + rm + age + dis + rad + tax + ptratio + lstat + 
##     medv + log(dis) + log(lstat) + log(nox)
## 
##              Df Deviance    AIC
## - rm          1   170.24 196.24
## <none>            168.91 196.91
## - log(lstat)  1   172.61 198.61
## - zn          1   173.09 199.09
## - lstat       1   173.64 199.64
## - medv        1   174.13 200.13
## - tax         1   174.76 200.76
## - log(nox)    1   178.55 204.55
## - age         1   179.49 205.49
## - ptratio     1   181.89 207.89
## - nox         1   181.99 207.99
## - dis         1   183.61 209.61
## - log(dis)    1   191.61 217.61
## - rad         1   229.19 255.19
## 
## Step:  AIC=196.24
## target ~ zn + nox + age + dis + rad + tax + ptratio + lstat + 
##     medv + log(dis) + log(lstat) + log(nox)
## 
##              Df Deviance    AIC
## <none>            170.24 196.24
## - log(lstat)  1   173.61 197.61
## - medv        1   174.61 198.61
## - zn          1   175.57 199.57
## - lstat       1   175.71 199.71
## - tax         1   176.51 200.51
## - age         1   179.88 203.88
## - log(nox)    1   180.09 204.09
## - ptratio     1   181.90 205.90
## - nox         1   183.53 207.53
## - dis         1   185.10 209.10
## - log(dis)    1   192.54 216.54
## - rad         1   229.40 253.40

## 
## Call:
## glm(formula = target ~ zn + nox + age + dis + rad + tax + ptratio + 
##     lstat + medv + log(dis) + log(lstat) + log(nox), family = binomial(link = "logit"), 
##     data = train_set_full)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9764  -0.2218  -0.0010   0.0003   3.2518  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.304e+02  9.399e+01  -3.516 0.000438 ***
## zn          -6.446e-02  3.253e-02  -1.982 0.047525 *  
## nox          3.794e+02  1.071e+02   3.541 0.000398 ***
## age          3.640e-02  1.250e-02   2.911 0.003598 ** 
## dis         -3.365e+00  9.550e-01  -3.524 0.000425 ***
## rad          8.766e-01  1.742e-01   5.031 4.88e-07 ***
## tax         -7.373e-03  3.100e-03  -2.378 0.017407 *  
## ptratio      4.256e-01  1.301e-01   3.270 0.001075 ** 
## lstat        2.752e-01  1.211e-01   2.272 0.023084 *  
## medv         1.092e-01  5.241e-02   2.084 0.037162 *  
## log(dis)     1.672e+01  3.953e+00   4.229 2.34e-05 ***
## log(lstat)  -2.889e+00  1.607e+00  -1.797 0.072323 .  
## log(nox)    -1.713e+02  5.477e+01  -3.127 0.001763 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.88  on 465  degrees of freedom
## Residual deviance: 170.24  on 453  degrees of freedom
## AIC: 196.24
## 
## Number of Fisher Scoring iterations: 9

With a Residual Deviance of 170 and an AIC of 196, taking the log of “dis”, “lstat” and “nox” does in fact add significant predictive value.

Probit Model

We will now repeat the above with a Probit Model.

Probit_Model1 <- glm(target ~., data=train_set_full, family = binomial (link = "probit"))

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(Probit_Model1)

## 
## Call:
## glm(formula = target ~ ., family = binomial(link = "probit"), 
##     data = train_set_full)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8378  -0.1214   0.0000   0.0000   3.6075  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -21.747727   3.471508  -6.265 3.74e-10 ***
## zn           -0.027575   0.016908  -1.631  0.10291    
## indus        -0.029877   0.027201  -1.098  0.27204    
## chas1         0.492849   0.412977   1.193  0.23271    
## nox          26.331909   4.166104   6.321 2.61e-10 ***
## rm           -0.262020   0.393514  -0.666  0.50551    
## age           0.014564   0.007182   2.028  0.04258 *  
## dis           0.352634   0.119663   2.947  0.00321 ** 
## rad           0.365330   0.088570   4.125 3.71e-05 ***
## tax          -0.003858   0.001670  -2.309  0.02092 *  
## ptratio       0.215668   0.069623   3.098  0.00195 ** 
## lstat         0.036394   0.029595   1.230  0.21880    
## medv          0.092848   0.036097   2.572  0.01011 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.88  on 465  degrees of freedom
## Residual deviance: 195.73  on 453  degrees of freedom
## AIC: 221.73
## 
## Number of Fisher Scoring iterations: 10

Probit_Model2 <- glm(target ~ zn + indus + chas + nox + rm + age + dis + rad + tax+ ptratio + lstat + medv + 
                      age^2 + 
                      ptratio^2 + 
                      log(dis) + 
                      log(lstat) + 
                      log(nox) + 
                      log(zn+1), data=train_set_full, family = binomial (link = "probit"))

summary(step(Probit_Model2))

## Start:  AIC=206.26
## target ~ zn + indus + chas + nox + rm + age + dis + rad + tax + 
##     ptratio + lstat + medv + age^2 + ptratio^2 + log(dis) + log(lstat) + 
##     log(nox) + log(zn + 1)
## 
##               Df Deviance    AIC
## - indus        1   172.28 204.28
## - chas         1   172.51 204.51
## - log(zn + 1)  1   172.67 204.67
## - rm           1   173.69 205.69
## <none>             172.26 206.26
## - zn           1   174.35 206.35
## - log(lstat)   1   175.68 207.68
## - tax          1   177.11 209.11
## - lstat        1   177.18 209.18
## - medv         1   177.25 209.25
## - age          1   180.83 212.83
## - log(nox)     1   182.55 214.55
## - dis          1   185.51 217.51
## - ptratio      1   185.52 217.52
## - nox          1   186.06 218.06
## - log(dis)     1   191.90 223.90
## - rad          1   221.59 253.59
## 
## Step:  AIC=204.28
## target ~ zn + chas + nox + rm + age + dis + rad + tax + ptratio + 
##     lstat + medv + log(dis) + log(lstat) + log(nox) + log(zn + 
##     1)
## 
##               Df Deviance    AIC
## - chas         1   172.59 202.59
## - log(zn + 1)  1   172.67 202.67
## - rm           1   173.69 203.69
## <none>             172.28 204.28
## - zn           1   174.35 204.35
## - log(lstat)   1   175.68 205.68
## - lstat        1   177.20 207.20
## - medv         1   177.25 207.25
## - tax          1   177.75 207.75
## - age          1   180.84 210.84
## - log(nox)     1   182.74 212.74
## - ptratio      1   185.70 215.70
## - nox          1   186.34 216.34
## - dis          1   186.76 216.76
## - log(dis)     1   194.00 224.00
## - rad          1   227.79 257.79
## 
## Step:  AIC=202.58
## target ~ zn + nox + rm + age + dis + rad + tax + ptratio + lstat + 
##     medv + log(dis) + log(lstat) + log(nox) + log(zn + 1)
## 
##               Df Deviance    AIC
## - log(zn + 1)  1   172.92 200.92
## - rm           1   174.08 202.08
## <none>             172.59 202.59
## - zn           1   174.62 202.62
## - log(lstat)   1   176.04 204.04
## - medv         1   177.72 205.72
## - lstat        1   177.74 205.74
## - tax          1   178.43 206.43
## - age          1   181.81 209.81
## - log(nox)     1   183.46 211.46
## - ptratio      1   185.70 213.70
## - nox          1   187.10 215.10
## - dis          1   187.48 215.48
## - log(dis)     1   194.74 222.74
## - rad          1   231.15 259.15
## 
## Step:  AIC=200.92
## target ~ zn + nox + rm + age + dis + rad + tax + ptratio + lstat + 
##     medv + log(dis) + log(lstat) + log(nox)
## 
##              Df Deviance    AIC
## - rm          1   174.31 200.31
## <none>            172.92 200.92
## - log(lstat)  1   176.13 202.13
## - zn          1   176.98 202.98
## - lstat       1   177.98 203.98
## - medv        1   178.28 204.28
## - tax         1   178.69 204.69
## - age         1   182.10 208.10
## - log(nox)    1   183.57 209.57
## - ptratio     1   186.26 212.26
## - nox         1   187.25 213.25
## - dis         1   187.76 213.76
## - log(dis)    1   194.75 220.75
## - rad         1   231.25 257.25
## 
## Step:  AIC=200.3
## target ~ zn + nox + age + dis + rad + tax + ptratio + lstat + 
##     medv + log(dis) + log(lstat) + log(nox)
## 
##              Df Deviance    AIC
## <none>            174.31 200.31
## - log(lstat)  1   177.11 201.11
## - medv        1   178.99 202.99
## - zn          1   179.09 203.09
## - lstat       1   179.84 203.84
## - tax         1   180.71 204.71
## - age         1   182.37 206.37
## - log(nox)    1   184.97 208.97
## - ptratio     1   186.29 210.29
## - nox         1   188.60 212.60
## - dis         1   189.03 213.03
## - log(dis)    1   195.50 219.50
## - rad         1   231.40 255.40

## 
## Call:
## glm(formula = target ~ zn + nox + age + dis + rad + tax + ptratio + 
##     lstat + medv + log(dis) + log(lstat) + log(nox), family = binomial(link = "probit"), 
##     data = train_set_full)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0462  -0.2287   0.0000   0.0000   3.3557  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.785e+02  4.666e+01  -3.826 0.000130 ***
## zn          -2.998e-02  1.539e-02  -1.947 0.051488 .  
## nox          2.052e+02  5.324e+01   3.854 0.000116 ***
## age          1.667e-02  6.378e-03   2.613 0.008971 ** 
## dis         -1.690e+00  4.661e-01  -3.625 0.000289 ***
## rad          4.833e-01  9.033e-02   5.350 8.79e-08 ***
## tax         -4.343e-03  1.710e-03  -2.539 0.011108 *  
## ptratio      2.344e-01  6.876e-02   3.408 0.000654 ***
## lstat        1.470e-01  6.366e-02   2.309 0.020928 *  
## medv         5.913e-02  2.809e-02   2.105 0.035317 *  
## log(dis)     8.541e+00  1.981e+00   4.311 1.63e-05 ***
## log(lstat)  -1.430e+00  8.430e-01  -1.696 0.089818 .  
## log(nox)    -9.240e+01  2.713e+01  -3.406 0.000660 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.88  on 465  degrees of freedom
## Residual deviance: 174.30  on 453  degrees of freedom
## AIC: 200.3
## 
## Number of Fisher Scoring iterations: 12

The Logistic regression yields slighter better results.

HW3 Prep

Christopher Bloome

4/11/2021