Personal

This is a plot that shows us missing values. There are missing values in the util.16.17 so I will replace with the missing values with the median.

glm.fit <-  glm(Y~Distance.bucket + X..of.Seats..17.18. + Spent.2017.2018 + Tenure + Util.2017.2018 + Util.2016.2017 + Phone.Calls + Emails + In.game.visits + Appointments, 
              family = "binomial", data = pdata)
summary(glm.fit)

Call:
glm(formula = Y ~ Distance.bucket + X..of.Seats..17.18. + Spent.2017.2018 + 
    Tenure + Util.2017.2018 + Util.2016.2017 + Phone.Calls + 
    Emails + In.game.visits + Appointments, family = "binomial", 
    data = pdata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2383  -1.1356   0.5665   1.1084   1.6510  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -0.8275465  0.3365040  -2.459 0.013923 *  
Distance.bucket2    -0.2837849  0.1375107  -2.064 0.039043 *  
Distance.bucket3     0.1474949  0.1805140   0.817 0.413881    
Distance.bucket4    -0.0764126  0.1955135  -0.391 0.695923    
Distance.bucket5    -0.4757407  0.3004673  -1.583 0.113345    
Distance.bucket6     0.1011682  0.1718981   0.589 0.556173    
X..of.Seats..17.18. -0.0045560  0.0476717  -0.096 0.923863    
Spent.2017.2018      0.0007661  0.0002028   3.777 0.000159 ***
Tenure               0.0780824  0.0084165   9.277  < 2e-16 ***
Util.2017.2018       0.9523278  0.3148605   3.025 0.002490 ** 
Util.2016.2017      -0.2418799  0.4341632  -0.557 0.577447    
Phone.Calls         -0.0100563  0.0121372  -0.829 0.407360    
Emails               0.6953346  0.9997363   0.696 0.486731    
In.game.visits      -0.7461192  1.0008320  -0.745 0.455970    
Appointments         0.5751476  0.9634874   0.597 0.550545    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2195.7  on 1598  degrees of freedom
Residual deviance: 2031.3  on 1584  degrees of freedom
AIC: 2061.3

Number of Fisher Scoring iterations: 4

Below is a ‘confusion matrix’ which tells us how accurate our model is. The diagonal top left to bottom right is how many predictions were correct, and the off diagonal were wrong predictions.

      
p.pred   0   1
     0 383 285
     1 325 606

This value is how accurate out model is which is ~ 61.8%

mean(p.pred == pdata$Y)
## [1] 0.6185116
# anova(glm.fit,test="Chisq") ## come back to later

Business

Missing data values in Business data set

Business Model


Call:
glm(formula = Y ~ Distance.bucket + X..of.Seats..17.18. + Spent.2017.2018 + 
    Tenure + Util.2017.2018 + Util.2016.2017 + Phone.Calls + 
    Emails + In.game.visits + Appointments, family = "binomial", 
    data = pdata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7104  -1.1398   0.5378   0.8702   1.4978  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -6.113e-01  1.011e+00  -0.605 0.545308    
Distance.bucket2    -1.051e+00  4.179e-01  -2.514 0.011925 *  
Distance.bucket3    -6.343e-01  5.387e-01  -1.177 0.239028    
Distance.bucket4    -1.100e+00  6.129e-01  -1.795 0.072719 .  
Distance.bucket5    -2.451e-01  9.174e-01  -0.267 0.789327    
Distance.bucket6    -8.626e-01  4.477e-01  -1.927 0.054032 .  
X..of.Seats..17.18.  2.176e-01  1.358e-01   1.602 0.109087    
Spent.2017.2018      4.906e-04  3.703e-04   1.325 0.185187    
Tenure               8.120e-02  2.209e-02   3.676 0.000237 ***
Util.2017.2018      -2.945e-01  9.115e-01  -0.323 0.746643    
Util.2016.2017       5.623e-01  1.246e+00   0.451 0.651762    
Phone.Calls          5.694e-02  3.433e-02   1.659 0.097192 .  
Emails               1.433e+01  1.011e+03   0.014 0.988688    
In.game.visits      -1.465e+01  1.011e+03  -0.014 0.988441    
Appointments         1.457e+01  1.011e+03   0.014 0.988505    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 332.45  on 268  degrees of freedom
Residual deviance: 290.94  on 254  degrees of freedom
AIC: 320.94

Number of Fisher Scoring iterations: 14

Business Confusion Matrix

      
p.pred   0   1
     0  20  18
     1  63 168

Business Model Accuracy

mean(p.pred == pdata$Y)
## [1] 0.6988848
# anova(glm.fit,test="Chisq") ## come back to later