1 Introduction

Scotty turns out being a very popular service in Turkey! This is a good thing, but also brings a new problem: “There is no drivers!”. The demands for Scotty began to overload, at some region and some times, and there was not enough driver at those times and places. Fortunately, we are know that we can use classification model to predict which region and times are risky enough to have this “no drivers” problem.

Based on “scotty-cl-cov/data/data-train.csv” data (up to Saturday, December 2nd 2017), we are going to make a prediction model report that would be evaluated on next 7 days (Sunday, December 3rd 2017 to Monday, December 9th 2017). The prediction will cover the predicted coverage status for each hour and each area: “sufficient” or “insufficient”

2 Reading Data and Preprocessing

2.2 Preprocessing

## pad applied on the interval: hour
## pad applied on the interval: hour
## pad applied on the interval: hour

3 Data Exploration

4 Modeling

4.1 NN Model

4.1.1 Model Architecture

## Model: "model"
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## input (InputLayer)               [(None, 61)]                  0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 64)                    3968        
## ___________________________________________________________________________
## dense_1_act (LeakyReLU)          (None, 64)                    0           
## ___________________________________________________________________________
## dense_1_bn (BatchNormalization)  (None, 64)                    256         
## ___________________________________________________________________________
## dense_1_dp (Dropout)             (None, 64)                    0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 32)                    2080        
## ___________________________________________________________________________
## dense_2_act (LeakyReLU)          (None, 32)                    0           
## ___________________________________________________________________________
## dense_2_bn (BatchNormalization)  (None, 32)                    128         
## ___________________________________________________________________________
## dense_2_dp (Dropout)             (None, 32)                    0           
## ___________________________________________________________________________
## output (Dense)                   (None, 2)                     66          
## ___________________________________________________________________________
## output_bn (BatchNormalization)   (None, 2)                     8           
## ___________________________________________________________________________
## output_act (Activation)          (None, 2)                     0           
## ===========================================================================
## Total params: 6,506
## Trainable params: 6,310
## Non-trainable params: 196
## ___________________________________________________________________________

4.1.3 Model Evaluation

4.1.5 Confusion Matrix

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  confirmed nodrivers
##   confirmed        63        23
##   nodrivers        32        83
##                                           
##                Accuracy : 0.7264          
##                  95% CI : (0.6592, 0.7867)
##     No Information Rate : 0.5274          
##     P-Value [Acc > NIR] : 6.066e-09       
##                                           
##                   Kappa : 0.4484          
##                                           
##  Mcnemar's Test P-Value : 0.2807          
##                                           
##             Sensitivity : 0.6632          
##             Specificity : 0.7830          
##          Pos Pred Value : 0.7326          
##          Neg Pred Value : 0.7217          
##              Prevalence : 0.4726          
##          Detection Rate : 0.3134          
##    Detection Prevalence : 0.4279          
##       Balanced Accuracy : 0.7231          
##                                           
##        'Positive' Class : confirmed       
## 

4.2 Model NaiveBayes

We try the navie bayes model as all variables seem to be factors, in which we can take the assumption of all variables being important and independent. ### Modeling

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  confirmed nodrivers
##   confirmed        63        28
##   nodrivers        32        78
##                                           
##                Accuracy : 0.7015          
##                  95% CI : (0.6331, 0.7638)
##     No Information Rate : 0.5274          
##     P-Value [Acc > NIR] : 3.632e-07       
##                                           
##                   Kappa : 0.3999          
##                                           
##  Mcnemar's Test P-Value : 0.6985          
##                                           
##             Sensitivity : 0.6632          
##             Specificity : 0.7358          
##          Pos Pred Value : 0.6923          
##          Neg Pred Value : 0.7091          
##              Prevalence : 0.4726          
##          Detection Rate : 0.3134          
##    Detection Prevalence : 0.4527          
##       Balanced Accuracy : 0.6995          
##                                           
##        'Positive' Class : confirmed       
## 
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  confirmed nodrivers
##   confirmed        63        28
##   nodrivers        32        78
##                                           
##                Accuracy : 0.7015          
##                  95% CI : (0.6331, 0.7638)
##     No Information Rate : 0.5274          
##     P-Value [Acc > NIR] : 3.632e-07       
##                                           
##                   Kappa : 0.3999          
##                                           
##  Mcnemar's Test P-Value : 0.6985          
##                                           
##             Sensitivity : 0.6632          
##             Specificity : 0.7358          
##          Pos Pred Value : 0.6923          
##          Neg Pred Value : 0.7091          
##              Prevalence : 0.4726          
##          Detection Rate : 0.3134          
##    Detection Prevalence : 0.4527          
##       Balanced Accuracy : 0.6995          
##                                           
##        'Positive' Class : confirmed       
## 

5 Comparing and Choosing Models

##  Accuracy 
## 0.7263682
##  Accuracy 
## 0.7014925
##  Accuracy 
## 0.7014925

From the metrics above we choose the Neural Network Model as the best model by comparing their metrics of accuracy. (Sensitivity,Accuracy, Pos Pred, Specificity)

5.1 Choosing treshold to maximize metrics

6 Predicting to Data eval

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  confirmed nodrivers
##   confirmed       226        32
##   nodrivers        53       193
##                                          
##                Accuracy : 0.8313         
##                  95% CI : (0.7957, 0.863)
##     No Information Rate : 0.5536         
##     P-Value [Acc > NIR] : < 2e-16        
##                                          
##                   Kappa : 0.6618         
##                                          
##  Mcnemar's Test P-Value : 0.03006        
##                                          
##             Sensitivity : 0.8100         
##             Specificity : 0.8578         
##          Pos Pred Value : 0.8760         
##          Neg Pred Value : 0.7846         
##              Prevalence : 0.5536         
##          Detection Rate : 0.4484         
##    Detection Prevalence : 0.5119         
##       Balanced Accuracy : 0.8339         
##                                          
##        'Positive' Class : confirmed      
##