Parkinson’s disease (PD) is a neurodegenerative disorder that affects predominately dopamine-producing neurons in a specific area of the brain called substantia nigra.

Symptoms generally develop slowly over years. The progression of symptoms is often a bit different from one person to another due to the diversity of the disease. People with PD may experience:

Let’s come to the dataset. This dataset is from UCI Machine Learning Repository.This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson’s disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals (“name” column). The main aim of the data is to discriminate healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD.

Our problem is to predict any individual as healthy or not.

library(ggplot2)
library(reshape2)
library(ggpubr)
library(caTools)

Let’s load the dataset first.

park=read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data", sep = ",", header = TRUE, stringsAsFactors = FALSE)

Okay. We have our data set called “park”. let’s get a view and summary of the dataset.

head(park,6)
dim(park)
[1] 195  24
str(park)
'data.frame':   195 obs. of  24 variables:
 $ name            : chr  "phon_R01_S01_1" "phon_R01_S01_2" "phon_R01_S01_3" "phon_R01_S01_4" ...
 $ MDVP.Fo.Hz.     : num  120 122 117 117 116 ...
 $ MDVP.Fhi.Hz.    : num  157 149 131 138 142 ...
 $ MDVP.Flo.Hz.    : num  75 114 112 111 111 ...
 $ MDVP.Jitter...  : num  0.00784 0.00968 0.0105 0.00997 0.01284 ...
 $ MDVP.Jitter.Abs.: num  0.00007 0.00008 0.00009 0.00009 0.00011 0.00008 0.00003 0.00003 0.00006 0.00006 ...
 $ MDVP.RAP        : num  0.0037 0.00465 0.00544 0.00502 0.00655 0.00463 0.00155 0.00144 0.00293 0.00268 ...
 $ MDVP.PPQ        : num  0.00554 0.00696 0.00781 0.00698 0.00908 0.0075 0.00202 0.00182 0.00332 0.00332 ...
 $ Jitter.DDP      : num  0.0111 0.0139 0.0163 0.015 0.0197 ...
 $ MDVP.Shimmer    : num  0.0437 0.0613 0.0523 0.0549 0.0643 ...
 $ MDVP.Shimmer.dB.: num  0.426 0.626 0.482 0.517 0.584 0.456 0.14 0.134 0.191 0.255 ...
 $ Shimmer.APQ3    : num  0.0218 0.0313 0.0276 0.0292 0.0349 ...
 $ Shimmer.APQ5    : num  0.0313 0.0452 0.0386 0.0401 0.0483 ...
 $ MDVP.APQ        : num  0.0297 0.0437 0.0359 0.0377 0.0447 ...
 $ Shimmer.DDA     : num  0.0654 0.094 0.0827 0.0877 0.1047 ...
 $ NHR             : num  0.0221 0.0193 0.0131 0.0135 0.0177 ...
 $ HNR             : num  21 19.1 20.7 20.6 19.6 ...
 $ status          : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RPDE            : num  0.415 0.458 0.43 0.435 0.417 ...
 $ DFA             : num  0.815 0.82 0.825 0.819 0.823 ...
 $ spread1         : num  -4.81 -4.08 -4.44 -4.12 -3.75 ...
 $ spread2         : num  0.266 0.336 0.311 0.334 0.235 ...
 $ D2              : num  2.3 2.49 2.34 2.41 2.33 ...
 $ PPE             : num  0.285 0.369 0.333 0.369 0.41 ...
park$status=as.factor(park$status)
summary(park)
     name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.    MDVP.Jitter...    
 Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48   Min.   :0.001680  
 Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29   1st Qu.:0.003460  
 Mode  :character   Median :148.79   Median :175.8   Median :104.31   Median :0.004940  
                    Mean   :154.23   Mean   :197.1   Mean   :116.32   Mean   :0.006220  
                    3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02   3rd Qu.:0.007365  
                    Max.   :260.11   Max.   :592.0   Max.   :239.17   Max.   :0.033160  
 MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ          Jitter.DDP      
 Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920   Min.   :0.002040  
 1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860   1st Qu.:0.004985  
 Median :3.000e-05   Median :0.002500   Median :0.002690   Median :0.007490  
 Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446   Mean   :0.009920  
 3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955   3rd Qu.:0.011505  
 Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580   Max.   :0.064330  
  MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3       Shimmer.APQ5        MDVP.APQ      
 Min.   :0.00954   Min.   :0.0850   Min.   :0.004550   Min.   :0.00570   Min.   :0.00719  
 1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245   1st Qu.:0.00958   1st Qu.:0.01308  
 Median :0.02297   Median :0.2210   Median :0.012790   Median :0.01347   Median :0.01826  
 Mean   :0.02971   Mean   :0.2823   Mean   :0.015664   Mean   :0.01788   Mean   :0.02408  
 3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265   3rd Qu.:0.02238   3rd Qu.:0.02940  
 Max.   :0.11908   Max.   :1.3020   Max.   :0.056470   Max.   :0.07940   Max.   :0.13778  
  Shimmer.DDA           NHR                HNR         status       RPDE       
 Min.   :0.01364   Min.   :0.000650   Min.   : 8.441   0: 48   Min.   :0.2566  
 1st Qu.:0.02474   1st Qu.:0.005925   1st Qu.:19.198   1:147   1st Qu.:0.4213  
 Median :0.03836   Median :0.011660   Median :22.085           Median :0.4960  
 Mean   :0.04699   Mean   :0.024847   Mean   :21.886           Mean   :0.4985  
 3rd Qu.:0.06080   3rd Qu.:0.025640   3rd Qu.:25.076           3rd Qu.:0.5876  
 Max.   :0.16942   Max.   :0.314820   Max.   :33.047           Max.   :0.6852  
      DFA            spread1          spread2               D2             PPE         
 Min.   :0.5743   Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.04454  
 1st Qu.:0.6748   1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:0.13745  
 Median :0.7223   Median :-5.721   Median :0.218885   Median :2.362   Median :0.19405  
 Mean   :0.7181   Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.20655  
 3rd Qu.:0.7619   3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:0.25298  
 Max.   :0.8253   Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :0.52737  

We should have some idea about the features. Let’s have a look.

Anomaly detection and treatment:

na = colSums(is.na(park))
na
            name      MDVP.Fo.Hz.     MDVP.Fhi.Hz.     MDVP.Flo.Hz.   MDVP.Jitter... 
               0                0                0                0                0 
MDVP.Jitter.Abs.         MDVP.RAP         MDVP.PPQ       Jitter.DDP     MDVP.Shimmer 
               0                0                0                0                0 
MDVP.Shimmer.dB.     Shimmer.APQ3     Shimmer.APQ5         MDVP.APQ      Shimmer.DDA 
               0                0                0                0                0 
             NHR              HNR           status             RPDE              DFA 
               0                0                0                0                0 
         spread1          spread2               D2              PPE 
               0                0                0                0 

Great! no missing values.

So “status” is our variable of interest. We would like to predict the class of the ‘status’ of any individuals. it’s a classification problem. We are going to use various classification algorithm for this problem.

Get into the data is very important thing for any analysis. So, we will do some EDA. That is always helpful for knowing the data in a better way.

Exploratory Data Analysis

First let’s have a basic idea of each variable respect to the ‘status’ variable. We are going to plot various box-plots for each variable.

attach(park)
The following objects are masked from park (pos = 10):

    D2, DFA, HNR, Jitter.DDP, MDVP.APQ, MDVP.Fhi.Hz., MDVP.Flo.Hz., MDVP.Fo.Hz.,
    MDVP.Jitter..., MDVP.Jitter.Abs., MDVP.PPQ, MDVP.RAP, MDVP.Shimmer,
    MDVP.Shimmer.dB., name, NHR, PPE, RPDE, Shimmer.APQ3, Shimmer.APQ5,
    Shimmer.DDA, spread1, spread2, status
park.m=melt(park[,-1],id.vars = "status")
p <- ggplot(data = park.m, aes(x=variable, y=value)) + 
             geom_boxplot(aes(fill=status))
p + facet_wrap( ~ variable, scales="free")

Okay! now we have some basic idea how the variables are distributed over ‘status’. Also we are getting an idea of outliers in our variable.

names(park)
 [1] "name"             "MDVP.Fo.Hz."      "MDVP.Fhi.Hz."     "MDVP.Flo.Hz."    
 [5] "MDVP.Jitter..."   "MDVP.Jitter.Abs." "MDVP.RAP"         "MDVP.PPQ"        
 [9] "Jitter.DDP"       "MDVP.Shimmer"     "MDVP.Shimmer.dB." "Shimmer.APQ3"    
[13] "Shimmer.APQ5"     "MDVP.APQ"         "Shimmer.DDA"      "NHR"             
[17] "HNR"              "status"           "RPDE"             "DFA"             
[21] "spread1"          "spread2"          "D2"               "PPE"             

we need to work on it(outliers). We can see there are good numbers of outliers in “MDVP.Fhi.Hz.”, “MDVP.Jitter…”, “MDVP.Jitter.Abs.”,“MDVP.RAP”,“MDVP.PPQ” ,“Jitter.DDP” and “NHR”. And we will replace them by their median. we will create a function called outlier_replace to do that.

outlier_remove=function(x){
    quantiles <- quantile( x, c(.25, .75 ) )
    iqr=IQR(x)
    med=median(x)
    x[ x < quantiles[1]-iqr ] <- med
    x[ x > quantiles[2]+ iqr] <- med
    x
  }

Replacing the outliers of various variables :

park$MDVP.Fhi.Hz.= outlier_remove(park$MDVP.Fhi.Hz.)
park$MDVP.Jitter...= outlier_remove(park$MDVP.Jitter...)
park$MDVP.Jitter.Abs.= outlier_remove(park$MDVP.Jitter.Abs.)
park$MDVP.RAP= outlier_remove(park$MDVP.RAP)
park$MDVP.PPQ= outlier_remove(park$MDVP.PPQ)
park$Jitter.DDP= outlier_remove(park$Jitter.DDP)
park$NHR= outlier_remove(park$NHR)

Okay! Let’s create their boxplots and see is their any outliers or not.

So, no outlier is there.

feature scaling:

park[,-c(1,18)]=scale(park[,-c(1,18)])

now we age going to split our data into training and test data:

split=sample.split(park$status,SplitRatio = .80)
train=subset(park,split==T)
test=subset(park,split==F)

model fitting: Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where ?? refers to the parameters and x represents the independent variables. $ log(odds) = _0 + _1*x_1+….. The log(odds), or log-odds ratio, is defined by ln[p/(1???p)] and expresses the natural logarithm of the ratio between the probability that an event will occur, p(Y=1), to the probability that it will not occur. We are usually concerned with the predicted probability of an event occuring and that is defined by p=1/1+exp^???z, where $ z=_0 + _1*x_1+…..

.

library(caret)
model.lg=glm(data=train[,-1],status~.,family = "binomial")
summary(model.lg)
step(model.lg,direction="backward")  
Start:  AIC=121.49
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer + 
    MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + HNR + RPDE + DFA + spread1 + spread2 + 
    D2 + PPE

                   Df Deviance    AIC
- spread1           1   75.507 119.51
- HNR               1   75.523 119.52
- MDVP.PPQ          1   75.528 119.53
- Jitter.DDP        1   75.536 119.54
- MDVP.RAP          1   75.542 119.54
- MDVP.Fo.Hz.       1   75.574 119.57
- MDVP.Shimmer.dB.  1   75.585 119.58
- MDVP.APQ          1   75.608 119.61
- NHR               1   75.649 119.65
- DFA               1   75.737 119.74
- MDVP.Flo.Hz.      1   75.846 119.85
- RPDE              1   75.928 119.93
- MDVP.Jitter...    1   76.035 120.03
- Shimmer.APQ3      1   76.180 120.18
- Shimmer.DDA       1   76.195 120.19
- PPE               1   76.615 120.61
- MDVP.Jitter.Abs.  1   76.784 120.78
- spread2           1   76.831 120.83
- Shimmer.APQ5      1   76.850 120.85
- MDVP.Shimmer      1   77.214 121.21
- MDVP.Fhi.Hz.      1   77.362 121.36
<none>                  75.488 121.49
- D2                1   77.706 121.71

Step:  AIC=119.51
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer + 
    MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + HNR + RPDE + DFA + spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.PPQ          1   75.540 117.54
- HNR               1   75.547 117.55
- Jitter.DDP        1   75.566 117.57
- MDVP.RAP          1   75.574 117.57
- MDVP.Fo.Hz.       1   75.590 117.59
- MDVP.Shimmer.dB.  1   75.604 117.60
- MDVP.APQ          1   75.630 117.63
- NHR               1   75.747 117.75
- DFA               1   75.766 117.77
- MDVP.Flo.Hz.      1   75.852 117.85
- RPDE              1   75.969 117.97
- MDVP.Jitter...    1   76.048 118.05
- Shimmer.APQ3      1   76.197 118.20
- Shimmer.DDA       1   76.211 118.21
- MDVP.Jitter.Abs.  1   76.817 118.82
- spread2           1   76.834 118.83
- Shimmer.APQ5      1   76.852 118.85
- MDVP.Shimmer      1   77.246 119.25
- MDVP.Fhi.Hz.      1   77.365 119.36
<none>                  75.507 119.51
- D2                1   77.715 119.72
- PPE               1   79.179 121.18

Step:  AIC=117.54
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + Jitter.DDP + MDVP.Shimmer + 
    MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + HNR + RPDE + DFA + spread2 + D2 + PPE

                   Df Deviance    AIC
- HNR               1   75.578 115.58
- MDVP.Fo.Hz.       1   75.615 115.61
- Jitter.DDP        1   75.627 115.63
- MDVP.RAP          1   75.637 115.64
- MDVP.Shimmer.dB.  1   75.656 115.66
- MDVP.APQ          1   75.700 115.70
- NHR               1   75.760 115.76
- DFA               1   75.837 115.84
- MDVP.Flo.Hz.      1   75.853 115.85
- RPDE              1   75.988 115.99
- MDVP.Jitter...    1   76.124 116.12
- Shimmer.APQ3      1   76.267 116.27
- Shimmer.DDA       1   76.283 116.28
- spread2           1   76.848 116.85
- Shimmer.APQ5      1   76.858 116.86
- MDVP.Jitter.Abs.  1   76.970 116.97
- MDVP.Fhi.Hz.      1   77.405 117.41
- MDVP.Shimmer      1   77.416 117.42
<none>                  75.540 117.54
- D2                1   77.758 117.76
- PPE               1   81.451 121.45

Step:  AIC=115.58
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + Jitter.DDP + MDVP.Shimmer + 
    MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + RPDE + DFA + spread2 + D2 + PPE

                   Df Deviance    AIC
- Jitter.DDP        1   75.641 113.64
- MDVP.Fo.Hz.       1   75.648 113.65
- MDVP.RAP          1   75.649 113.65
- MDVP.Shimmer.dB.  1   75.664 113.66
- MDVP.APQ          1   75.715 113.72
- NHR               1   75.760 113.76
- DFA               1   75.846 113.85
- MDVP.Flo.Hz.      1   75.863 113.86
- RPDE              1   75.993 113.99
- MDVP.Jitter...    1   76.124 114.12
- Shimmer.APQ3      1   76.268 114.27
- Shimmer.DDA       1   76.284 114.28
- spread2           1   76.853 114.85
- Shimmer.APQ5      1   76.859 114.86
- MDVP.Jitter.Abs.  1   77.218 115.22
- MDVP.Shimmer      1   77.416 115.42
- MDVP.Fhi.Hz.      1   77.439 115.44
<none>                  75.578 115.58
- D2                1   77.992 115.99
- PPE               1   81.867 119.87

Step:  AIC=113.64
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + MDVP.Shimmer.dB. + 
    Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + NHR + 
    RPDE + DFA + spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.Shimmer.dB.  1   75.701 111.70
- MDVP.Fo.Hz.       1   75.727 111.73
- MDVP.APQ          1   75.798 111.80
- NHR               1   75.818 111.82
- DFA               1   75.931 111.93
- MDVP.Flo.Hz.      1   75.946 111.95
- RPDE              1   76.007 112.01
- MDVP.Jitter...    1   76.195 112.19
- Shimmer.APQ3      1   76.305 112.31
- Shimmer.DDA       1   76.321 112.32
- Shimmer.APQ5      1   76.904 112.90
- spread2           1   76.974 112.97
- MDVP.Jitter.Abs.  1   77.258 113.26
- MDVP.Fhi.Hz.      1   77.551 113.55
- MDVP.Shimmer      1   77.634 113.63
<none>                  75.641 113.64
- D2                1   78.018 114.02
- MDVP.RAP          1   81.460 117.46
- PPE               1   82.101 118.10

Step:  AIC=111.7
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + 
    MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + 
    Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + NHR + RPDE + DFA + 
    spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.Fo.Hz.       1   75.755 109.75
- MDVP.APQ          1   75.835 109.83
- NHR               1   75.892 109.89
- DFA               1   75.943 109.94
- MDVP.Flo.Hz.      1   76.020 110.02
- MDVP.Jitter...    1   76.199 110.20
- Shimmer.APQ3      1   76.318 110.32
- Shimmer.DDA       1   76.334 110.33
- RPDE              1   76.359 110.36
- Shimmer.APQ5      1   77.003 111.00
- spread2           1   77.190 111.19
- MDVP.Jitter.Abs.  1   77.366 111.37
- MDVP.Fhi.Hz.      1   77.563 111.56
- MDVP.Shimmer      1   77.673 111.67
<none>                  75.701 111.70
- D2                1   78.029 112.03
- MDVP.RAP          1   81.465 115.47
- PPE               1   82.369 116.37

Step:  AIC=109.76
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + 
    MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + RPDE + DFA + spread2 + D2 + PPE

                   Df Deviance    AIC
- DFA               1   75.952 107.95
- MDVP.APQ          1   75.957 107.96
- NHR               1   75.967 107.97
- MDVP.Flo.Hz.      1   76.021 108.02
- MDVP.Jitter...    1   76.199 108.20
- Shimmer.APQ3      1   76.335 108.33
- Shimmer.DDA       1   76.350 108.35
- RPDE              1   76.503 108.50
- Shimmer.APQ5      1   77.025 109.03
- spread2           1   77.359 109.36
<none>                  75.755 109.75
- MDVP.Shimmer      1   77.814 109.81
- D2                1   78.161 110.16
- MDVP.Jitter.Abs.  1   78.468 110.47
- MDVP.Fhi.Hz.      1   78.779 110.78
- MDVP.RAP          1   81.667 113.67
- PPE               1   82.377 114.38

Step:  AIC=107.95
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + 
    MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + 
    Shimmer.DDA + NHR + RPDE + spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.APQ          1   76.128 106.13
- MDVP.Flo.Hz.      1   76.188 106.19
- MDVP.Jitter...    1   76.289 106.29
- NHR               1   76.339 106.34
- Shimmer.APQ3      1   76.608 106.61
- Shimmer.DDA       1   76.622 106.62
- Shimmer.APQ5      1   77.025 107.03
- RPDE              1   77.199 107.20
- MDVP.Shimmer      1   77.816 107.82
<none>                  75.952 107.95
- spread2           1   78.234 108.23
- D2                1   78.254 108.25
- MDVP.Jitter.Abs.  1   78.492 108.49
- MDVP.Fhi.Hz.      1   80.774 110.77
- MDVP.RAP          1   81.806 111.81
- PPE               1   82.399 112.40

Step:  AIC=106.13
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + 
    MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA + 
    NHR + RPDE + spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.Flo.Hz.      1   76.375 104.38
- MDVP.Jitter...    1   76.429 104.43
- NHR               1   76.456 104.46
- Shimmer.APQ3      1   76.813 104.81
- Shimmer.DDA       1   76.824 104.82
- Shimmer.APQ5      1   77.161 105.16
- RPDE              1   77.714 105.71
<none>                  76.128 106.13
- spread2           1   78.234 106.23
- D2                1   78.577 106.58
- MDVP.Jitter.Abs.  1   78.749 106.75
- MDVP.Shimmer      1   79.167 107.17
- MDVP.Fhi.Hz.      1   80.885 108.89
- MDVP.RAP          1   81.806 109.81
- PPE               1   82.407 110.41

Step:  AIC=104.37
status ~ MDVP.Fhi.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + MDVP.RAP + 
    MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA + 
    NHR + RPDE + spread2 + D2 + PPE

                   Df Deviance    AIC
- NHR               1   76.534 102.53
- MDVP.Jitter...    1   76.724 102.72
- Shimmer.APQ3      1   77.125 103.12
- Shimmer.DDA       1   77.138 103.14
- Shimmer.APQ5      1   77.661 103.66
- RPDE              1   77.892 103.89
<none>                  76.375 104.38
- spread2           1   78.384 104.38
- D2                1   78.684 104.68
- MDVP.Jitter.Abs.  1   78.949 104.95
- MDVP.Shimmer      1   80.165 106.17
- MDVP.RAP          1   81.991 107.99
- MDVP.Fhi.Hz.      1   82.481 108.48
- PPE               1   84.006 110.01

Step:  AIC=102.53
status ~ MDVP.Fhi.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + MDVP.RAP + 
    MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA + 
    RPDE + spread2 + D2 + PPE

                   Df Deviance    AIC
- MDVP.Jitter...    1   76.982 100.98
- Shimmer.APQ3      1   77.307 101.31
- Shimmer.DDA       1   77.320 101.32
- Shimmer.APQ5      1   78.069 102.07
- spread2           1   78.501 102.50
<none>                  76.534 102.53
- RPDE              1   78.739 102.74
- D2                1   78.802 102.80
- MDVP.Jitter.Abs.  1   79.066 103.07
- MDVP.Shimmer      1   80.471 104.47
- MDVP.RAP          1   81.994 105.99
- MDVP.Fhi.Hz.      1   82.706 106.71
- PPE               1   84.961 108.96

Step:  AIC=100.98
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + 
    Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA + RPDE + spread2 + 
    D2 + PPE

                   Df Deviance     AIC
- Shimmer.APQ3      1   77.756  99.756
- Shimmer.DDA       1   77.768  99.768
- Shimmer.APQ5      1   78.151 100.151
<none>                  76.982 100.982
- RPDE              1   79.063 101.063
- spread2           1   79.164 101.164
- D2                1   79.399 101.399
- MDVP.Shimmer      1   80.626 102.626
- MDVP.Jitter.Abs.  1   81.642 103.642
- MDVP.Fhi.Hz.      1   84.706 106.706
- PPE               1   85.077 107.077
- MDVP.RAP          1   85.102 107.102

Step:  AIC=99.76
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + 
    Shimmer.APQ5 + Shimmer.DDA + RPDE + spread2 + D2 + PPE

                   Df Deviance     AIC
- Shimmer.APQ5      1   78.995  98.995
<none>                  77.756  99.756
- RPDE              1   79.934  99.934
- spread2           1   80.217 100.217
- D2                1   80.262 100.262
- Shimmer.DDA       1   80.696 100.696
- MDVP.Shimmer      1   81.235 101.235
- MDVP.Jitter.Abs.  1   82.264 102.264
- MDVP.Fhi.Hz.      1   85.078 105.078
- MDVP.RAP          1   85.387 105.387
- PPE               1   86.245 106.245

Step:  AIC=98.99
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + 
    Shimmer.DDA + RPDE + spread2 + D2 + PPE

                   Df Deviance     AIC
- Shimmer.DDA       1   80.894  98.894
<none>                  78.995  98.995
- RPDE              1   81.373  99.373
- MDVP.Shimmer      1   81.443  99.443
- D2                1   81.853  99.853
- spread2           1   82.235 100.235
- MDVP.Jitter.Abs.  1   82.288 100.288
- MDVP.RAP          1   85.415 103.415
- MDVP.Fhi.Hz.      1   86.242 104.242
- PPE               1   86.390 104.390

Step:  AIC=98.89
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + 
    RPDE + spread2 + D2 + PPE

                   Df Deviance     AIC
<none>                  80.894  98.894
- RPDE              1   83.187  99.187
- MDVP.Shimmer      1   83.201  99.201
- MDVP.Jitter.Abs.  1   84.181 100.181
- spread2           1   84.286 100.286
- D2                1   84.721 100.721
- MDVP.RAP          1   86.905 102.905
- MDVP.Fhi.Hz.      1   87.919 103.919
- PPE               1   90.571 106.571

Call:  glm(formula = status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + 
    MDVP.Shimmer + RPDE + spread2 + D2 + PPE, family = "binomial", 
    data = train[, -1])

Coefficients:
     (Intercept)      MDVP.Fhi.Hz.  MDVP.Jitter.Abs.          MDVP.RAP      MDVP.Shimmer  
          3.0768           -0.9632           -1.3110            1.3788            0.8969  
            RPDE           spread2                D2               PPE  
         -0.5853            0.7618            0.9286            2.2344  

Degrees of Freedom: 155 Total (i.e. Null);  147 Residual
Null Deviance:      173.2 
Residual Deviance: 80.89    AIC: 98.89

So, we will select the model with lowest AIC. So, our final model will be:

status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + RPDE + spread2 + D2 + PPE

summary(final.model)

Call:
glm(formula = status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + 
    MDVP.Shimmer + RPDE + spread2 + D2 + PPE, family = "binomial", 
    data = train[, -1])

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.17705   0.00159   0.09886   0.34277   1.79517  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)        3.0768     0.5942   5.178 2.24e-07 ***
MDVP.Fhi.Hz.      -0.9632     0.3555  -2.710  0.00673 ** 
MDVP.Jitter.Abs.  -1.3110     0.7168  -1.829  0.06742 .  
MDVP.RAP           1.3788     0.5911   2.333  0.01967 *  
MDVP.Shimmer       0.8969     0.6816   1.316  0.18823    
RPDE              -0.5853     0.3930  -1.489  0.13636    
spread2            0.7618     0.4268   1.785  0.07428 .  
D2                 0.9286     0.5034   1.845  0.06506 .  
PPE                2.2344     0.7706   2.899  0.00374 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 173.217  on 155  degrees of freedom
Residual deviance:  80.894  on 147  degrees of freedom
AIC: 98.894

Number of Fisher Scoring iterations: 7

We will check our model’s accuracy now. First we will use our training data. Let’s see:

pre <- as.numeric(predict(final.model,type="response")>0.45)
confusionMatrix(table(pre,train$status))
Confusion Matrix and Statistics

   
pre   0   1
  0  26   6
  1  12 112
                                          
               Accuracy : 0.8846          
                 95% CI : (0.8238, 0.9302)
    No Information Rate : 0.7564          
    P-Value [Acc > NIR] : 4.658e-05       
                                          
                  Kappa : 0.6692          
 Mcnemar's Test P-Value : 0.2386          
                                          
            Sensitivity : 0.6842          
            Specificity : 0.9492          
         Pos Pred Value : 0.8125          
         Neg Pred Value : 0.9032          
             Prevalence : 0.2436          
         Detection Rate : 0.1667          
   Detection Prevalence : 0.2051          
      Balanced Accuracy : 0.8167          
                                          
       'Positive' Class : 0               
                                          

So, our balanced accuracy is 81.67% . Our model has predicted 138 correct observation out of 156. But it’s quite evident that our model will work good on training data as the model was built on the training data.

We will test our model on the basis of test data set.

pre <- as.numeric(predict(final.model,newdata=test[,-1],type="response")>0.5)
confusionMatrix(table(pre,test$status))
Confusion Matrix and Statistics

   
pre  0  1
  0  6  3
  1  4 26
                                          
               Accuracy : 0.8205          
                 95% CI : (0.6647, 0.9246)
    No Information Rate : 0.7436          
    P-Value [Acc > NIR] : 0.1808          
                                          
                  Kappa : 0.5134          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.6000          
            Specificity : 0.8966          
         Pos Pred Value : 0.6667          
         Neg Pred Value : 0.8667          
             Prevalence : 0.2564          
         Detection Rate : 0.1538          
   Detection Prevalence : 0.2308          
      Balanced Accuracy : 0.7483          
                                          
       'Positive' Class : 0               
                                          

Conclusion: Our model has predicted 32 correct observation out of 39. Our balanced Accuracy is 74.83%.

Also the model performs well on both sensitivity and specificity. Both the measures have more than 60% accuracy. Also the Kappa staistic is well more than 50%.

