Parkinson’s disease (PD) is a neurodegenerative disorder that affects predominately dopamine-producing neurons in a specific area of the brain called substantia nigra.
Symptoms generally develop slowly over years. The progression of symptoms is often a bit different from one person to another due to the diversity of the disease. People with PD may experience:
Let’s come to the dataset. This dataset is from UCI Machine Learning Repository.This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson’s disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals (“name” column). The main aim of the data is to discriminate healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD.
Our problem is to predict any individual as healthy or not.
library(ggplot2)
library(reshape2)
library(ggpubr)
library(caTools)
Let’s load the dataset first.
park=read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data", sep = ",", header = TRUE, stringsAsFactors = FALSE)
Okay. We have our data set called “park”. let’s get a view and summary of the dataset.
head(park,6)
dim(park)
[1] 195 24
str(park)
'data.frame': 195 obs. of 24 variables:
$ name : chr "phon_R01_S01_1" "phon_R01_S01_2" "phon_R01_S01_3" "phon_R01_S01_4" ...
$ MDVP.Fo.Hz. : num 120 122 117 117 116 ...
$ MDVP.Fhi.Hz. : num 157 149 131 138 142 ...
$ MDVP.Flo.Hz. : num 75 114 112 111 111 ...
$ MDVP.Jitter... : num 0.00784 0.00968 0.0105 0.00997 0.01284 ...
$ MDVP.Jitter.Abs.: num 0.00007 0.00008 0.00009 0.00009 0.00011 0.00008 0.00003 0.00003 0.00006 0.00006 ...
$ MDVP.RAP : num 0.0037 0.00465 0.00544 0.00502 0.00655 0.00463 0.00155 0.00144 0.00293 0.00268 ...
$ MDVP.PPQ : num 0.00554 0.00696 0.00781 0.00698 0.00908 0.0075 0.00202 0.00182 0.00332 0.00332 ...
$ Jitter.DDP : num 0.0111 0.0139 0.0163 0.015 0.0197 ...
$ MDVP.Shimmer : num 0.0437 0.0613 0.0523 0.0549 0.0643 ...
$ MDVP.Shimmer.dB.: num 0.426 0.626 0.482 0.517 0.584 0.456 0.14 0.134 0.191 0.255 ...
$ Shimmer.APQ3 : num 0.0218 0.0313 0.0276 0.0292 0.0349 ...
$ Shimmer.APQ5 : num 0.0313 0.0452 0.0386 0.0401 0.0483 ...
$ MDVP.APQ : num 0.0297 0.0437 0.0359 0.0377 0.0447 ...
$ Shimmer.DDA : num 0.0654 0.094 0.0827 0.0877 0.1047 ...
$ NHR : num 0.0221 0.0193 0.0131 0.0135 0.0177 ...
$ HNR : num 21 19.1 20.7 20.6 19.6 ...
$ status : int 1 1 1 1 1 1 1 1 1 1 ...
$ RPDE : num 0.415 0.458 0.43 0.435 0.417 ...
$ DFA : num 0.815 0.82 0.825 0.819 0.823 ...
$ spread1 : num -4.81 -4.08 -4.44 -4.12 -3.75 ...
$ spread2 : num 0.266 0.336 0.311 0.334 0.235 ...
$ D2 : num 2.3 2.49 2.34 2.41 2.33 ...
$ PPE : num 0.285 0.369 0.333 0.369 0.41 ...
park$status=as.factor(park$status)
summary(park)
name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz. MDVP.Jitter...
Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48 Min. :0.001680
Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29 1st Qu.:0.003460
Mode :character Median :148.79 Median :175.8 Median :104.31 Median :0.004940
Mean :154.23 Mean :197.1 Mean :116.32 Mean :0.006220
3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02 3rd Qu.:0.007365
Max. :260.11 Max. :592.0 Max. :239.17 Max. :0.033160
MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ Jitter.DDP
Min. :7.000e-06 Min. :0.000680 Min. :0.000920 Min. :0.002040
1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860 1st Qu.:0.004985
Median :3.000e-05 Median :0.002500 Median :0.002690 Median :0.007490
Mean :4.396e-05 Mean :0.003306 Mean :0.003446 Mean :0.009920
3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955 3rd Qu.:0.011505
Max. :2.600e-04 Max. :0.021440 Max. :0.019580 Max. :0.064330
MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5 MDVP.APQ
Min. :0.00954 Min. :0.0850 Min. :0.004550 Min. :0.00570 Min. :0.00719
1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245 1st Qu.:0.00958 1st Qu.:0.01308
Median :0.02297 Median :0.2210 Median :0.012790 Median :0.01347 Median :0.01826
Mean :0.02971 Mean :0.2823 Mean :0.015664 Mean :0.01788 Mean :0.02408
3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265 3rd Qu.:0.02238 3rd Qu.:0.02940
Max. :0.11908 Max. :1.3020 Max. :0.056470 Max. :0.07940 Max. :0.13778
Shimmer.DDA NHR HNR status RPDE
Min. :0.01364 Min. :0.000650 Min. : 8.441 0: 48 Min. :0.2566
1st Qu.:0.02474 1st Qu.:0.005925 1st Qu.:19.198 1:147 1st Qu.:0.4213
Median :0.03836 Median :0.011660 Median :22.085 Median :0.4960
Mean :0.04699 Mean :0.024847 Mean :21.886 Mean :0.4985
3rd Qu.:0.06080 3rd Qu.:0.025640 3rd Qu.:25.076 3rd Qu.:0.5876
Max. :0.16942 Max. :0.314820 Max. :33.047 Max. :0.6852
DFA spread1 spread2 D2 PPE
Min. :0.5743 Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.04454
1st Qu.:0.6748 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:0.13745
Median :0.7223 Median :-5.721 Median :0.218885 Median :2.362 Median :0.19405
Mean :0.7181 Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.20655
3rd Qu.:0.7619 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:0.25298
Max. :0.8253 Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :0.52737
We should have some idea about the features. Let’s have a look.
na = colSums(is.na(park))
na
name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz. MDVP.Jitter...
0 0 0 0 0
MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ Jitter.DDP MDVP.Shimmer
0 0 0 0 0
MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5 MDVP.APQ Shimmer.DDA
0 0 0 0 0
NHR HNR status RPDE DFA
0 0 0 0 0
spread1 spread2 D2 PPE
0 0 0 0
Great! no missing values.
So “status” is our variable of interest. We would like to predict the class of the ‘status’ of any individuals. it’s a classification problem. We are going to use various classification algorithm for this problem.
Get into the data is very important thing for any analysis. So, we will do some EDA. That is always helpful for knowing the data in a better way.
First let’s have a basic idea of each variable respect to the ‘status’ variable. We are going to plot various box-plots for each variable.
attach(park)
The following objects are masked from park (pos = 10):
D2, DFA, HNR, Jitter.DDP, MDVP.APQ, MDVP.Fhi.Hz., MDVP.Flo.Hz., MDVP.Fo.Hz.,
MDVP.Jitter..., MDVP.Jitter.Abs., MDVP.PPQ, MDVP.RAP, MDVP.Shimmer,
MDVP.Shimmer.dB., name, NHR, PPE, RPDE, Shimmer.APQ3, Shimmer.APQ5,
Shimmer.DDA, spread1, spread2, status
park.m=melt(park[,-1],id.vars = "status")
p <- ggplot(data = park.m, aes(x=variable, y=value)) +
geom_boxplot(aes(fill=status))
p + facet_wrap( ~ variable, scales="free")
Okay! now we have some basic idea how the variables are distributed over ‘status’. Also we are getting an idea of outliers in our variable.
names(park)
[1] "name" "MDVP.Fo.Hz." "MDVP.Fhi.Hz." "MDVP.Flo.Hz."
[5] "MDVP.Jitter..." "MDVP.Jitter.Abs." "MDVP.RAP" "MDVP.PPQ"
[9] "Jitter.DDP" "MDVP.Shimmer" "MDVP.Shimmer.dB." "Shimmer.APQ3"
[13] "Shimmer.APQ5" "MDVP.APQ" "Shimmer.DDA" "NHR"
[17] "HNR" "status" "RPDE" "DFA"
[21] "spread1" "spread2" "D2" "PPE"
we need to work on it(outliers). We can see there are good numbers of outliers in “MDVP.Fhi.Hz.”, “MDVP.Jitter…”, “MDVP.Jitter.Abs.”,“MDVP.RAP”,“MDVP.PPQ” ,“Jitter.DDP” and “NHR”. And we will replace them by their median. we will create a function called outlier_replace to do that.
outlier_remove=function(x){
quantiles <- quantile( x, c(.25, .75 ) )
iqr=IQR(x)
med=median(x)
x[ x < quantiles[1]-iqr ] <- med
x[ x > quantiles[2]+ iqr] <- med
x
}
park$MDVP.Fhi.Hz.= outlier_remove(park$MDVP.Fhi.Hz.)
park$MDVP.Jitter...= outlier_remove(park$MDVP.Jitter...)
park$MDVP.Jitter.Abs.= outlier_remove(park$MDVP.Jitter.Abs.)
park$MDVP.RAP= outlier_remove(park$MDVP.RAP)
park$MDVP.PPQ= outlier_remove(park$MDVP.PPQ)
park$Jitter.DDP= outlier_remove(park$Jitter.DDP)
park$NHR= outlier_remove(park$NHR)
Okay! Let’s create their boxplots and see is their any outliers or not.
So, no outlier is there.
park[,-c(1,18)]=scale(park[,-c(1,18)])
now we age going to split our data into training and test data:
split=sample.split(park$status,SplitRatio = .80)
train=subset(park,split==T)
test=subset(park,split==F)
model fitting: Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where ?? refers to the parameters and x represents the independent variables. $ log(odds) = _0 + _1*x_1+….. The log(odds), or log-odds ratio, is defined by ln[p/(1???p)] and expresses the natural logarithm of the ratio between the probability that an event will occur, p(Y=1), to the probability that it will not occur. We are usually concerned with the predicted probability of an event occuring and that is defined by p=1/1+exp^???z, where $ z=_0 + _1*x_1+…..
.
library(caret)
model.lg=glm(data=train[,-1],status~.,family = "binomial")
summary(model.lg)
step(model.lg,direction="backward")
Start: AIC=121.49
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer +
MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + HNR + RPDE + DFA + spread1 + spread2 +
D2 + PPE
Df Deviance AIC
- spread1 1 75.507 119.51
- HNR 1 75.523 119.52
- MDVP.PPQ 1 75.528 119.53
- Jitter.DDP 1 75.536 119.54
- MDVP.RAP 1 75.542 119.54
- MDVP.Fo.Hz. 1 75.574 119.57
- MDVP.Shimmer.dB. 1 75.585 119.58
- MDVP.APQ 1 75.608 119.61
- NHR 1 75.649 119.65
- DFA 1 75.737 119.74
- MDVP.Flo.Hz. 1 75.846 119.85
- RPDE 1 75.928 119.93
- MDVP.Jitter... 1 76.035 120.03
- Shimmer.APQ3 1 76.180 120.18
- Shimmer.DDA 1 76.195 120.19
- PPE 1 76.615 120.61
- MDVP.Jitter.Abs. 1 76.784 120.78
- spread2 1 76.831 120.83
- Shimmer.APQ5 1 76.850 120.85
- MDVP.Shimmer 1 77.214 121.21
- MDVP.Fhi.Hz. 1 77.362 121.36
<none> 75.488 121.49
- D2 1 77.706 121.71
Step: AIC=119.51
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer +
MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + HNR + RPDE + DFA + spread2 + D2 + PPE
Df Deviance AIC
- MDVP.PPQ 1 75.540 117.54
- HNR 1 75.547 117.55
- Jitter.DDP 1 75.566 117.57
- MDVP.RAP 1 75.574 117.57
- MDVP.Fo.Hz. 1 75.590 117.59
- MDVP.Shimmer.dB. 1 75.604 117.60
- MDVP.APQ 1 75.630 117.63
- NHR 1 75.747 117.75
- DFA 1 75.766 117.77
- MDVP.Flo.Hz. 1 75.852 117.85
- RPDE 1 75.969 117.97
- MDVP.Jitter... 1 76.048 118.05
- Shimmer.APQ3 1 76.197 118.20
- Shimmer.DDA 1 76.211 118.21
- MDVP.Jitter.Abs. 1 76.817 118.82
- spread2 1 76.834 118.83
- Shimmer.APQ5 1 76.852 118.85
- MDVP.Shimmer 1 77.246 119.25
- MDVP.Fhi.Hz. 1 77.365 119.36
<none> 75.507 119.51
- D2 1 77.715 119.72
- PPE 1 79.179 121.18
Step: AIC=117.54
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + Jitter.DDP + MDVP.Shimmer +
MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + HNR + RPDE + DFA + spread2 + D2 + PPE
Df Deviance AIC
- HNR 1 75.578 115.58
- MDVP.Fo.Hz. 1 75.615 115.61
- Jitter.DDP 1 75.627 115.63
- MDVP.RAP 1 75.637 115.64
- MDVP.Shimmer.dB. 1 75.656 115.66
- MDVP.APQ 1 75.700 115.70
- NHR 1 75.760 115.76
- DFA 1 75.837 115.84
- MDVP.Flo.Hz. 1 75.853 115.85
- RPDE 1 75.988 115.99
- MDVP.Jitter... 1 76.124 116.12
- Shimmer.APQ3 1 76.267 116.27
- Shimmer.DDA 1 76.283 116.28
- spread2 1 76.848 116.85
- Shimmer.APQ5 1 76.858 116.86
- MDVP.Jitter.Abs. 1 76.970 116.97
- MDVP.Fhi.Hz. 1 77.405 117.41
- MDVP.Shimmer 1 77.416 117.42
<none> 75.540 117.54
- D2 1 77.758 117.76
- PPE 1 81.451 121.45
Step: AIC=115.58
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + Jitter.DDP + MDVP.Shimmer +
MDVP.Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + RPDE + DFA + spread2 + D2 + PPE
Df Deviance AIC
- Jitter.DDP 1 75.641 113.64
- MDVP.Fo.Hz. 1 75.648 113.65
- MDVP.RAP 1 75.649 113.65
- MDVP.Shimmer.dB. 1 75.664 113.66
- MDVP.APQ 1 75.715 113.72
- NHR 1 75.760 113.76
- DFA 1 75.846 113.85
- MDVP.Flo.Hz. 1 75.863 113.86
- RPDE 1 75.993 113.99
- MDVP.Jitter... 1 76.124 114.12
- Shimmer.APQ3 1 76.268 114.27
- Shimmer.DDA 1 76.284 114.28
- spread2 1 76.853 114.85
- Shimmer.APQ5 1 76.859 114.86
- MDVP.Jitter.Abs. 1 77.218 115.22
- MDVP.Shimmer 1 77.416 115.42
- MDVP.Fhi.Hz. 1 77.439 115.44
<none> 75.578 115.58
- D2 1 77.992 115.99
- PPE 1 81.867 119.87
Step: AIC=113.64
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + MDVP.Shimmer.dB. +
Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + NHR +
RPDE + DFA + spread2 + D2 + PPE
Df Deviance AIC
- MDVP.Shimmer.dB. 1 75.701 111.70
- MDVP.Fo.Hz. 1 75.727 111.73
- MDVP.APQ 1 75.798 111.80
- NHR 1 75.818 111.82
- DFA 1 75.931 111.93
- MDVP.Flo.Hz. 1 75.946 111.95
- RPDE 1 76.007 112.01
- MDVP.Jitter... 1 76.195 112.19
- Shimmer.APQ3 1 76.305 112.31
- Shimmer.DDA 1 76.321 112.32
- Shimmer.APQ5 1 76.904 112.90
- spread2 1 76.974 112.97
- MDVP.Jitter.Abs. 1 77.258 113.26
- MDVP.Fhi.Hz. 1 77.551 113.55
- MDVP.Shimmer 1 77.634 113.63
<none> 75.641 113.64
- D2 1 78.018 114.02
- MDVP.RAP 1 81.460 117.46
- PPE 1 82.101 118.10
Step: AIC=111.7
status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... +
MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 +
Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + NHR + RPDE + DFA +
spread2 + D2 + PPE
Df Deviance AIC
- MDVP.Fo.Hz. 1 75.755 109.75
- MDVP.APQ 1 75.835 109.83
- NHR 1 75.892 109.89
- DFA 1 75.943 109.94
- MDVP.Flo.Hz. 1 76.020 110.02
- MDVP.Jitter... 1 76.199 110.20
- Shimmer.APQ3 1 76.318 110.32
- Shimmer.DDA 1 76.334 110.33
- RPDE 1 76.359 110.36
- Shimmer.APQ5 1 77.003 111.00
- spread2 1 77.190 111.19
- MDVP.Jitter.Abs. 1 77.366 111.37
- MDVP.Fhi.Hz. 1 77.563 111.56
- MDVP.Shimmer 1 77.673 111.67
<none> 75.701 111.70
- D2 1 78.029 112.03
- MDVP.RAP 1 81.465 115.47
- PPE 1 82.369 116.37
Step: AIC=109.76
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. +
MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + RPDE + DFA + spread2 + D2 + PPE
Df Deviance AIC
- DFA 1 75.952 107.95
- MDVP.APQ 1 75.957 107.96
- NHR 1 75.967 107.97
- MDVP.Flo.Hz. 1 76.021 108.02
- MDVP.Jitter... 1 76.199 108.20
- Shimmer.APQ3 1 76.335 108.33
- Shimmer.DDA 1 76.350 108.35
- RPDE 1 76.503 108.50
- Shimmer.APQ5 1 77.025 109.03
- spread2 1 77.359 109.36
<none> 75.755 109.75
- MDVP.Shimmer 1 77.814 109.81
- D2 1 78.161 110.16
- MDVP.Jitter.Abs. 1 78.468 110.47
- MDVP.Fhi.Hz. 1 78.779 110.78
- MDVP.RAP 1 81.667 113.67
- PPE 1 82.377 114.38
Step: AIC=107.95
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. +
MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ +
Shimmer.DDA + NHR + RPDE + spread2 + D2 + PPE
Df Deviance AIC
- MDVP.APQ 1 76.128 106.13
- MDVP.Flo.Hz. 1 76.188 106.19
- MDVP.Jitter... 1 76.289 106.29
- NHR 1 76.339 106.34
- Shimmer.APQ3 1 76.608 106.61
- Shimmer.DDA 1 76.622 106.62
- Shimmer.APQ5 1 77.025 107.03
- RPDE 1 77.199 107.20
- MDVP.Shimmer 1 77.816 107.82
<none> 75.952 107.95
- spread2 1 78.234 108.23
- D2 1 78.254 108.25
- MDVP.Jitter.Abs. 1 78.492 108.49
- MDVP.Fhi.Hz. 1 80.774 110.77
- MDVP.RAP 1 81.806 111.81
- PPE 1 82.399 112.40
Step: AIC=106.13
status ~ MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. +
MDVP.RAP + MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA +
NHR + RPDE + spread2 + D2 + PPE
Df Deviance AIC
- MDVP.Flo.Hz. 1 76.375 104.38
- MDVP.Jitter... 1 76.429 104.43
- NHR 1 76.456 104.46
- Shimmer.APQ3 1 76.813 104.81
- Shimmer.DDA 1 76.824 104.82
- Shimmer.APQ5 1 77.161 105.16
- RPDE 1 77.714 105.71
<none> 76.128 106.13
- spread2 1 78.234 106.23
- D2 1 78.577 106.58
- MDVP.Jitter.Abs. 1 78.749 106.75
- MDVP.Shimmer 1 79.167 107.17
- MDVP.Fhi.Hz. 1 80.885 108.89
- MDVP.RAP 1 81.806 109.81
- PPE 1 82.407 110.41
Step: AIC=104.37
status ~ MDVP.Fhi.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + MDVP.RAP +
MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA +
NHR + RPDE + spread2 + D2 + PPE
Df Deviance AIC
- NHR 1 76.534 102.53
- MDVP.Jitter... 1 76.724 102.72
- Shimmer.APQ3 1 77.125 103.12
- Shimmer.DDA 1 77.138 103.14
- Shimmer.APQ5 1 77.661 103.66
- RPDE 1 77.892 103.89
<none> 76.375 104.38
- spread2 1 78.384 104.38
- D2 1 78.684 104.68
- MDVP.Jitter.Abs. 1 78.949 104.95
- MDVP.Shimmer 1 80.165 106.17
- MDVP.RAP 1 81.991 107.99
- MDVP.Fhi.Hz. 1 82.481 108.48
- PPE 1 84.006 110.01
Step: AIC=102.53
status ~ MDVP.Fhi.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + MDVP.RAP +
MDVP.Shimmer + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA +
RPDE + spread2 + D2 + PPE
Df Deviance AIC
- MDVP.Jitter... 1 76.982 100.98
- Shimmer.APQ3 1 77.307 101.31
- Shimmer.DDA 1 77.320 101.32
- Shimmer.APQ5 1 78.069 102.07
- spread2 1 78.501 102.50
<none> 76.534 102.53
- RPDE 1 78.739 102.74
- D2 1 78.802 102.80
- MDVP.Jitter.Abs. 1 79.066 103.07
- MDVP.Shimmer 1 80.471 104.47
- MDVP.RAP 1 81.994 105.99
- MDVP.Fhi.Hz. 1 82.706 106.71
- PPE 1 84.961 108.96
Step: AIC=100.98
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer +
Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.DDA + RPDE + spread2 +
D2 + PPE
Df Deviance AIC
- Shimmer.APQ3 1 77.756 99.756
- Shimmer.DDA 1 77.768 99.768
- Shimmer.APQ5 1 78.151 100.151
<none> 76.982 100.982
- RPDE 1 79.063 101.063
- spread2 1 79.164 101.164
- D2 1 79.399 101.399
- MDVP.Shimmer 1 80.626 102.626
- MDVP.Jitter.Abs. 1 81.642 103.642
- MDVP.Fhi.Hz. 1 84.706 106.706
- PPE 1 85.077 107.077
- MDVP.RAP 1 85.102 107.102
Step: AIC=99.76
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer +
Shimmer.APQ5 + Shimmer.DDA + RPDE + spread2 + D2 + PPE
Df Deviance AIC
- Shimmer.APQ5 1 78.995 98.995
<none> 77.756 99.756
- RPDE 1 79.934 99.934
- spread2 1 80.217 100.217
- D2 1 80.262 100.262
- Shimmer.DDA 1 80.696 100.696
- MDVP.Shimmer 1 81.235 101.235
- MDVP.Jitter.Abs. 1 82.264 102.264
- MDVP.Fhi.Hz. 1 85.078 105.078
- MDVP.RAP 1 85.387 105.387
- PPE 1 86.245 106.245
Step: AIC=98.99
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer +
Shimmer.DDA + RPDE + spread2 + D2 + PPE
Df Deviance AIC
- Shimmer.DDA 1 80.894 98.894
<none> 78.995 98.995
- RPDE 1 81.373 99.373
- MDVP.Shimmer 1 81.443 99.443
- D2 1 81.853 99.853
- spread2 1 82.235 100.235
- MDVP.Jitter.Abs. 1 82.288 100.288
- MDVP.RAP 1 85.415 103.415
- MDVP.Fhi.Hz. 1 86.242 104.242
- PPE 1 86.390 104.390
Step: AIC=98.89
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer +
RPDE + spread2 + D2 + PPE
Df Deviance AIC
<none> 80.894 98.894
- RPDE 1 83.187 99.187
- MDVP.Shimmer 1 83.201 99.201
- MDVP.Jitter.Abs. 1 84.181 100.181
- spread2 1 84.286 100.286
- D2 1 84.721 100.721
- MDVP.RAP 1 86.905 102.905
- MDVP.Fhi.Hz. 1 87.919 103.919
- PPE 1 90.571 106.571
Call: glm(formula = status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP +
MDVP.Shimmer + RPDE + spread2 + D2 + PPE, family = "binomial",
data = train[, -1])
Coefficients:
(Intercept) MDVP.Fhi.Hz. MDVP.Jitter.Abs. MDVP.RAP MDVP.Shimmer
3.0768 -0.9632 -1.3110 1.3788 0.8969
RPDE spread2 D2 PPE
-0.5853 0.7618 0.9286 2.2344
Degrees of Freedom: 155 Total (i.e. Null); 147 Residual
Null Deviance: 173.2
Residual Deviance: 80.89 AIC: 98.89
So, we will select the model with lowest AIC. So, our final model will be:
status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP + MDVP.Shimmer + RPDE + spread2 + D2 + PPE
summary(final.model)
Call:
glm(formula = status ~ MDVP.Fhi.Hz. + MDVP.Jitter.Abs. + MDVP.RAP +
MDVP.Shimmer + RPDE + spread2 + D2 + PPE, family = "binomial",
data = train[, -1])
Deviance Residuals:
Min 1Q Median 3Q Max
-2.17705 0.00159 0.09886 0.34277 1.79517
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0768 0.5942 5.178 2.24e-07 ***
MDVP.Fhi.Hz. -0.9632 0.3555 -2.710 0.00673 **
MDVP.Jitter.Abs. -1.3110 0.7168 -1.829 0.06742 .
MDVP.RAP 1.3788 0.5911 2.333 0.01967 *
MDVP.Shimmer 0.8969 0.6816 1.316 0.18823
RPDE -0.5853 0.3930 -1.489 0.13636
spread2 0.7618 0.4268 1.785 0.07428 .
D2 0.9286 0.5034 1.845 0.06506 .
PPE 2.2344 0.7706 2.899 0.00374 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 173.217 on 155 degrees of freedom
Residual deviance: 80.894 on 147 degrees of freedom
AIC: 98.894
Number of Fisher Scoring iterations: 7
We will check our model’s accuracy now. First we will use our training data. Let’s see:
pre <- as.numeric(predict(final.model,type="response")>0.45)
confusionMatrix(table(pre,train$status))
Confusion Matrix and Statistics
pre 0 1
0 26 6
1 12 112
Accuracy : 0.8846
95% CI : (0.8238, 0.9302)
No Information Rate : 0.7564
P-Value [Acc > NIR] : 4.658e-05
Kappa : 0.6692
Mcnemar's Test P-Value : 0.2386
Sensitivity : 0.6842
Specificity : 0.9492
Pos Pred Value : 0.8125
Neg Pred Value : 0.9032
Prevalence : 0.2436
Detection Rate : 0.1667
Detection Prevalence : 0.2051
Balanced Accuracy : 0.8167
'Positive' Class : 0
So, our balanced accuracy is 81.67% . Our model has predicted 138 correct observation out of 156. But it’s quite evident that our model will work good on training data as the model was built on the training data.
We will test our model on the basis of test data set.
pre <- as.numeric(predict(final.model,newdata=test[,-1],type="response")>0.5)
confusionMatrix(table(pre,test$status))
Confusion Matrix and Statistics
pre 0 1
0 6 3
1 4 26
Accuracy : 0.8205
95% CI : (0.6647, 0.9246)
No Information Rate : 0.7436
P-Value [Acc > NIR] : 0.1808
Kappa : 0.5134
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.6000
Specificity : 0.8966
Pos Pred Value : 0.6667
Neg Pred Value : 0.8667
Prevalence : 0.2564
Detection Rate : 0.1538
Detection Prevalence : 0.2308
Balanced Accuracy : 0.7483
'Positive' Class : 0
Also the model performs well on both sensitivity and specificity. Both the measures have more than 60% accuracy. Also the Kappa staistic is well more than 50%.