AIM :Developing and Creating a suitable machine learning model for the dataset assigned. Compare the results with at-least two machine learning Algorithms.

Dataset : Parkinsons Disease Data Set

Link : https://archive.ics.uci.edu/ml/datasets/parkinsons

LITERATURE SURVEY OF PARKINSONS DATASET

2. Literature Survey of your dataset

Title: Parkinsons Disease Data Set Abstract: Oxford Parkinson’s Disease Detection Dataset

Data Set Characteristics: Multivariate
Number of Instances: 197
Area: Life
Attribute Characteristics: Real
Number of Attributes: 23
Date Donated: 2008-06-26
Associated Tasks: Classification
Missing Values? N/A

Source:

The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders.

Data Set Information:

This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson’s disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals (“name” column). The main aim of the data is to discriminate healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD.

The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem ‘@’ robots.ox.ac.uk).

ALGORITHM-1

KNN-K Nearest neighbour

importing dataset from the link

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"

storing as dataset2 by removing , in header and attributes and converting into a csv

dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1:17,19:24,18)]

remove column-1(name)

dataset=dataset[,-1]
dataset$status=factor(dataset$status,levels=c(0,1))
library(caTools)
set.seed(123)

splitting dataset into train_set and test_set

split=sample.split(Y=dataset$status,SplitRatio=2/3)
train_set=subset(x=dataset,split==T)
test_set=subset(x=dataset,split==F)

feature scaling

train_set[-23]=scale(train_set[-23])
test_set[-23]=scale(test_set[-23])
library(class)

prediction

y_pred=knn(train_set[,-23],test=test_set[,-23],cl=train_set[,23],k=5)
y_pred
##  [1] 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 1 1 1 1 1
## Levels: 0 1
cm=table(test_set[,23],y_pred)
cm
##    y_pred
##      0  1
##   0 10  6
##   1  3 46
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.8615385

Summarization

summary(dataset)
##   MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.    MDVP.Jitter...    
##  Min.   : 88.33   Min.   :102.1   Min.   : 65.48   Min.   :0.001680  
##  1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29   1st Qu.:0.003460  
##  Median :148.79   Median :175.8   Median :104.31   Median :0.004940  
##  Mean   :154.23   Mean   :197.1   Mean   :116.32   Mean   :0.006220  
##  3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02   3rd Qu.:0.007365  
##  Max.   :260.11   Max.   :592.0   Max.   :239.17   Max.   :0.033160  
##  MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ          Jitter.DDP      
##  Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920   Min.   :0.002040  
##  1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860   1st Qu.:0.004985  
##  Median :3.000e-05   Median :0.002500   Median :0.002690   Median :0.007490  
##  Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446   Mean   :0.009920  
##  3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955   3rd Qu.:0.011505  
##  Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580   Max.   :0.064330  
##   MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3       Shimmer.APQ5    
##  Min.   :0.00954   Min.   :0.0850   Min.   :0.004550   Min.   :0.00570  
##  1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245   1st Qu.:0.00958  
##  Median :0.02297   Median :0.2210   Median :0.012790   Median :0.01347  
##  Mean   :0.02971   Mean   :0.2823   Mean   :0.015664   Mean   :0.01788  
##  3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265   3rd Qu.:0.02238  
##  Max.   :0.11908   Max.   :1.3020   Max.   :0.056470   Max.   :0.07940  
##     MDVP.APQ        Shimmer.DDA           NHR                HNR        
##  Min.   :0.00719   Min.   :0.01364   Min.   :0.000650   Min.   : 8.441  
##  1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925   1st Qu.:19.198  
##  Median :0.01826   Median :0.03836   Median :0.011660   Median :22.085  
##  Mean   :0.02408   Mean   :0.04699   Mean   :0.024847   Mean   :21.886  
##  3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640   3rd Qu.:25.076  
##  Max.   :0.13778   Max.   :0.16942   Max.   :0.314820   Max.   :33.047  
##       RPDE             DFA            spread1          spread2        
##  Min.   :0.2566   Min.   :0.5743   Min.   :-7.965   Min.   :0.006274  
##  1st Qu.:0.4213   1st Qu.:0.6748   1st Qu.:-6.450   1st Qu.:0.174350  
##  Median :0.4960   Median :0.7223   Median :-5.721   Median :0.218885  
##  Mean   :0.4985   Mean   :0.7181   Mean   :-5.684   Mean   :0.226510  
##  3rd Qu.:0.5876   3rd Qu.:0.7619   3rd Qu.:-5.046   3rd Qu.:0.279234  
##  Max.   :0.6852   Max.   :0.8253   Max.   :-2.434   Max.   :0.450493  
##        D2             PPE          status 
##  Min.   :1.423   Min.   :0.04454   0: 48  
##  1st Qu.:2.099   1st Qu.:0.13745   1:147  
##  Median :2.362   Median :0.19405          
##  Mean   :2.382   Mean   :0.20655          
##  3rd Qu.:2.636   3rd Qu.:0.25298          
##  Max.   :3.671   Max.   :0.52737
summary(y_pred)
##  0  1 
## 13 52
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 23.96, df = 1, p-value = 9.833e-07
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8615  0.8615  0.8615  0.8615  0.8615  0.8615

Visualization( no visualization for KNN)

plot(y_pred,main="y_pred ANNEM SHIVAJI(20MIC0091)")

ANALYSIS:

We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using KNN machine learning algorithm here. We split dataset in the ratio 2/3 into test set and train set. Then we did feature scaling and performed prediction using knn() method from class library. The accuracy of the model is 0.8615385. The accuracy of the model is greater than 0.5. So it is a good machine learning algorithm for this dataset.

ALGORITHM 2

Naive Bayes

importing the dataset

#importing the dataset
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
View(dataset2)
dataset=dataset2[, c(1:17,19:24,18)]
View(dataset)
dataset=dataset[2:24]
dataset$status=factor(dataset$status,levels=c(0,1))
library(caTools)
set.seed(123)

splitting the dataset into test_set and train_set

split=sample.split(Y=dataset$status,SplitRatio =.75)
train_set=subset(x=dataset,split==T)
test_set=subset(x=dataset,split==F)

feature scaling

train_set[-23]=scale(train_set[-23])
test_set[-23]=scale(test_set[-23])
library(e1071)
classifier=naiveBayes(x=train_set[,-23],y=train_set$status)

prediction

y_pred=predict(object=classifier,newdata=test_set[,-23])
cm=table(test_set$status,y_pred)
cm
##    y_pred
##      0  1
##   0  8  4
##   1 12 25
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.6734694

Summarization

summary(dataset)
##   MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.    MDVP.Jitter...    
##  Min.   : 88.33   Min.   :102.1   Min.   : 65.48   Min.   :0.001680  
##  1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29   1st Qu.:0.003460  
##  Median :148.79   Median :175.8   Median :104.31   Median :0.004940  
##  Mean   :154.23   Mean   :197.1   Mean   :116.32   Mean   :0.006220  
##  3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02   3rd Qu.:0.007365  
##  Max.   :260.11   Max.   :592.0   Max.   :239.17   Max.   :0.033160  
##  MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ          Jitter.DDP      
##  Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920   Min.   :0.002040  
##  1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860   1st Qu.:0.004985  
##  Median :3.000e-05   Median :0.002500   Median :0.002690   Median :0.007490  
##  Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446   Mean   :0.009920  
##  3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955   3rd Qu.:0.011505  
##  Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580   Max.   :0.064330  
##   MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3       Shimmer.APQ5    
##  Min.   :0.00954   Min.   :0.0850   Min.   :0.004550   Min.   :0.00570  
##  1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245   1st Qu.:0.00958  
##  Median :0.02297   Median :0.2210   Median :0.012790   Median :0.01347  
##  Mean   :0.02971   Mean   :0.2823   Mean   :0.015664   Mean   :0.01788  
##  3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265   3rd Qu.:0.02238  
##  Max.   :0.11908   Max.   :1.3020   Max.   :0.056470   Max.   :0.07940  
##     MDVP.APQ        Shimmer.DDA           NHR                HNR        
##  Min.   :0.00719   Min.   :0.01364   Min.   :0.000650   Min.   : 8.441  
##  1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925   1st Qu.:19.198  
##  Median :0.01826   Median :0.03836   Median :0.011660   Median :22.085  
##  Mean   :0.02408   Mean   :0.04699   Mean   :0.024847   Mean   :21.886  
##  3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640   3rd Qu.:25.076  
##  Max.   :0.13778   Max.   :0.16942   Max.   :0.314820   Max.   :33.047  
##       RPDE             DFA            spread1          spread2        
##  Min.   :0.2566   Min.   :0.5743   Min.   :-7.965   Min.   :0.006274  
##  1st Qu.:0.4213   1st Qu.:0.6748   1st Qu.:-6.450   1st Qu.:0.174350  
##  Median :0.4960   Median :0.7223   Median :-5.721   Median :0.218885  
##  Mean   :0.4985   Mean   :0.7181   Mean   :-5.684   Mean   :0.226510  
##  3rd Qu.:0.5876   3rd Qu.:0.7619   3rd Qu.:-5.046   3rd Qu.:0.279234  
##  Max.   :0.6852   Max.   :0.8253   Max.   :-2.434   Max.   :0.450493  
##        D2             PPE          status 
##  Min.   :1.423   Min.   :0.04454   0: 48  
##  1st Qu.:2.099   1st Qu.:0.13745   1:147  
##  Median :2.362   Median :0.19405          
##  Mean   :2.382   Mean   :0.20655          
##  3rd Qu.:2.636   3rd Qu.:0.25298          
##  Max.   :3.671   Max.   :0.52737
summary(classifier)
##           Length Class  Mode     
## apriori    2     table  numeric  
## tables    22     -none- list     
## levels     2     -none- character
## isnumeric 22     -none- logical  
## call       3     -none- call
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 49 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 4.396, df = 1, p-value = 0.03602
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6735  0.6735  0.6735  0.6735  0.6735  0.6735

Visualization(No visualization for Naive Bayes)

plot(y_pred,main="y_pred ANNEM SHIVAJI(20MIC0091)")

ANALYSIS:

We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using naiveBayes machine learning algorithm here. We split dataset in the ratio 3/4 into test set and train set. Then we did feature scaling and performed prediction and classifier using naiveBayes() method from e1071 library. The accuracy of the model is 0.6734694. The accuracy of the model is greater than 0.5. So it is also a good machine learning algorithm for this dataset.

ALGORITHM 3

Total 5 Decision trees on 5 different dependencies:

Decision Tree-1

importing the dataset

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Cleaning dataset

clean_dataset=dataset%>%select(MDVP.Fo.Hz.:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)

Splitting dataset

split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(HNR+DFA+NHR+RPDE+MDVP.Fo.Hz.+MDVP.Fhi.Hz.+MDVP.Flo.Hz.+MDVP.Jitter...+
           MDVP.Jitter.Abs.+MDVP.RAP+MDVP.PPQ+Jitter.DDP+MDVP.Shimmer+
            MDVP.Shimmer.dB.+Shimmer.APQ3+Shimmer.APQ5+MDVP.APQ+Shimmer.DDA+PPE+spread1+spread2+D2)
            ,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)

Prediction

predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
##      predict_unseen
##       No Yes
##   No  12   4
##   Yes  4  45
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.8769231

Summarization

summary(dataset)
##      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
##  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
##  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
##  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
##                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
##                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
##                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2            status      
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.0000  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:1.0000  
##  Median :-5.721   Median :0.218885   Median :2.362   Median :1.0000  
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.7538  
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:1.0000  
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :1.0000
summary(clean_dataset)
##   MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.    MDVP.Jitter...    
##  Min.   : 88.33   Min.   :102.1   Min.   : 65.48   Min.   :0.001680  
##  1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29   1st Qu.:0.003460  
##  Median :148.79   Median :175.8   Median :104.31   Median :0.004940  
##  Mean   :154.23   Mean   :197.1   Mean   :116.32   Mean   :0.006220  
##  3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02   3rd Qu.:0.007365  
##  Max.   :260.11   Max.   :592.0   Max.   :239.17   Max.   :0.033160  
##  MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ          Jitter.DDP      
##  Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920   Min.   :0.002040  
##  1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860   1st Qu.:0.004985  
##  Median :3.000e-05   Median :0.002500   Median :0.002690   Median :0.007490  
##  Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446   Mean   :0.009920  
##  3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955   3rd Qu.:0.011505  
##  Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580   Max.   :0.064330  
##   MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3       Shimmer.APQ5    
##  Min.   :0.00954   Min.   :0.0850   Min.   :0.004550   Min.   :0.00570  
##  1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245   1st Qu.:0.00958  
##  Median :0.02297   Median :0.2210   Median :0.012790   Median :0.01347  
##  Mean   :0.02971   Mean   :0.2823   Mean   :0.015664   Mean   :0.01788  
##  3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265   3rd Qu.:0.02238  
##  Max.   :0.11908   Max.   :1.3020   Max.   :0.056470   Max.   :0.07940  
##     MDVP.APQ        Shimmer.DDA           NHR                HNR        
##  Min.   :0.00719   Min.   :0.01364   Min.   :0.000650   Min.   : 8.441  
##  1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925   1st Qu.:19.198  
##  Median :0.01826   Median :0.03836   Median :0.011660   Median :22.085  
##  Mean   :0.02408   Mean   :0.04699   Mean   :0.024847   Mean   :21.886  
##  3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640   3rd Qu.:25.076  
##  Max.   :0.13778   Max.   :0.16942   Max.   :0.314820   Max.   :33.047  
##       PPE               RPDE             DFA            spread1      
##  Min.   :0.04454   Min.   :0.2566   Min.   :0.5743   Min.   :-7.965  
##  1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748   1st Qu.:-6.450  
##  Median :0.19405   Median :0.4960   Median :0.7223   Median :-5.721  
##  Mean   :0.20655   Mean   :0.4985   Mean   :0.7181   Mean   :-5.684  
##  3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619   3rd Qu.:-5.046  
##  Max.   :0.52737   Max.   :0.6852   Max.   :0.8253   Max.   :-2.434  
##     spread2               D2        status   
##  Min.   :0.006274   Min.   :1.423   No : 48  
##  1st Qu.:0.174350   1st Qu.:2.099   Yes:147  
##  Median :0.218885   Median :2.362            
##  Mean   :0.226510   Mean   :2.382            
##  3rd Qu.:0.279234   3rd Qu.:2.636            
##  Max.   :0.450493   Max.   :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (HNR + DFA + NHR + RPDE + MDVP.Fo.Hz. + 
##     MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. + 
##     MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer + MDVP.Shimmer.dB. + 
##     Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + PPE + 
##     spread1 + spread2 + D2), data = train_set, method = "class")
##   n= 130 
## 
##         CP nsplit rel error  xerror      xstd
## 1 0.406250      0   1.00000 1.00000 0.1534852
## 2 0.156250      1   0.59375 0.78125 0.1404245
## 3 0.046875      3   0.28125 0.62500 0.1285552
## 4 0.010000      5   0.18750 0.62500 0.1285552
## 
## Variable importance
##              PPE          spread1         MDVP.PPQ              DFA 
##               14               13                8                8 
##   MDVP.Jitter...       Jitter.DDP         MDVP.RAP     Shimmer.APQ3 
##                7                7                7                5 
##      Shimmer.DDA     MDVP.Flo.Hz.     MDVP.Fhi.Hz.     Shimmer.APQ5 
##                5                5                4                4 
##          spread2      MDVP.Fo.Hz.              NHR     MDVP.Shimmer 
##                3                3                2                2 
##              HNR MDVP.Shimmer.dB. 
##                2                2 
## 
## Node number 1: 130 observations,    complexity param=0.40625
##   predicted class=Yes  expected loss=0.2461538  P(node) =1
##     class counts:    32    98
##    probabilities: 0.246 0.754 
##   left son=2 (39 obs) right son=3 (91 obs)
##   Primary splits:
##       spread1      < -6.317759 to the left,  improve=19.70403, (0 missing)
##       PPE          < 0.1051815 to the left,  improve=18.89477, (0 missing)
##       MDVP.Flo.Hz. < 188.6565  to the right, improve=17.83236, (0 missing)
##       MDVP.Fo.Hz.  < 226.0965  to the right, improve=13.65792, (0 missing)
##       MDVP.RAP     < 0.001775  to the left,  improve=13.17687, (0 missing)
##   Surrogate splits:
##       PPE            < 0.14169   to the left,  agree=0.977, adj=0.923, (0 split)
##       MDVP.PPQ       < 0.001845  to the left,  agree=0.892, adj=0.641, (0 split)
##       MDVP.Jitter... < 0.003315  to the left,  agree=0.869, adj=0.564, (0 split)
##       Jitter.DDP     < 0.005075  to the left,  agree=0.862, adj=0.538, (0 split)
##       MDVP.RAP       < 0.001685  to the left,  agree=0.854, adj=0.513, (0 split)
## 
## Node number 2: 39 observations,    complexity param=0.15625
##   predicted class=No   expected loss=0.3333333  P(node) =0.3
##     class counts:    26    13
##    probabilities: 0.667 0.333 
##   left son=4 (16 obs) right son=5 (23 obs)
##   Primary splits:
##       MDVP.Fhi.Hz. < 229.1805  to the right, improve=6.028986, (0 missing)
##       MDVP.Flo.Hz. < 180.185   to the right, improve=5.416667, (0 missing)
##       MDVP.Fo.Hz.  < 191.6195  to the right, improve=5.158730, (0 missing)
##       PPE          < 0.1051815 to the left,  improve=4.541889, (0 missing)
##       DFA          < 0.674851  to the left,  improve=4.333333, (0 missing)
##   Surrogate splits:
##       DFA          < 0.674851  to the left,  agree=0.923, adj=0.813, (0 split)
##       MDVP.Fo.Hz.  < 207.7355  to the right, agree=0.923, adj=0.813, (0 split)
##       MDVP.Flo.Hz. < 200.8275  to the right, agree=0.846, adj=0.625, (0 split)
##       spread2      < 0.136211  to the left,  agree=0.821, adj=0.562, (0 split)
##       PPE          < 0.1033925 to the left,  agree=0.744, adj=0.375, (0 split)
## 
## Node number 3: 91 observations,    complexity param=0.046875
##   predicted class=Yes  expected loss=0.06593407  P(node) =0.7
##     class counts:     6    85
##    probabilities: 0.066 0.934 
##   left son=6 (27 obs) right son=7 (64 obs)
##   Primary splits:
##       spread2          < 0.2100605 to the left,  improve=1.875458, (0 missing)
##       Shimmer.APQ3     < 0.008735  to the left,  improve=1.868148, (0 missing)
##       Shimmer.DDA      < 0.02621   to the left,  improve=1.868148, (0 missing)
##       MDVP.Shimmer.dB. < 0.1375    to the left,  improve=1.675659, (0 missing)
##       MDVP.Jitter...   < 0.003505  to the left,  improve=1.428303, (0 missing)
##   Surrogate splits:
##       PPE            < 0.184526  to the left,  agree=0.791, adj=0.296, (0 split)
##       D2             < 2.2965    to the left,  agree=0.791, adj=0.296, (0 split)
##       spread1        < -5.921087 to the left,  agree=0.780, adj=0.259, (0 split)
##       RPDE           < 0.389095  to the left,  agree=0.736, adj=0.111, (0 split)
##       MDVP.Jitter... < 0.003325  to the left,  agree=0.736, adj=0.111, (0 split)
## 
## Node number 4: 16 observations
##   predicted class=No   expected loss=0  P(node) =0.1230769
##     class counts:    16     0
##    probabilities: 1.000 0.000 
## 
## Node number 5: 23 observations,    complexity param=0.15625
##   predicted class=Yes  expected loss=0.4347826  P(node) =0.1769231
##     class counts:    10    13
##    probabilities: 0.435 0.565 
##   left son=10 (13 obs) right son=11 (10 obs)
##   Primary splits:
##       DFA         < 0.728728  to the right, improve=6.688963, (0 missing)
##       RPDE        < 0.3371195 to the right, improve=3.804348, (0 missing)
##       NHR         < 0.004895  to the left,  improve=3.097999, (0 missing)
##       D2          < 2.23273   to the left,  improve=3.097999, (0 missing)
##       MDVP.Fo.Hz. < 139.797   to the left,  improve=2.437681, (0 missing)
##   Surrogate splits:
##       NHR          < 0.004895  to the left,  agree=0.783, adj=0.5, (0 split)
##       MDVP.Flo.Hz. < 112.256   to the right, agree=0.783, adj=0.5, (0 split)
##       Shimmer.APQ3 < 0.005505  to the right, agree=0.739, adj=0.4, (0 split)
##       Shimmer.APQ5 < 0.006365  to the right, agree=0.739, adj=0.4, (0 split)
##       Shimmer.DDA  < 0.016515  to the right, agree=0.739, adj=0.4, (0 split)
## 
## Node number 6: 27 observations,    complexity param=0.046875
##   predicted class=Yes  expected loss=0.2222222  P(node) =0.2076923
##     class counts:     6    21
##    probabilities: 0.222 0.778 
##   left son=12 (7 obs) right son=13 (20 obs)
##   Primary splits:
##       Shimmer.APQ3     < 0.00925   to the left,  improve=4.576190, (0 missing)
##       Shimmer.DDA      < 0.02776   to the left,  improve=4.576190, (0 missing)
##       MDVP.Shimmer     < 0.01914   to the left,  improve=3.688596, (0 missing)
##       MDVP.Shimmer.dB. < 0.1845    to the left,  improve=3.688596, (0 missing)
##       MDVP.Flo.Hz.     < 89.1605   to the right, improve=3.333333, (0 missing)
##   Surrogate splits:
##       Shimmer.DDA      < 0.02776   to the left,  agree=1.000, adj=1.000, (0 split)
##       MDVP.Shimmer     < 0.02032   to the left,  agree=0.926, adj=0.714, (0 split)
##       Shimmer.APQ5     < 0.01274   to the left,  agree=0.926, adj=0.714, (0 split)
##       HNR              < 23.0435   to the right, agree=0.889, adj=0.571, (0 split)
##       MDVP.Shimmer.dB. < 0.1485    to the left,  agree=0.889, adj=0.571, (0 split)
## 
## Node number 7: 64 observations
##   predicted class=Yes  expected loss=0  P(node) =0.4923077
##     class counts:     0    64
##    probabilities: 0.000 1.000 
## 
## Node number 10: 13 observations
##   predicted class=No   expected loss=0.2307692  P(node) =0.1
##     class counts:    10     3
##    probabilities: 0.769 0.231 
## 
## Node number 11: 10 observations
##   predicted class=Yes  expected loss=0  P(node) =0.07692308
##     class counts:     0    10
##    probabilities: 0.000 1.000 
## 
## Node number 12: 7 observations
##   predicted class=No   expected loss=0.2857143  P(node) =0.05384615
##     class counts:     5     2
##    probabilities: 0.714 0.286 
## 
## Node number 13: 20 observations
##   predicted class=Yes  expected loss=0.05  P(node) =0.1538462
##     class counts:     1    19
##    probabilities: 0.050 0.950
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 29.036, df = 1, p-value = 7.103e-08
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8769  0.8769  0.8769  0.8769  0.8769  0.8769

Visualization

plot(fit,main="(ANNEM SHIVAJI 20MIC0091)")
text(fit)

DECISION TREE-2(Several measures of variation in fundamental frequency)

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(MDVP.Jitter...:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(MDVP.Jitter...+MDVP.Jitter.Abs.+MDVP.RAP+MDVP.PPQ+Jitter.DDP)
          ,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)

predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
##      predict_unseen
##       No Yes
##   No   9   7
##   Yes  2  47
sum(diag(cm))/sum(cm)
## [1] 0.8615385
plot(fit,main="Several measures of variation in fundamental frequency (ANNEM SHIVAJI 20MIC0091)")
text(fit)

summary(dataset)
##      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
##  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
##  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
##  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
##                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
##                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
##                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2            status      
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.0000  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:1.0000  
##  Median :-5.721   Median :0.218885   Median :2.362   Median :1.0000  
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.7538  
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:1.0000  
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :1.0000
summary(clean_dataset)
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2        status   
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   No : 48  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   Yes:147  
##  Median :-5.721   Median :0.218885   Median :2.362            
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382            
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636            
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (MDVP.Jitter... + MDVP.Jitter.Abs. + 
##     MDVP.RAP + MDVP.PPQ + Jitter.DDP), data = train_set, method = "class")
##   n= 130 
## 
##        CP nsplit rel error  xerror      xstd
## 1 0.15625      0   1.00000 1.00000 0.1534852
## 2 0.12500      2   0.68750 1.15625 0.1607758
## 3 0.03125      3   0.56250 0.81250 0.1425219
## 4 0.01000      4   0.53125 0.75000 0.1382410
## 
## Variable importance
##       Jitter.DDP         MDVP.RAP   MDVP.Jitter...         MDVP.PPQ 
##               21               21               21               21 
## MDVP.Jitter.Abs. 
##               16 
## 
## Node number 1: 130 observations,    complexity param=0.15625
##   predicted class=Yes  expected loss=0.2461538  P(node) =1
##     class counts:    32    98
##    probabilities: 0.246 0.754 
##   left son=2 (45 obs) right son=3 (85 obs)
##   Primary splits:
##       MDVP.RAP         < 0.001775 to the left,  improve=13.17687, (0 missing)
##       Jitter.DDP       < 0.00533  to the left,  improve=13.17687, (0 missing)
##       MDVP.PPQ         < 0.00204  to the left,  improve=12.58673, (0 missing)
##       MDVP.Jitter.Abs. < 1.5e-05  to the left,  improve=12.00070, (0 missing)
##       MDVP.Jitter...   < 0.003035 to the left,  improve=10.40968, (0 missing)
##   Surrogate splits:
##       Jitter.DDP       < 0.00533  to the left,  agree=1.000, adj=1.000, (0 split)
##       MDVP.PPQ         < 0.00204  to the left,  agree=0.977, adj=0.933, (0 split)
##       MDVP.Jitter...   < 0.00362  to the left,  agree=0.938, adj=0.822, (0 split)
##       MDVP.Jitter.Abs. < 2.5e-05  to the left,  agree=0.838, adj=0.533, (0 split)
## 
## Node number 2: 45 observations,    complexity param=0.15625
##   predicted class=No   expected loss=0.4444444  P(node) =0.3461538
##     class counts:    25    20
##    probabilities: 0.556 0.444 
##   left son=4 (20 obs) right son=5 (25 obs)
##   Primary splits:
##       MDVP.Jitter.Abs. < 1.5e-05  to the left,  improve=2.7222220, (0 missing)
##       MDVP.Jitter...   < 0.002915 to the left,  improve=1.5022220, (0 missing)
##       MDVP.PPQ         < 0.00169  to the left,  improve=1.1898340, (0 missing)
##       MDVP.RAP         < 0.00107  to the right, improve=0.6343844, (0 missing)
##       Jitter.DDP       < 0.00321  to the right, improve=0.6343844, (0 missing)
##   Surrogate splits:
##       MDVP.Jitter... < 0.002915 to the left,  agree=0.911, adj=0.8, (0 split)
##       MDVP.PPQ       < 0.00154  to the left,  agree=0.822, adj=0.6, (0 split)
##       MDVP.RAP       < 0.001145 to the left,  agree=0.733, adj=0.4, (0 split)
##       Jitter.DDP     < 0.003435 to the left,  agree=0.733, adj=0.4, (0 split)
## 
## Node number 3: 85 observations
##   predicted class=Yes  expected loss=0.08235294  P(node) =0.6538462
##     class counts:     7    78
##    probabilities: 0.082 0.918 
## 
## Node number 4: 20 observations,    complexity param=0.03125
##   predicted class=No   expected loss=0.25  P(node) =0.1538462
##     class counts:    15     5
##    probabilities: 0.750 0.250 
##   left son=8 (13 obs) right son=9 (7 obs)
##   Primary splits:
##       MDVP.RAP       < 0.00107  to the right, improve=2.225275, (0 missing)
##       Jitter.DDP     < 0.003205 to the right, improve=2.225275, (0 missing)
##       MDVP.PPQ       < 0.00135  to the right, improve=1.666667, (0 missing)
##       MDVP.Jitter... < 0.00262  to the right, improve=1.346154, (0 missing)
##   Surrogate splits:
##       Jitter.DDP     < 0.003205 to the right, agree=1.00, adj=1.000, (0 split)
##       MDVP.Jitter... < 0.002015 to the right, agree=0.95, adj=0.857, (0 split)
##       MDVP.PPQ       < 0.00135  to the right, agree=0.95, adj=0.857, (0 split)
## 
## Node number 5: 25 observations,    complexity param=0.125
##   predicted class=Yes  expected loss=0.4  P(node) =0.1923077
##     class counts:    10    15
##    probabilities: 0.400 0.600 
##   left son=10 (14 obs) right son=11 (11 obs)
##   Primary splits:
##       MDVP.Jitter.Abs. < 2.5e-05  to the right, improve=3.7532470, (0 missing)
##       MDVP.Jitter...   < 0.003475 to the right, improve=2.1948050, (0 missing)
##       MDVP.RAP         < 0.001465 to the left,  improve=0.6805556, (0 missing)
##       Jitter.DDP       < 0.00439  to the left,  improve=0.6805556, (0 missing)
##       MDVP.PPQ         < 0.00189  to the right, improve=0.3333333, (0 missing)
##   Surrogate splits:
##       MDVP.Jitter... < 0.00326  to the right, agree=0.84, adj=0.636, (0 split)
##       MDVP.PPQ       < 0.00152  to the right, agree=0.72, adj=0.364, (0 split)
##       MDVP.RAP       < 0.001225 to the right, agree=0.68, adj=0.273, (0 split)
##       Jitter.DDP     < 0.003685 to the right, agree=0.68, adj=0.273, (0 split)
## 
## Node number 8: 13 observations
##   predicted class=No   expected loss=0.07692308  P(node) =0.1
##     class counts:    12     1
##    probabilities: 0.923 0.077 
## 
## Node number 9: 7 observations
##   predicted class=Yes  expected loss=0.4285714  P(node) =0.05384615
##     class counts:     3     4
##    probabilities: 0.429 0.571 
## 
## Node number 10: 14 observations
##   predicted class=No   expected loss=0.3571429  P(node) =0.1076923
##     class counts:     9     5
##    probabilities: 0.643 0.357 
## 
## Node number 11: 11 observations
##   predicted class=Yes  expected loss=0.09090909  P(node) =0.08461538
##     class counts:     1    10
##    probabilities: 0.091 0.909
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 23.348, df = 1, p-value = 1.352e-06
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8769  0.8769  0.8769  0.8769  0.8769  0.8769

DECISION TREE-3(Several measures of variation in amplitude)

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(MDVP.Shimmer:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(MDVP.Shimmer+MDVP.Shimmer.dB.+Shimmer.APQ3+Shimmer.APQ5+MDVP.APQ+Shimmer.DDA)
          ,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)

predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
##      predict_unseen
##       No Yes
##   No   8   8
##   Yes  4  45
sum(diag(cm))/sum(cm)
## [1] 0.8153846
plot(fit,main="Several measures of variation in amplitude (ANNEM SHIVAJI 20MIC0091)")
text(fit)

summary(dataset)
##      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
##  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
##  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
##  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
##                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
##                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
##                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2            status      
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.0000  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:1.0000  
##  Median :-5.721   Median :0.218885   Median :2.362   Median :1.0000  
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.7538  
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:1.0000  
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :1.0000
summary(clean_dataset)
##   MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3       Shimmer.APQ5    
##  Min.   :0.00954   Min.   :0.0850   Min.   :0.004550   Min.   :0.00570  
##  1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245   1st Qu.:0.00958  
##  Median :0.02297   Median :0.2210   Median :0.012790   Median :0.01347  
##  Mean   :0.02971   Mean   :0.2823   Mean   :0.015664   Mean   :0.01788  
##  3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265   3rd Qu.:0.02238  
##  Max.   :0.11908   Max.   :1.3020   Max.   :0.056470   Max.   :0.07940  
##     MDVP.APQ        Shimmer.DDA           NHR                HNR        
##  Min.   :0.00719   Min.   :0.01364   Min.   :0.000650   Min.   : 8.441  
##  1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925   1st Qu.:19.198  
##  Median :0.01826   Median :0.03836   Median :0.011660   Median :22.085  
##  Mean   :0.02408   Mean   :0.04699   Mean   :0.024847   Mean   :21.886  
##  3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640   3rd Qu.:25.076  
##  Max.   :0.13778   Max.   :0.16942   Max.   :0.314820   Max.   :33.047  
##       PPE               RPDE             DFA            spread1      
##  Min.   :0.04454   Min.   :0.2566   Min.   :0.5743   Min.   :-7.965  
##  1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748   1st Qu.:-6.450  
##  Median :0.19405   Median :0.4960   Median :0.7223   Median :-5.721  
##  Mean   :0.20655   Mean   :0.4985   Mean   :0.7181   Mean   :-5.684  
##  3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619   3rd Qu.:-5.046  
##  Max.   :0.52737   Max.   :0.6852   Max.   :0.8253   Max.   :-2.434  
##     spread2               D2        status   
##  Min.   :0.006274   Min.   :1.423   No : 48  
##  1st Qu.:0.174350   1st Qu.:2.099   Yes:147  
##  Median :0.218885   Median :2.362            
##  Mean   :0.226510   Mean   :2.382            
##  3rd Qu.:0.279234   3rd Qu.:2.636            
##  Max.   :0.450493   Max.   :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (MDVP.Shimmer + MDVP.Shimmer.dB. + Shimmer.APQ3 + 
##     Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA), data = train_set, 
##     method = "class")
##   n= 130 
## 
##         CP nsplit rel error  xerror      xstd
## 1 0.093750      0   1.00000 1.00000 0.1534852
## 2 0.046875      2   0.81250 1.00000 0.1534852
## 3 0.031250      4   0.71875 1.03125 0.1550676
## 4 0.010000      5   0.68750 1.06250 0.1565862
## 
## Variable importance
##         MDVP.APQ     MDVP.Shimmer MDVP.Shimmer.dB.     Shimmer.APQ5 
##               17               17               17               17 
##     Shimmer.APQ3      Shimmer.DDA 
##               16               16 
## 
## Node number 1: 130 observations,    complexity param=0.09375
##   predicted class=Yes  expected loss=0.2461538  P(node) =1
##     class counts:    32    98
##    probabilities: 0.246 0.754 
##   left son=2 (64 obs) right son=3 (66 obs)
##   Primary splits:
##       Shimmer.APQ5     < 0.012675 to the left,  improve=10.800130, (0 missing)
##       MDVP.APQ         < 0.019775 to the left,  improve= 9.909184, (0 missing)
##       MDVP.Shimmer.dB. < 0.219    to the left,  improve= 9.271771, (0 missing)
##       MDVP.Shimmer     < 0.025005 to the left,  improve= 9.044039, (0 missing)
##       Shimmer.APQ3     < 0.01417  to the left,  improve= 8.078513, (0 missing)
##   Surrogate splits:
##       MDVP.Shimmer     < 0.0207   to the left,  agree=0.962, adj=0.922, (0 split)
##       MDVP.Shimmer.dB. < 0.214    to the left,  agree=0.954, adj=0.906, (0 split)
##       Shimmer.APQ3     < 0.010685 to the left,  agree=0.938, adj=0.875, (0 split)
##       Shimmer.DDA      < 0.032045 to the left,  agree=0.938, adj=0.875, (0 split)
##       MDVP.APQ         < 0.018285 to the left,  agree=0.923, adj=0.844, (0 split)
## 
## Node number 2: 64 observations,    complexity param=0.09375
##   predicted class=Yes  expected loss=0.453125  P(node) =0.4923077
##     class counts:    29    35
##    probabilities: 0.453 0.547 
##   left son=4 (30 obs) right son=5 (34 obs)
##   Primary splits:
##       Shimmer.APQ3 < 0.00803  to the right, improve=2.4363970, (0 missing)
##       Shimmer.DDA  < 0.02409  to the right, improve=2.4363970, (0 missing)
##       MDVP.APQ     < 0.01215  to the left,  improve=2.3057950, (0 missing)
##       Shimmer.APQ5 < 0.006365 to the right, improve=1.9687500, (0 missing)
##       MDVP.Shimmer < 0.012995 to the right, improve=0.5622874, (0 missing)
##   Surrogate splits:
##       Shimmer.DDA      < 0.02409  to the right, agree=1.000, adj=1.000, (0 split)
##       MDVP.Shimmer     < 0.01569  to the right, agree=0.859, adj=0.700, (0 split)
##       MDVP.Shimmer.dB. < 0.1405   to the right, agree=0.859, adj=0.700, (0 split)
##       Shimmer.APQ5     < 0.009715 to the right, agree=0.859, adj=0.700, (0 split)
##       MDVP.APQ         < 0.0124   to the right, agree=0.734, adj=0.433, (0 split)
## 
## Node number 3: 66 observations
##   predicted class=Yes  expected loss=0.04545455  P(node) =0.5076923
##     class counts:     3    63
##    probabilities: 0.045 0.955 
## 
## Node number 4: 30 observations,    complexity param=0.046875
##   predicted class=No   expected loss=0.4  P(node) =0.2307692
##     class counts:    18    12
##    probabilities: 0.600 0.400 
##   left son=8 (9 obs) right son=9 (21 obs)
##   Primary splits:
##       MDVP.APQ         < 0.01278  to the left,  improve=2.146032, (0 missing)
##       MDVP.Shimmer     < 0.0167   to the left,  improve=1.650000, (0 missing)
##       MDVP.Shimmer.dB. < 0.147    to the left,  improve=1.207453, (0 missing)
##       Shimmer.APQ3     < 0.008825 to the left,  improve=1.200000, (0 missing)
##       Shimmer.DDA      < 0.026485 to the left,  improve=1.200000, (0 missing)
##   Surrogate splits:
##       MDVP.Shimmer     < 0.01652  to the left,  agree=0.8, adj=0.333, (0 split)
##       MDVP.Shimmer.dB. < 0.1415   to the left,  agree=0.8, adj=0.333, (0 split)
## 
## Node number 5: 34 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.3235294  P(node) =0.2615385
##     class counts:    11    23
##    probabilities: 0.324 0.676 
##   left son=10 (19 obs) right son=11 (15 obs)
##   Primary splits:
##       MDVP.APQ         < 0.011495 to the left,  improve=3.542002, (0 missing)
##       MDVP.Shimmer.dB. < 0.137    to the left,  improve=2.562353, (0 missing)
##       MDVP.Shimmer     < 0.01586  to the left,  improve=1.845316, (0 missing)
##       Shimmer.APQ3     < 0.00673  to the left,  improve=1.118464, (0 missing)
##       Shimmer.DDA      < 0.020195 to the left,  improve=1.118464, (0 missing)
##   Surrogate splits:
##       MDVP.Shimmer.dB. < 0.13     to the left,  agree=0.941, adj=0.867, (0 split)
##       MDVP.Shimmer     < 0.014345 to the left,  agree=0.912, adj=0.800, (0 split)
##       Shimmer.APQ5     < 0.00802  to the left,  agree=0.853, adj=0.667, (0 split)
##       Shimmer.APQ3     < 0.006655 to the left,  agree=0.824, adj=0.600, (0 split)
##       Shimmer.DDA      < 0.019965 to the left,  agree=0.824, adj=0.600, (0 split)
## 
## Node number 8: 9 observations
##   predicted class=No   expected loss=0.1111111  P(node) =0.06923077
##     class counts:     8     1
##    probabilities: 0.889 0.111 
## 
## Node number 9: 21 observations,    complexity param=0.046875
##   predicted class=Yes  expected loss=0.4761905  P(node) =0.1615385
##     class counts:    10    11
##    probabilities: 0.476 0.524 
##   left son=18 (8 obs) right son=19 (13 obs)
##   Primary splits:
##       Shimmer.APQ3     < 0.009745 to the right, improve=0.5723443, (0 missing)
##       Shimmer.DDA      < 0.02923  to the right, improve=0.5723443, (0 missing)
##       MDVP.Shimmer     < 0.01914  to the left,  improve=0.2216450, (0 missing)
##       MDVP.Shimmer.dB. < 0.1755   to the left,  improve=0.2216450, (0 missing)
##       Shimmer.APQ5     < 0.010875 to the left,  improve=0.1984127, (0 missing)
##   Surrogate splits:
##       Shimmer.DDA      < 0.02923  to the right, agree=1.000, adj=1.000, (0 split)
##       MDVP.Shimmer     < 0.019815 to the right, agree=0.905, adj=0.750, (0 split)
##       MDVP.Shimmer.dB. < 0.1945   to the right, agree=0.857, adj=0.625, (0 split)
##       Shimmer.APQ5     < 0.01173  to the right, agree=0.810, adj=0.500, (0 split)
## 
## Node number 10: 19 observations
##   predicted class=No   expected loss=0.4736842  P(node) =0.1461538
##     class counts:    10     9
##    probabilities: 0.526 0.474 
## 
## Node number 11: 15 observations
##   predicted class=Yes  expected loss=0.06666667  P(node) =0.1153846
##     class counts:     1    14
##    probabilities: 0.067 0.933 
## 
## Node number 18: 8 observations
##   predicted class=No   expected loss=0.375  P(node) =0.06153846
##     class counts:     5     3
##    probabilities: 0.625 0.375 
## 
## Node number 19: 13 observations
##   predicted class=Yes  expected loss=0.3846154  P(node) =0.1
##     class counts:     5     8
##    probabilities: 0.385 0.615
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 14.025, df = 1, p-value = 0.0001804
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8769  0.8769  0.8769  0.8769  0.8769  0.8769

DECISION TREE-4(Two measures of ratio of noise to tonal components in the voice)

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(NHR:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(HNR+NHR),data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)

predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
##      predict_unseen
##       No Yes
##   No   5  11
##   Yes  2  47
sum(diag(cm))/sum(cm)
## [1] 0.8
plot(fit,main="Two measures of ratio of noise to tonal components in the voice (ANNEM SHIVAJI 20MIC0091)")
text(fit)

summary(dataset)
##      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
##  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
##  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
##  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
##                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
##                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
##                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2            status      
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.0000  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:1.0000  
##  Median :-5.721   Median :0.218885   Median :2.362   Median :1.0000  
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.7538  
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:1.0000  
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :1.0000
summary(clean_dataset)
##       NHR                HNR              PPE               RPDE       
##  Min.   :0.000650   Min.   : 8.441   Min.   :0.04454   Min.   :0.2566  
##  1st Qu.:0.005925   1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213  
##  Median :0.011660   Median :22.085   Median :0.19405   Median :0.4960  
##  Mean   :0.024847   Mean   :21.886   Mean   :0.20655   Mean   :0.4985  
##  3rd Qu.:0.025640   3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876  
##  Max.   :0.314820   Max.   :33.047   Max.   :0.52737   Max.   :0.6852  
##       DFA            spread1          spread2               D2        status   
##  Min.   :0.5743   Min.   :-7.965   Min.   :0.006274   Min.   :1.423   No : 48  
##  1st Qu.:0.6748   1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   Yes:147  
##  Median :0.7223   Median :-5.721   Median :0.218885   Median :2.362            
##  Mean   :0.7181   Mean   :-5.684   Mean   :0.226510   Mean   :2.382            
##  3rd Qu.:0.7619   3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636            
##  Max.   :0.8253   Max.   :-2.434   Max.   :0.450493   Max.   :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (HNR + NHR), data = train_set, method = "class")
##   n= 130 
## 
##        CP nsplit rel error xerror      xstd
## 1 0.28125      0   1.00000 1.0000 0.1534852
## 2 0.06250      1   0.71875 0.9375 0.1501201
## 3 0.01000      2   0.65625 0.9375 0.1501201
## 
## Variable importance
## NHR HNR 
##  66  34 
## 
## Node number 1: 130 observations,    complexity param=0.28125
##   predicted class=Yes  expected loss=0.2461538  P(node) =1
##     class counts:    32    98
##    probabilities: 0.246 0.754 
##   left son=2 (27 obs) right son=3 (103 obs)
##   Primary splits:
##       NHR < 0.00486  to the left,  improve=12.051980, (0 missing)
##       HNR < 23.0435  to the right, improve= 8.556499, (0 missing)
##   Surrogate splits:
##       HNR < 25.0275  to the right, agree=0.885, adj=0.444, (0 split)
## 
## Node number 2: 27 observations,    complexity param=0.0625
##   predicted class=No   expected loss=0.3333333  P(node) =0.2076923
##     class counts:    18     9
##    probabilities: 0.667 0.333 
##   left son=4 (19 obs) right son=5 (8 obs)
##   Primary splits:
##       NHR < 0.002785 to the right, improve=1.934211, (0 missing)
##       HNR < 26.8135  to the left,  improve=1.071429, (0 missing)
##   Surrogate splits:
##       HNR < 26.8135  to the left,  agree=0.963, adj=0.875, (0 split)
## 
## Node number 3: 103 observations
##   predicted class=Yes  expected loss=0.1359223  P(node) =0.7923077
##     class counts:    14    89
##    probabilities: 0.136 0.864 
## 
## Node number 4: 19 observations
##   predicted class=No   expected loss=0.2105263  P(node) =0.1461538
##     class counts:    15     4
##    probabilities: 0.789 0.211 
## 
## Node number 5: 8 observations
##   predicted class=Yes  expected loss=0.375  P(node) =0.06153846
##     class counts:     3     5
##    probabilities: 0.375 0.625
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 9.265, df = 1, p-value = 0.002336
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8769  0.8769  0.8769  0.8769  0.8769  0.8769

DECISION TREE-5(Two nonlinear dynamical complexity measures)

my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(RPDE:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(RPDE+D2)
          ,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)

predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
##      predict_unseen
##       No Yes
##   No   0  16
##   Yes 12  37
sum(diag(cm))/sum(cm)
## [1] 0.5692308
plot(fit,main="Two nonlinear dynamical complexity measures(ANNEM SHIVAJI 20MIC0091)")
text(fit)

summary(dataset)
##      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
##  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
##  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
##  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
##                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
##                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
##                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
##  MDVP.Jitter...     MDVP.Jitter.Abs.       MDVP.RAP           MDVP.PPQ       
##  Min.   :0.001680   Min.   :7.000e-06   Min.   :0.000680   Min.   :0.000920  
##  1st Qu.:0.003460   1st Qu.:2.000e-05   1st Qu.:0.001660   1st Qu.:0.001860  
##  Median :0.004940   Median :3.000e-05   Median :0.002500   Median :0.002690  
##  Mean   :0.006220   Mean   :4.396e-05   Mean   :0.003306   Mean   :0.003446  
##  3rd Qu.:0.007365   3rd Qu.:6.000e-05   3rd Qu.:0.003835   3rd Qu.:0.003955  
##  Max.   :0.033160   Max.   :2.600e-04   Max.   :0.021440   Max.   :0.019580  
##    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
##  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
##  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
##  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
##  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
##  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
##  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
##   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
##  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
##  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
##  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
##  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
##  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
##  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
##       HNR              PPE               RPDE             DFA        
##  Min.   : 8.441   Min.   :0.04454   Min.   :0.2566   Min.   :0.5743  
##  1st Qu.:19.198   1st Qu.:0.13745   1st Qu.:0.4213   1st Qu.:0.6748  
##  Median :22.085   Median :0.19405   Median :0.4960   Median :0.7223  
##  Mean   :21.886   Mean   :0.20655   Mean   :0.4985   Mean   :0.7181  
##  3rd Qu.:25.076   3rd Qu.:0.25298   3rd Qu.:0.5876   3rd Qu.:0.7619  
##  Max.   :33.047   Max.   :0.52737   Max.   :0.6852   Max.   :0.8253  
##     spread1          spread2               D2            status      
##  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.0000  
##  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:1.0000  
##  Median :-5.721   Median :0.218885   Median :2.362   Median :1.0000  
##  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.7538  
##  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:1.0000  
##  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :1.0000
summary(clean_dataset)
##       RPDE             DFA            spread1          spread2        
##  Min.   :0.2566   Min.   :0.5743   Min.   :-7.965   Min.   :0.006274  
##  1st Qu.:0.4213   1st Qu.:0.6748   1st Qu.:-6.450   1st Qu.:0.174350  
##  Median :0.4960   Median :0.7223   Median :-5.721   Median :0.218885  
##  Mean   :0.4985   Mean   :0.7181   Mean   :-5.684   Mean   :0.226510  
##  3rd Qu.:0.5876   3rd Qu.:0.7619   3rd Qu.:-5.046   3rd Qu.:0.279234  
##  Max.   :0.6852   Max.   :0.8253   Max.   :-2.434   Max.   :0.450493  
##        D2        status   
##  Min.   :1.423   No : 48  
##  1st Qu.:2.099   Yes:147  
##  Median :2.362            
##  Mean   :2.382            
##  3rd Qu.:2.636            
##  Max.   :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (RPDE + D2), data = train_set, method = "class")
##   n= 130 
## 
##        CP nsplit rel error xerror      xstd
## 1 0.03125      0      1.00 1.0000 0.1534852
## 2 0.01000      7      0.75 1.4375 0.1703715
## 
## Variable importance
##   D2 RPDE 
##   79   21 
## 
## Node number 1: 130 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.2461538  P(node) =1
##     class counts:    32    98
##    probabilities: 0.246 0.754 
##   left son=2 (85 obs) right son=3 (45 obs)
##   Primary splits:
##       D2   < 2.507272  to the left,  improve=5.600402, (0 missing)
##       RPDE < 0.470175  to the left,  improve=4.512821, (0 missing)
##   Surrogate splits:
##       RPDE < 0.6253385 to the left,  agree=0.692, adj=0.111, (0 split)
## 
## Node number 2: 85 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.3529412  P(node) =0.6538462
##     class counts:    30    55
##    probabilities: 0.353 0.647 
##   left son=4 (42 obs) right son=5 (43 obs)
##   Primary splits:
##       RPDE < 0.470203  to the left,  improve=2.5223110, (0 missing)
##       D2   < 2.285122  to the left,  improve=0.8849329, (0 missing)
##   Surrogate splits:
##       D2 < 2.180933  to the right, agree=0.671, adj=0.333, (0 split)
## 
## Node number 3: 45 observations
##   predicted class=Yes  expected loss=0.04444444  P(node) =0.3461538
##     class counts:     2    43
##    probabilities: 0.044 0.956 
## 
## Node number 4: 42 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.4761905  P(node) =0.3230769
##     class counts:    20    22
##    probabilities: 0.476 0.524 
##   left son=8 (7 obs) right son=9 (35 obs)
##   Primary splits:
##       D2   < 2.383098  to the right, improve=0.952381, (0 missing)
##       RPDE < 0.3371195 to the right, improve=0.814881, (0 missing)
## 
## Node number 5: 43 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.2325581  P(node) =0.3307692
##     class counts:    10    33
##    probabilities: 0.233 0.767 
##   left son=10 (28 obs) right son=11 (15 obs)
##   Primary splits:
##       D2   < 2.168121  to the left,  improve=2.491694, (0 missing)
##       RPDE < 0.5180245 to the right, improve=1.063123, (0 missing)
##   Surrogate splits:
##       RPDE < 0.505685  to the right, agree=0.674, adj=0.067, (0 split)
## 
## Node number 8: 7 observations
##   predicted class=No   expected loss=0.2857143  P(node) =0.05384615
##     class counts:     5     2
##    probabilities: 0.714 0.286 
## 
## Node number 9: 35 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.4285714  P(node) =0.2692308
##     class counts:    15    20
##    probabilities: 0.429 0.571 
##   left son=18 (24 obs) right son=19 (11 obs)
##   Primary splits:
##       D2   < 2.28778   to the left,  improve=1.953463, (0 missing)
##       RPDE < 0.3371195 to the right, improve=1.911376, (0 missing)
## 
## Node number 10: 28 observations,    complexity param=0.03125
##   predicted class=Yes  expected loss=0.3571429  P(node) =0.2153846
##     class counts:    10    18
##    probabilities: 0.357 0.643 
##   left son=20 (16 obs) right son=21 (12 obs)
##   Primary splits:
##       D2   < 2.02441   to the right, improve=3.1488100, (0 missing)
##       RPDE < 0.562505  to the right, improve=0.5289377, (0 missing)
##   Surrogate splits:
##       RPDE < 0.5393685 to the left,  agree=0.679, adj=0.25, (0 split)
## 
## Node number 11: 15 observations
##   predicted class=Yes  expected loss=0  P(node) =0.1153846
##     class counts:     0    15
##    probabilities: 0.000 1.000 
## 
## Node number 18: 24 observations,    complexity param=0.03125
##   predicted class=No   expected loss=0.4583333  P(node) =0.1846154
##     class counts:    13    11
##    probabilities: 0.542 0.458 
##   left son=36 (7 obs) right son=37 (17 obs)
##   Primary splits:
##       D2   < 2.21637   to the right, improve=0.5889356, (0 missing)
##       RPDE < 0.425007  to the left,  improve=0.2722222, (0 missing)
## 
## Node number 19: 11 observations
##   predicted class=Yes  expected loss=0.1818182  P(node) =0.08461538
##     class counts:     2     9
##    probabilities: 0.182 0.818 
## 
## Node number 20: 16 observations
##   predicted class=No   expected loss=0.4375  P(node) =0.1230769
##     class counts:     9     7
##    probabilities: 0.562 0.437 
## 
## Node number 21: 12 observations
##   predicted class=Yes  expected loss=0.08333333  P(node) =0.09230769
##     class counts:     1    11
##    probabilities: 0.083 0.917 
## 
## Node number 36: 7 observations
##   predicted class=No   expected loss=0.2857143  P(node) =0.05384615
##     class counts:     5     2
##    probabilities: 0.714 0.286 
## 
## Node number 37: 17 observations
##   predicted class=Yes  expected loss=0.4705882  P(node) =0.1307692
##     class counts:     8     9
##    probabilities: 0.471 0.529
summary(y_pred)
##  0  1 
## 20 29
summary(cm)
## Number of cases in table: 65 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 4.806, df = 1, p-value = 0.02837
##  Chi-squared approximation may be incorrect
summary(accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8769  0.8769  0.8769  0.8769  0.8769  0.8769

ANALYSIS:

RELATED ATTRIBUTES OF THE DATASET ON WHICH STATUS DEPENDS ARE:

#Decision tree 1 MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several measures of variation in fundamental frequency

#Decision tree 2 MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude

#Decision tree 3 NHR,HNR - Two measures of ratio of noise to tonal components in the voice

#Decision tree 4 RPDE,D2 - Two nonlinear dynamical complexity measures

#Decision tree 5 spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation .

We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using DECISION TREE machine learning algorithm here. We split dataset in the ratio 2/3 into test set and train set. Then we did feature scaling and performed prediction and fit using rpart() method from rpart library. The accuracy of the model of decision tree 1 (whole) is 0.8769231. The accuracy of the model of decision tree 2 is 0.8615385. The accuracy of the model of decision tree 3 is 0.8153846. The accuracy of the model of decision tree 4 is 0.8. The accuracy of the model of decision tree 5 is 0.5692308.

The accuracy of the model for every decision tree is is greater than 0.5. So it is also a good machine learning algorithm for this dataset.

Comparative Statements of Algorithms

The accuracy of the model of decision tree 1(whole) when we take status is dependent variable (it depends on all other attributes) is 0.8769231. It has most accuracy of all our models. Note here we considered all attributes , not related ones.

Next, the accuracy of the KNN model is 0.8615385. (Even decision tree 2 has same accuracy). It is greater than remaining every other models including naïve Bayes, and other decision trees.

The accuracy of Naïve Bayes model is only 0.6734694.

Result

for this dataset ,it is better to trust KNN machine learning model as it has most accuracy among all models. (Decision tree 1 has more accuracy , but has all attributes. So better to trust KNN model)