AIM :Developing and Creating a suitable machine learning model for the dataset assigned. Compare the results with at-least two machine learning Algorithms.
Dataset : Parkinsons Disease Data Set
Link : https://archive.ics.uci.edu/ml/datasets/parkinsons
Title: Parkinsons Disease Data Set Abstract: Oxford Parkinson’s Disease Detection Dataset
| Data Set Characteristics: Multivariate |
| Number of Instances: 197 |
| Area: Life |
| Attribute Characteristics: Real |
| Number of Attributes: 23 |
| Date Donated: 2008-06-26 |
| Associated Tasks: Classification |
| Missing Values? N/A |
Source:
The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders.
Data Set Information:
This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson’s disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals (“name” column). The main aim of the data is to discriminate healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD.
The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem ‘@’ robots.ox.ac.uk).
importing dataset from the link
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
storing as dataset2 by removing , in header and attributes and converting into a csv
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1:17,19:24,18)]
remove column-1(name)
dataset=dataset[,-1]
dataset$status=factor(dataset$status,levels=c(0,1))
library(caTools)
set.seed(123)
splitting dataset into train_set and test_set
split=sample.split(Y=dataset$status,SplitRatio=2/3)
train_set=subset(x=dataset,split==T)
test_set=subset(x=dataset,split==F)
feature scaling
train_set[-23]=scale(train_set[-23])
test_set[-23]=scale(test_set[-23])
library(class)
prediction
y_pred=knn(train_set[,-23],test=test_set[,-23],cl=train_set[,23],k=5)
y_pred
## [1] 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 1 1 1 1 1
## Levels: 0 1
cm=table(test_set[,23],y_pred)
cm
## y_pred
## 0 1
## 0 10 6
## 1 3 46
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.8615385
Summarization
summary(dataset)
## MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz. MDVP.Jitter...
## Min. : 88.33 Min. :102.1 Min. : 65.48 Min. :0.001680
## 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29 1st Qu.:0.003460
## Median :148.79 Median :175.8 Median :104.31 Median :0.004940
## Mean :154.23 Mean :197.1 Mean :116.32 Mean :0.006220
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02 3rd Qu.:0.007365
## Max. :260.11 Max. :592.0 Max. :239.17 Max. :0.033160
## MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ Jitter.DDP
## Min. :7.000e-06 Min. :0.000680 Min. :0.000920 Min. :0.002040
## 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860 1st Qu.:0.004985
## Median :3.000e-05 Median :0.002500 Median :0.002690 Median :0.007490
## Mean :4.396e-05 Mean :0.003306 Mean :0.003446 Mean :0.009920
## 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955 3rd Qu.:0.011505
## Max. :2.600e-04 Max. :0.021440 Max. :0.019580 Max. :0.064330
## MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5
## Min. :0.00954 Min. :0.0850 Min. :0.004550 Min. :0.00570
## 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245 1st Qu.:0.00958
## Median :0.02297 Median :0.2210 Median :0.012790 Median :0.01347
## Mean :0.02971 Mean :0.2823 Mean :0.015664 Mean :0.01788
## 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265 3rd Qu.:0.02238
## Max. :0.11908 Max. :1.3020 Max. :0.056470 Max. :0.07940
## MDVP.APQ Shimmer.DDA NHR HNR
## Min. :0.00719 Min. :0.01364 Min. :0.000650 Min. : 8.441
## 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925 1st Qu.:19.198
## Median :0.01826 Median :0.03836 Median :0.011660 Median :22.085
## Mean :0.02408 Mean :0.04699 Mean :0.024847 Mean :21.886
## 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640 3rd Qu.:25.076
## Max. :0.13778 Max. :0.16942 Max. :0.314820 Max. :33.047
## RPDE DFA spread1 spread2
## Min. :0.2566 Min. :0.5743 Min. :-7.965 Min. :0.006274
## 1st Qu.:0.4213 1st Qu.:0.6748 1st Qu.:-6.450 1st Qu.:0.174350
## Median :0.4960 Median :0.7223 Median :-5.721 Median :0.218885
## Mean :0.4985 Mean :0.7181 Mean :-5.684 Mean :0.226510
## 3rd Qu.:0.5876 3rd Qu.:0.7619 3rd Qu.:-5.046 3rd Qu.:0.279234
## Max. :0.6852 Max. :0.8253 Max. :-2.434 Max. :0.450493
## D2 PPE status
## Min. :1.423 Min. :0.04454 0: 48
## 1st Qu.:2.099 1st Qu.:0.13745 1:147
## Median :2.362 Median :0.19405
## Mean :2.382 Mean :0.20655
## 3rd Qu.:2.636 3rd Qu.:0.25298
## Max. :3.671 Max. :0.52737
summary(y_pred)
## 0 1
## 13 52
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 23.96, df = 1, p-value = 9.833e-07
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8615 0.8615 0.8615 0.8615 0.8615 0.8615
Visualization( no visualization for KNN)
plot(y_pred,main="y_pred ANNEM SHIVAJI(20MIC0091)")
We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using KNN machine learning algorithm here. We split dataset in the ratio 2/3 into test set and train set. Then we did feature scaling and performed prediction using knn() method from class library. The accuracy of the model is 0.8615385. The accuracy of the model is greater than 0.5. So it is a good machine learning algorithm for this dataset.
importing the dataset
#importing the dataset
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
View(dataset2)
dataset=dataset2[, c(1:17,19:24,18)]
View(dataset)
dataset=dataset[2:24]
dataset$status=factor(dataset$status,levels=c(0,1))
library(caTools)
set.seed(123)
splitting the dataset into test_set and train_set
split=sample.split(Y=dataset$status,SplitRatio =.75)
train_set=subset(x=dataset,split==T)
test_set=subset(x=dataset,split==F)
feature scaling
train_set[-23]=scale(train_set[-23])
test_set[-23]=scale(test_set[-23])
library(e1071)
classifier=naiveBayes(x=train_set[,-23],y=train_set$status)
prediction
y_pred=predict(object=classifier,newdata=test_set[,-23])
cm=table(test_set$status,y_pred)
cm
## y_pred
## 0 1
## 0 8 4
## 1 12 25
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.6734694
Summarization
summary(dataset)
## MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz. MDVP.Jitter...
## Min. : 88.33 Min. :102.1 Min. : 65.48 Min. :0.001680
## 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29 1st Qu.:0.003460
## Median :148.79 Median :175.8 Median :104.31 Median :0.004940
## Mean :154.23 Mean :197.1 Mean :116.32 Mean :0.006220
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02 3rd Qu.:0.007365
## Max. :260.11 Max. :592.0 Max. :239.17 Max. :0.033160
## MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ Jitter.DDP
## Min. :7.000e-06 Min. :0.000680 Min. :0.000920 Min. :0.002040
## 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860 1st Qu.:0.004985
## Median :3.000e-05 Median :0.002500 Median :0.002690 Median :0.007490
## Mean :4.396e-05 Mean :0.003306 Mean :0.003446 Mean :0.009920
## 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955 3rd Qu.:0.011505
## Max. :2.600e-04 Max. :0.021440 Max. :0.019580 Max. :0.064330
## MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5
## Min. :0.00954 Min. :0.0850 Min. :0.004550 Min. :0.00570
## 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245 1st Qu.:0.00958
## Median :0.02297 Median :0.2210 Median :0.012790 Median :0.01347
## Mean :0.02971 Mean :0.2823 Mean :0.015664 Mean :0.01788
## 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265 3rd Qu.:0.02238
## Max. :0.11908 Max. :1.3020 Max. :0.056470 Max. :0.07940
## MDVP.APQ Shimmer.DDA NHR HNR
## Min. :0.00719 Min. :0.01364 Min. :0.000650 Min. : 8.441
## 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925 1st Qu.:19.198
## Median :0.01826 Median :0.03836 Median :0.011660 Median :22.085
## Mean :0.02408 Mean :0.04699 Mean :0.024847 Mean :21.886
## 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640 3rd Qu.:25.076
## Max. :0.13778 Max. :0.16942 Max. :0.314820 Max. :33.047
## RPDE DFA spread1 spread2
## Min. :0.2566 Min. :0.5743 Min. :-7.965 Min. :0.006274
## 1st Qu.:0.4213 1st Qu.:0.6748 1st Qu.:-6.450 1st Qu.:0.174350
## Median :0.4960 Median :0.7223 Median :-5.721 Median :0.218885
## Mean :0.4985 Mean :0.7181 Mean :-5.684 Mean :0.226510
## 3rd Qu.:0.5876 3rd Qu.:0.7619 3rd Qu.:-5.046 3rd Qu.:0.279234
## Max. :0.6852 Max. :0.8253 Max. :-2.434 Max. :0.450493
## D2 PPE status
## Min. :1.423 Min. :0.04454 0: 48
## 1st Qu.:2.099 1st Qu.:0.13745 1:147
## Median :2.362 Median :0.19405
## Mean :2.382 Mean :0.20655
## 3rd Qu.:2.636 3rd Qu.:0.25298
## Max. :3.671 Max. :0.52737
summary(classifier)
## Length Class Mode
## apriori 2 table numeric
## tables 22 -none- list
## levels 2 -none- character
## isnumeric 22 -none- logical
## call 3 -none- call
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 49
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 4.396, df = 1, p-value = 0.03602
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6735 0.6735 0.6735 0.6735 0.6735 0.6735
Visualization(No visualization for Naive Bayes)
plot(y_pred,main="y_pred ANNEM SHIVAJI(20MIC0091)")
We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using naiveBayes machine learning algorithm here. We split dataset in the ratio 3/4 into test set and train set. Then we did feature scaling and performed prediction and classifier using naiveBayes() method from e1071 library. The accuracy of the model is 0.6734694. The accuracy of the model is greater than 0.5. So it is also a good machine learning algorithm for this dataset.
importing the dataset
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Cleaning dataset
clean_dataset=dataset%>%select(MDVP.Fo.Hz.:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
Splitting dataset
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(HNR+DFA+NHR+RPDE+MDVP.Fo.Hz.+MDVP.Fhi.Hz.+MDVP.Flo.Hz.+MDVP.Jitter...+
MDVP.Jitter.Abs.+MDVP.RAP+MDVP.PPQ+Jitter.DDP+MDVP.Shimmer+
MDVP.Shimmer.dB.+Shimmer.APQ3+Shimmer.APQ5+MDVP.APQ+Shimmer.DDA+PPE+spread1+spread2+D2)
,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)
Prediction
predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
## predict_unseen
## No Yes
## No 12 4
## Yes 4 45
accuracy=sum(diag(cm))/sum(cm)
accuracy
## [1] 0.8769231
Summarization
summary(dataset)
## name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.
## Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48
## Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29
## Mode :character Median :148.79 Median :175.8 Median :104.31
## Mean :154.23 Mean :197.1 Mean :116.32
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02
## Max. :260.11 Max. :592.0 Max. :239.17
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.0000
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:1.0000
## Median :-5.721 Median :0.218885 Median :2.362 Median :1.0000
## Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.7538
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:1.0000
## Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :1.0000
summary(clean_dataset)
## MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz. MDVP.Jitter...
## Min. : 88.33 Min. :102.1 Min. : 65.48 Min. :0.001680
## 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29 1st Qu.:0.003460
## Median :148.79 Median :175.8 Median :104.31 Median :0.004940
## Mean :154.23 Mean :197.1 Mean :116.32 Mean :0.006220
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02 3rd Qu.:0.007365
## Max. :260.11 Max. :592.0 Max. :239.17 Max. :0.033160
## MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ Jitter.DDP
## Min. :7.000e-06 Min. :0.000680 Min. :0.000920 Min. :0.002040
## 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860 1st Qu.:0.004985
## Median :3.000e-05 Median :0.002500 Median :0.002690 Median :0.007490
## Mean :4.396e-05 Mean :0.003306 Mean :0.003446 Mean :0.009920
## 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955 3rd Qu.:0.011505
## Max. :2.600e-04 Max. :0.021440 Max. :0.019580 Max. :0.064330
## MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5
## Min. :0.00954 Min. :0.0850 Min. :0.004550 Min. :0.00570
## 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245 1st Qu.:0.00958
## Median :0.02297 Median :0.2210 Median :0.012790 Median :0.01347
## Mean :0.02971 Mean :0.2823 Mean :0.015664 Mean :0.01788
## 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265 3rd Qu.:0.02238
## Max. :0.11908 Max. :1.3020 Max. :0.056470 Max. :0.07940
## MDVP.APQ Shimmer.DDA NHR HNR
## Min. :0.00719 Min. :0.01364 Min. :0.000650 Min. : 8.441
## 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925 1st Qu.:19.198
## Median :0.01826 Median :0.03836 Median :0.011660 Median :22.085
## Mean :0.02408 Mean :0.04699 Mean :0.024847 Mean :21.886
## 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640 3rd Qu.:25.076
## Max. :0.13778 Max. :0.16942 Max. :0.314820 Max. :33.047
## PPE RPDE DFA spread1
## Min. :0.04454 Min. :0.2566 Min. :0.5743 Min. :-7.965
## 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748 1st Qu.:-6.450
## Median :0.19405 Median :0.4960 Median :0.7223 Median :-5.721
## Mean :0.20655 Mean :0.4985 Mean :0.7181 Mean :-5.684
## 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619 3rd Qu.:-5.046
## Max. :0.52737 Max. :0.6852 Max. :0.8253 Max. :-2.434
## spread2 D2 status
## Min. :0.006274 Min. :1.423 No : 48
## 1st Qu.:0.174350 1st Qu.:2.099 Yes:147
## Median :0.218885 Median :2.362
## Mean :0.226510 Mean :2.382
## 3rd Qu.:0.279234 3rd Qu.:2.636
## Max. :0.450493 Max. :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (HNR + DFA + NHR + RPDE + MDVP.Fo.Hz. +
## MDVP.Fhi.Hz. + MDVP.Flo.Hz. + MDVP.Jitter... + MDVP.Jitter.Abs. +
## MDVP.RAP + MDVP.PPQ + Jitter.DDP + MDVP.Shimmer + MDVP.Shimmer.dB. +
## Shimmer.APQ3 + Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA + PPE +
## spread1 + spread2 + D2), data = train_set, method = "class")
## n= 130
##
## CP nsplit rel error xerror xstd
## 1 0.406250 0 1.00000 1.00000 0.1534852
## 2 0.156250 1 0.59375 0.78125 0.1404245
## 3 0.046875 3 0.28125 0.62500 0.1285552
## 4 0.010000 5 0.18750 0.62500 0.1285552
##
## Variable importance
## PPE spread1 MDVP.PPQ DFA
## 14 13 8 8
## MDVP.Jitter... Jitter.DDP MDVP.RAP Shimmer.APQ3
## 7 7 7 5
## Shimmer.DDA MDVP.Flo.Hz. MDVP.Fhi.Hz. Shimmer.APQ5
## 5 5 4 4
## spread2 MDVP.Fo.Hz. NHR MDVP.Shimmer
## 3 3 2 2
## HNR MDVP.Shimmer.dB.
## 2 2
##
## Node number 1: 130 observations, complexity param=0.40625
## predicted class=Yes expected loss=0.2461538 P(node) =1
## class counts: 32 98
## probabilities: 0.246 0.754
## left son=2 (39 obs) right son=3 (91 obs)
## Primary splits:
## spread1 < -6.317759 to the left, improve=19.70403, (0 missing)
## PPE < 0.1051815 to the left, improve=18.89477, (0 missing)
## MDVP.Flo.Hz. < 188.6565 to the right, improve=17.83236, (0 missing)
## MDVP.Fo.Hz. < 226.0965 to the right, improve=13.65792, (0 missing)
## MDVP.RAP < 0.001775 to the left, improve=13.17687, (0 missing)
## Surrogate splits:
## PPE < 0.14169 to the left, agree=0.977, adj=0.923, (0 split)
## MDVP.PPQ < 0.001845 to the left, agree=0.892, adj=0.641, (0 split)
## MDVP.Jitter... < 0.003315 to the left, agree=0.869, adj=0.564, (0 split)
## Jitter.DDP < 0.005075 to the left, agree=0.862, adj=0.538, (0 split)
## MDVP.RAP < 0.001685 to the left, agree=0.854, adj=0.513, (0 split)
##
## Node number 2: 39 observations, complexity param=0.15625
## predicted class=No expected loss=0.3333333 P(node) =0.3
## class counts: 26 13
## probabilities: 0.667 0.333
## left son=4 (16 obs) right son=5 (23 obs)
## Primary splits:
## MDVP.Fhi.Hz. < 229.1805 to the right, improve=6.028986, (0 missing)
## MDVP.Flo.Hz. < 180.185 to the right, improve=5.416667, (0 missing)
## MDVP.Fo.Hz. < 191.6195 to the right, improve=5.158730, (0 missing)
## PPE < 0.1051815 to the left, improve=4.541889, (0 missing)
## DFA < 0.674851 to the left, improve=4.333333, (0 missing)
## Surrogate splits:
## DFA < 0.674851 to the left, agree=0.923, adj=0.813, (0 split)
## MDVP.Fo.Hz. < 207.7355 to the right, agree=0.923, adj=0.813, (0 split)
## MDVP.Flo.Hz. < 200.8275 to the right, agree=0.846, adj=0.625, (0 split)
## spread2 < 0.136211 to the left, agree=0.821, adj=0.562, (0 split)
## PPE < 0.1033925 to the left, agree=0.744, adj=0.375, (0 split)
##
## Node number 3: 91 observations, complexity param=0.046875
## predicted class=Yes expected loss=0.06593407 P(node) =0.7
## class counts: 6 85
## probabilities: 0.066 0.934
## left son=6 (27 obs) right son=7 (64 obs)
## Primary splits:
## spread2 < 0.2100605 to the left, improve=1.875458, (0 missing)
## Shimmer.APQ3 < 0.008735 to the left, improve=1.868148, (0 missing)
## Shimmer.DDA < 0.02621 to the left, improve=1.868148, (0 missing)
## MDVP.Shimmer.dB. < 0.1375 to the left, improve=1.675659, (0 missing)
## MDVP.Jitter... < 0.003505 to the left, improve=1.428303, (0 missing)
## Surrogate splits:
## PPE < 0.184526 to the left, agree=0.791, adj=0.296, (0 split)
## D2 < 2.2965 to the left, agree=0.791, adj=0.296, (0 split)
## spread1 < -5.921087 to the left, agree=0.780, adj=0.259, (0 split)
## RPDE < 0.389095 to the left, agree=0.736, adj=0.111, (0 split)
## MDVP.Jitter... < 0.003325 to the left, agree=0.736, adj=0.111, (0 split)
##
## Node number 4: 16 observations
## predicted class=No expected loss=0 P(node) =0.1230769
## class counts: 16 0
## probabilities: 1.000 0.000
##
## Node number 5: 23 observations, complexity param=0.15625
## predicted class=Yes expected loss=0.4347826 P(node) =0.1769231
## class counts: 10 13
## probabilities: 0.435 0.565
## left son=10 (13 obs) right son=11 (10 obs)
## Primary splits:
## DFA < 0.728728 to the right, improve=6.688963, (0 missing)
## RPDE < 0.3371195 to the right, improve=3.804348, (0 missing)
## NHR < 0.004895 to the left, improve=3.097999, (0 missing)
## D2 < 2.23273 to the left, improve=3.097999, (0 missing)
## MDVP.Fo.Hz. < 139.797 to the left, improve=2.437681, (0 missing)
## Surrogate splits:
## NHR < 0.004895 to the left, agree=0.783, adj=0.5, (0 split)
## MDVP.Flo.Hz. < 112.256 to the right, agree=0.783, adj=0.5, (0 split)
## Shimmer.APQ3 < 0.005505 to the right, agree=0.739, adj=0.4, (0 split)
## Shimmer.APQ5 < 0.006365 to the right, agree=0.739, adj=0.4, (0 split)
## Shimmer.DDA < 0.016515 to the right, agree=0.739, adj=0.4, (0 split)
##
## Node number 6: 27 observations, complexity param=0.046875
## predicted class=Yes expected loss=0.2222222 P(node) =0.2076923
## class counts: 6 21
## probabilities: 0.222 0.778
## left son=12 (7 obs) right son=13 (20 obs)
## Primary splits:
## Shimmer.APQ3 < 0.00925 to the left, improve=4.576190, (0 missing)
## Shimmer.DDA < 0.02776 to the left, improve=4.576190, (0 missing)
## MDVP.Shimmer < 0.01914 to the left, improve=3.688596, (0 missing)
## MDVP.Shimmer.dB. < 0.1845 to the left, improve=3.688596, (0 missing)
## MDVP.Flo.Hz. < 89.1605 to the right, improve=3.333333, (0 missing)
## Surrogate splits:
## Shimmer.DDA < 0.02776 to the left, agree=1.000, adj=1.000, (0 split)
## MDVP.Shimmer < 0.02032 to the left, agree=0.926, adj=0.714, (0 split)
## Shimmer.APQ5 < 0.01274 to the left, agree=0.926, adj=0.714, (0 split)
## HNR < 23.0435 to the right, agree=0.889, adj=0.571, (0 split)
## MDVP.Shimmer.dB. < 0.1485 to the left, agree=0.889, adj=0.571, (0 split)
##
## Node number 7: 64 observations
## predicted class=Yes expected loss=0 P(node) =0.4923077
## class counts: 0 64
## probabilities: 0.000 1.000
##
## Node number 10: 13 observations
## predicted class=No expected loss=0.2307692 P(node) =0.1
## class counts: 10 3
## probabilities: 0.769 0.231
##
## Node number 11: 10 observations
## predicted class=Yes expected loss=0 P(node) =0.07692308
## class counts: 0 10
## probabilities: 0.000 1.000
##
## Node number 12: 7 observations
## predicted class=No expected loss=0.2857143 P(node) =0.05384615
## class counts: 5 2
## probabilities: 0.714 0.286
##
## Node number 13: 20 observations
## predicted class=Yes expected loss=0.05 P(node) =0.1538462
## class counts: 1 19
## probabilities: 0.050 0.950
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 29.036, df = 1, p-value = 7.103e-08
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8769 0.8769 0.8769 0.8769 0.8769 0.8769
Visualization
plot(fit,main="(ANNEM SHIVAJI 20MIC0091)")
text(fit)
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(MDVP.Jitter...:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(MDVP.Jitter...+MDVP.Jitter.Abs.+MDVP.RAP+MDVP.PPQ+Jitter.DDP)
,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)
predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
## predict_unseen
## No Yes
## No 9 7
## Yes 2 47
sum(diag(cm))/sum(cm)
## [1] 0.8615385
plot(fit,main="Several measures of variation in fundamental frequency (ANNEM SHIVAJI 20MIC0091)")
text(fit)
summary(dataset)
## name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.
## Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48
## Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29
## Mode :character Median :148.79 Median :175.8 Median :104.31
## Mean :154.23 Mean :197.1 Mean :116.32
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02
## Max. :260.11 Max. :592.0 Max. :239.17
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.0000
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:1.0000
## Median :-5.721 Median :0.218885 Median :2.362 Median :1.0000
## Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.7538
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:1.0000
## Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :1.0000
summary(clean_dataset)
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 No : 48
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 Yes:147
## Median :-5.721 Median :0.218885 Median :2.362
## Mean :-5.684 Mean :0.226510 Mean :2.382
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636
## Max. :-2.434 Max. :0.450493 Max. :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (MDVP.Jitter... + MDVP.Jitter.Abs. +
## MDVP.RAP + MDVP.PPQ + Jitter.DDP), data = train_set, method = "class")
## n= 130
##
## CP nsplit rel error xerror xstd
## 1 0.15625 0 1.00000 1.00000 0.1534852
## 2 0.12500 2 0.68750 1.15625 0.1607758
## 3 0.03125 3 0.56250 0.81250 0.1425219
## 4 0.01000 4 0.53125 0.75000 0.1382410
##
## Variable importance
## Jitter.DDP MDVP.RAP MDVP.Jitter... MDVP.PPQ
## 21 21 21 21
## MDVP.Jitter.Abs.
## 16
##
## Node number 1: 130 observations, complexity param=0.15625
## predicted class=Yes expected loss=0.2461538 P(node) =1
## class counts: 32 98
## probabilities: 0.246 0.754
## left son=2 (45 obs) right son=3 (85 obs)
## Primary splits:
## MDVP.RAP < 0.001775 to the left, improve=13.17687, (0 missing)
## Jitter.DDP < 0.00533 to the left, improve=13.17687, (0 missing)
## MDVP.PPQ < 0.00204 to the left, improve=12.58673, (0 missing)
## MDVP.Jitter.Abs. < 1.5e-05 to the left, improve=12.00070, (0 missing)
## MDVP.Jitter... < 0.003035 to the left, improve=10.40968, (0 missing)
## Surrogate splits:
## Jitter.DDP < 0.00533 to the left, agree=1.000, adj=1.000, (0 split)
## MDVP.PPQ < 0.00204 to the left, agree=0.977, adj=0.933, (0 split)
## MDVP.Jitter... < 0.00362 to the left, agree=0.938, adj=0.822, (0 split)
## MDVP.Jitter.Abs. < 2.5e-05 to the left, agree=0.838, adj=0.533, (0 split)
##
## Node number 2: 45 observations, complexity param=0.15625
## predicted class=No expected loss=0.4444444 P(node) =0.3461538
## class counts: 25 20
## probabilities: 0.556 0.444
## left son=4 (20 obs) right son=5 (25 obs)
## Primary splits:
## MDVP.Jitter.Abs. < 1.5e-05 to the left, improve=2.7222220, (0 missing)
## MDVP.Jitter... < 0.002915 to the left, improve=1.5022220, (0 missing)
## MDVP.PPQ < 0.00169 to the left, improve=1.1898340, (0 missing)
## MDVP.RAP < 0.00107 to the right, improve=0.6343844, (0 missing)
## Jitter.DDP < 0.00321 to the right, improve=0.6343844, (0 missing)
## Surrogate splits:
## MDVP.Jitter... < 0.002915 to the left, agree=0.911, adj=0.8, (0 split)
## MDVP.PPQ < 0.00154 to the left, agree=0.822, adj=0.6, (0 split)
## MDVP.RAP < 0.001145 to the left, agree=0.733, adj=0.4, (0 split)
## Jitter.DDP < 0.003435 to the left, agree=0.733, adj=0.4, (0 split)
##
## Node number 3: 85 observations
## predicted class=Yes expected loss=0.08235294 P(node) =0.6538462
## class counts: 7 78
## probabilities: 0.082 0.918
##
## Node number 4: 20 observations, complexity param=0.03125
## predicted class=No expected loss=0.25 P(node) =0.1538462
## class counts: 15 5
## probabilities: 0.750 0.250
## left son=8 (13 obs) right son=9 (7 obs)
## Primary splits:
## MDVP.RAP < 0.00107 to the right, improve=2.225275, (0 missing)
## Jitter.DDP < 0.003205 to the right, improve=2.225275, (0 missing)
## MDVP.PPQ < 0.00135 to the right, improve=1.666667, (0 missing)
## MDVP.Jitter... < 0.00262 to the right, improve=1.346154, (0 missing)
## Surrogate splits:
## Jitter.DDP < 0.003205 to the right, agree=1.00, adj=1.000, (0 split)
## MDVP.Jitter... < 0.002015 to the right, agree=0.95, adj=0.857, (0 split)
## MDVP.PPQ < 0.00135 to the right, agree=0.95, adj=0.857, (0 split)
##
## Node number 5: 25 observations, complexity param=0.125
## predicted class=Yes expected loss=0.4 P(node) =0.1923077
## class counts: 10 15
## probabilities: 0.400 0.600
## left son=10 (14 obs) right son=11 (11 obs)
## Primary splits:
## MDVP.Jitter.Abs. < 2.5e-05 to the right, improve=3.7532470, (0 missing)
## MDVP.Jitter... < 0.003475 to the right, improve=2.1948050, (0 missing)
## MDVP.RAP < 0.001465 to the left, improve=0.6805556, (0 missing)
## Jitter.DDP < 0.00439 to the left, improve=0.6805556, (0 missing)
## MDVP.PPQ < 0.00189 to the right, improve=0.3333333, (0 missing)
## Surrogate splits:
## MDVP.Jitter... < 0.00326 to the right, agree=0.84, adj=0.636, (0 split)
## MDVP.PPQ < 0.00152 to the right, agree=0.72, adj=0.364, (0 split)
## MDVP.RAP < 0.001225 to the right, agree=0.68, adj=0.273, (0 split)
## Jitter.DDP < 0.003685 to the right, agree=0.68, adj=0.273, (0 split)
##
## Node number 8: 13 observations
## predicted class=No expected loss=0.07692308 P(node) =0.1
## class counts: 12 1
## probabilities: 0.923 0.077
##
## Node number 9: 7 observations
## predicted class=Yes expected loss=0.4285714 P(node) =0.05384615
## class counts: 3 4
## probabilities: 0.429 0.571
##
## Node number 10: 14 observations
## predicted class=No expected loss=0.3571429 P(node) =0.1076923
## class counts: 9 5
## probabilities: 0.643 0.357
##
## Node number 11: 11 observations
## predicted class=Yes expected loss=0.09090909 P(node) =0.08461538
## class counts: 1 10
## probabilities: 0.091 0.909
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 23.348, df = 1, p-value = 1.352e-06
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8769 0.8769 0.8769 0.8769 0.8769 0.8769
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(MDVP.Shimmer:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(MDVP.Shimmer+MDVP.Shimmer.dB.+Shimmer.APQ3+Shimmer.APQ5+MDVP.APQ+Shimmer.DDA)
,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)
predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
## predict_unseen
## No Yes
## No 8 8
## Yes 4 45
sum(diag(cm))/sum(cm)
## [1] 0.8153846
plot(fit,main="Several measures of variation in amplitude (ANNEM SHIVAJI 20MIC0091)")
text(fit)
summary(dataset)
## name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.
## Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48
## Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29
## Mode :character Median :148.79 Median :175.8 Median :104.31
## Mean :154.23 Mean :197.1 Mean :116.32
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02
## Max. :260.11 Max. :592.0 Max. :239.17
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.0000
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:1.0000
## Median :-5.721 Median :0.218885 Median :2.362 Median :1.0000
## Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.7538
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:1.0000
## Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :1.0000
summary(clean_dataset)
## MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3 Shimmer.APQ5
## Min. :0.00954 Min. :0.0850 Min. :0.004550 Min. :0.00570
## 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245 1st Qu.:0.00958
## Median :0.02297 Median :0.2210 Median :0.012790 Median :0.01347
## Mean :0.02971 Mean :0.2823 Mean :0.015664 Mean :0.01788
## 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265 3rd Qu.:0.02238
## Max. :0.11908 Max. :1.3020 Max. :0.056470 Max. :0.07940
## MDVP.APQ Shimmer.DDA NHR HNR
## Min. :0.00719 Min. :0.01364 Min. :0.000650 Min. : 8.441
## 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925 1st Qu.:19.198
## Median :0.01826 Median :0.03836 Median :0.011660 Median :22.085
## Mean :0.02408 Mean :0.04699 Mean :0.024847 Mean :21.886
## 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640 3rd Qu.:25.076
## Max. :0.13778 Max. :0.16942 Max. :0.314820 Max. :33.047
## PPE RPDE DFA spread1
## Min. :0.04454 Min. :0.2566 Min. :0.5743 Min. :-7.965
## 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748 1st Qu.:-6.450
## Median :0.19405 Median :0.4960 Median :0.7223 Median :-5.721
## Mean :0.20655 Mean :0.4985 Mean :0.7181 Mean :-5.684
## 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619 3rd Qu.:-5.046
## Max. :0.52737 Max. :0.6852 Max. :0.8253 Max. :-2.434
## spread2 D2 status
## Min. :0.006274 Min. :1.423 No : 48
## 1st Qu.:0.174350 1st Qu.:2.099 Yes:147
## Median :0.218885 Median :2.362
## Mean :0.226510 Mean :2.382
## 3rd Qu.:0.279234 3rd Qu.:2.636
## Max. :0.450493 Max. :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (MDVP.Shimmer + MDVP.Shimmer.dB. + Shimmer.APQ3 +
## Shimmer.APQ5 + MDVP.APQ + Shimmer.DDA), data = train_set,
## method = "class")
## n= 130
##
## CP nsplit rel error xerror xstd
## 1 0.093750 0 1.00000 1.00000 0.1534852
## 2 0.046875 2 0.81250 1.00000 0.1534852
## 3 0.031250 4 0.71875 1.03125 0.1550676
## 4 0.010000 5 0.68750 1.06250 0.1565862
##
## Variable importance
## MDVP.APQ MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ5
## 17 17 17 17
## Shimmer.APQ3 Shimmer.DDA
## 16 16
##
## Node number 1: 130 observations, complexity param=0.09375
## predicted class=Yes expected loss=0.2461538 P(node) =1
## class counts: 32 98
## probabilities: 0.246 0.754
## left son=2 (64 obs) right son=3 (66 obs)
## Primary splits:
## Shimmer.APQ5 < 0.012675 to the left, improve=10.800130, (0 missing)
## MDVP.APQ < 0.019775 to the left, improve= 9.909184, (0 missing)
## MDVP.Shimmer.dB. < 0.219 to the left, improve= 9.271771, (0 missing)
## MDVP.Shimmer < 0.025005 to the left, improve= 9.044039, (0 missing)
## Shimmer.APQ3 < 0.01417 to the left, improve= 8.078513, (0 missing)
## Surrogate splits:
## MDVP.Shimmer < 0.0207 to the left, agree=0.962, adj=0.922, (0 split)
## MDVP.Shimmer.dB. < 0.214 to the left, agree=0.954, adj=0.906, (0 split)
## Shimmer.APQ3 < 0.010685 to the left, agree=0.938, adj=0.875, (0 split)
## Shimmer.DDA < 0.032045 to the left, agree=0.938, adj=0.875, (0 split)
## MDVP.APQ < 0.018285 to the left, agree=0.923, adj=0.844, (0 split)
##
## Node number 2: 64 observations, complexity param=0.09375
## predicted class=Yes expected loss=0.453125 P(node) =0.4923077
## class counts: 29 35
## probabilities: 0.453 0.547
## left son=4 (30 obs) right son=5 (34 obs)
## Primary splits:
## Shimmer.APQ3 < 0.00803 to the right, improve=2.4363970, (0 missing)
## Shimmer.DDA < 0.02409 to the right, improve=2.4363970, (0 missing)
## MDVP.APQ < 0.01215 to the left, improve=2.3057950, (0 missing)
## Shimmer.APQ5 < 0.006365 to the right, improve=1.9687500, (0 missing)
## MDVP.Shimmer < 0.012995 to the right, improve=0.5622874, (0 missing)
## Surrogate splits:
## Shimmer.DDA < 0.02409 to the right, agree=1.000, adj=1.000, (0 split)
## MDVP.Shimmer < 0.01569 to the right, agree=0.859, adj=0.700, (0 split)
## MDVP.Shimmer.dB. < 0.1405 to the right, agree=0.859, adj=0.700, (0 split)
## Shimmer.APQ5 < 0.009715 to the right, agree=0.859, adj=0.700, (0 split)
## MDVP.APQ < 0.0124 to the right, agree=0.734, adj=0.433, (0 split)
##
## Node number 3: 66 observations
## predicted class=Yes expected loss=0.04545455 P(node) =0.5076923
## class counts: 3 63
## probabilities: 0.045 0.955
##
## Node number 4: 30 observations, complexity param=0.046875
## predicted class=No expected loss=0.4 P(node) =0.2307692
## class counts: 18 12
## probabilities: 0.600 0.400
## left son=8 (9 obs) right son=9 (21 obs)
## Primary splits:
## MDVP.APQ < 0.01278 to the left, improve=2.146032, (0 missing)
## MDVP.Shimmer < 0.0167 to the left, improve=1.650000, (0 missing)
## MDVP.Shimmer.dB. < 0.147 to the left, improve=1.207453, (0 missing)
## Shimmer.APQ3 < 0.008825 to the left, improve=1.200000, (0 missing)
## Shimmer.DDA < 0.026485 to the left, improve=1.200000, (0 missing)
## Surrogate splits:
## MDVP.Shimmer < 0.01652 to the left, agree=0.8, adj=0.333, (0 split)
## MDVP.Shimmer.dB. < 0.1415 to the left, agree=0.8, adj=0.333, (0 split)
##
## Node number 5: 34 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.3235294 P(node) =0.2615385
## class counts: 11 23
## probabilities: 0.324 0.676
## left son=10 (19 obs) right son=11 (15 obs)
## Primary splits:
## MDVP.APQ < 0.011495 to the left, improve=3.542002, (0 missing)
## MDVP.Shimmer.dB. < 0.137 to the left, improve=2.562353, (0 missing)
## MDVP.Shimmer < 0.01586 to the left, improve=1.845316, (0 missing)
## Shimmer.APQ3 < 0.00673 to the left, improve=1.118464, (0 missing)
## Shimmer.DDA < 0.020195 to the left, improve=1.118464, (0 missing)
## Surrogate splits:
## MDVP.Shimmer.dB. < 0.13 to the left, agree=0.941, adj=0.867, (0 split)
## MDVP.Shimmer < 0.014345 to the left, agree=0.912, adj=0.800, (0 split)
## Shimmer.APQ5 < 0.00802 to the left, agree=0.853, adj=0.667, (0 split)
## Shimmer.APQ3 < 0.006655 to the left, agree=0.824, adj=0.600, (0 split)
## Shimmer.DDA < 0.019965 to the left, agree=0.824, adj=0.600, (0 split)
##
## Node number 8: 9 observations
## predicted class=No expected loss=0.1111111 P(node) =0.06923077
## class counts: 8 1
## probabilities: 0.889 0.111
##
## Node number 9: 21 observations, complexity param=0.046875
## predicted class=Yes expected loss=0.4761905 P(node) =0.1615385
## class counts: 10 11
## probabilities: 0.476 0.524
## left son=18 (8 obs) right son=19 (13 obs)
## Primary splits:
## Shimmer.APQ3 < 0.009745 to the right, improve=0.5723443, (0 missing)
## Shimmer.DDA < 0.02923 to the right, improve=0.5723443, (0 missing)
## MDVP.Shimmer < 0.01914 to the left, improve=0.2216450, (0 missing)
## MDVP.Shimmer.dB. < 0.1755 to the left, improve=0.2216450, (0 missing)
## Shimmer.APQ5 < 0.010875 to the left, improve=0.1984127, (0 missing)
## Surrogate splits:
## Shimmer.DDA < 0.02923 to the right, agree=1.000, adj=1.000, (0 split)
## MDVP.Shimmer < 0.019815 to the right, agree=0.905, adj=0.750, (0 split)
## MDVP.Shimmer.dB. < 0.1945 to the right, agree=0.857, adj=0.625, (0 split)
## Shimmer.APQ5 < 0.01173 to the right, agree=0.810, adj=0.500, (0 split)
##
## Node number 10: 19 observations
## predicted class=No expected loss=0.4736842 P(node) =0.1461538
## class counts: 10 9
## probabilities: 0.526 0.474
##
## Node number 11: 15 observations
## predicted class=Yes expected loss=0.06666667 P(node) =0.1153846
## class counts: 1 14
## probabilities: 0.067 0.933
##
## Node number 18: 8 observations
## predicted class=No expected loss=0.375 P(node) =0.06153846
## class counts: 5 3
## probabilities: 0.625 0.375
##
## Node number 19: 13 observations
## predicted class=Yes expected loss=0.3846154 P(node) =0.1
## class counts: 5 8
## probabilities: 0.385 0.615
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 14.025, df = 1, p-value = 0.0001804
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8769 0.8769 0.8769 0.8769 0.8769 0.8769
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(NHR:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(HNR+NHR),data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)
predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
## predict_unseen
## No Yes
## No 5 11
## Yes 2 47
sum(diag(cm))/sum(cm)
## [1] 0.8
plot(fit,main="Two measures of ratio of noise to tonal components in the voice (ANNEM SHIVAJI 20MIC0091)")
text(fit)
summary(dataset)
## name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.
## Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48
## Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29
## Mode :character Median :148.79 Median :175.8 Median :104.31
## Mean :154.23 Mean :197.1 Mean :116.32
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02
## Max. :260.11 Max. :592.0 Max. :239.17
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.0000
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:1.0000
## Median :-5.721 Median :0.218885 Median :2.362 Median :1.0000
## Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.7538
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:1.0000
## Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :1.0000
summary(clean_dataset)
## NHR HNR PPE RPDE
## Min. :0.000650 Min. : 8.441 Min. :0.04454 Min. :0.2566
## 1st Qu.:0.005925 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213
## Median :0.011660 Median :22.085 Median :0.19405 Median :0.4960
## Mean :0.024847 Mean :21.886 Mean :0.20655 Mean :0.4985
## 3rd Qu.:0.025640 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876
## Max. :0.314820 Max. :33.047 Max. :0.52737 Max. :0.6852
## DFA spread1 spread2 D2 status
## Min. :0.5743 Min. :-7.965 Min. :0.006274 Min. :1.423 No : 48
## 1st Qu.:0.6748 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 Yes:147
## Median :0.7223 Median :-5.721 Median :0.218885 Median :2.362
## Mean :0.7181 Mean :-5.684 Mean :0.226510 Mean :2.382
## 3rd Qu.:0.7619 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636
## Max. :0.8253 Max. :-2.434 Max. :0.450493 Max. :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (HNR + NHR), data = train_set, method = "class")
## n= 130
##
## CP nsplit rel error xerror xstd
## 1 0.28125 0 1.00000 1.0000 0.1534852
## 2 0.06250 1 0.71875 0.9375 0.1501201
## 3 0.01000 2 0.65625 0.9375 0.1501201
##
## Variable importance
## NHR HNR
## 66 34
##
## Node number 1: 130 observations, complexity param=0.28125
## predicted class=Yes expected loss=0.2461538 P(node) =1
## class counts: 32 98
## probabilities: 0.246 0.754
## left son=2 (27 obs) right son=3 (103 obs)
## Primary splits:
## NHR < 0.00486 to the left, improve=12.051980, (0 missing)
## HNR < 23.0435 to the right, improve= 8.556499, (0 missing)
## Surrogate splits:
## HNR < 25.0275 to the right, agree=0.885, adj=0.444, (0 split)
##
## Node number 2: 27 observations, complexity param=0.0625
## predicted class=No expected loss=0.3333333 P(node) =0.2076923
## class counts: 18 9
## probabilities: 0.667 0.333
## left son=4 (19 obs) right son=5 (8 obs)
## Primary splits:
## NHR < 0.002785 to the right, improve=1.934211, (0 missing)
## HNR < 26.8135 to the left, improve=1.071429, (0 missing)
## Surrogate splits:
## HNR < 26.8135 to the left, agree=0.963, adj=0.875, (0 split)
##
## Node number 3: 103 observations
## predicted class=Yes expected loss=0.1359223 P(node) =0.7923077
## class counts: 14 89
## probabilities: 0.136 0.864
##
## Node number 4: 19 observations
## predicted class=No expected loss=0.2105263 P(node) =0.1461538
## class counts: 15 4
## probabilities: 0.789 0.211
##
## Node number 5: 8 observations
## predicted class=Yes expected loss=0.375 P(node) =0.06153846
## class counts: 3 5
## probabilities: 0.375 0.625
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 9.265, df = 1, p-value = 0.002336
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8769 0.8769 0.8769 0.8769 0.8769 0.8769
my_url="https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data"
dataset2=read.csv(my_url,header=TRUE,sep=",")
dataset=dataset2[, c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,24,19,20,21,22,23,18)]
library(dplyr)
clean_dataset=dataset%>%select(RPDE:status)%>%mutate(status=factor(status,levels=c(0,1),labels=c("No","Yes")))%>%na.omit()
library(caTools)
set.seed(123)
split=sample.split(Y=clean_dataset$status,SplitRatio=2/3)
train_set=subset(x=clean_dataset,split==T)
test_set=subset(x=clean_dataset,split==F)
library(rpart)
fit=rpart(formula=status~(RPDE+D2)
,data=train_set,method="class")
library(rpart.plot)
rpart.plot(fit)
predict_unseen=predict(object=fit,newdata=test_set,type="class")
cm=table(test_set$status,predict_unseen)
cm
## predict_unseen
## No Yes
## No 0 16
## Yes 12 37
sum(diag(cm))/sum(cm)
## [1] 0.5692308
plot(fit,main="Two nonlinear dynamical complexity measures(ANNEM SHIVAJI 20MIC0091)")
text(fit)
summary(dataset)
## name MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.
## Length:195 Min. : 88.33 Min. :102.1 Min. : 65.48
## Class :character 1st Qu.:117.57 1st Qu.:134.9 1st Qu.: 84.29
## Mode :character Median :148.79 Median :175.8 Median :104.31
## Mean :154.23 Mean :197.1 Mean :116.32
## 3rd Qu.:182.77 3rd Qu.:224.2 3rd Qu.:140.02
## Max. :260.11 Max. :592.0 Max. :239.17
## MDVP.Jitter... MDVP.Jitter.Abs. MDVP.RAP MDVP.PPQ
## Min. :0.001680 Min. :7.000e-06 Min. :0.000680 Min. :0.000920
## 1st Qu.:0.003460 1st Qu.:2.000e-05 1st Qu.:0.001660 1st Qu.:0.001860
## Median :0.004940 Median :3.000e-05 Median :0.002500 Median :0.002690
## Mean :0.006220 Mean :4.396e-05 Mean :0.003306 Mean :0.003446
## 3rd Qu.:0.007365 3rd Qu.:6.000e-05 3rd Qu.:0.003835 3rd Qu.:0.003955
## Max. :0.033160 Max. :2.600e-04 Max. :0.021440 Max. :0.019580
## Jitter.DDP MDVP.Shimmer MDVP.Shimmer.dB. Shimmer.APQ3
## Min. :0.002040 Min. :0.00954 Min. :0.0850 Min. :0.004550
## 1st Qu.:0.004985 1st Qu.:0.01650 1st Qu.:0.1485 1st Qu.:0.008245
## Median :0.007490 Median :0.02297 Median :0.2210 Median :0.012790
## Mean :0.009920 Mean :0.02971 Mean :0.2823 Mean :0.015664
## 3rd Qu.:0.011505 3rd Qu.:0.03789 3rd Qu.:0.3500 3rd Qu.:0.020265
## Max. :0.064330 Max. :0.11908 Max. :1.3020 Max. :0.056470
## Shimmer.APQ5 MDVP.APQ Shimmer.DDA NHR
## Min. :0.00570 Min. :0.00719 Min. :0.01364 Min. :0.000650
## 1st Qu.:0.00958 1st Qu.:0.01308 1st Qu.:0.02474 1st Qu.:0.005925
## Median :0.01347 Median :0.01826 Median :0.03836 Median :0.011660
## Mean :0.01788 Mean :0.02408 Mean :0.04699 Mean :0.024847
## 3rd Qu.:0.02238 3rd Qu.:0.02940 3rd Qu.:0.06080 3rd Qu.:0.025640
## Max. :0.07940 Max. :0.13778 Max. :0.16942 Max. :0.314820
## HNR PPE RPDE DFA
## Min. : 8.441 Min. :0.04454 Min. :0.2566 Min. :0.5743
## 1st Qu.:19.198 1st Qu.:0.13745 1st Qu.:0.4213 1st Qu.:0.6748
## Median :22.085 Median :0.19405 Median :0.4960 Median :0.7223
## Mean :21.886 Mean :0.20655 Mean :0.4985 Mean :0.7181
## 3rd Qu.:25.076 3rd Qu.:0.25298 3rd Qu.:0.5876 3rd Qu.:0.7619
## Max. :33.047 Max. :0.52737 Max. :0.6852 Max. :0.8253
## spread1 spread2 D2 status
## Min. :-7.965 Min. :0.006274 Min. :1.423 Min. :0.0000
## 1st Qu.:-6.450 1st Qu.:0.174350 1st Qu.:2.099 1st Qu.:1.0000
## Median :-5.721 Median :0.218885 Median :2.362 Median :1.0000
## Mean :-5.684 Mean :0.226510 Mean :2.382 Mean :0.7538
## 3rd Qu.:-5.046 3rd Qu.:0.279234 3rd Qu.:2.636 3rd Qu.:1.0000
## Max. :-2.434 Max. :0.450493 Max. :3.671 Max. :1.0000
summary(clean_dataset)
## RPDE DFA spread1 spread2
## Min. :0.2566 Min. :0.5743 Min. :-7.965 Min. :0.006274
## 1st Qu.:0.4213 1st Qu.:0.6748 1st Qu.:-6.450 1st Qu.:0.174350
## Median :0.4960 Median :0.7223 Median :-5.721 Median :0.218885
## Mean :0.4985 Mean :0.7181 Mean :-5.684 Mean :0.226510
## 3rd Qu.:0.5876 3rd Qu.:0.7619 3rd Qu.:-5.046 3rd Qu.:0.279234
## Max. :0.6852 Max. :0.8253 Max. :-2.434 Max. :0.450493
## D2 status
## Min. :1.423 No : 48
## 1st Qu.:2.099 Yes:147
## Median :2.362
## Mean :2.382
## 3rd Qu.:2.636
## Max. :3.671
summary(fit)
## Call:
## rpart(formula = status ~ (RPDE + D2), data = train_set, method = "class")
## n= 130
##
## CP nsplit rel error xerror xstd
## 1 0.03125 0 1.00 1.0000 0.1534852
## 2 0.01000 7 0.75 1.4375 0.1703715
##
## Variable importance
## D2 RPDE
## 79 21
##
## Node number 1: 130 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.2461538 P(node) =1
## class counts: 32 98
## probabilities: 0.246 0.754
## left son=2 (85 obs) right son=3 (45 obs)
## Primary splits:
## D2 < 2.507272 to the left, improve=5.600402, (0 missing)
## RPDE < 0.470175 to the left, improve=4.512821, (0 missing)
## Surrogate splits:
## RPDE < 0.6253385 to the left, agree=0.692, adj=0.111, (0 split)
##
## Node number 2: 85 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.3529412 P(node) =0.6538462
## class counts: 30 55
## probabilities: 0.353 0.647
## left son=4 (42 obs) right son=5 (43 obs)
## Primary splits:
## RPDE < 0.470203 to the left, improve=2.5223110, (0 missing)
## D2 < 2.285122 to the left, improve=0.8849329, (0 missing)
## Surrogate splits:
## D2 < 2.180933 to the right, agree=0.671, adj=0.333, (0 split)
##
## Node number 3: 45 observations
## predicted class=Yes expected loss=0.04444444 P(node) =0.3461538
## class counts: 2 43
## probabilities: 0.044 0.956
##
## Node number 4: 42 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.4761905 P(node) =0.3230769
## class counts: 20 22
## probabilities: 0.476 0.524
## left son=8 (7 obs) right son=9 (35 obs)
## Primary splits:
## D2 < 2.383098 to the right, improve=0.952381, (0 missing)
## RPDE < 0.3371195 to the right, improve=0.814881, (0 missing)
##
## Node number 5: 43 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.2325581 P(node) =0.3307692
## class counts: 10 33
## probabilities: 0.233 0.767
## left son=10 (28 obs) right son=11 (15 obs)
## Primary splits:
## D2 < 2.168121 to the left, improve=2.491694, (0 missing)
## RPDE < 0.5180245 to the right, improve=1.063123, (0 missing)
## Surrogate splits:
## RPDE < 0.505685 to the right, agree=0.674, adj=0.067, (0 split)
##
## Node number 8: 7 observations
## predicted class=No expected loss=0.2857143 P(node) =0.05384615
## class counts: 5 2
## probabilities: 0.714 0.286
##
## Node number 9: 35 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.4285714 P(node) =0.2692308
## class counts: 15 20
## probabilities: 0.429 0.571
## left son=18 (24 obs) right son=19 (11 obs)
## Primary splits:
## D2 < 2.28778 to the left, improve=1.953463, (0 missing)
## RPDE < 0.3371195 to the right, improve=1.911376, (0 missing)
##
## Node number 10: 28 observations, complexity param=0.03125
## predicted class=Yes expected loss=0.3571429 P(node) =0.2153846
## class counts: 10 18
## probabilities: 0.357 0.643
## left son=20 (16 obs) right son=21 (12 obs)
## Primary splits:
## D2 < 2.02441 to the right, improve=3.1488100, (0 missing)
## RPDE < 0.562505 to the right, improve=0.5289377, (0 missing)
## Surrogate splits:
## RPDE < 0.5393685 to the left, agree=0.679, adj=0.25, (0 split)
##
## Node number 11: 15 observations
## predicted class=Yes expected loss=0 P(node) =0.1153846
## class counts: 0 15
## probabilities: 0.000 1.000
##
## Node number 18: 24 observations, complexity param=0.03125
## predicted class=No expected loss=0.4583333 P(node) =0.1846154
## class counts: 13 11
## probabilities: 0.542 0.458
## left son=36 (7 obs) right son=37 (17 obs)
## Primary splits:
## D2 < 2.21637 to the right, improve=0.5889356, (0 missing)
## RPDE < 0.425007 to the left, improve=0.2722222, (0 missing)
##
## Node number 19: 11 observations
## predicted class=Yes expected loss=0.1818182 P(node) =0.08461538
## class counts: 2 9
## probabilities: 0.182 0.818
##
## Node number 20: 16 observations
## predicted class=No expected loss=0.4375 P(node) =0.1230769
## class counts: 9 7
## probabilities: 0.562 0.437
##
## Node number 21: 12 observations
## predicted class=Yes expected loss=0.08333333 P(node) =0.09230769
## class counts: 1 11
## probabilities: 0.083 0.917
##
## Node number 36: 7 observations
## predicted class=No expected loss=0.2857143 P(node) =0.05384615
## class counts: 5 2
## probabilities: 0.714 0.286
##
## Node number 37: 17 observations
## predicted class=Yes expected loss=0.4705882 P(node) =0.1307692
## class counts: 8 9
## probabilities: 0.471 0.529
summary(y_pred)
## 0 1
## 20 29
summary(cm)
## Number of cases in table: 65
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 4.806, df = 1, p-value = 0.02837
## Chi-squared approximation may be incorrect
summary(accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8769 0.8769 0.8769 0.8769 0.8769 0.8769
RELATED ATTRIBUTES OF THE DATASET ON WHICH STATUS DEPENDS ARE:
#Decision tree 1 MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several measures of variation in fundamental frequency
#Decision tree 2 MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude
#Decision tree 3 NHR,HNR - Two measures of ratio of noise to tonal components in the voice
#Decision tree 4 RPDE,D2 - Two nonlinear dynamical complexity measures
#Decision tree 5 spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation .
We discriminated healthy people from those with PD, according to “status” column which is set to 0 for healthy and 1 for PD using DECISION TREE machine learning algorithm here. We split dataset in the ratio 2/3 into test set and train set. Then we did feature scaling and performed prediction and fit using rpart() method from rpart library. The accuracy of the model of decision tree 1 (whole) is 0.8769231. The accuracy of the model of decision tree 2 is 0.8615385. The accuracy of the model of decision tree 3 is 0.8153846. The accuracy of the model of decision tree 4 is 0.8. The accuracy of the model of decision tree 5 is 0.5692308.
The accuracy of the model for every decision tree is is greater than 0.5. So it is also a good machine learning algorithm for this dataset.
The accuracy of the model of decision tree 1(whole) when we take status is dependent variable (it depends on all other attributes) is 0.8769231. It has most accuracy of all our models. Note here we considered all attributes , not related ones.
Next, the accuracy of the KNN model is 0.8615385. (Even decision tree 2 has same accuracy). It is greater than remaining every other models including naïve Bayes, and other decision trees.
The accuracy of Naïve Bayes model is only 0.6734694.
for this dataset ,it is better to trust KNN machine learning model as it has most accuracy among all models. (Decision tree 1 has more accuracy , but has all attributes. So better to trust KNN model)