# clear-up the environment
rm(list = ls())

# chunk options
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  fig.align = "center",
  comment = "#>"
)

Intro

This article is made for completing the assignment for Algoritma : Machine Learning Course in Classification 2 Theia Batch 2022.

In this article, we will try to predict if someone has a potential to get Parkinson or not based on many variables on their speech recordings. This dataset is the output of voice frequency analysis of people that have Parkinson and not have Parkinson.

About Parkinson

Parkinson’s

From Mayoclinic.ORG , Parkinson’s disease is a progressive disorder that affects the nervous system and the parts of the body controlled by the nerves. Symptoms start slowly. The first symptom may be a barely noticeable tremor in just one hand. Tremors are common, but the disorder may also cause stiffness or slowing of movement.

In the early stages of Parkinson’s disease, your face may show little or no expression. Your arms may not swing when you walk. Your speech may become soft or slurred. Parkinson’s disease symptoms worsen as your condition progresses over time.

Based on the picture above, one of the main symptoms is changes in speech. There are slight differences in speech if people have a potential to have Parkinsons. These findings can be used to predict someone has potential to have Parkinson’s before they develop more symptoms and have therapy and cure to reduce the effect of the disease before it become worse.

One example of people with Parkinson’s is my favorite actor, Michael J. Fox. He played as Marty in the Back to the Future Film. You can see how he speaks in this video.

Back2TheFuture

Michael J Fox as Marty

Feel bad for him though :(

About the Dataset

The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 (65.1±10.9) at the Department of Neurology in Cerrahpaşa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82 (61.1Â±8.9). During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel /a/ was collected from each subject with three repetitions.

Various speech signal processing algorithms including Time Frequency Features, Mel Frequency Cepstral Coefficients (MFCCs), Wavelet Transform based Features, Vocal Fold Features and TWQT features have been applied to the speech recordings of Parkinson’s Disease (PD) patients to extract clinically useful information for PD assessment.

Explaratory Data Analysis

Load Library

options(scipen = 99) #non active science annotation (example : e-1)
library(tidyverse) #load several important libraries 
library(dplyr) #grammar of data manipulation
library(ggplot2) #plotting data visualization
library(scales) #edit scales
library(GGally) #plotting data visualization

Load Dataset

#load dataset
parkinsons <- read.csv("data-input/parkinsons-data.csv")
glimpse(parkinsons)

#> Rows: 195
#> Columns: 24
#> $ name             <chr> "phon_R01_S01_1", "phon_R01_S01_2", "phon_R01_S01_3",…
#> $ MDVP.Fo.Hz.      <dbl> 119.992, 122.400, 116.682, 116.676, 116.014, 120.552,…
#> $ MDVP.Fhi.Hz.     <dbl> 157.302, 148.650, 131.111, 137.871, 141.781, 131.162,…
#> $ MDVP.Flo.Hz.     <dbl> 74.997, 113.819, 111.555, 111.366, 110.655, 113.787, …
#> $ MDVP.Jitter...   <dbl> 0.00784, 0.00968, 0.01050, 0.00997, 0.01284, 0.00968,…
#> $ MDVP.Jitter.Abs. <dbl> 0.00007, 0.00008, 0.00009, 0.00009, 0.00011, 0.00008,…
#> $ MDVP.RAP         <dbl> 0.00370, 0.00465, 0.00544, 0.00502, 0.00655, 0.00463,…
#> $ MDVP.PPQ         <dbl> 0.00554, 0.00696, 0.00781, 0.00698, 0.00908, 0.00750,…
#> $ Jitter.DDP       <dbl> 0.01109, 0.01394, 0.01633, 0.01505, 0.01966, 0.01388,…
#> $ MDVP.Shimmer     <dbl> 0.04374, 0.06134, 0.05233, 0.05492, 0.06425, 0.04701,…
#> $ MDVP.Shimmer.dB. <dbl> 0.426, 0.626, 0.482, 0.517, 0.584, 0.456, 0.140, 0.13…
#> $ Shimmer.APQ3     <dbl> 0.02182, 0.03134, 0.02757, 0.02924, 0.03490, 0.02328,…
#> $ Shimmer.APQ5     <dbl> 0.03130, 0.04518, 0.03858, 0.04005, 0.04825, 0.03526,…
#> $ MDVP.APQ         <dbl> 0.02971, 0.04368, 0.03590, 0.03772, 0.04465, 0.03243,…
#> $ Shimmer.DDA      <dbl> 0.06545, 0.09403, 0.08270, 0.08771, 0.10470, 0.06985,…
#> $ NHR              <dbl> 0.02211, 0.01929, 0.01309, 0.01353, 0.01767, 0.01222,…
#> $ HNR              <dbl> 21.033, 19.085, 20.651, 20.644, 19.649, 21.378, 24.88…
#> $ status           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ RPDE             <dbl> 0.414783, 0.458359, 0.429895, 0.434969, 0.417356, 0.4…
#> $ DFA              <dbl> 0.815285, 0.819521, 0.825288, 0.819235, 0.823484, 0.8…
#> $ spread1          <dbl> -4.813031, -4.075192, -4.443179, -4.117501, -3.747787…
#> $ spread2          <dbl> 0.266482, 0.335590, 0.311173, 0.334147, 0.234513, 0.2…
#> $ D2               <dbl> 2.301442, 2.486855, 2.342259, 2.405554, 2.332180, 2.1…
#> $ PPE              <dbl> 0.284654, 0.368674, 0.332634, 0.368975, 0.410335, 0.3…

Matrix column entries (attributes):

Name - ASCII subject name and recording number
MDVP:Fo(Hz) - Average vocal fundamental frequency
MDVP:Fhi(Hz) - Maximum vocal fundamental frequency
MDVP:Flo(Hz) - Minimum vocal fundamental frequency
MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several measures of variation in fundamental frequency
MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude
NHR,HNR - Two measures of ratio of noise to tonal components in the voice
Status - Health status of the subject (one) - Parkinson’s, (zero) - healthy
RPDE,D2 - Two nonlinear dynamical complexity measures
DFA - Signal fractal scaling exponent
spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation

summary(parkinsons)

#>      name            MDVP.Fo.Hz.      MDVP.Fhi.Hz.    MDVP.Flo.Hz.   
#>  Length:195         Min.   : 88.33   Min.   :102.1   Min.   : 65.48  
#>  Class :character   1st Qu.:117.57   1st Qu.:134.9   1st Qu.: 84.29  
#>  Mode  :character   Median :148.79   Median :175.8   Median :104.31  
#>                     Mean   :154.23   Mean   :197.1   Mean   :116.32  
#>                     3rd Qu.:182.77   3rd Qu.:224.2   3rd Qu.:140.02  
#>                     Max.   :260.11   Max.   :592.0   Max.   :239.17  
#>  MDVP.Jitter...     MDVP.Jitter.Abs.        MDVP.RAP           MDVP.PPQ       
#>  Min.   :0.001680   Min.   :0.00000700   Min.   :0.000680   Min.   :0.000920  
#>  1st Qu.:0.003460   1st Qu.:0.00002000   1st Qu.:0.001660   1st Qu.:0.001860  
#>  Median :0.004940   Median :0.00003000   Median :0.002500   Median :0.002690  
#>  Mean   :0.006220   Mean   :0.00004396   Mean   :0.003306   Mean   :0.003446  
#>  3rd Qu.:0.007365   3rd Qu.:0.00006000   3rd Qu.:0.003835   3rd Qu.:0.003955  
#>  Max.   :0.033160   Max.   :0.00026000   Max.   :0.021440   Max.   :0.019580  
#>    Jitter.DDP        MDVP.Shimmer     MDVP.Shimmer.dB.  Shimmer.APQ3     
#>  Min.   :0.002040   Min.   :0.00954   Min.   :0.0850   Min.   :0.004550  
#>  1st Qu.:0.004985   1st Qu.:0.01650   1st Qu.:0.1485   1st Qu.:0.008245  
#>  Median :0.007490   Median :0.02297   Median :0.2210   Median :0.012790  
#>  Mean   :0.009920   Mean   :0.02971   Mean   :0.2823   Mean   :0.015664  
#>  3rd Qu.:0.011505   3rd Qu.:0.03789   3rd Qu.:0.3500   3rd Qu.:0.020265  
#>  Max.   :0.064330   Max.   :0.11908   Max.   :1.3020   Max.   :0.056470  
#>   Shimmer.APQ5        MDVP.APQ        Shimmer.DDA           NHR          
#>  Min.   :0.00570   Min.   :0.00719   Min.   :0.01364   Min.   :0.000650  
#>  1st Qu.:0.00958   1st Qu.:0.01308   1st Qu.:0.02474   1st Qu.:0.005925  
#>  Median :0.01347   Median :0.01826   Median :0.03836   Median :0.011660  
#>  Mean   :0.01788   Mean   :0.02408   Mean   :0.04699   Mean   :0.024847  
#>  3rd Qu.:0.02238   3rd Qu.:0.02940   3rd Qu.:0.06080   3rd Qu.:0.025640  
#>  Max.   :0.07940   Max.   :0.13778   Max.   :0.16942   Max.   :0.314820  
#>       HNR             status            RPDE             DFA        
#>  Min.   : 8.441   Min.   :0.0000   Min.   :0.2566   Min.   :0.5743  
#>  1st Qu.:19.198   1st Qu.:1.0000   1st Qu.:0.4213   1st Qu.:0.6748  
#>  Median :22.085   Median :1.0000   Median :0.4960   Median :0.7223  
#>  Mean   :21.886   Mean   :0.7538   Mean   :0.4985   Mean   :0.7181  
#>  3rd Qu.:25.076   3rd Qu.:1.0000   3rd Qu.:0.5876   3rd Qu.:0.7619  
#>  Max.   :33.047   Max.   :1.0000   Max.   :0.6852   Max.   :0.8253  
#>     spread1          spread2               D2             PPE         
#>  Min.   :-7.965   Min.   :0.006274   Min.   :1.423   Min.   :0.04454  
#>  1st Qu.:-6.450   1st Qu.:0.174350   1st Qu.:2.099   1st Qu.:0.13745  
#>  Median :-5.721   Median :0.218885   Median :2.362   Median :0.19405  
#>  Mean   :-5.684   Mean   :0.226510   Mean   :2.382   Mean   :0.20655  
#>  3rd Qu.:-5.046   3rd Qu.:0.279234   3rd Qu.:2.636   3rd Qu.:0.25298  
#>  Max.   :-2.434   Max.   :0.450493   Max.   :3.671   Max.   :0.52737

Based on the summary above, we found that several columns have a potential of outliers, but it could be not because this is an output of speech analysis (which we don’t know what’s the reasonable range of data). So we don’t kick out any outliers because it may be useful in our dataset.

Data Wrangling and Cleaning

We will see the various columns correlation with each other and with our target variable status

ggcorr(parkinsons, hjust = 1, layout.exp = 3, label = TRUE, label_size = 2.5)

parkinsons_dataset <- parkinsons %>%
  select(-c(MDVP.Jitter..., MDVP.Jitter.Abs., MDVP.RAP, MDVP.PPQ, MDVP.Shimmer, MDVP.Shimmer.dB., Shimmer.APQ3, Shimmer.APQ5, MDVP.APQ, Jitter.DDP, Shimmer.DDA, spread1, DFA, HNR))

Remove columns with high correlation value with another column, threshold >0.7 / <-0.7 to reduce to potential of Multicolinearity of the Data.

ggcorr(parkinsons_dataset, hjust = 1, layout.exp = 3, label = TRUE, label_size = 2.5)

We see the proportion of the data.

hist(parkinsons_dataset$status)

We see that the data is not balance, we’ll deal with it later.

#change column status into factor and remove column name.
parkinsons_dataset <- parkinsons_dataset %>%
  mutate(status = as.factor(ifelse(status == 1, "parkinson", "healthy"))) %>%
  select(-name)

#check if there's missing value
colSums(is.na(parkinsons_dataset))

#>  MDVP.Fo.Hz. MDVP.Fhi.Hz. MDVP.Flo.Hz.          NHR       status         RPDE 
#>            0            0            0            0            0            0 
#>      spread2           D2          PPE 
#>            0            0            0

Data Splitting

Split Data into 75% for Data Train, 25% for Data Test

RNGkind(sample.kind = "Rounding")
set.seed(100)
library(rsample)

# Stratified Random Sampling

index_n <- initial_split(data = parkinsons_dataset, # dataset that will be split
                         prop = 0.75,  # data train proportion
                         strata = status) # target variable

parkinsons_train <- training(index_n) # training function from rsample
parkinsons_test <- testing(index_n)

prop.table(table(parkinsons_train$status))

#> 
#>   healthy parkinson 
#> 0.2465753 0.7534247

prop.table(table(parkinsons_test$status))

#> 
#>   healthy parkinson 
#>  0.244898  0.755102

Because our data is not balance, we will upsample the dataset for train. Why upsample? because in the dataset we found there are fewer sample of healthy people compared to people with Parkinson’s. So we don’t want any lost of information from the data.

# upsampling
RNGkind(sample.kind = "Rounding")
set.seed(100)
library(caret)

parkinsons_train_n <- upSample(x=parkinsons_train %>% select(-status), #select only predictors variable
                         y=parkinsons_train$status, # select only target variable
                         yname="status") # name of target variable column

prop.table(table(parkinsons_train_n$status))

#> 
#>   healthy parkinson 
#>       0.5       0.5

Create Dataset without Target variable, because simply some model doesn’t need that.

# create dataset without target variable
parkinsons_test_n <- parkinsons_test %>% select(-(status))

Modelling Data, Interpretation, and Performance.

Naive Bayes Model

Modelling

library(e1071)

#create model
naive_model<- naiveBayes(status ~ ., data = parkinsons_train_n)  

#create prediction
naive_pred <- predict(naive_model, newdata = parkinsons_test_n, type = "class") #for binary result
naive_prob <- predict(naive_model, newdata = parkinsons_test_n, type = "raw") #for probability result

#create table 
naive_conf <- table(naive_pred, parkinsons_test$status) #create table confusion matrix

#confusion matrix and interpretation of model

confusionMatrix(naive_conf)

#> Confusion Matrix and Statistics
#> 
#>            
#> naive_pred  healthy parkinson
#>   healthy         9        11
#>   parkinson       3        26
#>                                           
#>                Accuracy : 0.7143          
#>                  95% CI : (0.5674, 0.8342)
#>     No Information Rate : 0.7551          
#>     P-Value [Acc > NIR] : 0.79940         
#>                                           
#>                   Kappa : 0.3695          
#>                                           
#>  Mcnemar's Test P-Value : 0.06137         
#>                                           
#>             Sensitivity : 0.7500          
#>             Specificity : 0.7027          
#>          Pos Pred Value : 0.4500          
#>          Neg Pred Value : 0.8966          
#>              Prevalence : 0.2449          
#>          Detection Rate : 0.1837          
#>    Detection Prevalence : 0.4082          
#>       Balanced Accuracy : 0.7264          
#>                                           
#>        'Positive' Class : healthy         
#>

It seems our model is not very accurate enough, if we set our target to be 80% Accurate. It also has a very low precision performance (52% Positive Prediction Value)

Measuring Performance

# change pred into actual
parkinsons_test$actual <- ifelse(parkinsons_test$status=="parkinson", 1, 0)

library(ROCR)
# object prediction

naive_roc <- prediction(predictions = naive_prob[,1], # probability one of the class, in this case we choose "1" because it is Parkinson Patient
                        labels = parkinsons_test$actual) # actual label that is converted to "1" and "0"

# ROC curve

plot(performance(prediction.obj = naive_roc,  # prediction object naive bayes
            measure = "tpr", # axis y 
            x.measure = "fpr")) # axis x
abline(0,1, lty=2) # diagonal line

# AUC
naive_auc <- performance(prediction.obj = naive_roc,
                        measure = "auc")

naive_auc@y.values

#> [[1]]
#> [1] 0.1891892

Ups! Based on plot above, we can see that the model is extremely bad at predicting the correct answer, but it is a quite good binary classifier! So from this point, we will switch the positive label for healthy, which means the class we want to set as positive is healthy

Change Label `healthy` to “1”

# change pred into actual
parkinsons_test$actual <- ifelse(parkinsons_test$status=="healthy", 1, 0)

naive_roc <- prediction(predictions = naive_prob[,1], 
                        labels = parkinsons_test$actual)

plot(performance(prediction.obj = naive_roc,  
            measure = "tpr",  
            x.measure = "fpr")) 
abline(0,1, lty=2)

# AUC
naive_auc <- performance(prediction.obj = naive_roc,
                        measure = "auc")
naive_auc@y.values

#> [[1]]
#> [1] 0.8108108

We will try another model for comparation

Decision Tree

Modelling

library(partykit)

dt_model <- ctree(status~ ., 
                  data = parkinsons_train_n, 
                  control = ctree_control(mincriterion = 0.95)) #P-value (significant value) must be < 0.05 for a node to making a branch

plot(dt_model, type="simple")

#model interpetation
dt_model

#> 
#> Model formula:
#> status ~ MDVP.Fo.Hz. + MDVP.Fhi.Hz. + MDVP.Flo.Hz. + NHR + RPDE + 
#>     spread2 + D2 + PPE
#> 
#> Fitted party:
#> [1] root
#> |   [2] PPE <= 0.13387
#> |   |   [3] RPDE <= 0.39661
#> |   |   |   [4] MDVP.Fo.Hz. <= 186.163: parkinson (n = 9, err = 0.0%)
#> |   |   |   [5] MDVP.Fo.Hz. > 186.163: healthy (n = 24, err = 0.0%)
#> |   |   [6] RPDE > 0.39661: healthy (n = 61, err = 1.6%)
#> |   [7] PPE > 0.13387
#> |   |   [8] spread2 <= 0.21888: parkinson (n = 62, err = 41.9%)
#> |   |   [9] spread2 > 0.21888: parkinson (n = 64, err = 0.0%)
#> 
#> Number of inner nodes:    4
#> Number of terminal nodes: 5

#create prediction
dt_pred <- predict(dt_model, parkinsons_test_n, type = "response") #for binary result
dt_prob <- predict(dt_model, parkinsons_test_n, type = "prob") #for probability result

#create table
dt_conf <- table(dt_pred, parkinsons_test$status)

#create confusion matrix
confusionMatrix(dt_conf)

#> Confusion Matrix and Statistics
#> 
#>            
#> dt_pred     healthy parkinson
#>   healthy         7         2
#>   parkinson       5        35
#>                                           
#>                Accuracy : 0.8571          
#>                  95% CI : (0.7276, 0.9406)
#>     No Information Rate : 0.7551          
#>     P-Value [Acc > NIR] : 0.06174         
#>                                           
#>                   Kappa : 0.5781          
#>                                           
#>  Mcnemar's Test P-Value : 0.44969         
#>                                           
#>             Sensitivity : 0.5833          
#>             Specificity : 0.9459          
#>          Pos Pred Value : 0.7778          
#>          Neg Pred Value : 0.8750          
#>              Prevalence : 0.2449          
#>          Detection Rate : 0.1429          
#>    Detection Prevalence : 0.1837          
#>       Balanced Accuracy : 0.7646          
#>                                           
#>        'Positive' Class : healthy         
#>

It seems a model using Decision Tree can produce a better performance. We see that accuracy improved to 85%.

Measuring Performance

# object prediction
dt_roc <- prediction(predictions = dt_prob[,1], # probability from a positive class 'healthy'
                        labels = parkinsons_test$actual) # actual label that was converted to 0 and 1

# ROC curve
plot(performance(prediction.obj = dt_roc,  # ROC object prediction
            measure = "tpr", # axis y 
            x.measure = "fpr")) # axis x
abline(0,1, lty=2) # diagonal line

# AUC
dt_auc <- performance(prediction.obj = dt_roc,
                        measure = "auc")

dt_auc@y.values

#> [[1]]
#> [1] 0.8153153

It seems that the AUC is not very significantly improved with this model.

Random Forrest

Modelling

#create model random forrest 

set.seed(150)
ctrl <- trainControl(method="repeatedcv", number=5, repeats=3) 
# method = "repeatedcv"
# folds = 5, repeats = 3

rf_model <- train(status~ ., data=parkinsons_train_n, method="rf", trControl = ctrl)

saveRDS(rf_model, file = "rf_model.rds") #save model for lighter computation later

#upload model random forrest
rf_model <- readRDS("rf_model.rds")

#create prediction
rf_pred <- predict(rf_model, parkinsons_test_n, type = "raw") #for binary result
rf_prob <- predict(rf_model, parkinsons_test_n, type = "prob") #for probability result

rf_conf<- table(rf_pred, parkinsons_test$status)

confusionMatrix(rf_conf)

#> Confusion Matrix and Statistics
#> 
#>            
#> rf_pred     healthy parkinson
#>   healthy         9         3
#>   parkinson       3        34
#>                                           
#>                Accuracy : 0.8776          
#>                  95% CI : (0.7523, 0.9537)
#>     No Information Rate : 0.7551          
#>     P-Value [Acc > NIR] : 0.02761         
#>                                           
#>                   Kappa : 0.6689          
#>                                           
#>  Mcnemar's Test P-Value : 1.00000         
#>                                           
#>             Sensitivity : 0.7500          
#>             Specificity : 0.9189          
#>          Pos Pred Value : 0.7500          
#>          Neg Pred Value : 0.9189          
#>              Prevalence : 0.2449          
#>          Detection Rate : 0.1837          
#>    Detection Prevalence : 0.2449          
#>       Balanced Accuracy : 0.8345          
#>                                           
#>        'Positive' Class : healthy         
#>

Wow, this model can produce a beter accuracy significantly compared to other model. It has 87% Accuracy, above our target which is 80% Accuracy.

Measuring Performance

# object prediction
rf_roc <- prediction(predictions = rf_prob[,1], 
                        labels = parkinsons_test$actual)

# ROC curve
plot(performance(prediction.obj = rf_roc,  #ROC object prediction
            measure = "tpr", # axis y 
            x.measure = "fpr")) # axis x
abline(0,1, lty=2) # diagonal line

# AUC
rf_auc <- performance(prediction.obj = rf_roc,
                        measure = "auc")

rf_auc@y.values

#> [[1]]
#> [1] 0.9594595

Based on the plot above we can see that this model can produce better ROC plot and AUC value. Significantly bigger than the rest of the Model.

varImp(rf_model)

#> rf variable importance
#> 
#>              Overall
#> PPE          100.000
#> NHR           53.145
#> MDVP.Fo.Hz.   52.737
#> MDVP.Fhi.Hz.  28.459
#> spread2       23.590
#> MDVP.Flo.Hz.   5.487
#> D2             3.325
#> RPDE           0.000

Based on varImp function above, we can see that the PPE is the most significant predictors and the RDPE is the least significant one. The PPE itself is one of three nonlinear measures of fundamental frequency variation, which it can significantly separate healthy and parkinson patient. Don’t take it as one measure to predict because abnormal in speech could means many thing.

plot(rf_model$finalModel)
legend("topright", colnames(rf_model$finalModel$err.rate),
       col=1:6,cex=0.8,fill=1:6)

Evaluate the Random Forrest Model, we can see that the trees not improving significantly after 100 number of trees, it also shows that the best trees value is 300 which produces smallest error rate on the model.

Conclusion

Because we make this model using healthy as a positive class, it means that the model could predict which data is categorized as healthy as a positive result. Means that this model is not made to predict parkinsons but as a test if someone has a potential to have parkinson or not. So if someone is tested and passes this test, it means that he’s healthy, and if he’s not then they need to take another procedure to look further.

The best result we can get is from Random Forest Model, it can be a better model because it is made by several more iterations compared with Naive Bayes and Decision Tree. So in this case it will create a better model compared to the other two.

The best accuracy of the model we made above is 87% , which means it has a potential of 13% wrong!. That’s why in the Medical Section, when we try to diagnose someone’s disease we use several tests because when we use several tests it significantly increases our chances of predicting right!.

Take a look at HOUSE MD SERIES , He never test the patient with only one method of test, and He always giving a doubt to the result of the test (because the chance of the result is wrong is also high when the procedure of doing the test is not done right)

Learning By Building - Parkinson’s Classification

Muhammad Irfani

November 24, 2022

Intro

About Parkinson

About the Dataset

Explaratory Data Analysis

Load Library

Load Dataset

Data Wrangling and Cleaning

Data Splitting

Modelling Data, Interpretation, and Performance.

Naive Bayes Model

Modelling

Measuring Performance

Change Label `healthy` to “1”

Decision Tree

Modelling

Measuring Performance

Random Forrest

Modelling

Measuring Performance

Conclusion

Reference

Learning By Building - Parkinson’s Classification

Muhammad Irfani

November 24, 2022

Intro

About Parkinson

About the Dataset

Explaratory Data Analysis

Load Library

Load Dataset

Data Wrangling and Cleaning

Data Splitting

Modelling Data, Interpretation, and Performance.

Naive Bayes Model

Modelling

Measuring Performance

Change Label healthy to “1”

Decision Tree

Modelling

Measuring Performance

Random Forrest

Modelling

Measuring Performance

Conclusion

Reference

Change Label `healthy` to “1”