Of all the applications of machine-learning, diagnosing any serious disease using a black box is always going to be a hard sell. If the output from a model is the particular course of treatment (potentially with side-effects), or surgery, or the absence of treatment, people are going to want to know why.
This dataset gives a number of variables along with a target condition of having or not having heart disease.
heartDf<-read.csv("heart.csv")
summary(heartDf)
#> age sex cp trestbps
#> Min. :29.00 Min. :0.0000 Min. :0.000 Min. : 94.0
#> 1st Qu.:47.50 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:120.0
#> Median :55.00 Median :1.0000 Median :1.000 Median :130.0
#> Mean :54.37 Mean :0.6832 Mean :0.967 Mean :131.6
#> 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:140.0
#> Max. :77.00 Max. :1.0000 Max. :3.000 Max. :200.0
#> chol fbs restecg thalach
#> Min. :126.0 Min. :0.0000 Min. :0.0000 Min. : 71.0
#> 1st Qu.:211.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:133.5
#> Median :240.0 Median :0.0000 Median :1.0000 Median :153.0
#> Mean :246.3 Mean :0.1485 Mean :0.5281 Mean :149.6
#> 3rd Qu.:274.5 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:166.0
#> Max. :564.0 Max. :1.0000 Max. :2.0000 Max. :202.0
#> exang oldpeak slope ca
#> Min. :0.0000 Min. :0.00 Min. :0.000 Min. :0.0000
#> 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.0000
#> Median :0.0000 Median :0.80 Median :1.000 Median :0.0000
#> Mean :0.3267 Mean :1.04 Mean :1.399 Mean :0.7294
#> 3rd Qu.:1.0000 3rd Qu.:1.60 3rd Qu.:2.000 3rd Qu.:1.0000
#> Max. :1.0000 Max. :6.20 Max. :2.000 Max. :4.0000
#> thal target
#> Min. :0.000 Min. :0.0000
#> 1st Qu.:2.000 1st Qu.:0.0000
#> Median :2.000 Median :1.0000
#> Mean :2.314 Mean :0.5446
#> 3rd Qu.:3.000 3rd Qu.:1.0000
#> Max. :3.000 Max. :1.0000Meaning of column names:
Let us try to understand how each variable affects diagnosis of heart disease first and then try to solve the case:
Looking at information of heart disease risk factors led me to the following: high cholesterol, high blood pressure, diabetes, weight, family history and smoking 3. According to another source 4, the major factors that can’t be changed are: increasing age, male gender and heredity. Note that thalassemia, one of the variables in this dataset, is heredity. Major factors that can be modified are: Smoking, high cholesterol, high blood pressure, physical inactivity, and being overweight and having diabetes. Other factors include stress, alcohol and poor diet/nutrition.
I can see no reference to the ‘number of major vessels’, but given that the definition of heart disease is “…what happens when your heart’s blood supply is blocked or interrupted by a build-up of fatty substances in the coronary arteries”, it seems logical the more major vessels is a good thing, and therefore will reduce the probability of heart disease.
heartDf$sex[heartDf$sex == 0]<-"F"
heartDf$sex[heartDf$sex == 1]<-"M"
heartDf$chest_pain_type[heartDf$chest_pain_type == 0] <- NA
heartDf$chest_pain_type[heartDf$chest_pain_type == 1] <- 'Typical Angina'
heartDf$chest_pain_type[heartDf$chest_pain_type == 2] <- 'Atypical Angina'
heartDf$chest_pain_type[heartDf$chest_pain_type == 3] <- 'Non-Anginal Pain'
heartDf$chest_pain_type[heartDf$chest_pain_type == 4] <- 'Asymptomatic'
heartDf$fasting_blood_sugar[heartDf$fasting_blood_sugar == 0] <- 'Lower than 120mg/ml'
heartDf$fasting_blood_sugar[heartDf$fasting_blood_sugar == 1] <- 'Greater than 120mg/ml'
heartDf$rest_ecg[heartDf$rest_ecg == 0] <- 'Normal'
heartDf$rest_ecg[heartDf$rest_ecg == 1] <- 'ST-T wave abnormality'
heartDf$rest_ecg[heartDf$rest_ecg == 2] <- 'Left Ventricular Hypertrophy'
heartDf$exercise_induced_angina[heartDf$exercise_induced_angina == 0] <- 'No'
heartDf$exercise_induced_angina[heartDf$exercise_induced_angina == 1] <- 'Yes'
heartDf$st_slope[heartDf$st_slope == 0] <- NA
heartDf$st_slope[heartDf$st_slope == 1] <- 'Upsloping'
heartDf$st_slope[heartDf$st_slope == 2] <- 'Flat'
heartDf$st_slope[heartDf$st_slope == 3] <- 'Downsloping'
heartDf$thalassemia[heartDf$thalassemia == 0] <- NA
heartDf$thalassemia[heartDf$thalassemia == 1] <- 'Normal'
heartDf$thalassemia[heartDf$thalassemia == 2] <- 'Fixed Defect'
heartDf$thalassemia[heartDf$thalassemia == 3] <- 'Reversable Defect'
head(heartDf)
#> age sex chest_pain_type resting_blood_pressure cholesterol
#> 1 63 M Non-Anginal Pain 145 233
#> 2 37 M Atypical Angina 130 250
#> 3 41 F Typical Angina 130 204
#> 4 56 M Typical Angina 120 236
#> 5 57 F <NA> 120 354
#> 6 57 M <NA> 140 192
#> fasting_blood_sugar rest_ecg max_heart_rate_achieved
#> 1 Greater than 120mg/ml Normal 150
#> 2 Lower than 120mg/ml ST-T wave abnormality 187
#> 3 Lower than 120mg/ml Normal 172
#> 4 Lower than 120mg/ml ST-T wave abnormality 178
#> 5 Lower than 120mg/ml ST-T wave abnormality 163
#> 6 Lower than 120mg/ml ST-T wave abnormality 148
#> exercise_induced_angina st_depression st_slope num_major_vessels
#> 1 No 2.3 <NA> 0
#> 2 No 3.5 <NA> 0
#> 3 No 1.4 Flat 0
#> 4 No 0.8 Flat 0
#> 5 Yes 0.6 Flat 0
#> 6 No 0.4 Upsloping 0
#> thalassemia target
#> 1 Normal 1
#> 2 Fixed Defect 1
#> 3 Fixed Defect 1
#> 4 Fixed Defect 1
#> 5 Fixed Defect 1
#> 6 Normal 1Our aim is to attempt and distinguish the presence of heart disease (values 1,2,3,4) from absence of heart disease (value 0). Therefore, we replace all labels greater than 1 by 1.
heartDf$target[heartDf$target > 1] <- 1
summary(heartDf)
#> age sex chest_pain_type resting_blood_pressure
#> Min. :29.00 Length:149 Length:149 Min. : 94
#> 1st Qu.:45.00 Class :character Class :character 1st Qu.:120
#> Median :54.00 Mode :character Mode :character Median :130
#> Mean :53.23 Mean :131
#> 3rd Qu.:60.00 3rd Qu.:140
#> Max. :76.00 Max. :192
#> cholesterol fasting_blood_sugar rest_ecg max_heart_rate_achieved
#> Min. :126.0 Length:149 Length:149 Min. : 96
#> 1st Qu.:211.0 Class :character Class :character 1st Qu.:149
#> Median :235.0 Mode :character Mode :character Median :162
#> Mean :243.6 Mean :158
#> 3rd Qu.:269.0 3rd Qu.:172
#> Max. :564.0 Max. :202
#> exercise_induced_angina st_depression st_slope num_major_vessels
#> Length:149 Min. :0.0000 Length:149 Min. :0.0000
#> Class :character 1st Qu.:0.0000 Class :character 1st Qu.:0.0000
#> Mode :character Median :0.2000 Mode :character Median :0.0000
#> Mean :0.6691 Mean :0.5503
#> 3rd Qu.:1.2000 3rd Qu.:1.0000
#> Max. :3.8000 Max. :4.0000
#> thalassemia target
#> Length:149 Min. :0.0000
#> Class :character 1st Qu.:1.0000
#> Mode :character Median :1.0000
#> Mean :0.7785
#> 3rd Qu.:1.0000
#> Max. :1.0000
sapply(heartDf,class)
#> age sex chest_pain_type
#> "integer" "character" "character"
#> resting_blood_pressure cholesterol fasting_blood_sugar
#> "integer" "integer" "character"
#> rest_ecg max_heart_rate_achieved exercise_induced_angina
#> "character" "integer" "character"
#> st_depression st_slope num_major_vessels
#> "numeric" "character" "integer"
#> thalassemia target
#> "character" "numeric"In R, a categorical variable (a variable that takes on a finite amount of values) is a factor. As we can see, sex is incorrectly treated as a number when in reality it can only be 1 if male and 0 if female. We can use the transform method to change the in built type of each feature.
heartDfTrans <- transform(
heartDf,
age=as.integer(age),
sex=as.factor(sex),
chest_pain_type=as.factor(chest_pain_type),
resting_blood_pressure=as.integer(resting_blood_pressure),
cholesterol=as.integer(cholesterol),
fasting_blood_sugar=as.factor(fasting_blood_sugar),
rest_ecg=as.factor(rest_ecg),
max_heart_rate_achieved=as.integer(max_heart_rate_achieved),
exercise_induced_angina=as.factor(exercise_induced_angina),
st_depression=as.numeric(st_depression),
st_slope=as.factor(st_slope),
num_major_vessels=as.factor(num_major_vessels),
thalassemia=as.factor(thalassemia),
target=as.factor(target)
)sapply(heartDfTrans, class)
#> age sex chest_pain_type
#> "integer" "factor" "factor"
#> resting_blood_pressure cholesterol fasting_blood_sugar
#> "integer" "integer" "factor"
#> rest_ecg max_heart_rate_achieved exercise_induced_angina
#> "factor" "integer" "factor"
#> st_depression st_slope num_major_vessels
#> "numeric" "factor" "factor"
#> thalassemia target
#> "factor" "factor"sapply(heartDfTrans, typeof)
#> age sex chest_pain_type
#> "integer" "integer" "integer"
#> resting_blood_pressure cholesterol fasting_blood_sugar
#> "integer" "integer" "integer"
#> rest_ecg max_heart_rate_achieved exercise_induced_angina
#> "integer" "integer" "integer"
#> st_depression st_slope num_major_vessels
#> "double" "integer" "integer"
#> thalassemia target
#> "integer" "integer"summary(heartDfTrans)
#> age sex chest_pain_type resting_blood_pressure
#> Min. :29.00 F:54 Atypical Angina :81 Min. : 94
#> 1st Qu.:45.00 M:95 Non-Anginal Pain:20 1st Qu.:120
#> Median :54.00 Typical Angina :48 Median :130
#> Mean :53.23 Mean :131
#> 3rd Qu.:60.00 3rd Qu.:140
#> Max. :76.00 Max. :192
#> cholesterol fasting_blood_sugar
#> Min. :126.0 Greater than 120mg/ml: 24
#> 1st Qu.:211.0 Lower than 120mg/ml :125
#> Median :235.0
#> Mean :243.6
#> 3rd Qu.:269.0
#> Max. :564.0
#> rest_ecg max_heart_rate_achieved
#> Left Ventricular Hypertrophy: 1 Min. : 96
#> Normal :63 1st Qu.:149
#> ST-T wave abnormality :85 Median :162
#> Mean :158
#> 3rd Qu.:172
#> Max. :202
#> exercise_induced_angina st_depression st_slope num_major_vessels
#> No :131 Min. :0.0000 Flat :93 0:100
#> Yes: 18 1st Qu.:0.0000 Upsloping:56 1: 30
#> Median :0.2000 2: 9
#> Mean :0.6691 3: 6
#> 3rd Qu.:1.2000 4: 4
#> Max. :3.8000
#> thalassemia target
#> Fixed Defect :108 0: 33
#> Normal : 5 1:116
#> Reversable Defect: 36
#>
#>
#> What if the training data has a bias, the entire model can have the bias carry forwarded. To avoid this it is really important for us that we identify the bias and figure out the training data accordingly.
Here the bias is very less (delta of <30 observations). However, its good to split data on the basis of target variables equally distributed.
# Get subset of dataframe with all the 1's
heartDfTransOnes<-subset(heartDfTrans,heartDfTrans$target==1)
dim(heartDfTransOnes)
#> [1] 116 14
# Get subset of dataframe with all the 0's
heartDfTransZeros<-subset(heartDfTrans,heartDfTrans$target==0)
dim(heartDfTransZeros)
#> [1] 33 14
#Seed is a simple atomic integer vector, the first element of which specifies the kind of normal generator
set.seed(100)
heartDfTransOnesTrainingSet<-sample(1:nrow(heartDfTransOnes),0.7*nrow(heartDfTransOnes))
heartDfTransZerosTrainingSet<-sample(1:nrow(heartDfTransZeros),0.7*nrow(heartDfTransZeros))
trainingDataOnes<-heartDfTransOnes[heartDfTransOnesTrainingSet,]
dim(trainingDataOnes)
#> [1] 81 14
trainingDataZeros<-heartDfTransZeros[heartDfTransZerosTrainingSet,]
dim(trainingDataZeros)
#> [1] 23 14
trainingData<-rbind(trainingDataOnes,trainingDataZeros)
dim(trainingData)
#> [1] 104 14
testDataOnes<-heartDfTransOnes[-heartDfTransOnesTrainingSet,]
dim(testDataOnes)
#> [1] 35 14
testDataZeros<-heartDfTransZeros[-heartDfTransZerosTrainingSet,]
dim(testDataZeros)
#> [1] 10 14
testData<-rbind(testDataOnes,testDataZeros)
dim(testData)
#> [1] 45 14
#We now have exactly divided the training data and test data into 70% & 30% respectivelyOne more important point is regarding the error has more levels. To avoid this make sure of three things: - The levels in R dataframe test & training dataset are exactly the same. - Make sure that factor data has no missing values.
We have used glm() function with binomial option to implement a logistic regression function. Post then once the model characteristics are captured in the predictor variable then we use the predict function to derive the log(odds) of the Y variable, in our case the variable name is target. But this will be a logarithmic variable however we wish to have values as between 0 and 1. So, to convert it into prediction probability scores that is bound between 0 and 1, we use the plogis()
library(InformationValue)
logisticRegressionModel<-glm(target ~ age+
sex+
chest_pain_type+
resting_blood_pressure+
cholesterol+
fasting_blood_sugar+
rest_ecg+
max_heart_rate_achieved+
exercise_induced_angina+
st_depression+
st_slope+
num_major_vessels+
thalassemia, data=trainingData, family=binomial(link="logit"))
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
sapply(trainingData, levels)
#> $age
#> NULL
#>
#> $sex
#> [1] "F" "M"
#>
#> $chest_pain_type
#> [1] "Atypical Angina" "Non-Anginal Pain" "Typical Angina"
#>
#> $resting_blood_pressure
#> NULL
#>
#> $cholesterol
#> NULL
#>
#> $fasting_blood_sugar
#> [1] "Greater than 120mg/ml" "Lower than 120mg/ml"
#>
#> $rest_ecg
#> [1] "Left Ventricular Hypertrophy" "Normal"
#> [3] "ST-T wave abnormality"
#>
#> $max_heart_rate_achieved
#> NULL
#>
#> $exercise_induced_angina
#> [1] "No" "Yes"
#>
#> $st_depression
#> NULL
#>
#> $st_slope
#> [1] "Flat" "Upsloping"
#>
#> $num_major_vessels
#> [1] "0" "1" "2" "3" "4"
#>
#> $thalassemia
#> [1] "Fixed Defect" "Normal" "Reversable Defect"
#>
#> $target
#> [1] "0" "1"
sapply(testData, levels)
#> $age
#> NULL
#>
#> $sex
#> [1] "F" "M"
#>
#> $chest_pain_type
#> [1] "Atypical Angina" "Non-Anginal Pain" "Typical Angina"
#>
#> $resting_blood_pressure
#> NULL
#>
#> $cholesterol
#> NULL
#>
#> $fasting_blood_sugar
#> [1] "Greater than 120mg/ml" "Lower than 120mg/ml"
#>
#> $rest_ecg
#> [1] "Left Ventricular Hypertrophy" "Normal"
#> [3] "ST-T wave abnormality"
#>
#> $max_heart_rate_achieved
#> NULL
#>
#> $exercise_induced_angina
#> [1] "No" "Yes"
#>
#> $st_depression
#> NULL
#>
#> $st_slope
#> [1] "Flat" "Upsloping"
#>
#> $num_major_vessels
#> [1] "0" "1" "2" "3" "4"
#>
#> $thalassemia
#> [1] "Fixed Defect" "Normal" "Reversable Defect"
#>
#> $target
#> [1] "0" "1"The default cutoff prediction probability score is 0.5 or the ratio of 1’s and 0’s in the training data. But sometimes, tuning the probability cutoff can improve the accuracy in both the development and validation samples. The InformationValue::optimalCutoff function provides ways to find the optimal cutoff to improve the prediction of 1’s, 0’s, both 1’s and 0’s and o reduce the misclassification error. Lets compute the optimal score that minimizes the misclassification error for the above model.
Sensitivity measures how often a test correctly generates a positive result for people who have the condition that’s being tested for (also known as the “true positive” rate). A test that’s highly sensitive will flag almost everyone who has the disease and not generate many false-negative results. (Example: a test with 90% sensitivity will correctly return a positive result for 90% of people who have the disease, but will return a negative result — a false-negative — for 10% of the people who have the disease and should have tested positive.)
Specificity measures a test’s ability to correctly generate a negative result for people who don’t have the condition that’s being tested for (also known as the “true negative” rate). A high-specificity test will correctly rule out almost everyone who doesn’t have the disease and won’t generate many false-positive results. (Example: a test with 90% specificity will correctly return a negative result for 90% of people who don’t have the disease, but will return a positive result — a false-positive — for 10% of the people who don’t have the disease and should have tested negative.)
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate against the false positive rate at various threshold settings.
summary(logisticRegressionModel)
#>
#> Call:
#> glm(formula = target ~ age + sex + chest_pain_type + resting_blood_pressure +
#> cholesterol + fasting_blood_sugar + rest_ecg + max_heart_rate_achieved +
#> exercise_induced_angina + st_depression + st_slope + num_major_vessels +
#> thalassemia, family = binomial(link = "logit"), data = trainingData)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -4.0765 0.0003 0.0329 0.2864 0.9101
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.933e+01 6.523e+03 0.003 0.99764
#> age 4.529e-02 8.402e-02 0.539 0.58987
#> sexM -4.799e+00 1.972e+00 -2.433 0.01496
#> chest_pain_typeNon-Anginal Pain 3.819e+00 1.820e+00 2.099 0.03585
#> chest_pain_typeTypical Angina -4.660e+00 1.765e+00 -2.639 0.00830
#> resting_blood_pressure -4.304e-02 3.573e-02 -1.205 0.22833
#> cholesterol -5.212e-03 1.087e-02 -0.479 0.63168
#> fasting_blood_sugarLower than 120mg/ml -1.320e+00 1.417e+00 -0.931 0.35167
#> rest_ecgNormal -1.443e+01 6.523e+03 -0.002 0.99823
#> rest_ecgST-T wave abnormality -1.583e+01 6.523e+03 -0.002 0.99806
#> max_heart_rate_achieved 9.684e-02 4.477e-02 2.163 0.03053
#> exercise_induced_anginaYes 1.916e+00 1.501e+00 1.277 0.20154
#> st_depression -2.684e+00 8.902e-01 -3.015 0.00257
#> st_slopeUpsloping -4.112e+00 1.670e+00 -2.462 0.01383
#> num_major_vessels1 -6.260e+00 2.112e+00 -2.963 0.00304
#> num_major_vessels2 -9.741e+00 3.222e+00 -3.023 0.00250
#> num_major_vessels3 -8.785e+00 5.097e+00 -1.724 0.08479
#> num_major_vessels4 2.228e+01 2.644e+03 0.008 0.99328
#> thalassemiaNormal 6.059e+00 3.692e+00 1.641 0.10078
#> thalassemiaReversable Defect -3.096e+00 1.457e+00 -2.125 0.03357
#>
#> (Intercept)
#> age
#> sexM *
#> chest_pain_typeNon-Anginal Pain *
#> chest_pain_typeTypical Angina **
#> resting_blood_pressure
#> cholesterol
#> fasting_blood_sugarLower than 120mg/ml
#> rest_ecgNormal
#> rest_ecgST-T wave abnormality
#> max_heart_rate_achieved *
#> exercise_induced_anginaYes
#> st_depression **
#> st_slopeUpsloping *
#> num_major_vessels1 **
#> num_major_vessels2 **
#> num_major_vessels3 .
#> num_major_vessels4
#> thalassemiaNormal
#> thalassemiaReversable Defect *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 109.900 on 103 degrees of freedom
#> Residual deviance: 34.318 on 84 degrees of freedom
#> AIC: 74.318
#>
#> Number of Fisher Scoring iterations: 17
plotROC(testData$target, predicted)Concordance(testData$target, predicted)
#> $Concordance
#> [1] 0.7571429
#>
#> $Discordance
#> [1] 0.2428571
#>
#> $Tied
#> [1] 2.775558e-17
#>
#> $Pairs
#> [1] 350
sensitivity(testData$target, predicted, threshold = optCutOff)
#> [1] 0.9428571
specificity(testData$target, predicted, threshold = optCutOff)
#> [1] 0.3
confusionMatrix(testData$target, predicted, threshold = optCutOff)
#> 0 1
#> 0 3 2
#> 1 7 33
head(testData)
#> age sex chest_pain_type resting_blood_pressure cholesterol
#> 3 41 F Typical Angina 130 204
#> 10 57 M Atypical Angina 150 168
#> 13 49 M Typical Angina 130 266
#> 15 58 F Non-Anginal Pain 150 283
#> 16 50 F Atypical Angina 120 219
#> 20 69 F Non-Anginal Pain 140 239
#> fasting_blood_sugar rest_ecg max_heart_rate_achieved
#> 3 Lower than 120mg/ml Normal 172
#> 10 Lower than 120mg/ml ST-T wave abnormality 174
#> 13 Lower than 120mg/ml ST-T wave abnormality 171
#> 15 Greater than 120mg/ml Normal 162
#> 16 Lower than 120mg/ml ST-T wave abnormality 158
#> 20 Lower than 120mg/ml ST-T wave abnormality 151
#> exercise_induced_angina st_depression st_slope num_major_vessels
#> 3 No 1.4 Flat 0
#> 10 No 1.6 Flat 0
#> 13 No 0.6 Flat 0
#> 15 No 1.0 Flat 0
#> 16 No 1.6 Upsloping 0
#> 20 No 1.8 Flat 2
#> thalassemia target
#> 3 Fixed Defect 1
#> 10 Fixed Defect 1
#> 13 Fixed Defect 1
#> 15 Fixed Defect 1
#> 16 Fixed Defect 1
#> 20 Fixed Defect 1
colnames(testData)
#> [1] "age" "sex"
#> [3] "chest_pain_type" "resting_blood_pressure"
#> [5] "cholesterol" "fasting_blood_sugar"
#> [7] "rest_ecg" "max_heart_rate_achieved"
#> [9] "exercise_induced_angina" "st_depression"
#> [11] "st_slope" "num_major_vessels"
#> [13] "thalassemia" "target"
predicted_case <- plogis(predict(logisticRegressionModel, data.frame(
age=as.integer(30),
sex=as.factor('F'),
chest_pain_type=as.factor('Atypical Angina'),
resting_blood_pressure=as.integer(100),
cholesterol=as.integer(30),
fasting_blood_sugar=as.factor('Lower than 120mg/ml'),
rest_ecg=as.factor('Normal'),
max_heart_rate_achieved=as.integer(120),
exercise_induced_angina=as.factor('No'),
st_depression=as.numeric(1.2),
st_slope=as.factor('Flat'),
num_major_vessels=as.factor('0'),
thalassemia=as.factor('Fixed Defect')
)))
sprintf("The chances of patient being diagnosed with a heart disease is %0.2f %%", predicted_case*100)
#> [1] "The chances of patient being diagnosed with a heart disease is 99.99 %"… To be continued