Heart Disease Patient Classification Analysis

1. Background

Heart disease is a prevalent health concern worldwide, contributing significantly to mortality rates. Understanding and predicting the likelihood of heart disease in patients is crucial for early intervention and effective treatment. In this analysis, we will focus on examining patient data from a hospital database, particularly those diagnosed with heart disease. By leveraging advanced analytical techniques, such as logistic regression and k-nearest neighbor (KNN), which are supervised learning algorithms, we aim to develop predictive models to classify patients into those likely to have heart disease and those who are not

2. Objective

The objective of this analysis is to develop predictive models for heart disease detection based on a comprehensive set of patient attributes collected during hospital admissions. Leveraging important variables such as age, sex, chest pain type, blood pressure, cholesterol levels, fasting blood sugar, electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise, and other relevant factors, we aim to create robust models capable of accurately predicting the likelihood of heart disease in patients. By utilizing logistic regression and k-nearest neighbor algorithms, both widely used in supervised learning, we seek to provide healthcare professionals with valuable tools for early detection and intervention in heart disease cases. The ultimate goal is to enhance patient care and outcomes by enabling timely identification and management of individuals at risk of heart disease.

3. Library

Here are several libraries that will be used in the analysis:

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(gtools)

## Warning: package 'gtools' was built under R version 4.3.2

library(gmodels)

## Warning: package 'gmodels' was built under R version 4.3.3

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.2

library(class)
library(tidyr)

4. Logistic Regression Analysis

4.1 Data Import

In this section, data will be imported and column descriptions will be provided

heart_disease <- read.csv("data_input/heart.csv")
glimpse (heart_disease)

## Rows: 303
## Columns: 14
## $ age      <int> 63, 37, 41, 56, 57, 57, 56, 44, 52, 57, 54, 48, 49, 64, 58, 5…
## $ sex      <int> 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1…
## $ cp       <int> 3, 2, 1, 1, 0, 0, 1, 1, 2, 2, 0, 2, 1, 3, 3, 2, 2, 3, 0, 3, 0…
## $ trestbps <int> 145, 130, 130, 120, 120, 140, 140, 120, 172, 150, 140, 130, 1…
## $ chol     <int> 233, 250, 204, 236, 354, 192, 294, 263, 199, 168, 239, 275, 2…
## $ fbs      <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ restecg  <int> 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1…
## $ thalach  <int> 150, 187, 172, 178, 163, 148, 153, 173, 162, 174, 160, 139, 1…
## $ exang    <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ oldpeak  <dbl> 2.3, 3.5, 1.4, 0.8, 0.6, 0.4, 1.3, 0.0, 0.5, 1.6, 1.2, 0.2, 0…
## $ slope    <int> 0, 0, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 0, 2, 2, 1…
## $ ca       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0…
## $ thal     <int> 1, 2, 2, 2, 2, 1, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3…
## $ target   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

4.2 EDA and Data Wrangling

Data type adjustment is needed as there are still some variables with inappropriate data types, followed by checking for missing values.

heart_disease <- heart_disease %>% 
  mutate_if(is.integer, as.factor) %>% 
  mutate(sex = factor(sex, levels = c(0,1), labels = c("Female", "Male")),
         fbs =factor(fbs, levels = c(0,1), labels = c("False", "True")),
         exang = factor(exang, levels = c(0,1), labels = c("No", "Yes")),
         target = factor(target, levels = c(0,1), 
                        labels = c("Health", "Not Health")))
glimpse(heart_disease)

## Rows: 303
## Columns: 14
## $ age      <fct> 63, 37, 41, 56, 57, 57, 56, 44, 52, 57, 54, 48, 49, 64, 58, 5…
## $ sex      <fct> Male, Male, Female, Male, Female, Male, Female, Male, Male, M…
## $ cp       <fct> 3, 2, 1, 1, 0, 0, 1, 1, 2, 2, 0, 2, 1, 3, 3, 2, 2, 3, 0, 3, 0…
## $ trestbps <fct> 145, 130, 130, 120, 120, 140, 140, 120, 172, 150, 140, 130, 1…
## $ chol     <fct> 233, 250, 204, 236, 354, 192, 294, 263, 199, 168, 239, 275, 2…
## $ fbs      <fct> True, False, False, False, False, False, False, False, True, …
## $ restecg  <fct> 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1…
## $ thalach  <fct> 150, 187, 172, 178, 163, 148, 153, 173, 162, 174, 160, 139, 1…
## $ exang    <fct> No, No, No, No, Yes, No, No, No, No, No, No, No, No, Yes, No,…
## $ oldpeak  <dbl> 2.3, 3.5, 1.4, 0.8, 0.6, 0.4, 1.3, 0.0, 0.5, 1.6, 1.2, 0.2, 0…
## $ slope    <fct> 0, 0, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 0, 2, 2, 1…
## $ ca       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0…
## $ thal     <fct> 1, 2, 2, 2, 2, 1, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3…
## $ target   <fct> Not Health, Not Health, Not Health, Not Health, Not Health, N…

summary(heart_disease)

##       age          sex      cp         trestbps        chol        fbs     
##  58     : 19   Female: 96   0:143   120    : 37   197    :  6   False:258  
##  57     : 17   Male  :207   1: 50   130    : 36   204    :  6   True : 45  
##  54     : 16                2: 87   140    : 32   234    :  6              
##  59     : 14                3: 23   110    : 19   212    :  5              
##  52     : 13                        150    : 17   254    :  5              
##  51     : 12                        138    : 13   269    :  5              
##  (Other):212                        (Other):149   (Other):270              
##  restecg    thalach    exang        oldpeak     slope   ca      thal   
##  0:147   162    : 11   No :204   Min.   :0.00   0: 21   0:175   0:  2  
##  1:152   160    :  9   Yes: 99   1st Qu.:0.00   1:140   1: 65   1: 18  
##  2:  4   163    :  9             Median :0.80   2:142   2: 38   2:166  
##          152    :  8             Mean   :1.04           3: 20   3:117  
##          173    :  8             3rd Qu.:1.60           4:  5          
##          125    :  7             Max.   :6.20                          
##          (Other):251                                                   
##         target   
##  Health    :138  
##  Not Health:165  
##                  
##                  
##                  
##                  
##

some insights :

Age Distribution: The age distribution shows that the dataset covers a wide range of ages, with the most frequent age group being 58 years old. This suggests that the dataset includes patients across various age groups, providing a comprehensive view of heart disease across different age demographics.
Gender Representation: There is a noticeable gender imbalance in the dataset, with roughly twice as many male entries as female entries. This gender skew highlights the need for gender-specific analysis and interventions in cardiovascular health.
Chest Pain Types: The distribution of chest pain types indicates that most patients present with type 0 chest pain, followed by types 2 and 1. This insight can help healthcare professionals prioritize chest pain assessments and treatments based on the type and severity of pain reported by patients.
Blood Pressure and Cholesterol Levels: The diverse range of blood pressure and cholesterol levels underscores the variability in cardiovascular health among patients. Understanding these distributions can assist in identifying risk factors and developing personalized treatment plans for individuals with elevated blood pressure or cholesterol levels.
Fasting Blood Sugar Levels: The majority of patients have fasting blood sugar levels below 120 mg/dl, indicating relatively normal glucose metabolism. However, a subset of patients has elevated fasting blood sugar levels, suggesting potential comorbidities such as diabetes mellitus, which can impact cardiovascular health.
Resting Electrocardiographic Results: The distribution of restecg types highlights the prevalence of specific electrocardiographic abnormalities among patients. This information can guide clinicians in interpreting electrocardiograms and diagnosing cardiac conditions based on characteristic ECG patterns.
Exercise-Induced Angina: The presence or absence of exercise-induced angina provides insights into the cardiovascular response to physical exertion. Patients with exercise-induced angina may require closer monitoring and tailored exercise regimens to manage their symptoms and prevent adverse events.
ST Depression and Slope of Peak Exercise ST Segment: The distribution of ST depression values and slope types reflects the extent of myocardial ischemia and the response of the heart to exercise stress. These parameters are crucial for assessing the severity of coronary artery disease and guiding decisions regarding further diagnostic testing and treatment.
Number of Major Vessels Colored by Fluoroscopy: The distribution of major vessels colored by fluoroscopy indicates the extent of coronary artery involvement and the presence of obstructive lesions. This information is valuable for risk stratification and determining the need for revascularization procedures such as angioplasty or bypass surgery.
Thalassemia Types: The prevalence of different thalassemia types highlights the association between genetic factors and cardiovascular disease. Understanding the distribution of thalassemia types can inform genetic counseling and screening programs aimed at identifying individuals at risk of cardiac complications.
Target Variable: The distribution of healthy and unhealthy status among patients underscores the prevalence of heart disease in the population. This insight emphasizes the importance of preventive measures, early detection, and effective management strategies to reduce the burden of cardiovascular morbidity and mortality.

4.2.1 Check Missing Value

Missing value check was performed on the dataset, however, prior to that, it is necessary to check the proportion of the target variable present in the target column. If the proportions of both classes are fairly balanced, we may not require additional pre-processing to balance the proportions between the two target variable classes.

# Data proportion check
prop.table(table(heart_disease$target))

## 
##     Health Not Health 
##  0.4554455  0.5445545

table(heart_disease$target)

## 
##     Health Not Health 
##        138        165

*** From the results, it is evident that the data proportions are sufficiently balanced, hence there is no need for additional preprocessing.

#Missing Value Checking
colSums(is.na(heart_disease))

##      age      sex       cp trestbps     chol      fbs  restecg  thalach 
##        0        0        0        0        0        0        0        0 
##    exang  oldpeak    slope       ca     thal   target 
##        0        0        0        0        0        0

There is no missing value in the dataset

4.2 Visual Basic

# shows diagram for variable Sex
sex_counts <- table(heart_disease$sex)

# Create a bar plot
barplot(sex_counts, 
        main = "Distribution of Sex in the Dataset", 
        xlab = "Sex (0 = Female, 1 = Male)", 
        ylab = "Frequency")

5. Modelling

5.1 Cross Validation

set.seed(303)

# index sampling
index <- sample(nrow(heart_disease),
                size = nrow(heart_disease)*0.7) 

# splitting
heart_disease_train <- heart_disease[index, ]
heart_disease_test <- heart_disease[-index, ] 
heart_disease$target %>% 
  levels()

## [1] "Health"     "Not Health"

5.2 Logistic Regression

Next, models are built using Logistic Regression and KNN. The data is divided into training and testing sets using cross-validation. In the Logistic Regression model, the model is fitted using the glm() function, and feature selection is performed using the stepwise method. The evaluation results of the model are displayed based on accuracy, precision, recall, and F1-score.

model <- glm(formula = target ~ ., family = "binomial", 
             data = heart_disease_train)

## Warning: glm.fit: algorithm did not converge

summary(model)

## 
## Call:
## glm(formula = target ~ ., family = "binomial", data = heart_disease_train)
## 
## Coefficients: (92 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.353e+03  1.089e+08       0        1
## age34       -8.757e+02  1.411e+07       0        1
## age35       -3.989e+02  5.012e+06       0        1
## age37       -1.081e+03  1.679e+07       0        1
## age38        1.852e+03  4.009e+07       0        1
## age39       -1.199e+03  1.998e+07       0        1
## age40       -3.698e+03  6.895e+07       0        1
## age41        4.068e+02  8.401e+06       0        1
## age42        1.504e+03  2.979e+07       0        1
## age43       -6.091e+02  1.012e+07       0        1
## age44        3.374e+02  7.100e+06       0        1
## age45        1.534e+03  3.340e+07       0        1
## age46        1.065e+03  2.312e+07       0        1
## age47        1.756e+03  3.682e+07       0        1
## age48        5.609e+02  1.214e+07       0        1
## age49        1.053e+03  2.048e+07       0        1
## age50        2.808e+02  7.870e+06       0        1
## age51        1.534e+03  3.275e+07       0        1
## age52        4.440e+01  4.662e+06       0        1
## age53        2.061e+02  5.781e+06       0        1
## age54        7.970e+00  2.816e+06       0        1
## age55       -4.604e+02  5.822e+06       0        1
## age56        3.317e+02  7.248e+06       0        1
## age57        1.473e+03  3.172e+07       0        1
## age58       -2.467e+01  2.496e+06       0        1
## age59        1.681e+02  5.461e+06       0        1
## age60        1.932e+03  4.179e+07       0        1
## age61       -5.287e+02  7.910e+06       0        1
## age62        1.257e+03  2.734e+07       0        1
## age63       -2.288e+02  2.324e+06       0        1
## age64        1.450e+03  3.046e+07       0        1
## age65       -7.161e+02  1.271e+07       0        1
## age66       -1.280e+03  2.338e+07       0        1
## age67       -4.983e+03  9.517e+07       0        1
## age68       -1.262e+03  2.221e+07       0        1
## age69        2.314e+03  4.857e+07       0        1
## age70       -7.435e+02  1.591e+07       0        1
## age71        1.689e+03  3.504e+07       0        1
## age76       -9.716e+01  1.959e+06       0        1
## age77        5.252e+02  1.560e+07       0        1
## sexMale      4.068e+02  8.365e+06       0        1
## cp1         -3.120e+02  6.210e+06       0        1
## cp2          2.292e+02  4.987e+06       0        1
## cp3         -1.899e+03  3.782e+07       0        1
## trestbps100  4.844e+03  9.885e+07       0        1
## trestbps101  3.739e+03  7.554e+07       0        1
## trestbps102  3.875e+03  7.923e+07       0        1
## trestbps104  3.058e+03  5.919e+07       0        1
## trestbps105  3.399e+03  6.716e+07       0        1
## trestbps106  1.011e+04  1.989e+08       0        1
## trestbps108  5.446e+03  1.064e+08       0        1
## trestbps110  4.167e+03  8.353e+07       0        1
## trestbps112  4.895e+03  9.757e+07       0        1
## trestbps114  4.944e+03  9.860e+07       0        1
## trestbps115  5.869e+03  1.166e+08       0        1
## trestbps117  3.714e+03  7.469e+07       0        1
## trestbps118  6.407e+03  1.255e+08       0        1
## trestbps120  5.158e+03  1.016e+08       0        1
## trestbps122  4.411e+03  8.852e+07       0        1
## trestbps124  4.368e+03  8.784e+07       0        1
## trestbps125  3.410e+03  6.731e+07       0        1
## trestbps126  5.124e+03  1.041e+08       0        1
## trestbps128  4.354e+03  8.589e+07       0        1
## trestbps129  5.566e+03  1.134e+08       0        1
## trestbps130  4.217e+03  8.449e+07       0        1
## trestbps132  2.112e+03  4.172e+07       0        1
## trestbps134  5.423e+03  1.082e+08       0        1
## trestbps135  4.704e+03  9.548e+07       0        1
## trestbps136  6.369e+03  1.276e+08       0        1
## trestbps138  4.450e+03  8.614e+07       0        1
## trestbps140  4.221e+03  8.369e+07       0        1
## trestbps142  3.715e+03  7.196e+07       0        1
## trestbps144  4.638e+03  9.292e+07       0        1
## trestbps145  5.855e+03  1.166e+08       0        1
## trestbps146  6.430e+03  1.272e+08       0        1
## trestbps148  5.275e+03  1.054e+08       0        1
## trestbps150  3.817e+03  7.583e+07       0        1
## trestbps152  8.368e+03  1.644e+08       0        1
## trestbps154  3.758e+03  7.505e+07       0        1
## trestbps155  5.757e+03  1.125e+08       0        1
## trestbps156  5.522e+03  1.124e+08       0        1
## trestbps160  5.325e+03  1.072e+08       0        1
## trestbps165  4.153e+03  8.341e+07       0        1
## trestbps170  2.372e+03  4.771e+07       0        1
## trestbps174  4.998e+03  1.010e+08       0        1
## trestbps178  6.225e+03  1.275e+08       0        1
## trestbps180  6.668e+03  1.315e+08       0        1
## trestbps200  1.423e+03  3.101e+07       0        1
## chol131     -9.310e+02  1.824e+07       0        1
## chol141     -7.936e+02  1.209e+07       0        1
## chol157     -4.406e+02  6.675e+06       0        1
## chol160     -8.979e+02  1.877e+07       0        1
## chol164     -1.255e+03  2.580e+07       0        1
## chol166      9.983e+02  2.241e+07       0        1
## chol167      5.910e+03  1.140e+08       0        1
## chol169     -7.363e+02  1.077e+07       0        1
## chol174     -3.522e+02  3.352e+06       0        1
## chol175     -1.718e+03  3.345e+07       0        1
## chol177      3.704e+02  9.182e+06       0        1
## chol178      1.321e+03  2.425e+07       0        1
## chol180     -1.138e+03  2.133e+07       0        1
## chol182      1.340e+03  2.597e+07       0        1
## chol183      1.168e+03  2.396e+07       0        1
## chol184      3.096e+03  6.582e+07       0        1
## chol186      4.199e+02  9.076e+06       0        1
## chol187      9.275e+02  1.853e+07       0        1
## chol192      1.112e+03  1.960e+07       0        1
## chol193      2.089e+03  4.166e+07       0        1
## chol195      1.539e+03  3.037e+07       0        1
## chol196     -1.264e+03  2.807e+07       0        1
## chol197      1.027e+03  2.006e+07       0        1
## chol198             NA         NA      NA       NA
## chol199      6.189e+03  1.209e+08       0        1
## chol200             NA         NA      NA       NA
## chol201     -3.427e+02  8.824e+06       0        1
## chol203      1.290e+01  2.350e+06       0        1
## chol204      1.067e+03  2.234e+07       0        1
## chol205      1.432e+03  2.845e+07       0        1
## chol206      7.439e+02  1.467e+07       0        1
## chol207      1.227e+03  2.465e+07       0        1
## chol208      3.799e+02  8.863e+06       0        1
## chol209     -4.061e+02  9.204e+06       0        1
## chol210             NA         NA      NA       NA
## chol211      1.254e+03  2.447e+07       0        1
## chol212      1.305e+03  2.633e+07       0        1
## chol213      1.928e+03  3.839e+07       0        1
## chol214      1.192e+02  3.886e+06       0        1
## chol215      9.131e+02  1.644e+07       0        1
## chol216      5.898e+02  1.292e+07       0        1
## chol217      9.989e+02  1.962e+07       0        1
## chol218     -5.579e+01  2.221e+06       0        1
## chol219     -4.490e+02  8.193e+06       0        1
## chol220      1.739e+03  3.486e+07       0        1
## chol221      7.355e+02  1.524e+07       0        1
## chol222             NA         NA      NA       NA
## chol223      8.900e+01  3.861e+06       0        1
## chol225      3.685e+03  7.383e+07       0        1
## chol226     -9.124e+02  1.564e+07       0        1
## chol227      3.049e+03  6.030e+07       0        1
## chol228      9.279e+02  1.674e+07       0        1
## chol229      3.427e+02  8.921e+06       0        1
## chol230     -1.799e+02  4.531e+06       0        1
## chol231     -3.518e+02  6.427e+06       0        1
## chol232             NA         NA      NA       NA
## chol233      2.846e+01  2.228e+06       0        1
## chol234     -6.034e+01  3.498e+06       0        1
## chol235      5.499e+02  1.244e+07       0        1
## chol236     -5.199e+01  1.672e+06       0        1
## chol237      4.584e+03  9.140e+07       0        1
## chol239      5.837e+02  1.198e+07       0        1
## chol240      1.369e+03  2.659e+07       0        1
## chol242     -1.754e+03  3.135e+07       0        1
## chol243     -2.672e+03  5.030e+07       0        1
## chol244      9.222e+01  3.784e+06       0        1
## chol245      5.063e+02  1.037e+07       0        1
## chol246      1.027e+03  2.016e+07       0        1
## chol248      1.468e+03  2.970e+07       0        1
## chol249             NA         NA      NA       NA
## chol250      1.447e+03  2.500e+07       0        1
## chol252      6.744e+02  1.024e+07       0        1
## chol253     -1.390e+03  2.849e+07       0        1
## chol254      9.310e+02  1.775e+07       0        1
## chol255      3.605e+02  8.051e+06       0        1
## chol256      8.478e+02  1.863e+07       0        1
## chol257     -1.462e+03  2.772e+07       0        1
## chol258     -4.228e+02  8.980e+06       0        1
## chol260      1.298e+02  4.573e+06       0        1
## chol261     -7.700e+02  1.546e+07       0        1
## chol263     -3.712e+02  4.986e+06       0        1
## chol264      9.560e+02  1.895e+07       0        1
## chol265             NA         NA      NA       NA
## chol266     -1.461e+02  2.184e+06       0        1
## chol267     -1.656e+03  3.113e+07       0        1
## chol268     -1.514e+02  3.455e+06       0        1
## chol269     -5.123e+01  2.550e+06       0        1
## chol270      4.782e+02  5.938e+06       0        1
## chol271     -9.446e+02  1.645e+07       0        1
## chol273      1.325e+03  2.583e+07       0        1
## chol274     -9.867e+00  2.800e+06       0        1
## chol275     -1.004e+03  2.006e+07       0        1
## chol277     -5.619e+02  1.103e+07       0        1
## chol278             NA         NA      NA       NA
## chol281      1.932e+03  3.788e+07       0        1
## chol282     -2.868e+03  5.788e+07       0        1
## chol283      4.193e+03  8.346e+07       0        1
## chol284      9.792e+01  3.856e+06       0        1
## chol286      4.578e+03  8.841e+07       0        1
## chol288      4.278e+03  8.538e+07       0        1
## chol289             NA         NA      NA       NA
## chol293     -1.233e+03  2.513e+07       0        1
## chol295     -1.542e+03  3.062e+07       0        1
## chol298     -9.951e+02  1.872e+07       0        1
## chol299      5.058e+03  9.681e+07       0        1
## chol302      1.095e+03  2.236e+07       0        1
## chol303     -4.478e+02  8.876e+06       0        1
## chol304      9.841e+02  1.797e+07       0        1
## chol305     -5.848e+02  1.126e+07       0        1
## chol306             NA         NA      NA       NA
## chol307             NA         NA      NA       NA
## chol308     -6.042e+02  1.287e+07       0        1
## chol309     -3.298e+02  5.142e+06       0        1
## chol311     -5.264e+02  9.399e+06       0        1
## chol315      9.750e+02  1.857e+07       0        1
## chol318             NA         NA      NA       NA
## chol319             NA         NA      NA       NA
## chol325     -7.818e+01  2.931e+06       0        1
## chol326      2.379e+03  4.766e+07       0        1
## chol327     -8.820e+02  1.762e+07       0        1
## chol330      1.638e+03  3.218e+07       0        1
## chol340     -1.435e+02  3.267e+06       0        1
## chol341      4.530e+03  9.175e+07       0        1
## chol342      3.879e+03  7.608e+07       0        1
## chol354     -1.412e+03  2.705e+07       0        1
## chol360      5.416e+02  9.535e+06       0        1
## chol394     -9.829e+01  3.485e+06       0        1
## chol407      1.738e+03  3.469e+07       0        1
## chol417      2.352e+03  4.741e+07       0        1
## fbsTrue     -7.067e+02  1.451e+07       0        1
## restecg1     1.602e+02  2.630e+06       0        1
## restecg2            NA         NA      NA       NA
## thalach96   -2.715e+03  5.165e+07       0        1
## thalach97   -1.662e+02  6.559e+06       0        1
## thalach99           NA         NA      NA       NA
## thalach103  -3.115e+03  6.005e+07       0        1
## thalach105  -6.205e+02  1.418e+07       0        1
## thalach108          NA         NA      NA       NA
## thalach109   6.892e+02  1.141e+07       0        1
## thalach111          NA         NA      NA       NA
## thalach112          NA         NA      NA       NA
## thalach114  -1.460e+03  2.823e+07       0        1
## thalach115          NA         NA      NA       NA
## thalach116          NA         NA      NA       NA
## thalach117          NA         NA      NA       NA
## thalach118          NA         NA      NA       NA
## thalach120          NA         NA      NA       NA
## thalach122          NA         NA      NA       NA
## thalach124          NA         NA      NA       NA
## thalach125          NA         NA      NA       NA
## thalach126          NA         NA      NA       NA
## thalach127          NA         NA      NA       NA
## thalach130          NA         NA      NA       NA
## thalach131          NA         NA      NA       NA
## thalach132          NA         NA      NA       NA
## thalach133          NA         NA      NA       NA
## thalach134          NA         NA      NA       NA
## thalach136          NA         NA      NA       NA
## thalach138          NA         NA      NA       NA
## thalach140          NA         NA      NA       NA
## thalach141          NA         NA      NA       NA
## thalach142          NA         NA      NA       NA
## thalach143          NA         NA      NA       NA
## thalach144          NA         NA      NA       NA
## thalach145          NA         NA      NA       NA
## thalach146          NA         NA      NA       NA
## thalach147          NA         NA      NA       NA
## thalach148          NA         NA      NA       NA
## thalach149          NA         NA      NA       NA
## thalach150          NA         NA      NA       NA
## thalach151          NA         NA      NA       NA
## thalach152          NA         NA      NA       NA
## thalach153          NA         NA      NA       NA
## thalach154          NA         NA      NA       NA
## thalach155          NA         NA      NA       NA
## thalach156          NA         NA      NA       NA
## thalach157          NA         NA      NA       NA
## thalach158          NA         NA      NA       NA
## thalach159          NA         NA      NA       NA
## thalach160          NA         NA      NA       NA
## thalach161          NA         NA      NA       NA
## thalach162          NA         NA      NA       NA
## thalach163          NA         NA      NA       NA
## thalach164          NA         NA      NA       NA
## thalach165          NA         NA      NA       NA
## thalach166          NA         NA      NA       NA
## thalach168          NA         NA      NA       NA
## thalach169          NA         NA      NA       NA
## thalach170          NA         NA      NA       NA
## thalach171          NA         NA      NA       NA
## thalach172          NA         NA      NA       NA
## thalach173          NA         NA      NA       NA
## thalach174          NA         NA      NA       NA
## thalach175          NA         NA      NA       NA
## thalach178          NA         NA      NA       NA
## thalach179          NA         NA      NA       NA
## thalach180          NA         NA      NA       NA
## thalach181          NA         NA      NA       NA
## thalach182          NA         NA      NA       NA
## thalach184          NA         NA      NA       NA
## thalach186          NA         NA      NA       NA
## thalach187          NA         NA      NA       NA
## thalach190          NA         NA      NA       NA
## thalach192          NA         NA      NA       NA
## thalach202          NA         NA      NA       NA
## exangYes            NA         NA      NA       NA
## oldpeak             NA         NA      NA       NA
## slope1              NA         NA      NA       NA
## slope2              NA         NA      NA       NA
## ca1                 NA         NA      NA       NA
## ca2                 NA         NA      NA       NA
## ca3                 NA         NA      NA       NA
## ca4                 NA         NA      NA       NA
## thal1               NA         NA      NA       NA
## thal2               NA         NA      NA       NA
## thal3               NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2.9200e+02  on 211  degrees of freedom
## Residual deviance: 1.2299e-09  on   0  degrees of freedom
## AIC: 424
## 
## Number of Fisher Scoring iterations: 25

5.2.1 Model Fitting

library(MASS)

## Warning: package 'MASS' was built under R version 4.3.3

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

model_both <- step(model, direction = "both")

## Start:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach + 
##     exang + oldpeak + slope + ca + thal

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach + 
##     exang + oldpeak + slope + ca

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach + 
##     exang + oldpeak + slope

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach + 
##     exang + oldpeak

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach + 
##     exang

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + restecg + thalach

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + fbs + thalach

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + trestbps + chol + thalach

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + cp + chol + thalach

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + sex + chol + thalach

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## 
## Step:  AIC=424
## target ~ age + chol + thalach

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

##            Df Deviance     AIC
## - thalach  45    16.64  350.64
## <none>            0.00  424.00
## - age      25   576.70  950.70
## - chol    101  2739.32 2961.32

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## 
## Step:  AIC=350.64
## target ~ age + chol

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## - chol     127  224.763 304.76
## + exang      1    0.000 336.00
## + slope      2    0.000 338.00
## + thal       3    0.000 340.00
## + ca         4    0.000 342.00
## + restecg    1   13.863 349.86
## + oldpeak    1   14.030 350.03
## <none>           16.636 350.64
## + fbs        1   15.608 351.61
## + sex        1   16.636 352.64
## + cp         3   16.636 356.64
## - age       37  113.328 373.33
## + trestbps  33    0.000 400.00
## + thalach   45    0.000 424.00
## 
## Step:  AIC=304.76
## target ~ age

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance     AIC
## + thal       3   173.43  259.43
## + exang      1   180.50  262.50
## + slope      2   179.31  263.31
## + oldpeak    1   181.98  263.98
## + cp         3   182.23  268.23
## + ca         4   184.01  272.01
## + sex        1   209.63  291.63
## - age       39   292.00  294.00
## + restecg    2   219.66  303.66
## <none>           224.76  304.76
## + fbs        1   224.61  306.61
## + trestbps  44   159.15  327.15
## + chol     127    16.64  350.64
## + thalach   71  2739.32 2961.32

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## 
## Step:  AIC=259.43
## target ~ age + thal

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance     AIC
## - age       39   228.17  236.17
## + exang      1   148.19  236.19
## + ca         4   145.42  239.42
## + oldpeak    1   151.72  239.72
## + slope      2   150.12  240.12
## + cp         3   150.04  242.04
## + restecg    2   167.53  257.53
## <none>           173.43  259.43
## + sex        1   172.91  260.91
## + fbs        1   172.94  260.94
## + trestbps  44   114.25  288.25
## - thal       3   224.76  304.76
## + chol     127     0.00  340.00
## + thalach   71  1874.27 2102.27
## 
## Step:  AIC=236.17
## target ~ thal

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + cp         3  189.913 203.91
## + exang      1  196.208 206.21
## + ca         4  196.340 212.34
## + oldpeak    1  205.129 215.13
## + slope      2  207.760 219.76
## + restecg    2  221.373 233.37
## <none>          228.167 236.17
## + fbs        1  227.598 237.60
## + sex        1  227.978 237.98
## + age       39  173.431 259.43
## + trestbps  44  179.023 275.02
## + thalach   73  129.583 283.58
## - thal       3  292.005 294.00
## + chol     129   69.537 335.54
## 
## Step:  AIC=203.91
## target ~ thal + cp

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + ca         4  169.119 191.12
## + exang      1  175.415 191.41
## + oldpeak    1  175.715 191.72
## + slope      2  174.344 192.34
## + restecg    2  185.394 203.39
## <none>          189.913 203.91
## + sex        1  188.591 204.59
## + fbs        1  189.912 205.91
## - thal       3  227.904 235.90
## - cp         3  228.167 236.17
## + age       39  150.042 242.04
## + trestbps  44  145.113 247.11
## + thalach   73  104.027 264.03
## + chol     129   40.884 312.88
## 
## Step:  AIC=191.12
## target ~ thal + cp + ca

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + slope      2  147.676 173.68
## + oldpeak    1  155.914 179.91
## + exang      1  157.116 181.12
## <none>          169.119 191.12
## + restecg    2  165.360 191.36
## + sex        1  168.039 192.04
## + fbs        1  168.623 192.62
## - ca         4  189.913 203.91
## - cp         3  196.340 212.34
## - thal       3  202.057 218.06
## + age       39  126.899 226.90
## + trestbps  44  126.356 236.36
## + thalach   73   91.595 259.60
## + chol     129   31.826 311.83
## 
## Step:  AIC=173.68
## target ~ thal + cp + ca + slope

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + exang      1  140.069 168.07
## + oldpeak    1  143.277 171.28
## + sex        1  144.822 172.82
## <none>          147.676 173.68
## + fbs        1  147.396 175.40
## + restecg    2  146.629 176.63
## - cp         3  170.338 190.34
## - slope      2  169.119 191.12
## - thal       3  171.344 191.34
## - ca         4  174.344 192.34
## + age       39  103.911 207.91
## + trestbps  44  106.963 220.96
## + thalach   73   81.588 253.59
## + chol     129   17.707 301.71
## 
## Step:  AIC=168.07
## target ~ thal + cp + ca + slope + exang

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + sex        1  137.413 167.41
## + oldpeak    1  137.625 167.62
## <none>          140.069 168.07
## + fbs        1  139.498 169.50
## + restecg    2  139.411 171.41
## - exang      1  147.676 173.68
## - cp         3  153.773 175.77
## - thal       3  159.092 181.09
## - slope      2  157.116 181.12
## - ca         4  163.431 183.43
## + age       39   97.173 203.17
## + trestbps  44   98.560 214.56
## + thalach   73   78.590 252.59
## + chol     129   17.705 303.70
## 
## Step:  AIC=167.41
## target ~ thal + cp + ca + slope + exang + sex

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## + oldpeak    1  135.091 167.09
## <none>          137.413 167.41
## - sex        1  140.069 168.07
## + fbs        1  136.799 168.80
## + restecg    2  136.496 170.50
## - thal       3  147.838 171.84
## - exang      1  144.822 172.82
## - cp         3  152.339 176.34
## - slope      2  156.360 182.36
## - ca         4  161.964 183.96
## + age       39   91.053 199.05
## + trestbps  44   96.107 214.11
## + thalach   73   76.558 252.56
## + chol     129    0.000 288.00
## 
## Step:  AIC=167.09
## target ~ thal + cp + ca + slope + exang + sex + oldpeak

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##             Df Deviance    AIC
## <none>          135.091 167.09
## - oldpeak    1  137.413 167.41
## - sex        1  137.625 167.62
## + fbs        1  134.566 168.57
## + restecg    2  134.646 170.65
## - exang      1  140.773 170.77
## - thal       3  144.842 170.84
## - slope      2  147.524 175.52
## - cp         3  150.659 176.66
## - ca         4  157.844 181.84
## + age       39   90.021 200.02
## + trestbps  44   93.004 213.00
## + thalach   73   62.934 240.93
## + chol     129    0.000 290.00

model$aic

## [1] 424

model_both$aic

## [1] 167.0912

5.2.2 Prediction

heart_disease_test$prob_heart<-predict(model_both, type = "response", newdata = heart_disease_test)

heart_disease_test$pred_heart <- factor(ifelse(heart_disease_test$prob_heart > 0.5, "Not Health","Health"))
heart_disease_test[1:10, c("pred_heart", "target")]

##    pred_heart     target
## 1  Not Health Not Health
## 4  Not Health Not Health
## 6      Health Not Health
## 7  Not Health Not Health
## 9  Not Health Not Health
## 10 Not Health Not Health
## 12 Not Health Not Health
## 18 Not Health Not Health
## 19 Not Health Not Health
## 24 Not Health Not Health

From the result we know that when the data test probability more than 0.5, means the patient Not Health

5.2.3 Model Performance Evaluation

library(caret)

## Warning: package 'caret' was built under R version 4.3.2

## Loading required package: lattice

conf_matrix <- confusionMatrix(heart_disease_test$pred_heart, heart_disease_test$target, positive = "Not Health")




accuracy <- conf_matrix$overall['Accuracy']
recall <- conf_matrix$byClass['Recall']
precision <- conf_matrix$byClass['Precision']
f1_score <- conf_matrix$byClass['F1']

5.2.4 Evaluation Result

cat("Accuracy:", accuracy, "\n")

## Accuracy: 0.8571429

cat("Precision:", precision, "\n")

## Precision: 0.8333333

cat("Recall:", recall, "\n")

## Recall: 0.9183673

cat("F1 Score:", f1_score, "\n")

## F1 Score: 0.8737864

5.2.5 Model Interpretation

library(dplyr)
exp(model_both$coefficients) %>%
  data.frame()

##                        .
## (Intercept) 9.653188e-01
## thal1       3.886517e+00
## thal2       8.431730e+00
## thal3       1.797427e+00
## cp1         1.676569e+00
## cp2         7.621062e+00
## cp3         6.714734e+00
## ca1         1.522204e-01
## ca2         6.192590e-02
## ca3         1.726038e-01
## ca4         5.986371e+06
## slope1      5.323100e-01
## slope2      3.316779e+00
## exangYes    2.962824e-01
## sexMale     3.977241e-01
## oldpeak     6.768212e-01

Based on the confusionMatrix result above, we observe that the model’s overall accuracy in predicting the target variable (Health and Not Health) is 86%. Furthermore, among the total actual instances where individuals are not healthy, the model correctly predicts around 89%. Additionally, among the instances predicted as positive by the model, the proportion of true positives is 85%.

5.3 K-Nearest Neighbour (KNN)

The KNN model is also constructed. In this stage, numeric predictor filtering and class proportion checking are performed. The data is also presented in summary form to understand the range of predictor variable values. Training and testing data are generated by cross-validation.

5.3.1 Pre-Processing Data

Create dummy variables from the categoric data in classification

dummy <- dummyVars("~target+sex+cp+fbs+exang+oldpeak+slope+ca+thal", data = heart_disease)
dummy <- data.frame(predict(dummy, newdata = heart_disease))
glimpse(dummy)

## Rows: 303
## Columns: 25
## $ target.Health     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ target.Not.Health <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ sex.Female        <dbl> 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1…
## $ sex.Male          <dbl> 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0…
## $ cp.0              <dbl> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ cp.1              <dbl> 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ cp.2              <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0…
## $ cp.3              <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1…
## $ fbs.False         <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ fbs.True          <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ exang.No          <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1…
## $ exang.Yes         <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
## $ oldpeak           <dbl> 2.3, 3.5, 1.4, 0.8, 0.6, 0.4, 1.3, 0.0, 0.5, 1.6, 1.…
## $ slope.0           <dbl> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
## $ slope.1           <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0…
## $ slope.2           <dbl> 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0…
## $ ca.0              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ ca.1              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ ca.2              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ ca.3              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ ca.4              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ thal.0            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ thal.1            <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ thal.2            <dbl> 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ thal.3            <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Delete the dummy variable which the previous variable has two categories

dummy$target.Health <- NULL
dummy$sex.Female <- NULL
dummy$fbs.False <- NULL
dummy$exang.No <- NULL

5.3.2 Class proportion Check

prop.table(table(dummy$target))

## 
##         0         1 
## 0.4554455 0.5445545

6. Check the value of Predictor Variable Range

summary(dummy)

##  target.Not.Health    sex.Male           cp.0             cp.1      
##  Min.   :0.0000    Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000    1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :1.0000    Median :1.0000   Median :0.0000   Median :0.000  
##  Mean   :0.5446    Mean   :0.6832   Mean   :0.4719   Mean   :0.165  
##  3rd Qu.:1.0000    3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
##  Max.   :1.0000    Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##       cp.2             cp.3            fbs.True        exang.Yes     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0.2871   Mean   :0.07591   Mean   :0.1485   Mean   :0.3267  
##  3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##     oldpeak        slope.0           slope.1         slope.2      
##  Min.   :0.00   Min.   :0.00000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.00   1st Qu.:0.00000   1st Qu.:0.000   1st Qu.:0.0000  
##  Median :0.80   Median :0.00000   Median :0.000   Median :0.0000  
##  Mean   :1.04   Mean   :0.06931   Mean   :0.462   Mean   :0.4686  
##  3rd Qu.:1.60   3rd Qu.:0.00000   3rd Qu.:1.000   3rd Qu.:1.0000  
##  Max.   :6.20   Max.   :1.00000   Max.   :1.000   Max.   :1.0000  
##       ca.0             ca.1             ca.2             ca.3        
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :1.0000   Median :0.0000   Median :0.0000   Median :0.00000  
##  Mean   :0.5776   Mean   :0.2145   Mean   :0.1254   Mean   :0.06601  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##       ca.4            thal.0             thal.1            thal.2      
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.000000   Median :0.00000   Median :1.0000  
##  Mean   :0.0165   Mean   :0.006601   Mean   :0.05941   Mean   :0.5479  
##  3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.000000   Max.   :1.00000   Max.   :1.0000  
##      thal.3      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3861  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

6.1 Cross Validation

set.seed(300)
index_dmy <- sample(x = nrow(dummy), size = nrow(dummy) * 0.8)


heartdmy_train <- dummy[index_dmy, ]
heartdmy_test <- dummy[-index_dmy, ]

6.2 Pre-processing

heartdmy_train_label <- dummy[index_dmy,1]
heartdmy_test_label <- dummy[-index_dmy,1]

6.3 K-NN Prediction

KNN_Pred <- class::knn(train = heartdmy_train,
                       test = heartdmy_test, 
                       cl = heartdmy_train_label, 
                       k = 17)

6.4 K-NN Confusion Matrix

KNN_Pred_Coef <- confusionMatrix(as.factor(KNN_Pred), as.factor(heartdmy_test_label),"1")
KNN_Pred_Coef

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 25  1
##          1  2 33
##                                           
##                Accuracy : 0.9508          
##                  95% CI : (0.8629, 0.9897)
##     No Information Rate : 0.5574          
##     P-Value [Acc > NIR] : 6.295e-12       
##                                           
##                   Kappa : 0.8999          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9706          
##             Specificity : 0.9259          
##          Pos Pred Value : 0.9429          
##          Neg Pred Value : 0.9615          
##              Prevalence : 0.5574          
##          Detection Rate : 0.5410          
##    Detection Prevalence : 0.5738          
##       Balanced Accuracy : 0.9483          
##                                           
##        'Positive' Class : 1               
##

6.5 Evaluation Results

cat("Accuracy:", KNN_Pred_Coef$overall["Accuracy"], "\n")

## Accuracy: 0.9508197

cat("Precision:", KNN_Pred_Coef$byClass["Pos Pred Value"], "\n")

## Precision: 0.9428571

cat("Recall:", KNN_Pred_Coef$byClass["Sensitivity"], "\n")

## Recall: 0.9705882

cat("F1 Score:", KNN_Pred_Coef$byClass["F1"], "\n")

## F1 Score: 0.9565217

Evaluation Model ” Logistic Regression VS KNN ”

# Evaluasi Model Logistic Regression
eval_logit <- data.frame(Accuracy = conf_matrix$overall["Accuracy"],
                         Recall = conf_matrix$byClass["Sensitivity"],
                         Specificity = conf_matrix$byClass["Specificity"],
                         Precision = conf_matrix$byClass["Pos Pred Value"])

# Evaluasi Model K-NN
eval_knn <- data.frame(Accuracy = KNN_Pred_Coef$overall["Accuracy"],
                       Recall = KNN_Pred_Coef$byClass["Sensitivity"],
                       Specificity = KNN_Pred_Coef$byClass["Specificity"],
                       Precision = KNN_Pred_Coef$byClass["Pos Pred Value"])
eval_logit

##           Accuracy    Recall Specificity Precision
## Accuracy 0.8571429 0.9183673   0.7857143 0.8333333

eval_knn

##           Accuracy    Recall Specificity Precision
## Accuracy 0.9508197 0.9705882   0.9259259 0.9428571

**** Insights : Based on the Recall value, the K-NN model has a higher Recall value (0.972973) compared to the Logistic Regression model (0.9285714). This indicates that the K-NN model has better capability in predicting patients who are actually Not Health.

Therefore, from these evaluation results, it can be concluded that using the K-NN method is more recommended for predicting patients who are actually sick and not sick, due to its higher Recall value.

Conclusion

Based on the evaluation conducted, it can be concluded that the Logistic Regression model performs better in predicting passengers who actually survived as not survived. This is indicated by the higher recall value in the Logistic Regression model compared to the KNN model.

In conclusion, it is recommended to use the KNN model as the optimal model for predicting the tendency of patients with heart disease based on the performance evaluation conducted.

Thus, this report provides an overview of the classification analysis process using the heart disease dataset, model selection, performance evaluation, and the recommended model for use.