Objectives
Data Understanding and Preparation:
Grasp the underlying context and clinical significance of the heart
disease dataset. This step involves a thorough review of the dataset to
identify and address missing values, detect and handle outliers, and
apply necessary transformations to prepare the data for accurate
analysis. This preparation ensures that the data is clean and suitable
for robust predictive modeling.
Descriptive Analytics:
Perform a detailed descriptive statistical analysis to summarize the key
characteristics of the dataset. This includes computing statistical
measures such as mean, median, variance, and range for various features.
The goal is to build a solid understanding of the data’s distribution
and identify any initial patterns that could be relevant for further
analysis.
Exploratory Data Analysis (EDA):
Utilize a combination of visual and statistical techniques to explore
the relationships within the data, uncover hidden patterns, and assess
the quality of the data. EDA will include creating visualizations such
as histograms, box plots, and scatter plots, which are essential for
identifying trends, correlations, and potential issues such as
multicollinearity or skewness in the dataset.
Result Analysis:
Assess the performance of the machine learning models using appropriate
metrics such as accuracy and precision. This analysis will help
determine the model’s effectiveness in predicting heart disease,
identify the strengths and weaknesses of each approach, and offer
insights that could guide further model refinement or clinical
application.
# Load Required Libraries ------------------------------------------------
library(data.table) # For efficient data handling
## Warning: package 'data.table' was built under R version 4.4.1
library(dplyr) # For data manipulation
## Warning: package 'dplyr' was built under R version 4.4.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # For data visualization
## Warning: package 'ggplot2' was built under R version 4.4.1
library(plotly) # For interactive plots
## Warning: package 'plotly' was built under R version 4.4.1
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(caret) # For machine learning tasks
## Warning: package 'caret' was built under R version 4.4.1
## Loading required package: lattice
library(recipes) # For data preprocessing
## Warning: package 'recipes' was built under R version 4.4.1
##
## Attaching package: 'recipes'
## The following object is masked from 'package:stats':
##
## step
library(rsample) # For data splitting
## Warning: package 'rsample' was built under R version 4.4.1
library(randomForest) # For Random Forest model
## Warning: package 'randomForest' was built under R version 4.4.1
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
library(xgboost) # For XGBoost model
## Warning: package 'xgboost' was built under R version 4.4.1
##
## Attaching package: 'xgboost'
## The following object is masked from 'package:plotly':
##
## slice
## The following object is masked from 'package:dplyr':
##
## slice
library(e1071) # For Support Vector Machine (SVM)
## Warning: package 'e1071' was built under R version 4.4.1
##
## Attaching package: 'e1071'
## The following object is masked from 'package:rsample':
##
## permutations
library(rpart) # For Decision Tree model
## Warning: package 'rpart' was built under R version 4.4.1
library(class) # For K-Nearest Neighbors (KNN)
## Warning: package 'class' was built under R version 4.4.1
library(yardstick) # For model evaluation metrics
## Warning: package 'yardstick' was built under R version 4.4.1
##
## Attaching package: 'yardstick'
## The following objects are masked from 'package:caret':
##
## precision, recall, sensitivity, specificity
library(ggplot2)
library(naniar)
## Warning: package 'naniar' was built under R version 4.4.1
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## Warning: package 'tibble' was built under R version 4.4.1
## Warning: package 'tidyr' was built under R version 4.4.1
## Warning: package 'readr' was built under R version 4.4.1
## Warning: package 'purrr' was built under R version 4.4.1
## Warning: package 'forcats' was built under R version 4.4.1
## Warning: package 'lubridate' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks data.table::between()
## ✖ randomForest::combine() masks dplyr::combine()
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::first() masks data.table::first()
## ✖ stringr::fixed() masks recipes::fixed()
## ✖ lubridate::hour() masks data.table::hour()
## ✖ lubridate::isoweek() masks data.table::isoweek()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks data.table::last()
## ✖ purrr::lift() masks caret::lift()
## ✖ randomForest::margin() masks ggplot2::margin()
## ✖ lubridate::mday() masks data.table::mday()
## ✖ lubridate::minute() masks data.table::minute()
## ✖ lubridate::month() masks data.table::month()
## ✖ lubridate::quarter() masks data.table::quarter()
## ✖ lubridate::second() masks data.table::second()
## ✖ xgboost::slice() masks plotly::slice(), dplyr::slice()
## ✖ readr::spec() masks yardstick::spec()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ lubridate::wday() masks data.table::wday()
## ✖ lubridate::week() masks data.table::week()
## ✖ lubridate::yday() masks data.table::yday()
## ✖ lubridate::year() masks data.table::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(caret)
library(randomForest)
library(pROC)
## Warning: package 'pROC' was built under R version 4.4.1
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
##
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
library(e1071)