Eswar Phani Paruchuri(s3798488) | Madisetty Sujay Kamal(s3794983) |Kuntachikanahalli Srinivasa Reddy Harshitha Reddy(s3797186)
Last updated: 27 October, 2019
Heart diseases increasingly becomes a major risk factor for heart attack and stroke, which constitute the major causes of death’s World wide. Thus, we are looking into the segment of people with and without heart diseases and analyse what are the major factors resulting in Heart Diseases.
Initially our understanding is that people with high Max Heart Rate and high cholestrol levels are more vulnerable to heart related diseases.
Here are the parameters through which we carry out our further analysis: 1.chol - Cholestrol Level 2.thalach - Max Heart Rate
This public health dataset was provided by David Lapp in the Kaggle website, and the reference link is as follows: https://www.kaggle.com/johnsmith88/heart-disease-dataset/metadata
This open data set dates from 1988 and is last updated in 2019. This is the second version and consists of more than 1000 observations from four databases: Cleveland, Hungary, Switzerland, and Long Beach V.
Current Dataset is a subset of 5 attributes used for analysis out of the 14 attributes present. The “target” field refers to the presence of heart disease in the patient. It is an integer valued 0 = no disease and 1 = disease.
#Reading the data from csv file:
Heart_Incidences <- read_csv("C:/Users/Dick Smith/Desktop/Intro 2 Stat/Assignment 3 Data Sets/heart.csv")
#Summarising the Data:
summary(Heart_Incidences)## age sex chol thalach
## Min. :29.00 Min. :0.0000 Min. :126 Min. : 71.0
## 1st Qu.:48.00 1st Qu.:0.0000 1st Qu.:211 1st Qu.:132.0
## Median :56.00 Median :1.0000 Median :240 Median :152.0
## Mean :54.43 Mean :0.6956 Mean :246 Mean :149.1
## 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:275 3rd Qu.:166.0
## Max. :77.00 Max. :1.0000 Max. :564 Max. :202.0
## target
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5132
## 3rd Qu.:1.0000
## Max. :1.0000
## [1] 1025 5
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1025 obs. of 5 variables:
## $ age : num 52 53 70 61 62 58 58 55 46 54 ...
## $ sex : num 1 1 1 1 0 0 1 1 1 1 ...
## $ chol : num 212 203 174 203 294 248 318 289 249 286 ...
## $ thalach: num 168 155 125 161 106 122 140 145 144 116 ...
## $ target : num 0 0 0 0 0 1 0 0 0 0 ...
## - attr(*, "spec")=
## .. cols(
## .. age = col_double(),
## .. sex = col_double(),
## .. chol = col_double(),
## .. thalach = col_double(),
## .. target = col_double()
## .. )
Heart_Incidences$sex <- factor(Heart_Incidences$sex,levels = c(0,1),labels=c("Female","Male"))
#Target From Number to Factor:
Heart_Incidences$target<-factor(Heart_Incidences$target,levels=c(0,1),labels=c("NO","YES"))
#Checking if there are any missing values in the data set:
sum(is.na(Heart_Incidences))## [1] 0
“Heart Disease vulnerability with respect to Max Heart Rate” graph clearly shows that people with max heart rate are more vulnerable to Heart diseases.
“Heart Disease vulnerability with respect to Cholestrol” graph clearly shows that people with Cholestrol are also vulnerable to Heart Diseases.
A special segment of code has been used below as shown to remove outliers from the above given boxplot graphs.
##
## Female Male
## 0.3043902 0.6956098
##
## NO YES
## 0.4868293 0.5131707
plot_sex_bp<- ggplot(data = Heart_Incidences, aes(x=Heart_Incidences$target, y=Heart_Incidences$chol)) + geom_boxplot(aes(fill=Heart_Incidences$target))
plot_sex_bp+ggtitle("Cholestrol variation with Heart Diesease")+xlab("Heart Disease(Yes or No)")+ylab("cholestrol")plot_sex_bp<- ggplot(data = Heart_Incidences, aes(x=Heart_Incidences$target, y=Heart_Incidences$thalach)) + geom_boxplot(aes(fill=Heart_Incidences$target))
plot_sex_bp+ggtitle("Max Heart Rate variation with Heart Disease")+xlab("Heart Disease(Yes or No)")+ylab("Max Heart Rate")#Code used to remove outliers from the boxplots:
outliers <- boxplot(Heart_Incidences$chol, plot=FALSE)$out
Heart_Incidences <- Heart_Incidences[-which(Heart_Incidences$chol %in% outliers),]
plot_sex_bp<- ggplot(data = Heart_Incidences, aes(x=Heart_Incidences$target, y=Heart_Incidences$chol)) + geom_boxplot(aes(fill=Heart_Incidences$target))
plot_sex_bp+ggtitle("Cholestrol variation with Heart Diesease")+xlab("Heart Disease(Yes or No)")+ylab("cholestrol")outliers <- boxplot(Heart_Incidences$thalach, plot=FALSE)$out
Heart_Incidences <- Heart_Incidences[-which(Heart_Incidences$thalach %in% outliers),]
plot_sex_bp<- ggplot(data = Heart_Incidences, aes(x=Heart_Incidences$target, y=Heart_Incidences$thalach)) + geom_boxplot(aes(fill=Heart_Incidences$target))
plot_sex_bp+ggtitle("Max Heart Rate variation with Heart Disease")+xlab("Heart Disease(Yes or No)")+ylab("Max Heart Rate")## [1] 1921.851
## [1] 2152.996
t.test(Heart_Incidences$chol~Heart_Incidences$target,
paired = FALSE,
var.equal = FALSE,
alternative = "two.sided")##
## Welch Two Sample t-test
##
## data: Heart_Incidences$chol by Heart_Incidences$target
## t = 4.1978, df = 990.04, p-value = 2.938e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.373539 17.563614
## sample estimates:
## mean in group NO mean in group YES
## 249.1639 237.1954
## [1] 371.0126
## [1] 479.885
t.test(Heart_Incidences$thalach~Heart_Incidences$target,
paired = FALSE,
var.equal = FALSE,
alternative = "two.sided")##
## Welch Two Sample t-test
##
## data: Heart_Incidences$thalach by Heart_Incidences$target
## t = -14.641, df = 969.73, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.65517 -16.53632
## sample estimates:
## mean in group NO mean in group YES
## 139.5000 158.5957
-External Refernce on Heart Rates: https://www.health.harvard.edu/heart-health/what-your-heart-rate-is-telling-you