Denis Bharatbhai Vaghasia - s3858391
Last updated: 28 May, 2021
heart_patient_data <- read.csv('heart_disease_patients.csv')
heart_patient_data <- heart_patient_data %>% mutate(
sex = factor(sex, levels=c(1,0), labels=c('Male','Female')),
fbs = factor(fbs, levels=c(1,0), labels=c('Yes','No')),
exang = factor(exang, levels=c(1,0), labels=c('Yes','No')),
cp = factor(cp, levels = c(1,2,3,4), labels=c(1,2,3,4), ordered=TRUE))
str(heart_patient_data)## 'data.frame': 303 obs. of 12 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ age : int 63 67 67 37 41 56 62 57 63 53 ...
## $ sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 1 2 2 1 1 ...
## $ cp : Ord.factor w/ 4 levels "1"<"2"<"3"<"4": 1 4 4 3 2 2 4 4 4 4 ...
## $ trestbps: int 145 160 120 130 130 120 140 120 130 140 ...
## $ chol : int 233 286 229 250 204 236 268 354 254 203 ...
## $ fbs : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 2 2 2 2 1 ...
## $ restecg : int 2 2 2 0 2 0 2 0 2 2 ...
## $ thalach : int 150 108 129 187 172 178 160 163 147 155 ...
## $ exang : Factor w/ 2 levels "Yes","No": 2 1 1 2 2 2 2 1 2 1 ...
## $ oldpeak : num 2.3 1.5 2.6 3.5 1.4 0.8 3.6 0.6 1.4 3.1 ...
## $ slope : int 3 2 2 3 1 1 3 1 2 3 ...
## [1] 0
heart_patient_data %>% group_by(sex) %>% summarise(Min= min(chol, na.rm = TRUE),
Q1= quantile(chol, probs = 0.25, na.rm = TRUE),
Median = median(chol, na.rm = TRUE),
Q2 = quantile(chol, probs = 0.75, na.rm = TRUE),
Max = max(chol, na.rm=TRUE),
Mean = mean(chol, na.rm = TRUE),
SD = sd(chol, na.rm = TRUE),
n=n(),
Missing = sum(is.na(chol))) -> table_chol
knitr::kable(table_chol)| sex | Min | Q1 | Median | Q2 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Male | 126 | 208.75 | 235 | 268.5 | 353 | 239.6019 | 42.64976 | 206 | 0 |
| Female | 141 | 215.00 | 254 | 302.0 | 564 | 261.7526 | 64.90089 | 97 | 0 |
heart_patient_data %>% boxplot(chol ~ sex, data = ., main = 'Box Plot of Patient Cholestrol by Sex', ylab='Cholestrol Level', xlab='Sex', col='#1ABC9C')heart_patient_data %>% summarise(Min= min(trestbps, na.rm = TRUE),
Q1= quantile(trestbps, probs = 0.25, na.rm = TRUE),
Median = median(trestbps, na.rm = TRUE),
Q2 = quantile(trestbps, probs = 0.75, na.rm = TRUE),
Max = max(trestbps, na.rm=TRUE),
Mean = mean(trestbps, na.rm = TRUE),
SD = sd(trestbps, na.rm = TRUE),
n=n(),
Missing = sum(is.na(trestbps))) -> table_trestbps
knitr::kable(table_trestbps)| Min | Q1 | Median | Q2 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|
| 94 | 120 | 130 | 140 | 200 | 131.6898 | 17.59975 | 303 | 0 |
heart_patient_data %>% plot(trestbps ~ age, data = ., xlab = 'Patient Age', ylab = 'Resting Blood Pressure (mm)')heart_patient_data %>% summarise(Min= min(thalach, na.rm = TRUE),
Q1= quantile(thalach, probs = 0.25, na.rm = TRUE),
Median = median(thalach, na.rm = TRUE),
Q2 = quantile(thalach, probs = 0.75, na.rm = TRUE),
Max = max(thalach, na.rm=TRUE),
Mean = mean(thalach, na.rm = TRUE),
SD = sd(thalach, na.rm = TRUE),
n=n(),
Missing = sum(is.na(thalach))) -> table_thalach
knitr::kable(table_thalach)| Min | Q1 | Median | Q2 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|
| 71 | 133.5 | 153 | 166 | 202 | 149.6073 | 22.875 | 303 | 0 |
heart_patient_data$thalach %>% hist(col = '#F39C12', xlim=c(50,250), xlab="Maximum heart rate achieved", main= "Histogram of Heart Rate achieved")
heart_patient_data$thalach %>% mean() %>% abline(v=., col="#1C2833", lwd=2, lty=5) - We have gone through three attributes that are comparison of cholesterol level with sex, Resting blood pressure relationship with age and frequency of maximum heart beat rate of patients. - Females has more cholesterol problems and patient with age above 40 years comes with more blood pressure. This tells us that person with age above 40 mostly have high chances of getting heart disease and also cholesterol level is moreover high to Female. If the person also comes with maximum heart rate above 120 till 200, this cause an trouble for person and can be the patient of disease.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 126.0 211.0 241.0 246.7 275.0 564.0
## [1] 153 49
\[H_0: \mu_1 < \ 240 \] - We will use Confidence Interval Approach. The specialty of this approach is when we test Ho for one sample t-test, it will also test for two-tailed hypothesis test. - So, we will calculate 95% CI for sample mean 246.7. Since, we don’t know standard deviation we will use s/sqrt(n). Hence, below mentioned formula will calculate 95% CI.
## [1] 240.8397 252.5465
## attr(,"conf.level")
## [1] 0.95
##
## One Sample t-test
##
## data: heart_patient_data$chol
## t = 2.2501, df = 302, p-value = 0.02516
## alternative hypothesis: true mean is not equal to 240
## 95 percent confidence interval:
## 240.8397 252.5465
## sample estimates:
## mean of x
## 246.6931
So, as per the confidence Interval rule if the 95% CI does not capture Ho, we reject Ho.
To conclude, we remember Ho = 240 and we can see that Ho value is not captured by the 95% CI [240.8397, 252.5465]. Hence, the test is statistically significant.
To conclude, we have analyzed many symptoms of disease and came to conclusion that the average value of cholesterol level is quite near by of female and male and value indicates that female has high risk from cholesterol point of view than male.
At same time, resting blood pressure also plays important roles to indicate heart disease. The normal person has blood pressure between 90mm to 120mm. So, surely patient with more than this blood pressure has a risk. Mostly patient above age 45 has more blood pressure and has more risk.
As blood pressure problems arrives, heart rate also changes. The graph shows that average value of patient’s heart rate is approximately 150. It indicates more than half the patients has rate between 130 to 170, which is quite serious problem.
We then done hypothesis testing on our assumption of cholesterol. The 0.05 level of significance was used and the result came to be that mean of cholesterol level was statistically significantly higher. Hence, the conclusion was that it was statistically significant.
We can say, a person should maintain his/her heart rate below 120. Also an average person’s blood pressure should be between 90mm to 120mm. So, Patient above age 40 should take care of his/her pressure.