library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
HA<- read.csv("/Users/rupeshswarnakar/Desktop/heart_attack_prediction_dataset.csv")
There are many valuable columns that are present in the dataset of Heart Attack Prediction. However, one of the most significant columns is Stress Level. This column is important because it is the major attributing factor that directly impacts the brain and consequtively the heart health of patients. There are many recent researches that are pointing towards the mental stress being one of the contributing factors in mortality of patients due to heart failure and many other non-communicable diseases. Stress is also discovered as a significant factor impacting diabetes level of patients causing different healthy organs to fail due to excessive stress hormone (cortisol) release.
Let’s assume our response variable from the dataset is Stress level. And in order to observe how differently the stress hormone (cortisol) affects the healthy body between male and female, we consider gender as a explanatory variable.
The mean Stress level is same for both male and female.
The mean Stress level is different between male and female.
Let’s test the above hypothesis using ANOVA test as follows.
m <- aov(Stress.Level ~ Sex, data = HA)
summary(m)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 1 34 34.16 4.179 0.041 *
## Residuals 8761 71617 8.17
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above results, we can see that the p-value of 0.041 is less than 0.05. This means we can reject the null hypothesis. In other words, there is enough evidence to support the alternative hyothesis that the mean Stress level is different for both male and female patients. Specifically, the stress hormone (cortisol) affects both male and female body differently.
The result from the ANOVA test seems intuitive because generally women tends to react to stress differently than the male. Since, male have strong biological coping mechanism, they handle stress and any stress related consequences such as anxiety and depression smoothly. However, women tend to fall in depression and anxiety due to weaker coping mechanism. Women are emotionally weaker than male due to which they often experience mood swings, self-distraction, work-life pressure, emotional shock, fear and many more.
Let’s consider Cholesterol to be the continuous column. We can obtain a linear regression model for Stress vs Cholesterol as follows.
Let’s create a scatter plot to show the relationship between Stress level and Cholesterol as follows.
HA |>
ggplot(mapping = aes(x = Cholesterol, y = Stress.Level)) +
geom_point(size = 2,) +
geom_smooth(method = "lm", se = FALSE, color = 'red')+
labs(title = "Effect of Cholesterol on Stress level",
x = "Cholesterol level",
y = "Stress level") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
From the above plot we can see that there is a very weak negative relationship between stress and cholesterol. In other words, Stress decreases as cholesterol increases. This visualization does not explain the detailed relationship between stress and cholesterol therefore, it opens doors for further investigation. Furthermore, Stress is not solely impacted by cholesterol which is why the relationship between stress and cholesterol is not strong.
model <- lm(Stress.Level ~ Cholesterol, data = HA)
summary(model)
##
## Call:
## lm(formula = Stress.Level ~ Cholesterol, data = HA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5908 -2.4765 -0.3579 2.5148 4.6516
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.6947439 0.1027978 55.398 <2e-16 ***
## Cholesterol -0.0008660 0.0003777 -2.293 0.0219 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.859 on 8761 degrees of freedom
## Multiple R-squared: 0.0005996, Adjusted R-squared: 0.0004855
## F-statistic: 5.256 on 1 and 8761 DF, p-value: 0.02189
From the above results we can obtain various data that can be useful for the conclusion as presented below.
The value 5.6948439 indicates that even if the cholesterol of a human body is zero, there will be a stress level of 5.69 units. Practically this data is not useful since biological body will always have some level of cholesterol. It is impossible to find that ideal human body of zero cholesterol.
The value 0.0008660 indicates that for each unit increase in cholesterol, stress decreases by 0.000866 unit. This shows a negative relationship between stress level and cholesterol in human body.
The p-value of 0.0219 suggests a strong correlation between stress and cholesterol. In other words, cholesterol has a significant effect on the stress level on a human body.
This value of 0.0005996 suggest only 0.06% variation in the stress level due to cholesterol. This is a very small value for a small variation.
The value of 5.256 suggests that cholesterol has a significant contribution in the stress level of human body.
The linear regression model suggests a strong negative relationship between the Stress level and Cholesterol level on a human body. Specifically, the increment in cholesterol decreases the Stress level.
Generally, the above conclusion might sound counter-intuitive. However, in order to find the credibility of the statement, we need to understand the true meaning of cholesterol. Cholesterol are composed of HDL (good cholesterol) and LDL (bad cholesterol). So, increment in HDL is good for health and it is shown to decrease the stress hormone in the body (slightly). However, increment in LDL increases the stress hormone causing individuals prone to heart failure.
Hence, the above linear regression model opens up doors to further investigate on cholesterol at a detailed level. Also, we can make recommendation to general populations that eating appropriate amount of healthy fats such as avocado, nuts, healthy oil, seeds etc can help increase HDL levels thereby, reducing the stress hormone preventing them from the risk of heart attacks.