Mortality Outcomes for Myocardial Infarction

Dataset: “monica” Found at: http://vincentarelbundock.github.io/Rdatasets/doc/DAAG/monica.html

Question - How Are Heart Attacks affected by Factors such as: Age, Gender, and Smoking?

We will investigate how both the event of myocardial infarction (commonly known as a ‘heart attack’) and resultant death are effected by factors of age, gender and smoking.

Section 1 - Data Exploration

First let’s take a look at some high level summary statistics of the monica dataset:

summary(monica)
##  outcome     sex           age           yronset      premi     smstat   
##  live:3525   m:4605   Min.   :35.00   Min.   :85.00   y :1511   c :2051  
##  dead:2842   f:1762   1st Qu.:55.00   1st Qu.:87.00   n :4122   x :1938  
##                       Median :61.00   Median :89.00   nk: 734   n :1460  
##                       Mean   :59.42   Mean   :88.75             nk: 918  
##                       3rd Qu.:66.00   3rd Qu.:91.00                      
##                       Max.   :69.00   Max.   :93.00                      
##  diabetes  highbp    hichol    angina    stroke    hosp    
##  y : 818   y :2877   y :1840   y :1919   y : 560   y:4442  
##  n :4664   n :2542   n :3294   n :3473   n :4881   n:1925  
##  nk: 885   nk: 948   nk:1233   nk: 975   nk: 926           
##                                                            
##                                                            
## 

Some of these column names are a bit unintutitive, lets rename them:

colnames(monica)
##  [1] "outcome"  "sex"      "age"      "yronset"  "premi"    "smstat"  
##  [7] "diabetes" "highbp"   "hichol"   "angina"   "stroke"   "hosp"
newnames <- c("outcome", "gender", "age", "year_of_event", "previous_event", "smoker", "diabetes", "high_blood_pressure", "high_cholesterol", "angina", "stroke", "hospitalized")

monica_clean <- monica

colnames(monica_clean) <- newnames

Grouping by Gender

Now, let’s take another look at our data, this type grouping by some of the variables we are trying to investigate. Let’s start small with a grouping by gender only:

monica_groups <- group_by(monica_clean, gender)

summarize(monica_groups, mean_age <- mean(age), median_age <- median(age), count = n())
## # A tibble: 2 x 4
##   gender `mean_age <- mean(age)` `median_age <- median(age)` count
##   <fct>                    <dbl>                       <dbl> <int>
## 1 m                         58.8                          61  4605
## 2 f                         61.0                          63  1762

Mean and median ages show that men have heart attacks at lower ages than women.

We can also see from the “count” column that men are vastly more likely to have a heart attack then women, to the tune of 4605/1763, or 2.6 times as likely.

Findings

#1 Men have heart attacks on average 2.2 years younger than women.

#2 Men are 2.6 times more likely to have a heart attack then a women.

Grouping by ‘Outcome’

Now let’s take a look at “Outcome”, i.e. whether the heart attack resulted in life or death.

monica_groups <- group_by(monica_clean, gender, outcome)

summarize(monica_groups, mean_age <- mean(age), median_age <- median(age), count = n())
## # A tibble: 4 x 5
## # Groups:   gender [?]
##   gender outcome `mean_age <- mean(age)` `median_age <- median(age)` count
##   <fct>  <fct>                     <dbl>                       <dbl> <int>
## 1 m      live                       57.5                          59  2550
## 2 m      dead                       60.5                          62  2055
## 3 f      live                       60.4                          62   975
## 4 f      dead                       61.7                          64   787

Are women more or less susceptible to death than men? We can peak at the ratios in the graphic above, or we can use the following r-code to calculate them:

male_dead_proportion <- nrow(subset(monica_clean, gender == 'm' & outcome == 'dead')) /  nrow(subset(monica_clean, gender == 'm'))

male_dead_proportion
## [1] 0.4462541
female_dead_proportion <- nrow(subset(monica_clean, gender == 'f' & outcome == 'dead')) /  nrow(subset(monica_clean, gender == 'f'))

female_dead_proportion
## [1] 0.4466515

The data is remarkable similar for death rates, with 44.62 percent for men and 44.46 percent for women.

Findings

#3 While women are less likely to have a heart attack, if they do in fact have a heart attack, they have essentially the same risk of dying as men.

Grouping by Smoking Status

How does smoking status effect the occurence and outcome of heart attacks?

First, let’s look at the overall proportions of the data set. What proportion of the entire dataset are either ex smokers or smokers?

nrow(subset(monica_clean, smoker == 'c' | smoker == 'x'))/nrow(monica_clean)
## [1] 0.6265117

So nearly 63% of all heart attack sufferers were smokers or ex-smokers.

monica_groups <- group_by(monica_clean, smoker, outcome)

summarize(monica_groups, mean_age <- mean(age), median_age <- median(age), count = n())
## # A tibble: 8 x 5
## # Groups:   smoker [?]
##   smoker outcome `mean_age <- mean(age)` `median_age <- median(age)` count
##   <fct>  <fct>                     <dbl>                       <dbl> <int>
## 1 c      live                       55.3                          57  1337
## 2 c      dead                       59.4                          61   714
## 3 x      live                       60.0                          62  1186
## 4 x      dead                       61.8                          64   752
## 5 n      live                       60.2                          62   929
## 6 n      dead                       62.1                          64   531
## 7 nk     live                       61.6                          63    73
## 8 nk     dead                       60.2                          63   845

Taking a closer look at counts across these groups we can see if smoking status leads to a greater chance of dying given a heart attack event:

Manually checking the proportions:

#Current Smokers
714/1337
## [1] 0.5340314
#Ex-Smokers
752/1186
## [1] 0.6340641
#Non-Smokers
531/929
## [1] 0.5715823

The data is certainly not intuitive. Current smokers are actually the least likely to die if they have a heart attack, compared with ex-smokers and non-smokers.

We can maybe follow the same logic as the gender analysis above, i.e. despite being more likely to have a heart attack, the chance of dying once having a heart attack isn’t largely effected by smoking status or gender. Of course, this is just a hypothesis.

Part 2 - Let's Get Visual

Let’s look at some of the original analyses above, and some additional analyses, but this time from a visual perspective.

Gender

This simple bar chart shows us the skew of gender in the data:

ggplot(monica_clean, aes(x = monica_clean$gender)) + geom_bar() + labs(x = "Gender", y = "# of Heart Attacks")

Smoking

Same type of chart can show skew for smoking type:

ggplot(monica_clean, aes(x = monica_clean$smoker)) + geom_bar() + labs(x = "Smoking Status (Current, Ex, Non, or Not Known)", y = "# of Heart Attacks")

Age

How are heart attacks distributed across ages. Let's use a scatterplot to take a look:

monica_groups <- group_by(monica_clean, age)
ggplot(summarise(monica_groups, n = n()), aes(x = age, y = n)) + geom_point() + labs(x = "Age", y = "Count of Heart Attacks", title = "Heart Attacks by Age - Scatterplot")

So we can see # of heart attacks increases as age increases. This is pretty intuititve. Let’s look at how this data compares across genders, using a histogram:

ggplot(monica_clean, aes(age, fill = gender)) + geom_histogram(alpha = 0.5, position = 'identity', binwidth = 1) + labs(x = "Age", y = "Count of Heart Attacks", title = "Heart Attacks by Age - Male and Female - Histogram by Count")

We saw the skew in magnitude in the earlier data (i.e. more men have heart attacks than women). But how would this look if we normalized the data? We shift the histogram to look at the relative density, this shows the proportions of each age compared with the overal gender data. We can use the data to see if the age ranges differ across genders:

ggplot(monica_clean, aes(age, fill = gender)) + geom_histogram(aes( y = ..density..), alpha = 0.5, position = 'identity', binwidth = 1) + labs(x = "Age", y = "Count of Heart Attacks", title = "Heart Attacks by Age - Male and Female - Histogram by Density")

As we can see women tend to have a higher proportion of heart attacks later in life, while men do so earlier in life.

While we are at it, let’s look at the same type of histogram, only now analyzing the different types of smoker:

ggplot(monica_clean, aes(age, fill = smoker)) + geom_histogram(aes( y = ..density..), alpha = 0.4, position = 'identity', binwidth = 1) + labs(x = "Age", y = "Count of Heart Attacks", title = "Heart Attacks by Age - Smoking Status - Density Histogram")

While 3/4 of the types of smoker tend to show the same type of pattern, “c”, i.e. current smokers, tend to reach their peak heart attack proportion around age 60, while others continue to rise to peak in their late 60’s.

Finally we can take a look at age variable through its quartiles. We saw this earlier with a text summary, but a box-plot can present it visually.

ggplot(monica_clean, aes(y = age, x = gender)) + geom_boxplot() + labs(title = "Age by Gender - Boxplot")

This confirms the slightly higher age of women that we saw in the text summaries above.

While we are at it, let’s look at boxplots across smoker status.

ggplot(monica_clean, aes(y = age, x = smoker)) + geom_boxplot() + labs(title = "Age by Smoker Status - Boxplot", x = "Smoker Status (Current, Ex, Non, or Not Known)")

Here we see a rather strong difference between age of various smoker status, with current smokers having a median age of below 60, with other groups above 60. We can add that to our findings.

Findings

#4 - Current smokers suffer from heart attacks at an earlier age than ex-smokers or non smokers.

Conclusion(s)

Our initial question for this analysis was “How Are Heart Attacks affected by Factors such as: Age, Gender, and Smoking?”. During the course of the analysis we made the following findings:

#1 Men have heart attacks on average 2.2 years younger than women.

#2 Men are 2.6 times more likely to have a heart attack than women.

#3 While women are less likely to have a heart attack, if they do in fact have a heart attack, they have essentially the same risk of dying as men.

#4 - Current smokers suffer from heart attacks at an earlier age than ex-smokers or non smokers.