1 What is stress echocardiography?

A stress echocardiography, also known as an echocardiography stress test or stress echo, is a procedure that determines how well your heart and blood vessels are working.

Your doctor may ask you to take a stress echo test if you have chest pain that they think is due to coronary artery disease or a myocardial infarction, which is a heart attack. This test also provides information on how much exercise you can safely tolerate if you’re in cardiac rehabilitation. The test also helps doctor to determine how well the treatments such as bypass grafting, angioplasty, and anti-anginal or antiarrhythmic medications are working.

During a stress echocardiography, you’ll exercise on a treadmill or stationary bike while your doctor monitors your blood pressure and heart rhythm. When your heart rate reaches peak levels, your doctor will take ultrasound images of your heart to determine whether your heart muscles are getting enough blood and oxygen while you exercise.

I have chosen stress echo data originally published by UCLA, Department of Physiology.

2 General Explanation of the Study

This data is from a study that was trying to determine if a drug called “dobutamine” could be used effectively in a test for measuring a patient’s risk of having a heart attack, or “cardiac event.” For younger patients, a typical test of this risk is called “Stress Echocardiography.” It involves raising the patient’s heart rate by exercise–often by having the patient run on a treadmill–and then taking various measurements, such as heart rate and blood pressure, as well as more complicated measurements of the heart. The problem with this test is that it often cannot be used on older patients whose bodies can’t take the stress of hard exercise. The key to assessing risk, however, is putting stress on the heart before taking the relevant measurements. While exercise can’t be used to create this stress for older patients, the drug dobutamine can. This study, then, was partly an attempt to see if the stress echocardiography test was still effective in predicting cardiac events when the stress on the heart was produced by dobutamine instead of exercise. More specifically, though, the study sought to pinpoint which measurements taken during the stress echocardiography test were most helpful in predicting whether or not a patient suffered a cardiac event over the next year.

3 Data Preparation

# load data
library(tidyverse)
library(psych)
library(knitr)
library(ggplot2)

stressEcho.data <- read.csv('https://raw.githubusercontent.com/niteen11/MSDS/master/DATA606/ProjectProposal/dataset/stressEcho.csv')

#Research Question 1
rq1.heartRate <- select(stressEcho.data,age,maxhr,newMI,newPTCA,newCABG,death)

#Research Question 2
rq2.stressEcho <- select(stressEcho.data,X,posSE,newMI,newPTCA,newCABG,death)
rq2.stressEcho <- rename(rq2.stressEcho,id=X)
#per data file: posse   stress echocardiogram was positive (1  = yes)
stress.positive <- filter(rq2.stressEcho,posSE==1)
stress.negative <- filter(rq2.stressEcho,posSE==0)

#Research Question 3
rq3.restWMA<- select(stressEcho.data,X,restwma,newMI,newPTCA,newCABG,death)
rq3.restWMA <- rename(rq3.restWMA,id=X)
#per data file: restwma cardiologist sees wall motion anamoly on echocardiogram (1 = yes)
restWMA.positive <- filter(rq3.restWMA,restwma==1)
restWMA.negative <- filter(rq3.restWMA,restwma==0)

#Research Question 4
rq4.age.ecg<- select(stressEcho.data,age,maxhr,basebp,ecg,newMI,newPTCA,newCABG,death)
# there are 3 categories of ecw diagnosis: equivocal, MI and Normal
ecg.equivocal <- filter(rq4.age.ecg,ecg=='equivocal')
ecg.MI <- filter(rq4.age.ecg,ecg=='MI')
ecg.normal <- filter(rq4.age.ecg,ecg=='normal')

4 Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

  • Does a positive ‘Stress Echocardiography’ cause higher ‘Cardiac Events’ such as MI, PTCA, CABG or cardiac death?
  • Does Resting wall motion abnormality on echocardiogram a better prdeictor/ estimator for future cardiac events?
  • Do Age, blood pressure, higher heart rate and ecg (Baseline electrocardiogram diagnosis) have any influence in cardiac events outcome?

5 Cases

What are the cases, and how many are there?

dim(stressEcho.data)
## [1] 558  32
summary(stressEcho.data$gender)
## female   male 
##    338    220

6 Data collection

The accompanying data file contains the complete data for the final study population, which included 220 men and 338 women. The data collected on each subject is explained below.

1. The actual data file is a comma delimited text file. The first row contains the abbreviations for the information recorded on each subject. The remaining rows each represent a single patient’s corresponding information.

2. For the purposes of the study, the “cardiac events” that the “Dobutamine Stress Echocardiography” was attempting to predict were broken down into four categories:

CARDIAC EVENTS:

  • myocardial infarction (MI)
  • revascularization by percutaneous transluminal coronary angioplasty (PTCA)
  • coronary artery bypass grafting surgery (CABG)
  • cardiac death

Note : that several of the original variables have been renamed and recoded for the S datasets. Event variables that originally were coded 0=yes 1=no have been recoded 0=no 1=yes. equivecg and posecg are combined into a new variable ecg. The original hxofcig variable had values 0 .5 1 assumed to represent heavy, moderate, and none. %mphr(b) has been renamed pctMphr.

7 Type of study

What type of study is this (observational/experiment)? This was an experimental study.

8 Data Source

Vanderbilt link : http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/stressEcho.html

UCLA link : http://www.stat.ucla.edu/projects/datasets/cardiac-explanation.html

NCBI (National Center for Biotechnology Information) link: https://www.ncbi.nlm.nih.gov/pubmed/10080472

Prognostic value of dobutamine stress echocardiography in predicting cardiac events in patients with known or suspected coronary artery disease.

Keywords for Dataset: Medical, Biology, Physiology
Data Provided By: Alan Garfinkel, PhD, UCLA, Department of Physiology

The complete citation for the journal in which the results of the study were published is as follows:
Garfinkel, Alan, et. al. “Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease.” Journal of the American College of Cardiology 33.3 (1999) 708-16.

9 Response

What is the response variable, and what type is it (numerical/categorical)?

Note: “0” means that the patient DID NOT suffer the corresponding cardiac event, and a “1” means that he DID.

  • Research Question#1: The response variable is:(MI, PTCA, CABG or death) and is categorical
  • Research Question#2: The response variable is:(MI, PTCA, CABG or death) and is categorical
  • Research Question#3: The response variable is:(MI, PTCA, CABG or death) and is categorical

10 Explanatory

What is the explanatory variable, and what type is it (numerical/categorical)? The Explanatory variables are:

  • Research Question#1: The stress test positive is a categorical value (1=yes).
  • Research Question#2: cardiologist sees wall motion anamoly on echocardiogram (1 = yes)
  • Research Question#3: Age, maxhr,base bp is numeric and ecg is categorical (equivocal,MI, Normal)

11 Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

12 Research Question#1

Does a positive ‘Stress Echocardiography’ cause higher ‘Cardiac Events’ such as MI, PTCA, CABG or cardiac death?

summary(stressEcho.data)
##        X              bhr             basebp          basedp     
##  Min.   :  1.0   Min.   : 42.00   Min.   : 85.0   Min.   : 5000  
##  1st Qu.:140.2   1st Qu.: 64.00   1st Qu.:120.0   1st Qu.: 8400  
##  Median :279.5   Median : 74.00   Median :133.0   Median : 9792  
##  Mean   :279.5   Mean   : 75.29   Mean   :135.3   Mean   :10181  
##  3rd Qu.:418.8   3rd Qu.: 84.00   3rd Qu.:150.0   3rd Qu.:11663  
##  Max.   :558.0   Max.   :210.00   Max.   :203.0   Max.   :27300  
##       pkhr            sbp              dp             dose      
##  Min.   : 52.0   Min.   : 40.0   Min.   : 5100   Min.   :10.00  
##  1st Qu.:106.2   1st Qu.:120.0   1st Qu.:14033   1st Qu.:30.00  
##  Median :122.0   Median :141.0   Median :17060   Median :40.00  
##  Mean   :120.6   Mean   :146.9   Mean   :17634   Mean   :33.75  
##  3rd Qu.:135.0   3rd Qu.:170.0   3rd Qu.:20645   3rd Qu.:40.00  
##  Max.   :210.0   Max.   :309.0   Max.   :45114   Max.   :40.00  
##      maxhr          pctMphr            mbp           dpmaxdo     
##  Min.   : 58.0   Min.   : 38.00   Min.   : 84.0   Min.   : 7130  
##  1st Qu.:104.2   1st Qu.: 69.00   1st Qu.:133.2   1st Qu.:15260  
##  Median :120.0   Median : 78.00   Median :150.0   Median :18118  
##  Mean   :119.4   Mean   : 78.57   Mean   :156.0   Mean   :18550  
##  3rd Qu.:133.0   3rd Qu.: 88.00   3rd Qu.:175.8   3rd Qu.:21239  
##  Max.   :200.0   Max.   :133.00   Max.   :309.0   Max.   :45114  
##     dobdose           age           gender        baseEF    
##  Min.   : 5.00   Min.   :26.00   female:338   Min.   :20.0  
##  1st Qu.:20.00   1st Qu.:60.00   male  :220   1st Qu.:52.0  
##  Median :30.00   Median :69.00                Median :57.0  
##  Mean   :30.24   Mean   :67.34                Mean   :55.6  
##  3rd Qu.:40.00   3rd Qu.:75.00                3rd Qu.:62.0  
##  Max.   :40.00   Max.   :93.00                Max.   :83.0  
##      dobEF         chestpain         restwma           posSE       
##  Min.   :23.00   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:62.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :67.00   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :65.24   Mean   :0.3082   Mean   :0.4606   Mean   :0.2437  
##  3rd Qu.:73.00   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :94.00   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      newMI            newPTCA           newCABG            death        
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.05018   Mean   :0.04839   Mean   :0.05914   Mean   :0.04301  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##      hxofHT           hxofDM             hxofCig        hxofMI     
##  Min.   :0.0000   Min.   :0.0000   heavy     :122   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   moderate  :138   1st Qu.:0.000  
##  Median :1.0000   Median :0.0000   non-smoker:298   Median :0.000  
##  Mean   :0.7043   Mean   :0.3692                    Mean   :0.276  
##  3rd Qu.:1.0000   3rd Qu.:1.0000                    3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000                    Max.   :1.000  
##     hxofPTCA          hxofCABG        any.event             ecg     
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   equivocal:176  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   MI       : 71  
##  Median :0.00000   Median :0.0000   Median :0.0000   normal   :311  
##  Mean   :0.07348   Mean   :0.1577   Mean   :0.1595                  
##  3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.0000                  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000
kable(describe(rq2.stressEcho))
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 558 279.5000000 161.2249981 279.5 279.5000000 206.8227 1 558 557 0.000000 -1.2064535 6.8251984
posSE 2 558 0.2437276 0.4297155 0.0 0.1808036 0.0000 0 1 1 1.190616 -0.5834687 0.0181913
newMI 3 558 0.0501792 0.2185105 0.0 0.0000000 0.0000 0 1 1 4.109777 14.9170099 0.0092503
newPTCA 4 558 0.0483871 0.2147754 0.0 0.0000000 0.0000 0 1 1 4.197908 15.6504864 0.0090922
newCABG 5 558 0.0591398 0.2360978 0.0 0.0000000 0.0000 0 1 1 3.727863 11.9183332 0.0099948
death 6 558 0.0430108 0.2030634 0.0 0.0000000 0.0000 0 1 1 4.492886 18.2186862 0.0085964
kable(describe(stress.positive))
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 136 314.2205882 137.6966441 294.5 318.2000000 128.2449 57 555 498 -0.0665529 -0.8505038 11.8073898
posSE 2 136 1.0000000 0.0000000 1.0 1.0000000 0.0000 1 1 0 NaN NaN 0.0000000
newMI 3 136 0.1029412 0.3050054 0.0 0.0090909 0.0000 0 1 1 2.5844737 4.7143302 0.0261540
newPTCA 4 136 0.0955882 0.2951127 0.0 0.0000000 0.0000 0 1 1 2.7205753 5.4417040 0.0253057
newCABG 5 136 0.1544118 0.3626788 0.0 0.0727273 0.0000 0 1 1 1.8917406 1.5905393 0.0310995
death 6 136 0.0808824 0.2736623 0.0 0.0000000 0.0000 0 1 1 3.0405055 7.2985009 0.0234664
kable(describe(stress.negative))
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 422 268.3104265 166.7200872 277.5 266.5296 222.39 1 558 557 0.0740092 -1.302758 8.1158069
posSE 2 422 0.0000000 0.0000000 0.0 0.0000 0.00 0 0 0 NaN NaN 0.0000000
newMI 3 422 0.0331754 0.1793068 0.0 0.0000 0.00 0 1 1 5.1946536 25.043788 0.0087285
newPTCA 4 422 0.0331754 0.1793068 0.0 0.0000 0.00 0 1 1 5.1946536 25.043788 0.0087285
newCABG 5 422 0.0284360 0.1664122 0.0 0.0000 0.00 0 1 1 5.6539894 30.038795 0.0081008
death 6 422 0.0308057 0.1729960 0.0 0.0000 0.00 0 1 1 5.4114833 27.348976 0.0084213
rq2.stressEcho.tidy <- gather(rq2.stressEcho,CardiacEvent,EventStatus,newMI:death) %>%
                      arrange(id)

rq2.stressEcho.tidy.df <- rq2.stressEcho.tidy %>%
    group_by(posSE,CardiacEvent, EventStatus) %>%
    summarise(count=n()) 
kable(rq2.stressEcho.tidy.df)
posSE CardiacEvent EventStatus count
0 death 0 409
0 death 1 13
0 newCABG 0 410
0 newCABG 1 12
0 newMI 0 408
0 newMI 1 14
0 newPTCA 0 408
0 newPTCA 1 14
1 death 0 125
1 death 1 11
1 newCABG 0 115
1 newCABG 1 21
1 newMI 0 122
1 newMI 1 14
1 newPTCA 0 123
1 newPTCA 1 13
ggplot(data=filter(rq2.stressEcho.tidy,EventStatus==1), aes(posSE))+
      geom_bar(aes(fill=CardiacEvent))+
      geom_text(stat='count', aes(label=..count..), vjust=-0.2)+
      scale_x_continuous(breaks = c(0, 1))+
      xlab('Positive Stress Echocardiography (0 = No, 1 = yes) ')+
      facet_wrap(~CardiacEvent)+
      ggtitle('Stress Echocardiography Test : Suffer Cardiac Event')+
      theme_bw()

ggplot(data=filter(rq2.stressEcho.tidy,EventStatus==0), aes(posSE))+
      geom_bar(aes(fill=CardiacEvent))+
      scale_x_continuous(breaks = c(0, 1))+
      xlab('Positive Stress Echocardiography (0 = No, 1 = yes) ')+
      geom_text(stat='count', aes(label=..count..), vjust=-0.2)+
      facet_wrap(~CardiacEvent)+
      ggtitle('Stress Echocardiography Test : Did Not Suffer Cardiac Event')+
      theme_bw()

stress.positive %>%
  group_by(death) %>%
  summarise(count=n())
## # A tibble: 2 x 2
##   death count
##   <int> <int>
## 1     0   125
## 2     1    11
stress.negative %>%
  group_by(death) %>%
  summarise(count=n())
## # A tibble: 2 x 2
##   death count
##   <int> <int>
## 1     0   409
## 2     1    13
barplot(table(rq2.stressEcho$posSE), col=c("#ADD8E6","#ff7373"), main = 'Stress Echo Test (0=No, 1=Yes) ')

mosaicplot(data = rq2.stressEcho, ~posSE+death, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs Death" )

mosaicplot(data = rq2.stressEcho, ~posSE+newMI, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs Myocardial Infarctions" )

mosaicplot(data = rq2.stressEcho, ~posSE+newPTCA, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs PTCA" )

mosaicplot(data = rq2.stressEcho, ~posSE+newCABG, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs CABG" )

In the given population of 558, there are 422 patients with negative stress test a 136 patients with a positive stress test. Let’s create a contingency table, with Chi Square test values, in order to perform conditional probability and determine what is the likliehood of dying with positive stress test.

library(gmodels)
## Warning: package 'gmodels' was built under R version 3.4.4
CrossTable(rq2.stressEcho$posSE, rq2.stressEcho$death, chisq = TRUE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  558 
## 
##  
##                      | rq2.stressEcho$death 
## rq2.stressEcho$posSE |         0 |         1 | Row Total | 
## ---------------------|-----------|-----------|-----------|
##                    0 |       409 |        13 |       422 | 
##                      |     0.066 |     1.462 |           | 
##                      |     0.969 |     0.031 |     0.756 | 
##                      |     0.766 |     0.542 |           | 
##                      |     0.733 |     0.023 |           | 
## ---------------------|-----------|-----------|-----------|
##                    1 |       125 |        11 |       136 | 
##                      |     0.204 |     4.535 |           | 
##                      |     0.919 |     0.081 |     0.244 | 
##                      |     0.234 |     0.458 |           | 
##                      |     0.224 |     0.020 |           | 
## ---------------------|-----------|-----------|-----------|
##         Column Total |       534 |        24 |       558 | 
##                      |     0.957 |     0.043 |           | 
## ---------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  6.266194     d.f. =  1     p =  0.01230632 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  5.108637     d.f. =  1     p =  0.02380701 
## 
## 

H0 - Null Hypothesis: There is NO difference in adverse outcomes between patients with a positive stress test and patients with a negative stress test.

HA - Alternate Hypothesis: There IS a statistical difference in adverse outcomes between patients with a positive stress test and a negative stress test.

lm.eacho.death <-lm(rq2.stressEcho$death~rq2.stressEcho$posSE)
summary(lm.eacho.death)
## 
## Call:
## lm(formula = rq2.stressEcho$death ~ rq2.stressEcho$posSE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08088 -0.03081 -0.03081 -0.03081  0.96919 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          0.030806   0.009838   3.131  0.00183 **
## rq2.stressEcho$posSE 0.050077   0.019928   2.513  0.01226 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2021 on 556 degrees of freedom
## Multiple R-squared:  0.01123,    Adjusted R-squared:  0.009451 
## F-statistic: 6.315 on 1 and 556 DF,  p-value: 0.01226
lm1 <-lm(stress.negative$death~stress.negative$posSE)
summary(lm1)
## 
## Call:
## lm(formula = stress.negative$death ~ stress.negative$posSE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.03081 -0.03081 -0.03081 -0.03081  0.96919 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.030806   0.008421   3.658 0.000286 ***
## stress.negative$posSE       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.173 on 421 degrees of freedom
confint(lm1)
##                            2.5 %     97.5 %
## (Intercept)           0.01425263 0.04735875
## stress.negative$posSE         NA         NA
lm2 <-lm(stress.positive$death~stress.positive$posSE)
summary(lm2)
## 
## Call:
## lm(formula = stress.positive$death ~ stress.positive$posSE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08088 -0.08088 -0.08088 -0.08088  0.91912 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.08088    0.02347   3.447 0.000757 ***
## stress.positive$posSE       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2737 on 135 degrees of freedom
confint(lm2)
##                            2.5 %    97.5 %
## (Intercept)           0.03447313 0.1272916
## stress.positive$posSE         NA        NA
# Calculate the Z score
Z  <- (0.08088 - 0.030806)/0.008421
paste("Z score: ", round(Z, 4))
## [1] "Z score:  5.9463"

The above value is clearly statistically significant.

# Calculate the two sided p score
paste("Two sided P value: ", 2* pnorm(5.9463, lower.tail = FALSE))
## [1] "Two sided P value:  2.74271280474635e-09"
lm.stress.echo <-lm(rq2.stressEcho$death~rq2.stressEcho$posSE)
summary(lm.stress.echo)
## 
## Call:
## lm(formula = rq2.stressEcho$death ~ rq2.stressEcho$posSE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08088 -0.03081 -0.03081 -0.03081  0.96919 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          0.030806   0.009838   3.131  0.00183 **
## rq2.stressEcho$posSE 0.050077   0.019928   2.513  0.01226 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2021 on 556 degrees of freedom
## Multiple R-squared:  0.01123,    Adjusted R-squared:  0.009451 
## F-statistic: 6.315 on 1 and 556 DF,  p-value: 0.01226
plot(lm.stress.echo)

There is very clear evidence that there is a considerable statiscal differnce between patients with a positive stress test and a negative stress test.

13 Research Question#2

Does Resting wall motion abnormality on echocardiogram a better prdeictor/ estimator for future cardiac events?

kable(describe(rq3.restWMA))
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 558 279.5000000 161.2249981 279.5 279.5000000 206.8227 1 558 557 0.0000000 -1.206454 6.8251984
restwma 2 558 0.4605735 0.4988904 0.0 0.4508929 0.0000 0 1 1 0.1577736 -1.978644 0.0211197
newMI 3 558 0.0501792 0.2185105 0.0 0.0000000 0.0000 0 1 1 4.1097770 14.917010 0.0092503
newPTCA 4 558 0.0483871 0.2147754 0.0 0.0000000 0.0000 0 1 1 4.1979077 15.650486 0.0090922
newCABG 5 558 0.0591398 0.2360978 0.0 0.0000000 0.0000 0 1 1 3.7278633 11.918333 0.0099948
death 6 558 0.0430108 0.2030634 0.0 0.0000000 0.0000 0 1 1 4.4928862 18.218686 0.0085964
rq3.restWMA.tidy <- gather(rq3.restWMA,CardiacEvent,EventStatus,newMI:death) %>%
                    arrange(id)

rq3.restWMA.tidy.df <- rq3.restWMA.tidy %>%
    group_by(restwma,CardiacEvent, EventStatus) %>%
    summarise(count=n())
kable(rq3.restWMA.tidy.df)
restwma CardiacEvent EventStatus count
0 death 0 283
0 death 1 18
0 newCABG 0 271
0 newCABG 1 30
0 newMI 0 277
0 newMI 1 24
0 newPTCA 0 279
0 newPTCA 1 22
1 death 0 251
1 death 1 6
1 newCABG 0 254
1 newCABG 1 3
1 newMI 0 253
1 newMI 1 4
1 newPTCA 0 252
1 newPTCA 1 5
ggplot(data=filter(rq3.restWMA.tidy), aes(restwma))+
      geom_bar(aes(fill=CardiacEvent))+
      scale_x_continuous(breaks = c(0, 1))+
      xlab('Wall motion anamoly on echocardiogram (0 = No, 1 = yes) ')+
      geom_text(stat='count', aes(label=..count..), vjust=0.2)+
      facet_wrap(~CardiacEvent~EventStatus,ncol = 2)

CrossTable(rq3.restWMA$restwma,rq3.restWMA$death,chisq = TRUE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  558 
## 
##  
##                     | rq3.restWMA$death 
## rq3.restWMA$restwma |         0 |         1 | Row Total | 
## --------------------|-----------|-----------|-----------|
##                   0 |       283 |        18 |       301 | 
##                     |     0.089 |     1.973 |           | 
##                     |     0.940 |     0.060 |     0.539 | 
##                     |     0.530 |     0.750 |           | 
##                     |     0.507 |     0.032 |           | 
## --------------------|-----------|-----------|-----------|
##                   1 |       251 |         6 |       257 | 
##                     |     0.104 |     2.311 |           | 
##                     |     0.977 |     0.023 |     0.461 | 
##                     |     0.470 |     0.250 |           | 
##                     |     0.450 |     0.011 |           | 
## --------------------|-----------|-----------|-----------|
##        Column Total |       534 |        24 |       558 | 
##                     |     0.957 |     0.043 |           | 
## --------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  4.475899     d.f. =  1     p =  0.03437611 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  3.634054     d.f. =  1     p =  0.05660876 
## 
## 
lm.restWMA <-lm(rq3.restWMA$death~rq3.restWMA$restwma)
summary(lm.restWMA)
## 
## Call:
## lm(formula = rq3.restWMA$death ~ rq3.restWMA$restwma)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05980 -0.05980 -0.05980 -0.02335  0.97665 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.05980    0.01167   5.125 4.11e-07 ***
## rq3.restWMA$restwma -0.03645    0.01719  -2.120   0.0344 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2024 on 556 degrees of freedom
## Multiple R-squared:  0.008021,   Adjusted R-squared:  0.006237 
## F-statistic: 4.496 on 1 and 556 DF,  p-value: 0.03442
plot(lm.restWMA)

The rest wall motion clearly has very high statistcal significance and aligns with the original research thesis conlcusion also.

14 Research Question# 3

Do Age, higher heart rate and ecg (Baseline electrocardiogram diagnosis) have any influence in cardiac events outcome?

describe(rq4.age.ecg)
##         vars   n   mean    sd median trimmed   mad min max range  skew
## age        1 558  67.34 12.05     69   67.99 10.38  26  93    67 -0.58
## maxhr      2 558 119.37 21.91    120  119.58 20.76  58 200   142 -0.03
## basebp     3 558 135.32 20.77    133  134.43 19.27  85 203   118  0.41
## ecg*       4 558   2.24  0.90      3    2.30  0.00   1   3     2 -0.49
## newMI      5 558   0.05  0.22      0    0.00  0.00   0   1     1  4.11
## newPTCA    6 558   0.05  0.21      0    0.00  0.00   0   1     1  4.20
## newCABG    7 558   0.06  0.24      0    0.00  0.00   0   1     1  3.73
## death      8 558   0.04  0.20      0    0.00  0.00   0   1     1  4.49
##         kurtosis   se
## age         0.41 0.51
## maxhr       0.12 0.93
## basebp      0.03 0.88
## ecg*       -1.59 0.04
## newMI      14.92 0.01
## newPTCA    15.65 0.01
## newCABG    11.92 0.01
## death      18.22 0.01
summary(rq4.age.ecg$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.00   60.00   69.00   67.34   75.00   93.00
summary(rq4.age.ecg$ecg)
## equivocal        MI    normal 
##       176        71       311
ggplot(rq4.age.ecg,aes(rq4.age.ecg$age))+
  geom_bar(aes(fill=ecg))+
  ggtitle('Age vs Electrocardiogram Diagnosis')+
  xlab('Age')+
  theme_bw()

summary(rq4.age.ecg$maxhr)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    58.0   104.2   120.0   119.4   133.0   200.0
qqnorm(rq4.age.ecg$maxhr)
qqline(rq1.heartRate$maxhr)

summary(rq4.age.ecg$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.00   60.00   69.00   67.34   75.00   93.00
qqnorm(rq4.age.ecg$age)
qqline(rq1.heartRate$age)

Let’s plot a distribution curve for age

hist(rq4.age.ecg$age, probability = TRUE, main = "Histogram of Age", xlab = "Age")
x <- 20:100
y <- dnorm(x = x, mean = mean(rq4.age.ecg$age), sd = sd(rq4.age.ecg$age))
lines(x = x, y = y, col = "blue")

let’s create a contigency table and view statistical factors

CrossTable(rq4.age.ecg$ecg,rq4.age.ecg$death,prop.r = TRUE,prop.c = TRUE,prop.t = TRUE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  558 
## 
##  
##                 | rq4.age.ecg$death 
## rq4.age.ecg$ecg |         0 |         1 | Row Total | 
## ----------------|-----------|-----------|-----------|
##       equivocal |       169 |         7 |       176 | 
##                 |     0.002 |     0.043 |           | 
##                 |     0.960 |     0.040 |     0.315 | 
##                 |     0.316 |     0.292 |           | 
##                 |     0.303 |     0.013 |           | 
## ----------------|-----------|-----------|-----------|
##              MI |        69 |         2 |        71 | 
##                 |     0.016 |     0.364 |           | 
##                 |     0.972 |     0.028 |     0.127 | 
##                 |     0.129 |     0.083 |           | 
##                 |     0.124 |     0.004 |           | 
## ----------------|-----------|-----------|-----------|
##          normal |       296 |        15 |       311 | 
##                 |     0.009 |     0.197 |           | 
##                 |     0.952 |     0.048 |     0.557 | 
##                 |     0.554 |     0.625 |           | 
##                 |     0.530 |     0.027 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |       534 |        24 |       558 | 
##                 |     0.957 |     0.043 |           | 
## ----------------|-----------|-----------|-----------|
## 
## 
lm.ecg <- lm(rq4.age.ecg$death ~ rq4.age.ecg$ecg)
summary(lm.ecg)
## 
## Call:
## lm(formula = rq4.age.ecg$death ~ rq4.age.ecg$ecg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04823 -0.04823 -0.04823 -0.03977  0.97183 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)            0.039773   0.015325   2.595   0.0097 **
## rq4.age.ecg$ecgMI     -0.011604   0.028584  -0.406   0.6849   
## rq4.age.ecg$ecgnormal  0.008459   0.019178   0.441   0.6593   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2033 on 555 degrees of freedom
## Multiple R-squared:  0.00113,    Adjusted R-squared:  -0.002469 
## F-statistic: 0.314 on 2 and 555 DF,  p-value: 0.7306
confint(lm.ecg)
##                              2.5 %     97.5 %
## (Intercept)            0.009669924 0.06987553
## rq4.age.ecg$ecgMI     -0.067750673 0.04454325
## rq4.age.ecg$ecgnormal -0.029210835 0.04612840
par(mfrow=c(1,3))
boxplot(rq4.age.ecg$basebp, main = "Base Blood Pressure",col = "#ADD8E6")
boxplot(rq4.age.ecg$maxhr, main = "Maximum HR", col="#ff7373")
boxplot(rq4.age.ecg$age, main = "Age", col="#98FB98")

# Age vs Max HR
plot(rq4.age.ecg$maxhr ~ rq4.age.ecg$age, main = "Age vs. Maximum HR", xlab = "Age", ylab = "HR")

Building linear models for age, bp and max hr for finding probability of death cardiac event

lm.age.ecg.hr.bp <- lm(death ~ age+basebp+maxhr, data = rq4.age.ecg)
summary(lm.age.ecg.hr.bp)
## 
## Call:
## lm(formula = death ~ age + basebp + maxhr, data = rq4.age.ecg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.10663 -0.05401 -0.04178 -0.02945  0.98895 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1403649  0.0912742  -1.538    0.125
## age          0.0010233  0.0007225   1.416    0.157
## basebp       0.0005981  0.0004197   1.425    0.155
## maxhr        0.0002809  0.0003972   0.707    0.480
## 
## Residual standard error: 0.2028 on 554 degrees of freedom
## Multiple R-squared:  0.008422,   Adjusted R-squared:  0.003052 
## F-statistic: 1.568 on 3 and 554 DF,  p-value: 0.196
plot(lm.age.ecg.hr.bp)

# Calculating linear models for age and base bp
lm.bbp.age <- lm(basebp ~ age, data = rq4.age.ecg)
summary(lm.bbp.age)
## 
## Call:
## lm(formula = basebp ~ age, data = rq4.age.ecg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.248 -15.032  -2.138  13.090  68.637 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 120.4151     4.9597  24.279  < 2e-16 ***
## age           0.2214     0.0725   3.054  0.00237 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.62 on 556 degrees of freedom
## Multiple R-squared:  0.0165, Adjusted R-squared:  0.01473 
## F-statistic: 9.325 on 1 and 556 DF,  p-value: 0.002368
plot(lm.bbp.age)

Now check linear model for Age and Max Hr

# Now to maximum heart rate
lm.mhr.age <- lm(maxhr ~ age, data = rq4.age.ecg)
summary(lm.mhr.age)
## 
## Call:
## lm(formula = maxhr ~ age, data = rq4.age.ecg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.233 -14.085   0.521  13.317  80.767 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 133.3542     5.2404  25.448  < 2e-16 ***
## age          -0.2077     0.0766  -2.711  0.00692 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.78 on 556 degrees of freedom
## Multiple R-squared:  0.01305,    Adjusted R-squared:  0.01127 
## F-statistic:  7.35 on 1 and 556 DF,  p-value: 0.006915
plot(lm.mhr.age)

15 Conclusion

  • Based upon the statistcal analysis of the data there appears to be higher cardiac events for patients with positive stress test. For older patients the results are tend to have worse outcomes. The Chi Square test contingency tables provides staitiscally significant factors for that.
  • The rest wall motion clearly has very high statistcal significance for all 4 cardiac events and aligns with the original research thesis conlcusion also.
  • The linear regression models provide some great insights and there is significant staistical difference between age - base blood pressure and age-max heart rate. Equivocal ecg appears to have more presence between age 60 and 90. Myocardial infarction (heart attack) also tend to be on higher side after age 60.