A stress echocardiography, also known as an echocardiography stress test or stress echo, is a procedure that determines how well your heart and blood vessels are working.
Your doctor may ask you to take a stress echo test if you have chest pain that they think is due to coronary artery disease or a myocardial infarction, which is a heart attack. This test also provides information on how much exercise you can safely tolerate if you’re in cardiac rehabilitation. The test also helps doctor to determine how well the treatments such as bypass grafting, angioplasty, and anti-anginal or antiarrhythmic medications are working.
During a stress echocardiography, you’ll exercise on a treadmill or stationary bike while your doctor monitors your blood pressure and heart rhythm. When your heart rate reaches peak levels, your doctor will take ultrasound images of your heart to determine whether your heart muscles are getting enough blood and oxygen while you exercise.
I have chosen stress echo data originally published by UCLA, Department of Physiology.
This data is from a study that was trying to determine if a drug called “dobutamine” could be used effectively in a test for measuring a patient’s risk of having a heart attack, or “cardiac event.” For younger patients, a typical test of this risk is called “Stress Echocardiography.” It involves raising the patient’s heart rate by exercise–often by having the patient run on a treadmill–and then taking various measurements, such as heart rate and blood pressure, as well as more complicated measurements of the heart. The problem with this test is that it often cannot be used on older patients whose bodies can’t take the stress of hard exercise. The key to assessing risk, however, is putting stress on the heart before taking the relevant measurements. While exercise can’t be used to create this stress for older patients, the drug dobutamine can. This study, then, was partly an attempt to see if the stress echocardiography test was still effective in predicting cardiac events when the stress on the heart was produced by dobutamine instead of exercise. More specifically, though, the study sought to pinpoint which measurements taken during the stress echocardiography test were most helpful in predicting whether or not a patient suffered a cardiac event over the next year.
# load data
library(tidyverse)
library(psych)
library(knitr)
library(ggplot2)
stressEcho.data <- read.csv('https://raw.githubusercontent.com/niteen11/MSDS/master/DATA606/ProjectProposal/dataset/stressEcho.csv')
#Research Question 1
rq1.heartRate <- select(stressEcho.data,age,maxhr,newMI,newPTCA,newCABG,death)
#Research Question 2
rq2.stressEcho <- select(stressEcho.data,X,posSE,newMI,newPTCA,newCABG,death)
rq2.stressEcho <- rename(rq2.stressEcho,id=X)
#per data file: posse stress echocardiogram was positive (1 = yes)
stress.positive <- filter(rq2.stressEcho,posSE==1)
stress.negative <- filter(rq2.stressEcho,posSE==0)
#Research Question 3
rq3.restWMA<- select(stressEcho.data,X,restwma,newMI,newPTCA,newCABG,death)
rq3.restWMA <- rename(rq3.restWMA,id=X)
#per data file: restwma cardiologist sees wall motion anamoly on echocardiogram (1 = yes)
restWMA.positive <- filter(rq3.restWMA,restwma==1)
restWMA.negative <- filter(rq3.restWMA,restwma==0)
#Research Question 4
rq4.age.ecg<- select(stressEcho.data,age,maxhr,basebp,ecg,newMI,newPTCA,newCABG,death)
# there are 3 categories of ecw diagnosis: equivocal, MI and Normal
ecg.equivocal <- filter(rq4.age.ecg,ecg=='equivocal')
ecg.MI <- filter(rq4.age.ecg,ecg=='MI')
ecg.normal <- filter(rq4.age.ecg,ecg=='normal')
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
What are the cases, and how many are there?
dim(stressEcho.data)
## [1] 558 32
summary(stressEcho.data$gender)
## female male
## 338 220
The accompanying data file contains the complete data for the final study population, which included 220 men and 338 women. The data collected on each subject is explained below.
1. The actual data file is a comma delimited text file. The first row contains the abbreviations for the information recorded on each subject. The remaining rows each represent a single patient’s corresponding information.
2. For the purposes of the study, the “cardiac events” that the “Dobutamine Stress Echocardiography” was attempting to predict were broken down into four categories:
CARDIAC EVENTS:
Note : that several of the original variables have been renamed and recoded for the S datasets. Event variables that originally were coded 0=yes 1=no have been recoded 0=no 1=yes. equivecg and posecg are combined into a new variable ecg. The original hxofcig variable had values 0 .5 1 assumed to represent heavy, moderate, and none. %mphr(b) has been renamed pctMphr.
What type of study is this (observational/experiment)? This was an experimental study.
Vanderbilt link : http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/stressEcho.html
UCLA link : http://www.stat.ucla.edu/projects/datasets/cardiac-explanation.html
NCBI (National Center for Biotechnology Information) link: https://www.ncbi.nlm.nih.gov/pubmed/10080472
Prognostic value of dobutamine stress echocardiography in predicting cardiac events in patients with known or suspected coronary artery disease.
Keywords for Dataset: Medical, Biology, Physiology
Data Provided By: Alan Garfinkel, PhD, UCLA, Department of Physiology
The complete citation for the journal in which the results of the study were published is as follows:
Garfinkel, Alan, et. al. “Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease.” Journal of the American College of Cardiology 33.3 (1999) 708-16.
What is the response variable, and what type is it (numerical/categorical)?
Note: “0” means that the patient DID NOT suffer the corresponding cardiac event, and a “1” means that he DID.
What is the explanatory variable, and what type is it (numerical/categorical)? The Explanatory variables are:
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
Does a positive ‘Stress Echocardiography’ cause higher ‘Cardiac Events’ such as MI, PTCA, CABG or cardiac death?
summary(stressEcho.data)
## X bhr basebp basedp
## Min. : 1.0 Min. : 42.00 Min. : 85.0 Min. : 5000
## 1st Qu.:140.2 1st Qu.: 64.00 1st Qu.:120.0 1st Qu.: 8400
## Median :279.5 Median : 74.00 Median :133.0 Median : 9792
## Mean :279.5 Mean : 75.29 Mean :135.3 Mean :10181
## 3rd Qu.:418.8 3rd Qu.: 84.00 3rd Qu.:150.0 3rd Qu.:11663
## Max. :558.0 Max. :210.00 Max. :203.0 Max. :27300
## pkhr sbp dp dose
## Min. : 52.0 Min. : 40.0 Min. : 5100 Min. :10.00
## 1st Qu.:106.2 1st Qu.:120.0 1st Qu.:14033 1st Qu.:30.00
## Median :122.0 Median :141.0 Median :17060 Median :40.00
## Mean :120.6 Mean :146.9 Mean :17634 Mean :33.75
## 3rd Qu.:135.0 3rd Qu.:170.0 3rd Qu.:20645 3rd Qu.:40.00
## Max. :210.0 Max. :309.0 Max. :45114 Max. :40.00
## maxhr pctMphr mbp dpmaxdo
## Min. : 58.0 Min. : 38.00 Min. : 84.0 Min. : 7130
## 1st Qu.:104.2 1st Qu.: 69.00 1st Qu.:133.2 1st Qu.:15260
## Median :120.0 Median : 78.00 Median :150.0 Median :18118
## Mean :119.4 Mean : 78.57 Mean :156.0 Mean :18550
## 3rd Qu.:133.0 3rd Qu.: 88.00 3rd Qu.:175.8 3rd Qu.:21239
## Max. :200.0 Max. :133.00 Max. :309.0 Max. :45114
## dobdose age gender baseEF
## Min. : 5.00 Min. :26.00 female:338 Min. :20.0
## 1st Qu.:20.00 1st Qu.:60.00 male :220 1st Qu.:52.0
## Median :30.00 Median :69.00 Median :57.0
## Mean :30.24 Mean :67.34 Mean :55.6
## 3rd Qu.:40.00 3rd Qu.:75.00 3rd Qu.:62.0
## Max. :40.00 Max. :93.00 Max. :83.0
## dobEF chestpain restwma posSE
## Min. :23.00 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:62.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :67.00 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :65.24 Mean :0.3082 Mean :0.4606 Mean :0.2437
## 3rd Qu.:73.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :94.00 Max. :1.0000 Max. :1.0000 Max. :1.0000
## newMI newPTCA newCABG death
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05018 Mean :0.04839 Mean :0.05914 Mean :0.04301
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## hxofHT hxofDM hxofCig hxofMI
## Min. :0.0000 Min. :0.0000 heavy :122 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 moderate :138 1st Qu.:0.000
## Median :1.0000 Median :0.0000 non-smoker:298 Median :0.000
## Mean :0.7043 Mean :0.3692 Mean :0.276
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.000
## hxofPTCA hxofCABG any.event ecg
## Min. :0.00000 Min. :0.0000 Min. :0.0000 equivocal:176
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 MI : 71
## Median :0.00000 Median :0.0000 Median :0.0000 normal :311
## Mean :0.07348 Mean :0.1577 Mean :0.1595
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000
kable(describe(rq2.stressEcho))
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1 | 558 | 279.5000000 | 161.2249981 | 279.5 | 279.5000000 | 206.8227 | 1 | 558 | 557 | 0.000000 | -1.2064535 | 6.8251984 |
posSE | 2 | 558 | 0.2437276 | 0.4297155 | 0.0 | 0.1808036 | 0.0000 | 0 | 1 | 1 | 1.190616 | -0.5834687 | 0.0181913 |
newMI | 3 | 558 | 0.0501792 | 0.2185105 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.109777 | 14.9170099 | 0.0092503 |
newPTCA | 4 | 558 | 0.0483871 | 0.2147754 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.197908 | 15.6504864 | 0.0090922 |
newCABG | 5 | 558 | 0.0591398 | 0.2360978 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 3.727863 | 11.9183332 | 0.0099948 |
death | 6 | 558 | 0.0430108 | 0.2030634 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.492886 | 18.2186862 | 0.0085964 |
kable(describe(stress.positive))
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1 | 136 | 314.2205882 | 137.6966441 | 294.5 | 318.2000000 | 128.2449 | 57 | 555 | 498 | -0.0665529 | -0.8505038 | 11.8073898 |
posSE | 2 | 136 | 1.0000000 | 0.0000000 | 1.0 | 1.0000000 | 0.0000 | 1 | 1 | 0 | NaN | NaN | 0.0000000 |
newMI | 3 | 136 | 0.1029412 | 0.3050054 | 0.0 | 0.0090909 | 0.0000 | 0 | 1 | 1 | 2.5844737 | 4.7143302 | 0.0261540 |
newPTCA | 4 | 136 | 0.0955882 | 0.2951127 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 2.7205753 | 5.4417040 | 0.0253057 |
newCABG | 5 | 136 | 0.1544118 | 0.3626788 | 0.0 | 0.0727273 | 0.0000 | 0 | 1 | 1 | 1.8917406 | 1.5905393 | 0.0310995 |
death | 6 | 136 | 0.0808824 | 0.2736623 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 3.0405055 | 7.2985009 | 0.0234664 |
kable(describe(stress.negative))
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1 | 422 | 268.3104265 | 166.7200872 | 277.5 | 266.5296 | 222.39 | 1 | 558 | 557 | 0.0740092 | -1.302758 | 8.1158069 |
posSE | 2 | 422 | 0.0000000 | 0.0000000 | 0.0 | 0.0000 | 0.00 | 0 | 0 | 0 | NaN | NaN | 0.0000000 |
newMI | 3 | 422 | 0.0331754 | 0.1793068 | 0.0 | 0.0000 | 0.00 | 0 | 1 | 1 | 5.1946536 | 25.043788 | 0.0087285 |
newPTCA | 4 | 422 | 0.0331754 | 0.1793068 | 0.0 | 0.0000 | 0.00 | 0 | 1 | 1 | 5.1946536 | 25.043788 | 0.0087285 |
newCABG | 5 | 422 | 0.0284360 | 0.1664122 | 0.0 | 0.0000 | 0.00 | 0 | 1 | 1 | 5.6539894 | 30.038795 | 0.0081008 |
death | 6 | 422 | 0.0308057 | 0.1729960 | 0.0 | 0.0000 | 0.00 | 0 | 1 | 1 | 5.4114833 | 27.348976 | 0.0084213 |
rq2.stressEcho.tidy <- gather(rq2.stressEcho,CardiacEvent,EventStatus,newMI:death) %>%
arrange(id)
rq2.stressEcho.tidy.df <- rq2.stressEcho.tidy %>%
group_by(posSE,CardiacEvent, EventStatus) %>%
summarise(count=n())
kable(rq2.stressEcho.tidy.df)
posSE | CardiacEvent | EventStatus | count |
---|---|---|---|
0 | death | 0 | 409 |
0 | death | 1 | 13 |
0 | newCABG | 0 | 410 |
0 | newCABG | 1 | 12 |
0 | newMI | 0 | 408 |
0 | newMI | 1 | 14 |
0 | newPTCA | 0 | 408 |
0 | newPTCA | 1 | 14 |
1 | death | 0 | 125 |
1 | death | 1 | 11 |
1 | newCABG | 0 | 115 |
1 | newCABG | 1 | 21 |
1 | newMI | 0 | 122 |
1 | newMI | 1 | 14 |
1 | newPTCA | 0 | 123 |
1 | newPTCA | 1 | 13 |
ggplot(data=filter(rq2.stressEcho.tidy,EventStatus==1), aes(posSE))+
geom_bar(aes(fill=CardiacEvent))+
geom_text(stat='count', aes(label=..count..), vjust=-0.2)+
scale_x_continuous(breaks = c(0, 1))+
xlab('Positive Stress Echocardiography (0 = No, 1 = yes) ')+
facet_wrap(~CardiacEvent)+
ggtitle('Stress Echocardiography Test : Suffer Cardiac Event')+
theme_bw()
ggplot(data=filter(rq2.stressEcho.tidy,EventStatus==0), aes(posSE))+
geom_bar(aes(fill=CardiacEvent))+
scale_x_continuous(breaks = c(0, 1))+
xlab('Positive Stress Echocardiography (0 = No, 1 = yes) ')+
geom_text(stat='count', aes(label=..count..), vjust=-0.2)+
facet_wrap(~CardiacEvent)+
ggtitle('Stress Echocardiography Test : Did Not Suffer Cardiac Event')+
theme_bw()
stress.positive %>%
group_by(death) %>%
summarise(count=n())
## # A tibble: 2 x 2
## death count
## <int> <int>
## 1 0 125
## 2 1 11
stress.negative %>%
group_by(death) %>%
summarise(count=n())
## # A tibble: 2 x 2
## death count
## <int> <int>
## 1 0 409
## 2 1 13
barplot(table(rq2.stressEcho$posSE), col=c("#ADD8E6","#ff7373"), main = 'Stress Echo Test (0=No, 1=Yes) ')
mosaicplot(data = rq2.stressEcho, ~posSE+death, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs Death" )
mosaicplot(data = rq2.stressEcho, ~posSE+newMI, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs Myocardial Infarctions" )
mosaicplot(data = rq2.stressEcho, ~posSE+newPTCA, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs PTCA" )
mosaicplot(data = rq2.stressEcho, ~posSE+newCABG, color=c("#ADD8E6","#ff7373"), main ="Stress Test Vs CABG" )
In the given population of 558, there are 422 patients with negative stress test a 136 patients with a positive stress test. Let’s create a contingency table, with Chi Square test values, in order to perform conditional probability and determine what is the likliehood of dying with positive stress test.
library(gmodels)
## Warning: package 'gmodels' was built under R version 3.4.4
CrossTable(rq2.stressEcho$posSE, rq2.stressEcho$death, chisq = TRUE)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 558
##
##
## | rq2.stressEcho$death
## rq2.stressEcho$posSE | 0 | 1 | Row Total |
## ---------------------|-----------|-----------|-----------|
## 0 | 409 | 13 | 422 |
## | 0.066 | 1.462 | |
## | 0.969 | 0.031 | 0.756 |
## | 0.766 | 0.542 | |
## | 0.733 | 0.023 | |
## ---------------------|-----------|-----------|-----------|
## 1 | 125 | 11 | 136 |
## | 0.204 | 4.535 | |
## | 0.919 | 0.081 | 0.244 |
## | 0.234 | 0.458 | |
## | 0.224 | 0.020 | |
## ---------------------|-----------|-----------|-----------|
## Column Total | 534 | 24 | 558 |
## | 0.957 | 0.043 | |
## ---------------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 6.266194 d.f. = 1 p = 0.01230632
##
## Pearson's Chi-squared test with Yates' continuity correction
## ------------------------------------------------------------
## Chi^2 = 5.108637 d.f. = 1 p = 0.02380701
##
##
H0 - Null Hypothesis: There is NO difference in adverse outcomes between patients with a positive stress test and patients with a negative stress test.
HA - Alternate Hypothesis: There IS a statistical difference in adverse outcomes between patients with a positive stress test and a negative stress test.
lm.eacho.death <-lm(rq2.stressEcho$death~rq2.stressEcho$posSE)
summary(lm.eacho.death)
##
## Call:
## lm(formula = rq2.stressEcho$death ~ rq2.stressEcho$posSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.08088 -0.03081 -0.03081 -0.03081 0.96919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.030806 0.009838 3.131 0.00183 **
## rq2.stressEcho$posSE 0.050077 0.019928 2.513 0.01226 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2021 on 556 degrees of freedom
## Multiple R-squared: 0.01123, Adjusted R-squared: 0.009451
## F-statistic: 6.315 on 1 and 556 DF, p-value: 0.01226
lm1 <-lm(stress.negative$death~stress.negative$posSE)
summary(lm1)
##
## Call:
## lm(formula = stress.negative$death ~ stress.negative$posSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.03081 -0.03081 -0.03081 -0.03081 0.96919
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.030806 0.008421 3.658 0.000286 ***
## stress.negative$posSE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.173 on 421 degrees of freedom
confint(lm1)
## 2.5 % 97.5 %
## (Intercept) 0.01425263 0.04735875
## stress.negative$posSE NA NA
lm2 <-lm(stress.positive$death~stress.positive$posSE)
summary(lm2)
##
## Call:
## lm(formula = stress.positive$death ~ stress.positive$posSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.08088 -0.08088 -0.08088 -0.08088 0.91912
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.08088 0.02347 3.447 0.000757 ***
## stress.positive$posSE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2737 on 135 degrees of freedom
confint(lm2)
## 2.5 % 97.5 %
## (Intercept) 0.03447313 0.1272916
## stress.positive$posSE NA NA
# Calculate the Z score
Z <- (0.08088 - 0.030806)/0.008421
paste("Z score: ", round(Z, 4))
## [1] "Z score: 5.9463"
The above value is clearly statistically significant.
# Calculate the two sided p score
paste("Two sided P value: ", 2* pnorm(5.9463, lower.tail = FALSE))
## [1] "Two sided P value: 2.74271280474635e-09"
lm.stress.echo <-lm(rq2.stressEcho$death~rq2.stressEcho$posSE)
summary(lm.stress.echo)
##
## Call:
## lm(formula = rq2.stressEcho$death ~ rq2.stressEcho$posSE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.08088 -0.03081 -0.03081 -0.03081 0.96919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.030806 0.009838 3.131 0.00183 **
## rq2.stressEcho$posSE 0.050077 0.019928 2.513 0.01226 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2021 on 556 degrees of freedom
## Multiple R-squared: 0.01123, Adjusted R-squared: 0.009451
## F-statistic: 6.315 on 1 and 556 DF, p-value: 0.01226
plot(lm.stress.echo)
There is very clear evidence that there is a considerable statiscal differnce between patients with a positive stress test and a negative stress test.
Does Resting wall motion abnormality on echocardiogram a better prdeictor/ estimator for future cardiac events?
kable(describe(rq3.restWMA))
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1 | 558 | 279.5000000 | 161.2249981 | 279.5 | 279.5000000 | 206.8227 | 1 | 558 | 557 | 0.0000000 | -1.206454 | 6.8251984 |
restwma | 2 | 558 | 0.4605735 | 0.4988904 | 0.0 | 0.4508929 | 0.0000 | 0 | 1 | 1 | 0.1577736 | -1.978644 | 0.0211197 |
newMI | 3 | 558 | 0.0501792 | 0.2185105 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.1097770 | 14.917010 | 0.0092503 |
newPTCA | 4 | 558 | 0.0483871 | 0.2147754 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.1979077 | 15.650486 | 0.0090922 |
newCABG | 5 | 558 | 0.0591398 | 0.2360978 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 3.7278633 | 11.918333 | 0.0099948 |
death | 6 | 558 | 0.0430108 | 0.2030634 | 0.0 | 0.0000000 | 0.0000 | 0 | 1 | 1 | 4.4928862 | 18.218686 | 0.0085964 |
rq3.restWMA.tidy <- gather(rq3.restWMA,CardiacEvent,EventStatus,newMI:death) %>%
arrange(id)
rq3.restWMA.tidy.df <- rq3.restWMA.tidy %>%
group_by(restwma,CardiacEvent, EventStatus) %>%
summarise(count=n())
kable(rq3.restWMA.tidy.df)
restwma | CardiacEvent | EventStatus | count |
---|---|---|---|
0 | death | 0 | 283 |
0 | death | 1 | 18 |
0 | newCABG | 0 | 271 |
0 | newCABG | 1 | 30 |
0 | newMI | 0 | 277 |
0 | newMI | 1 | 24 |
0 | newPTCA | 0 | 279 |
0 | newPTCA | 1 | 22 |
1 | death | 0 | 251 |
1 | death | 1 | 6 |
1 | newCABG | 0 | 254 |
1 | newCABG | 1 | 3 |
1 | newMI | 0 | 253 |
1 | newMI | 1 | 4 |
1 | newPTCA | 0 | 252 |
1 | newPTCA | 1 | 5 |
ggplot(data=filter(rq3.restWMA.tidy), aes(restwma))+
geom_bar(aes(fill=CardiacEvent))+
scale_x_continuous(breaks = c(0, 1))+
xlab('Wall motion anamoly on echocardiogram (0 = No, 1 = yes) ')+
geom_text(stat='count', aes(label=..count..), vjust=0.2)+
facet_wrap(~CardiacEvent~EventStatus,ncol = 2)
CrossTable(rq3.restWMA$restwma,rq3.restWMA$death,chisq = TRUE)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 558
##
##
## | rq3.restWMA$death
## rq3.restWMA$restwma | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 283 | 18 | 301 |
## | 0.089 | 1.973 | |
## | 0.940 | 0.060 | 0.539 |
## | 0.530 | 0.750 | |
## | 0.507 | 0.032 | |
## --------------------|-----------|-----------|-----------|
## 1 | 251 | 6 | 257 |
## | 0.104 | 2.311 | |
## | 0.977 | 0.023 | 0.461 |
## | 0.470 | 0.250 | |
## | 0.450 | 0.011 | |
## --------------------|-----------|-----------|-----------|
## Column Total | 534 | 24 | 558 |
## | 0.957 | 0.043 | |
## --------------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 4.475899 d.f. = 1 p = 0.03437611
##
## Pearson's Chi-squared test with Yates' continuity correction
## ------------------------------------------------------------
## Chi^2 = 3.634054 d.f. = 1 p = 0.05660876
##
##
lm.restWMA <-lm(rq3.restWMA$death~rq3.restWMA$restwma)
summary(lm.restWMA)
##
## Call:
## lm(formula = rq3.restWMA$death ~ rq3.restWMA$restwma)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.05980 -0.05980 -0.05980 -0.02335 0.97665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05980 0.01167 5.125 4.11e-07 ***
## rq3.restWMA$restwma -0.03645 0.01719 -2.120 0.0344 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2024 on 556 degrees of freedom
## Multiple R-squared: 0.008021, Adjusted R-squared: 0.006237
## F-statistic: 4.496 on 1 and 556 DF, p-value: 0.03442
plot(lm.restWMA)
The rest wall motion clearly has very high statistcal significance and aligns with the original research thesis conlcusion also.
Do Age, higher heart rate and ecg (Baseline electrocardiogram diagnosis) have any influence in cardiac events outcome?
describe(rq4.age.ecg)
## vars n mean sd median trimmed mad min max range skew
## age 1 558 67.34 12.05 69 67.99 10.38 26 93 67 -0.58
## maxhr 2 558 119.37 21.91 120 119.58 20.76 58 200 142 -0.03
## basebp 3 558 135.32 20.77 133 134.43 19.27 85 203 118 0.41
## ecg* 4 558 2.24 0.90 3 2.30 0.00 1 3 2 -0.49
## newMI 5 558 0.05 0.22 0 0.00 0.00 0 1 1 4.11
## newPTCA 6 558 0.05 0.21 0 0.00 0.00 0 1 1 4.20
## newCABG 7 558 0.06 0.24 0 0.00 0.00 0 1 1 3.73
## death 8 558 0.04 0.20 0 0.00 0.00 0 1 1 4.49
## kurtosis se
## age 0.41 0.51
## maxhr 0.12 0.93
## basebp 0.03 0.88
## ecg* -1.59 0.04
## newMI 14.92 0.01
## newPTCA 15.65 0.01
## newCABG 11.92 0.01
## death 18.22 0.01
summary(rq4.age.ecg$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 26.00 60.00 69.00 67.34 75.00 93.00
summary(rq4.age.ecg$ecg)
## equivocal MI normal
## 176 71 311
ggplot(rq4.age.ecg,aes(rq4.age.ecg$age))+
geom_bar(aes(fill=ecg))+
ggtitle('Age vs Electrocardiogram Diagnosis')+
xlab('Age')+
theme_bw()
summary(rq4.age.ecg$maxhr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.0 104.2 120.0 119.4 133.0 200.0
qqnorm(rq4.age.ecg$maxhr)
qqline(rq1.heartRate$maxhr)
summary(rq4.age.ecg$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 26.00 60.00 69.00 67.34 75.00 93.00
qqnorm(rq4.age.ecg$age)
qqline(rq1.heartRate$age)
Let’s plot a distribution curve for age
hist(rq4.age.ecg$age, probability = TRUE, main = "Histogram of Age", xlab = "Age")
x <- 20:100
y <- dnorm(x = x, mean = mean(rq4.age.ecg$age), sd = sd(rq4.age.ecg$age))
lines(x = x, y = y, col = "blue")
let’s create a contigency table and view statistical factors
CrossTable(rq4.age.ecg$ecg,rq4.age.ecg$death,prop.r = TRUE,prop.c = TRUE,prop.t = TRUE)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 558
##
##
## | rq4.age.ecg$death
## rq4.age.ecg$ecg | 0 | 1 | Row Total |
## ----------------|-----------|-----------|-----------|
## equivocal | 169 | 7 | 176 |
## | 0.002 | 0.043 | |
## | 0.960 | 0.040 | 0.315 |
## | 0.316 | 0.292 | |
## | 0.303 | 0.013 | |
## ----------------|-----------|-----------|-----------|
## MI | 69 | 2 | 71 |
## | 0.016 | 0.364 | |
## | 0.972 | 0.028 | 0.127 |
## | 0.129 | 0.083 | |
## | 0.124 | 0.004 | |
## ----------------|-----------|-----------|-----------|
## normal | 296 | 15 | 311 |
## | 0.009 | 0.197 | |
## | 0.952 | 0.048 | 0.557 |
## | 0.554 | 0.625 | |
## | 0.530 | 0.027 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 534 | 24 | 558 |
## | 0.957 | 0.043 | |
## ----------------|-----------|-----------|-----------|
##
##
lm.ecg <- lm(rq4.age.ecg$death ~ rq4.age.ecg$ecg)
summary(lm.ecg)
##
## Call:
## lm(formula = rq4.age.ecg$death ~ rq4.age.ecg$ecg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.04823 -0.04823 -0.04823 -0.03977 0.97183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.039773 0.015325 2.595 0.0097 **
## rq4.age.ecg$ecgMI -0.011604 0.028584 -0.406 0.6849
## rq4.age.ecg$ecgnormal 0.008459 0.019178 0.441 0.6593
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2033 on 555 degrees of freedom
## Multiple R-squared: 0.00113, Adjusted R-squared: -0.002469
## F-statistic: 0.314 on 2 and 555 DF, p-value: 0.7306
confint(lm.ecg)
## 2.5 % 97.5 %
## (Intercept) 0.009669924 0.06987553
## rq4.age.ecg$ecgMI -0.067750673 0.04454325
## rq4.age.ecg$ecgnormal -0.029210835 0.04612840
par(mfrow=c(1,3))
boxplot(rq4.age.ecg$basebp, main = "Base Blood Pressure",col = "#ADD8E6")
boxplot(rq4.age.ecg$maxhr, main = "Maximum HR", col="#ff7373")
boxplot(rq4.age.ecg$age, main = "Age", col="#98FB98")
# Age vs Max HR
plot(rq4.age.ecg$maxhr ~ rq4.age.ecg$age, main = "Age vs. Maximum HR", xlab = "Age", ylab = "HR")
Building linear models for age, bp and max hr for finding probability of death cardiac event
lm.age.ecg.hr.bp <- lm(death ~ age+basebp+maxhr, data = rq4.age.ecg)
summary(lm.age.ecg.hr.bp)
##
## Call:
## lm(formula = death ~ age + basebp + maxhr, data = rq4.age.ecg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.10663 -0.05401 -0.04178 -0.02945 0.98895
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1403649 0.0912742 -1.538 0.125
## age 0.0010233 0.0007225 1.416 0.157
## basebp 0.0005981 0.0004197 1.425 0.155
## maxhr 0.0002809 0.0003972 0.707 0.480
##
## Residual standard error: 0.2028 on 554 degrees of freedom
## Multiple R-squared: 0.008422, Adjusted R-squared: 0.003052
## F-statistic: 1.568 on 3 and 554 DF, p-value: 0.196
plot(lm.age.ecg.hr.bp)
# Calculating linear models for age and base bp
lm.bbp.age <- lm(basebp ~ age, data = rq4.age.ecg)
summary(lm.bbp.age)
##
## Call:
## lm(formula = basebp ~ age, data = rq4.age.ecg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.248 -15.032 -2.138 13.090 68.637
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 120.4151 4.9597 24.279 < 2e-16 ***
## age 0.2214 0.0725 3.054 0.00237 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.62 on 556 degrees of freedom
## Multiple R-squared: 0.0165, Adjusted R-squared: 0.01473
## F-statistic: 9.325 on 1 and 556 DF, p-value: 0.002368
plot(lm.bbp.age)
Now check linear model for Age and Max Hr
# Now to maximum heart rate
lm.mhr.age <- lm(maxhr ~ age, data = rq4.age.ecg)
summary(lm.mhr.age)
##
## Call:
## lm(formula = maxhr ~ age, data = rq4.age.ecg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.233 -14.085 0.521 13.317 80.767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 133.3542 5.2404 25.448 < 2e-16 ***
## age -0.2077 0.0766 -2.711 0.00692 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.78 on 556 degrees of freedom
## Multiple R-squared: 0.01305, Adjusted R-squared: 0.01127
## F-statistic: 7.35 on 1 and 556 DF, p-value: 0.006915
plot(lm.mhr.age)