Determining whether or not if a patient with chest pain is the result of a disease process, acute coronary syndrome, can sometimes be difficult to differentiate for physicians. Often times, when a patient arrives to the emergency department, a physician is responsible for determining if the chest pain is from a dangerous acute pathology versus a benign pathology, such as heartburn, muscle cramps, etc. In medical literature, the consensus is that physical complaints (such as the characteristic, intensity, duration, location) can be unreliable predictors for bad cardiac outcomes. It does seem intuitive that if a patient is a smoker, has bad eating habits, or has history of hypertension, diabetes, high cholesterol, the patient will have a higher pretest probability of underlying coronary artery disease.
Ultimately, if the chest pain is deemed likely to be secondary to athlerosclerotic coronary arteries, a cardiologist must determine the severity of the heart disease. A very common method is the stress test. This test is design such that when a patient walks a treadmill, he will start to reach a target heart rate (ie. 150 beats per minute). Once he reaches that target heart rate, the cardiologist uses an ultrasound to visualize any abnormal movements of the heart walls. If there are abnormal cardiac movements, that would be considered a “positive” stress test, and that would require further testing and intervention (such as the cardiac cathertization).
However, many patients are simply too debilitated to walk the treadmill. Different comorbidities such as arthritis, obesity, etc. can limit the patient from reaching their target heart rate, thus making the exam limited in interpretation. An alternative is the nuclear stress test. Instead of walking on a treadmill, the patient is injected with a medication, dobutamine, which artifically elevates the heart rate to the desired heart rate. With significant review, the nuclear stress test is more sensitive to coronary artery disease than the walking treadmill stress test. Again, if there are any cardiac wall motion abnormalities, then it’s considered a positive stress test, and that will require further testing and/or intervention.
Ultimately, what physicians care about is whether or not significant morbidiy will arise from a positive (or perhaps even a negative) stress test. In this study that was published in the “Journal of the American College of Cardiology” (1999), data was collected on the demographics of the patient, their medical history, whether or not they had a postiive stress test, and if they sustained any “bad” outcomes over the course of 1 year. In this study, a “bad” outcome was defined as: myocardial infarction (heart attack), the need to undergo an invasive cardiac procedure, percutaneous transluminal coronary angioplasty (PTCA: this implies that the patient had one of his coronary arteries block which can more or less be a proxy for a heart attack) or coronary artery bypass grafting (CABG: a major cardiac surgery where they need to “create” new coronary arteries for the patient given that the native ones are blocked or badly damaged), or death.
However, one must remember that because a stress test is positive, this does not imply that a patient will have one of those adverse outcomes. If a stress test is positive, but no medical consequence results out of it, then in the medical community, we call it “clinically insignificant”.
Load libraries.
library(RCurl)
library(psych)
library(tidyr)
library(dplyr)
library(ggplot2)
Import data.
# Import data in from Github
# In this case, import in the Stress Echo dataset.
# Garfinkel, Alan, et. al. "Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease." Journal of the American College of Cardiology 33.3 (1999) 708-16.
stress.echo <- read.csv("https://raw.githubusercontent.com/jcp9010/MSDA/master/stressEcho.csv", header = TRUE)
# We will now subset our data to answer the 3 questions listed below.
# For the 1st question:
stress1 <- subset(stress.echo, select = c(posSE, newMI, newPTCA, newCABG, death))
colnames(stress1) <- c("Stress.Test", "New.MI", "New.PTCA", "New.CABG", "Death")
# We will need to subset these data sets again into positive stress test vs. negative stress test
stress.test.positive <- stress1 %>% filter(Stress.Test == 1)
stress.test.negative <- stress1 %>% filter(Stress.Test == 0)
# For the 2nd question:
stress2 <- subset(stress.echo, select = c(hxofMI, newMI, newPTCA, newCABG, death))
colnames(stress2) <- c("Hx.MI", "New.MI", "New.PTCA", "New.CABG", "Death")
# Subset this data into patients with history of MI and patients without history of MI.
With.MI <- stress2 %>% filter(Hx.MI == 1)
Without.MI <- stress2 %>% filter(Hx.MI == 0)
# For the 3rd question:
stress3 <- subset(stress.echo, select = c(age, newMI, newPTCA, newCABG, death))
colnames(stress3) <- c("Age", "New.MI", "New.PTCA", "New.CABG", "Death")
# According to the recent validated scoring system to risk stratify cardiac patients, Six AJ, Backus BE, Kelder JC. Chest pain in the emergency room: value of the HEART score. Netherlands Heart Journal. 2008;16(6):191-196, they had subsetted their patients to < 45 years old, 45 to 65 years old, and greater than 65 years old, This is what I will be subsetting them into.
Age.45 <- stress3 %>% filter(Age < 45)
Age.45.65 <- stress3 %>% filter(Age >= 45 & Age <= 65)
Age.65 <- stress3 %>% filter(Age > 65)
The important question for emergency medicine physician is:
Does a positive stress test lead to higher probabilities of having an adverse outcome, such as MI, CABG/PTCA, or death in 1 year?
Does having a history of a previous MI put you at higher risk for adverse outcomes in 1 year from the stress test?
Are older patients more likely to have an adverse outcome in 1 year from the stress test compared to younger patients.?
dim(stress.echo)
## [1] 558 32
Each observation represents a patient who had undergone a stress test. There are 558 observations in this dataset.
This data was obtained via the UCLA Statistics Web Site. The original data set was from a medical study.
Garfinkel, Alan, et. al. “Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease.” Journal of the American College of Cardiology 33.3 (1999) 708-16.
This was an experimental study.
This data was collected from the Department of Biostatistics at Vanderbilt University. The .csv file was downloaded and uploaded onto GitHub. The file was subsequently downloaded with the ‘RCurl’ package.
Question 1: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.
Question 2: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.
Question 3: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.
Question 1: The explanatory variable is: whether the stress test is positive or negative (categorical).
Question 2: The explanatory variable is: history of a previous MI (categorial).
Question 3: The explanatory variable is: age (numerical).
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
# Question 1
# This question looks at the stress test itself. A positive stress test indicates that the patient has abnormal findings on the echocardiogram during the stress test. This implies that the patient has some form of heart disease, such as coronary artery disease. Again, this question is to see if having a positive stress test (and hence coronary artery disease) leads to higher (or statistically significant) levels of adverse outcomes (as defined in this study).
# We'll do a basic describe() function using the psych package to look at this particular subset.
describe(stress1)
## vars n mean sd median trimmed mad min max range skew
## Stress.Test 1 558 0.24 0.43 0 0.18 0 0 1 1 1.19
## New.MI 2 558 0.05 0.22 0 0.00 0 0 1 1 4.11
## New.PTCA 3 558 0.05 0.21 0 0.00 0 0 1 1 4.20
## New.CABG 4 558 0.06 0.24 0 0.00 0 0 1 1 3.73
## Death 5 558 0.04 0.20 0 0.00 0 0 1 1 4.49
## kurtosis se
## Stress.Test -0.58 0.02
## New.MI 14.92 0.01
## New.PTCA 15.65 0.01
## New.CABG 11.92 0.01
## Death 18.22 0.01
# This gives us a general idea of what the data looks like. It looks like overall for this patient sample, there was 0.05 new MI's, 0.05 new PTCA, and 0.06 new CABG (observing their means), regardless if their stress test was positive or negative.
#Let's subset this even more and take a look at the positive stress test group vs. the negative stress test group.
describe(stress.test.positive)
## vars n mean sd median trimmed mad min max range skew
## Stress.Test 1 136 1.00 0.00 1 1.00 0 1 1 0 NaN
## New.MI 2 136 0.10 0.31 0 0.01 0 0 1 1 2.58
## New.PTCA 3 136 0.10 0.30 0 0.00 0 0 1 1 2.72
## New.CABG 4 136 0.15 0.36 0 0.07 0 0 1 1 1.89
## Death 5 136 0.08 0.27 0 0.00 0 0 1 1 3.04
## kurtosis se
## Stress.Test NaN 0.00
## New.MI 4.71 0.03
## New.PTCA 5.44 0.03
## New.CABG 1.59 0.03
## Death 7.30 0.02
describe(stress.test.negative)
## vars n mean sd median trimmed mad min max range skew
## Stress.Test 1 422 0.00 0.00 0 0 0 0 0 0 NaN
## New.MI 2 422 0.03 0.18 0 0 0 0 1 1 5.19
## New.PTCA 3 422 0.03 0.18 0 0 0 0 1 1 5.19
## New.CABG 4 422 0.03 0.17 0 0 0 0 1 1 5.65
## Death 5 422 0.03 0.17 0 0 0 0 1 1 5.41
## kurtosis se
## Stress.Test NaN 0.00
## New.MI 25.04 0.01
## New.PTCA 25.04 0.01
## New.CABG 30.04 0.01
## Death 27.35 0.01
# With the table() function, we can determine in each subset group the absolute number of MI's in the positive (1) group vs. the negative (0) group.
pos.stress.MI <- table(stress.test.positive$New.MI)
neg.stress.MI <- table(stress.test.negative$New.MI)
pos.stress.MI
##
## 0 1
## 122 14
neg.stress.MI
##
## 0 1
## 408 14
# Let's use a bar plot to see the amount of MI's for each group.
barplot(pos.stress.MI, main = "Myocardial Infarctions in 1 year with a Positive Stress Test", ylim = c(0,400), col=c("darkblue","red"))
barplot(neg.stress.MI, main = "Myocardial Infarctions in 1 year with a Negative Stress Test", ylim = c(0, 400), col=c("darkblue","red"))
# Let's do the same for PTCA, CABG, and Death
pos.stress.PTCA <- table(stress.test.positive$New.PTCA)
neg.stress.PTCA <- table(stress.test.negative$New.PTCA)
pos.stress.PTCA
##
## 0 1
## 123 13
neg.stress.PTCA
##
## 0 1
## 408 14
pos.stress.CABG <- table(stress.test.positive$New.CABG)
neg.stress.CABG <- table(stress.test.negative$New.CABG)
pos.stress.CABG
##
## 0 1
## 115 21
neg.stress.CABG
##
## 0 1
## 410 12
pos.stress.Death <- table(stress.test.positive$Death)
neg.stress.Death <- table(stress.test.negative$Death)
pos.stress.Death
##
## 0 1
## 125 11
neg.stress.Death
##
## 0 1
## 409 13
# Question 2
# Some risk factors are thought to be associated with higher risk of adverse outcomes. In this particular subset, we will be examining prior history of MI as a risk factor.
describe(stress2)
## vars n mean sd median trimmed mad min max range skew kurtosis
## Hx.MI 1 558 0.28 0.45 0 0.22 0 0 1 1 1.00 -1.00
## New.MI 2 558 0.05 0.22 0 0.00 0 0 1 1 4.11 14.92
## New.PTCA 3 558 0.05 0.21 0 0.00 0 0 1 1 4.20 15.65
## New.CABG 4 558 0.06 0.24 0 0.00 0 0 1 1 3.73 11.92
## Death 5 558 0.04 0.20 0 0.00 0 0 1 1 4.49 18.22
## se
## Hx.MI 0.02
## New.MI 0.01
## New.PTCA 0.01
## New.CABG 0.01
## Death 0.01
# Again, very similar to question 1. We will subset patients with history of MI versus patients WITHOUT history of MI and describe() the subset.
describe(With.MI)
## vars n mean sd median trimmed mad min max range skew kurtosis
## Hx.MI 1 154 1.00 0.00 1 1.00 0 1 1 0 NaN NaN
## New.MI 2 154 0.09 0.29 0 0.00 0 0 1 1 2.82 5.98
## New.PTCA 3 154 0.11 0.31 0 0.02 0 0 1 1 2.46 4.09
## New.CABG 4 154 0.08 0.27 0 0.00 0 0 1 1 3.12 7.78
## Death 5 154 0.06 0.24 0 0.00 0 0 1 1 3.73 11.98
## se
## Hx.MI 0.00
## New.MI 0.02
## New.PTCA 0.03
## New.CABG 0.02
## Death 0.02
describe(Without.MI)
## vars n mean sd median trimmed mad min max range skew kurtosis
## Hx.MI 1 404 0.00 0.00 0 0 0 0 0 0 NaN NaN
## New.MI 2 404 0.03 0.18 0 0 0 0 1 1 5.07 23.76
## New.PTCA 3 404 0.02 0.16 0 0 0 0 1 1 6.09 35.24
## New.CABG 4 404 0.05 0.22 0 0 0 0 1 1 4.02 14.21
## Death 5 404 0.04 0.19 0 0 0 0 1 1 4.88 21.85
## se
## Hx.MI 0.00
## New.MI 0.01
## New.PTCA 0.01
## New.CABG 0.01
## Death 0.01
# Similar to question 1, multiple tables can be made for each adverse outcome.
With.HxMI.NewMI <- table(With.MI$New.MI)
Without.HxMI.NewMI <- table(Without.MI$New.MI)
With.HxMI.NewMI
##
## 0 1
## 140 14
Without.HxMI.NewMI
##
## 0 1
## 390 14
With.HxMI.NewPTCA <- table(With.MI$New.PTCA)
Without.HxMI.NewPTCA <- table(Without.MI$New.PTCA)
With.HxMI.NewPTCA
##
## 0 1
## 137 17
Without.HxMI.NewPTCA
##
## 0 1
## 394 10
With.HxMI.NewCABG <- table(With.MI$New.CABG)
Without.HxMI.NewCABG <- table(Without.MI$New.CABG)
With.HxMI.NewCABG
##
## 0 1
## 142 12
Without.HxMI.NewCABG
##
## 0 1
## 383 21
With.HxMI.Death <- table(With.MI$Death)
Without.HxMI.Death <- table(Without.MI$Death)
With.HxMI.Death
##
## 0 1
## 145 9
Without.HxMI.Death
##
## 0 1
## 389 15
# Question 3
# It's presumed (or seems intuitive) that an older patient would more likely be higher risk for adverse outcomes. Again, let's describe() this data set for the three subset of age groups.
describe(Age.45)
## vars n mean sd median trimmed mad min max range skew
## Age 1 22 36.45 5.41 38 36.72 5.93 26 44 18 -0.43
## New.MI 2 22 0.00 0.00 0 0.00 0.00 0 0 0 NaN
## New.PTCA 3 22 0.00 0.00 0 0.00 0.00 0 0 0 NaN
## New.CABG 4 22 0.05 0.21 0 0.00 0.00 0 1 1 4.07
## Death 5 22 0.00 0.00 0 0.00 0.00 0 0 0 NaN
## kurtosis se
## Age -1.20 1.15
## New.MI NaN 0.00
## New.PTCA NaN 0.00
## New.CABG 15.27 0.05
## Death NaN 0.00
describe(Age.45.65)
## vars n mean sd median trimmed mad min max range skew
## Age 1 193 57.47 5.83 59 57.87 7.41 45 65 20 -0.48
## New.MI 2 193 0.04 0.20 0 0.00 0.00 0 1 1 4.57
## New.PTCA 3 193 0.04 0.20 0 0.00 0.00 0 1 1 4.57
## New.CABG 4 193 0.03 0.17 0 0.00 0.00 0 1 1 5.36
## Death 5 193 0.03 0.16 0 0.00 0.00 0 1 1 5.92
## kurtosis se
## Age -0.96 0.42
## New.MI 18.94 0.01
## New.PTCA 18.94 0.01
## New.CABG 26.89 0.01
## Death 33.25 0.01
describe(Age.65)
## vars n mean sd median trimmed mad min max range skew
## Age 1 343 74.88 6.49 74 74.32 7.41 66 93 27 0.66
## New.MI 2 343 0.06 0.23 0 0.00 0.00 0 1 1 3.75
## New.PTCA 3 343 0.06 0.23 0 0.00 0.00 0 1 1 3.87
## New.CABG 4 343 0.08 0.27 0 0.00 0.00 0 1 1 3.19
## Death 5 343 0.06 0.23 0 0.00 0.00 0 1 1 3.87
## kurtosis se
## Age -0.42 0.35
## New.MI 12.12 0.01
## New.PTCA 13.02 0.01
## New.CABG 8.21 0.01
## Death 13.02 0.01
# Graphics describing the distribution of the age of the patients in this sample.
# There appears to be a left skew to this data.
ggplot(stress3, aes(x = Age)) + geom_dotplot(binwidth = 1) + labs(title = "Age of Patients")
qqnorm(stress3$Age)
qqline(stress3$Age)