Park-Proposal

Determining whether or not if a patient with chest pain is the result of a disease process, acute coronary syndrome, can sometimes be difficult to differentiate for physicians. Often times, when a patient arrives to the emergency department, a physician is responsible for determining if the chest pain is from a dangerous acute pathology versus a benign pathology, such as heartburn, muscle cramps, etc. In medical literature, the consensus is that physical complaints (such as the characteristic, intensity, duration, location) can be unreliable predictors for bad cardiac outcomes. It does seem intuitive that if a patient is a smoker, has bad eating habits, or has history of hypertension, diabetes, high cholesterol, the patient will have a higher pretest probability of underlying coronary artery disease.

Ultimately, if the chest pain is deemed likely to be secondary to athlerosclerotic coronary arteries, a cardiologist must determine the severity of the heart disease. A very common method is the stress test. This test is design such that when a patient walks a treadmill, he will start to reach a target heart rate (ie. 150 beats per minute). Once he reaches that target heart rate, the cardiologist uses an ultrasound to visualize any abnormal movements of the heart walls. If there are abnormal cardiac movements, that would be considered a “positive” stress test, and that would require further testing and intervention (such as the cardiac cathertization).

However, many patients are simply too debilitated to walk the treadmill. Different comorbidities such as arthritis, obesity, etc. can limit the patient from reaching their target heart rate, thus making the exam limited in interpretation. An alternative is the nuclear stress test. Instead of walking on a treadmill, the patient is injected with a medication, dobutamine, which artifically elevates the heart rate to the desired heart rate. With significant review, the nuclear stress test is more sensitive to coronary artery disease than the walking treadmill stress test. Again, if there are any cardiac wall motion abnormalities, then it’s considered a positive stress test, and that will require further testing and/or intervention.

Ultimately, what physicians care about is whether or not significant morbidiy will arise from a positive (or perhaps even a negative) stress test. In this study that was published in the “Journal of the American College of Cardiology” (1999), data was collected on the demographics of the patient, their medical history, whether or not they had a postiive stress test, and if they sustained any “bad” outcomes over the course of 1 year. In this study, a “bad” outcome was defined as: myocardial infarction (heart attack), the need to undergo an invasive cardiac procedure, percutaneous transluminal coronary angioplasty (PTCA: this implies that the patient had one of his coronary arteries block which can more or less be a proxy for a heart attack) or coronary artery bypass grafting (CABG: a major cardiac surgery where they need to “create” new coronary arteries for the patient given that the native ones are blocked or badly damaged), or death.

However, one must remember that because a stress test is positive, this does not imply that a patient will have one of those adverse outcomes. If a stress test is positive, but no medical consequence results out of it, then in the medical community, we call it “clinically insignificant”.

Data Preparation

Load libraries.

library(RCurl)
library(psych)
library(tidyr)
library(dplyr)
library(ggplot2)

Import data.

# Import data in from Github
# In this case, import in the Stress Echo dataset.
# Garfinkel, Alan, et. al. "Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease." Journal of the American College of Cardiology 33.3 (1999) 708-16.

stress.echo <- read.csv("https://raw.githubusercontent.com/jcp9010/MSDA/master/stressEcho.csv", header = TRUE)

# We will now subset our data to answer the 3 questions listed below.

# For the 1st question:
stress1 <- subset(stress.echo, select = c(posSE, newMI, newPTCA, newCABG, death))
colnames(stress1) <- c("Stress.Test", "New.MI", "New.PTCA", "New.CABG", "Death")
# We will need to subset these data sets again into positive stress test vs. negative stress test
stress.test.positive <- stress1 %>% filter(Stress.Test == 1)
stress.test.negative <- stress1 %>% filter(Stress.Test == 0)

# For the 2nd question:
stress2 <- subset(stress.echo, select = c(hxofMI, newMI, newPTCA, newCABG, death))
colnames(stress2) <- c("Hx.MI", "New.MI", "New.PTCA", "New.CABG", "Death")
# Subset this data into patients with history of MI and patients without history of MI.
With.MI <- stress2 %>% filter(Hx.MI == 1)
Without.MI <- stress2 %>% filter(Hx.MI == 0)

# For the 3rd question:
stress3 <- subset(stress.echo, select = c(age, newMI, newPTCA, newCABG, death))
colnames(stress3) <- c("Age", "New.MI", "New.PTCA", "New.CABG", "Death")
# According to the recent validated scoring system to risk stratify cardiac patients, Six AJ, Backus BE, Kelder JC. Chest pain in the emergency room: value of the HEART score. Netherlands Heart Journal. 2008;16(6):191-196, they had subsetted their patients to < 45 years old, 45 to 65 years old, and greater than 65 years old, This is what I will be subsetting them into.
Age.45 <- stress3 %>% filter(Age < 45)
Age.45.65 <- stress3 %>% filter(Age >= 45 & Age <= 65)
Age.65 <- stress3 %>% filter(Age > 65)

Research question

The important question for emergency medicine physician is:

Does a positive stress test lead to higher probabilities of having an adverse outcome, such as MI, CABG/PTCA, or death in 1 year?
Does having a history of a previous MI put you at higher risk for adverse outcomes in 1 year from the stress test?
Are older patients more likely to have an adverse outcome in 1 year from the stress test compared to younger patients.?

Cases

dim(stress.echo)

## [1] 558  32

Each observation represents a patient who had undergone a stress test. There are 558 observations in this dataset.

Data collection

This data was obtained via the UCLA Statistics Web Site. The original data set was from a medical study.

Garfinkel, Alan, et. al. “Prognostic Value of Dobutamine Stress Echocardiography in Predicting Cardiac Events in Patients With Known or Suspected Coronary Artery Disease.” Journal of the American College of Cardiology 33.3 (1999) 708-16.

Type of study

This was an experimental study.

Data Source

This data was collected from the Department of Biostatistics at Vanderbilt University. The .csv file was downloaded and uploaded onto GitHub. The file was subsequently downloaded with the ‘RCurl’ package.

http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets

Response

Question 1: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.

Question 2: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.

Question 3: The response variable is: adverse outcome (MI, CABG/PTCA, or death in 1 year) and this is numerical.

Explanatory

Question 1: The explanatory variable is: whether the stress test is positive or negative (categorical).

Question 2: The explanatory variable is: history of a previous MI (categorial).

Question 3: The explanatory variable is: age (numerical).

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

# Question 1
# This question looks at the stress test itself. A positive stress test indicates that the patient has abnormal findings on the echocardiogram during the stress test. This implies that the patient has some form of heart disease, such as coronary artery disease. Again, this question is to see if having a positive stress test (and hence coronary artery disease) leads to higher (or statistically significant) levels of adverse outcomes (as defined in this study).

# We'll do a basic describe() function using the psych package to look at this particular subset.
describe(stress1)

##             vars   n mean   sd median trimmed mad min max range skew
## Stress.Test    1 558 0.24 0.43      0    0.18   0   0   1     1 1.19
## New.MI         2 558 0.05 0.22      0    0.00   0   0   1     1 4.11
## New.PTCA       3 558 0.05 0.21      0    0.00   0   0   1     1 4.20
## New.CABG       4 558 0.06 0.24      0    0.00   0   0   1     1 3.73
## Death          5 558 0.04 0.20      0    0.00   0   0   1     1 4.49
##             kurtosis   se
## Stress.Test    -0.58 0.02
## New.MI         14.92 0.01
## New.PTCA       15.65 0.01
## New.CABG       11.92 0.01
## Death          18.22 0.01

# This gives us a general idea of what the data looks like. It looks like overall for this patient sample, there was 0.05 new MI's, 0.05 new PTCA, and 0.06 new CABG (observing their means), regardless if their stress test was positive or negative.

#Let's subset this even more and take a look at the positive stress test group vs. the negative stress test group.
describe(stress.test.positive)

##             vars   n mean   sd median trimmed mad min max range skew
## Stress.Test    1 136 1.00 0.00      1    1.00   0   1   1     0  NaN
## New.MI         2 136 0.10 0.31      0    0.01   0   0   1     1 2.58
## New.PTCA       3 136 0.10 0.30      0    0.00   0   0   1     1 2.72
## New.CABG       4 136 0.15 0.36      0    0.07   0   0   1     1 1.89
## Death          5 136 0.08 0.27      0    0.00   0   0   1     1 3.04
##             kurtosis   se
## Stress.Test      NaN 0.00
## New.MI          4.71 0.03
## New.PTCA        5.44 0.03
## New.CABG        1.59 0.03
## Death           7.30 0.02

describe(stress.test.negative)

##             vars   n mean   sd median trimmed mad min max range skew
## Stress.Test    1 422 0.00 0.00      0       0   0   0   0     0  NaN
## New.MI         2 422 0.03 0.18      0       0   0   0   1     1 5.19
## New.PTCA       3 422 0.03 0.18      0       0   0   0   1     1 5.19
## New.CABG       4 422 0.03 0.17      0       0   0   0   1     1 5.65
## Death          5 422 0.03 0.17      0       0   0   0   1     1 5.41
##             kurtosis   se
## Stress.Test      NaN 0.00
## New.MI         25.04 0.01
## New.PTCA       25.04 0.01
## New.CABG       30.04 0.01
## Death          27.35 0.01

# With the table() function, we can determine in each subset group the absolute number of MI's in the positive (1) group vs. the negative (0) group.
pos.stress.MI <- table(stress.test.positive$New.MI)
neg.stress.MI <- table(stress.test.negative$New.MI)
pos.stress.MI

## 
##   0   1 
## 122  14

neg.stress.MI

## 
##   0   1 
## 408  14

# Let's use a bar plot to see the amount of MI's for each group.
barplot(pos.stress.MI, main = "Myocardial Infarctions in 1 year with a Positive Stress Test", ylim = c(0,400), col=c("darkblue","red"))

barplot(neg.stress.MI, main = "Myocardial Infarctions in 1 year with a Negative Stress Test", ylim = c(0, 400), col=c("darkblue","red"))

# Let's do the same for PTCA, CABG, and Death
pos.stress.PTCA <- table(stress.test.positive$New.PTCA)
neg.stress.PTCA <- table(stress.test.negative$New.PTCA)
pos.stress.PTCA

## 
##   0   1 
## 123  13

neg.stress.PTCA

## 
##   0   1 
## 408  14

pos.stress.CABG <- table(stress.test.positive$New.CABG)
neg.stress.CABG <- table(stress.test.negative$New.CABG)
pos.stress.CABG

## 
##   0   1 
## 115  21

neg.stress.CABG

## 
##   0   1 
## 410  12

pos.stress.Death <- table(stress.test.positive$Death)
neg.stress.Death <- table(stress.test.negative$Death)
pos.stress.Death

## 
##   0   1 
## 125  11

neg.stress.Death

## 
##   0   1 
## 409  13

# Question 2
# Some risk factors are thought to be associated with higher risk of adverse outcomes. In this particular subset, we will be examining prior history of MI as a risk factor.
describe(stress2)

##          vars   n mean   sd median trimmed mad min max range skew kurtosis
## Hx.MI       1 558 0.28 0.45      0    0.22   0   0   1     1 1.00    -1.00
## New.MI      2 558 0.05 0.22      0    0.00   0   0   1     1 4.11    14.92
## New.PTCA    3 558 0.05 0.21      0    0.00   0   0   1     1 4.20    15.65
## New.CABG    4 558 0.06 0.24      0    0.00   0   0   1     1 3.73    11.92
## Death       5 558 0.04 0.20      0    0.00   0   0   1     1 4.49    18.22
##            se
## Hx.MI    0.02
## New.MI   0.01
## New.PTCA 0.01
## New.CABG 0.01
## Death    0.01

# Again, very similar to question 1. We will subset patients with history of MI versus patients WITHOUT history of MI and describe() the subset.
describe(With.MI)

##          vars   n mean   sd median trimmed mad min max range skew kurtosis
## Hx.MI       1 154 1.00 0.00      1    1.00   0   1   1     0  NaN      NaN
## New.MI      2 154 0.09 0.29      0    0.00   0   0   1     1 2.82     5.98
## New.PTCA    3 154 0.11 0.31      0    0.02   0   0   1     1 2.46     4.09
## New.CABG    4 154 0.08 0.27      0    0.00   0   0   1     1 3.12     7.78
## Death       5 154 0.06 0.24      0    0.00   0   0   1     1 3.73    11.98
##            se
## Hx.MI    0.00
## New.MI   0.02
## New.PTCA 0.03
## New.CABG 0.02
## Death    0.02

describe(Without.MI)

##          vars   n mean   sd median trimmed mad min max range skew kurtosis
## Hx.MI       1 404 0.00 0.00      0       0   0   0   0     0  NaN      NaN
## New.MI      2 404 0.03 0.18      0       0   0   0   1     1 5.07    23.76
## New.PTCA    3 404 0.02 0.16      0       0   0   0   1     1 6.09    35.24
## New.CABG    4 404 0.05 0.22      0       0   0   0   1     1 4.02    14.21
## Death       5 404 0.04 0.19      0       0   0   0   1     1 4.88    21.85
##            se
## Hx.MI    0.00
## New.MI   0.01
## New.PTCA 0.01
## New.CABG 0.01
## Death    0.01

# Similar to question 1, multiple tables can be made for each adverse outcome.
With.HxMI.NewMI <- table(With.MI$New.MI)
Without.HxMI.NewMI <- table(Without.MI$New.MI)
With.HxMI.NewMI

## 
##   0   1 
## 140  14

Without.HxMI.NewMI

## 
##   0   1 
## 390  14

With.HxMI.NewPTCA <- table(With.MI$New.PTCA)
Without.HxMI.NewPTCA <- table(Without.MI$New.PTCA)
With.HxMI.NewPTCA

## 
##   0   1 
## 137  17

Without.HxMI.NewPTCA

## 
##   0   1 
## 394  10

With.HxMI.NewCABG <- table(With.MI$New.CABG)
Without.HxMI.NewCABG <- table(Without.MI$New.CABG)
With.HxMI.NewCABG

## 
##   0   1 
## 142  12

Without.HxMI.NewCABG

## 
##   0   1 
## 383  21

With.HxMI.Death <- table(With.MI$Death)
Without.HxMI.Death <- table(Without.MI$Death)
With.HxMI.Death

## 
##   0   1 
## 145   9

Without.HxMI.Death

## 
##   0   1 
## 389  15

# Question 3
# It's presumed (or seems intuitive) that an older patient would more likely be higher risk for adverse outcomes. Again, let's describe() this data set for the three subset of age groups.
describe(Age.45)

##          vars  n  mean   sd median trimmed  mad min max range  skew
## Age         1 22 36.45 5.41     38   36.72 5.93  26  44    18 -0.43
## New.MI      2 22  0.00 0.00      0    0.00 0.00   0   0     0   NaN
## New.PTCA    3 22  0.00 0.00      0    0.00 0.00   0   0     0   NaN
## New.CABG    4 22  0.05 0.21      0    0.00 0.00   0   1     1  4.07
## Death       5 22  0.00 0.00      0    0.00 0.00   0   0     0   NaN
##          kurtosis   se
## Age         -1.20 1.15
## New.MI        NaN 0.00
## New.PTCA      NaN 0.00
## New.CABG    15.27 0.05
## Death         NaN 0.00

describe(Age.45.65)

##          vars   n  mean   sd median trimmed  mad min max range  skew
## Age         1 193 57.47 5.83     59   57.87 7.41  45  65    20 -0.48
## New.MI      2 193  0.04 0.20      0    0.00 0.00   0   1     1  4.57
## New.PTCA    3 193  0.04 0.20      0    0.00 0.00   0   1     1  4.57
## New.CABG    4 193  0.03 0.17      0    0.00 0.00   0   1     1  5.36
## Death       5 193  0.03 0.16      0    0.00 0.00   0   1     1  5.92
##          kurtosis   se
## Age         -0.96 0.42
## New.MI      18.94 0.01
## New.PTCA    18.94 0.01
## New.CABG    26.89 0.01
## Death       33.25 0.01

describe(Age.65)

##          vars   n  mean   sd median trimmed  mad min max range skew
## Age         1 343 74.88 6.49     74   74.32 7.41  66  93    27 0.66
## New.MI      2 343  0.06 0.23      0    0.00 0.00   0   1     1 3.75
## New.PTCA    3 343  0.06 0.23      0    0.00 0.00   0   1     1 3.87
## New.CABG    4 343  0.08 0.27      0    0.00 0.00   0   1     1 3.19
## Death       5 343  0.06 0.23      0    0.00 0.00   0   1     1 3.87
##          kurtosis   se
## Age         -0.42 0.35
## New.MI      12.12 0.01
## New.PTCA    13.02 0.01
## New.CABG     8.21 0.01
## Death       13.02 0.01

# Graphics describing the distribution of the age of the patients in this sample.
# There appears to be a left skew to this data.
ggplot(stress3, aes(x = Age)) + geom_dotplot(binwidth = 1) + labs(title = "Age of Patients")

qqnorm(stress3$Age)
qqline(stress3$Age)