I use the “UC Irvine Machine Learning Repository,” which is an online repository with many databases. One such dataset is Heart Disease dataset http://archive.ics.uci.edu/ml/datasets/Heart+Disease which has four data sets from four different places.
I choose the reprocessed data of Hungarian Heart Disease Data Set for this assignment. The link to the dataset is http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/reprocessed.hungarian.data. The data was donated to the site on 1988 and was last modified on July 23th 1996.
I got the dataset from online and create a table on R. I also check how much data I have, by checking the number of columns and rows.
library(RCurl)
## Loading required package: bitops
x <- getURL("http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/reprocessed.hungarian.data")
y <- read.table(text = x, header = FALSE)
head(y, 10)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
## 1 40 1 2 140 289 0 0 172 0 0.0 -9 -9 -9 0
## 2 49 0 3 160 180 0 0 156 0 1.0 2 -9 -9 1
## 3 37 1 2 130 283 0 1 98 0 0.0 -9 -9 -9 0
## 4 48 0 4 138 214 0 0 108 1 1.5 2 -9 -9 3
## 5 54 1 3 150 -9 0 0 122 0 0.0 -9 -9 -9 0
## 6 39 1 3 120 339 0 0 170 0 0.0 -9 -9 -9 0
## 7 45 0 2 130 237 0 0 170 0 0.0 -9 -9 -9 0
## 8 54 1 2 110 208 0 0 142 0 0.0 -9 -9 -9 0
## 9 37 1 4 140 207 0 0 130 1 1.5 2 -9 -9 1
## 10 48 0 2 120 284 0 0 120 0 0.0 -9 -9 -9 0
ncol(y)
## [1] 14
nrow(y)
## [1] 294
Choosing specific data or creating a subset of data naming it the HHD table. I also name the columns according to the data dictionary. The data dictionary for the “Heart Disease Data Set” can found at http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names.
HHD <- data.frame(y[c(1:4, 7, 9)])
names(HHD) <- c("Age", "Sex", "Chest_Pain_Type", "Resting_Blood_Pressure", "Resting_EKG", "Exercise_Induced_Angina")
head(HHD, 3)
## Age Sex Chest_Pain_Type Resting_Blood_Pressure Resting_EKG
## 1 40 1 2 140 0
## 2 49 0 3 160 0
## 3 37 1 2 130 1
## Exercise_Induced_Angina
## 1 0
## 2 0
## 3 0
I transform the data in the table HHD, subset of the Hungarian “Heart Disease Data Set”.
HHD$Sex <- ifelse(HHD$Sex=="0", "female",
ifelse(HHD$Sex=="1", "male",
ifelse(HHD$Sex=="-9", "missing", "N/A")
))
HHD$Chest_Pain_Type <- ifelse(HHD$Chest_Pain_Type=="1", "typical angina",
ifelse(HHD$Chest_Pain_Type=="2", "atypical angina",
ifelse(HHD$Chest_Pain_Type=="3", "non-anginal pain",
ifelse(HHD$Chest_Pain_Type=="4", "asymptomatic",
ifelse(HHD$Chest_Pain_Type=="-9", "missing", "N/A")
))))
HHD$Resting_EKG <- ifelse(HHD$Resting_EKG=="0", "normal",
ifelse(HHD$Resting_EKG=="1", "ST-T wave abnormality",
ifelse(HHD$Resting_EKG=="2", "probable or definite left ventricular hypertrophy",
ifelse(HHD$Resting_EKG=="-9", "missing", "N/A")
)))
HHD$Exercise_Induced_Angina <- ifelse(HHD$Exercise_Induced_Angina=="0", "no",
ifelse(HHD$Exercise_Induced_Angina=="1", "yes",
ifelse(HHD$Exercise_Induced_Angina=="-9", "missing", "N/A")
))
head(HHD, 10)
## Age Sex Chest_Pain_Type Resting_Blood_Pressure
## 1 40 male atypical angina 140
## 2 49 female non-anginal pain 160
## 3 37 male atypical angina 130
## 4 48 female asymptomatic 138
## 5 54 male non-anginal pain 150
## 6 39 male non-anginal pain 120
## 7 45 female atypical angina 130
## 8 54 male atypical angina 110
## 9 37 male asymptomatic 140
## 10 48 female atypical angina 120
## Resting_EKG Exercise_Induced_Angina
## 1 normal no
## 2 normal no
## 3 ST-T wave abnormality no
## 4 normal yes
## 5 normal no
## 6 normal no
## 7 normal no
## 8 normal no
## 9 normal yes
## 10 normal no
I wanted to see the information of patients who had an abnormal EKG at rest and had angina (chest pain) with exercise. So, I created another subset name SubHHD, which contains only the rows that I wanted to look at.
SubHHD <- subset(HHD, Resting_EKG != "normal" & Resting_EKG !="missing" & Exercise_Induced_Angina == "yes" )
I get rid of the unique row names, and put the SubHHD table in a R Markdown format table using knitr package
rownames(SubHHD) <- NULL
library(knitr)
kable(SubHHD, align = "c", caption = "Table 1: List of patients who had an abnormal EKG at rest and had angina with exercise.")
| Age | Sex | Chest_Pain_Type | Resting_Blood_Pressure | Resting_EKG | Exercise_Induced_Angina |
|---|---|---|---|---|---|
| 58 | male | atypical angina | 136 | ST-T wave abnormality | yes |
| 53 | male | asymptomatic | 124 | ST-T wave abnormality | yes |
| 54 | female | non-anginal pain | 130 | ST-T wave abnormality | yes |
| 52 | male | asymptomatic | 112 | ST-T wave abnormality | yes |
| 52 | male | asymptomatic | 160 | ST-T wave abnormality | yes |
| 57 | male | atypical angina | 140 | ST-T wave abnormality | yes |
| 65 | male | asymptomatic | 130 | ST-T wave abnormality | yes |
| 59 | female | asymptomatic | 130 | ST-T wave abnormality | yes |
| 56 | male | asymptomatic | 170 | ST-T wave abnormality | yes |
| 56 | male | asymptomatic | 150 | ST-T wave abnormality | yes |
| 61 | female | asymptomatic | 130 | ST-T wave abnormality | yes |
| 50 | male | asymptomatic | 140 | ST-T wave abnormality | yes |
| 47 | male | asymptomatic | 160 | ST-T wave abnormality | yes |
| 59 | male | asymptomatic | 140 | probable or definite left ventricular hypertrophy | yes |
| 50 | male | asymptomatic | 140 | ST-T wave abnormality | yes |
| 46 | male | asymptomatic | 110 | ST-T wave abnormality | yes |
| 53 | male | asymptomatic | 180 | ST-T wave abnormality | yes |
| 48 | male | asymptomatic | 122 | ST-T wave abnormality | yes |
| 45 | male | asymptomatic | 130 | ST-T wave abnormality | yes |
| 61 | male | asymptomatic | 125 | ST-T wave abnormality | yes |
| 57 | female | asymptomatic | 180 | ST-T wave abnormality | yes |
Let us see how many male patient and how many female patient there are, who had an abnormal EKG at rest and had angina (chest pain) with exercise in Hungary. Using the plyr package I see that there are only 4 female patient and 17 male patient who meets my criteria.
library(plyr)
SubHHD2 <- count(SubHHD, 'Sex')
kable(SubHHD2, align = "c", caption = "Table 2: Number of patients by sex in table 1.")
| Sex | freq |
|---|---|
| female | 4 |
| male | 17 |