W2Exercise6

MoDS Week 2 Exercise 6

To complete this exercise, you will use a selection of data from the Duke cardiac catheterization coronary artery disease diagnostic dataset provided below (Department of Biostatistics, 2020).

Sex	Age	Duration symptoms coronary artery disease (days)	Cholesterol level mg	Significant coronary disease	Three vessel or left main disease
F	63	medium	192	no	no
M	62	medium	222	yes	yes
M	56	long	224	yes	yes
F	59	short	286	no	no
M	38	medium	275	yes	no
M	52	medium	204	yes	no
M	74	long	285	yes	yes
M	44	short	159	yes	no
M	65	long	205	yes	yes
F	63	short	312	no	no

Table 2.1.1. Selection of data from the Duke cardiac catheterization coronary artery disease diagnostic dataset (Department of Biostatistics, 2020).

Copy and paste your R code for each of the following:

6a Enter a date frame containing the above data in R. (Hint: use read.csv and import the file Duke_Cardiac_Data.csv)

Duke_Cardiac_Data <- read.csv("Duke_Cardiac_Data.csv")

6b What is the sample size of the dataset?

sample_size <- dim(Duke_Cardiac_Data) [1]
sample_size

## [1] 10

6c How many variables are in the dataset?

sample_vars <- dim(Duke_Cardiac_Data) [2]
sample_vars

## [1] 6

6d List the variables that are quantitative.

# The command str() delivers the structure of the data frame. From its result we can read how many variables are quantitative (= num or int) and which ones they are:
str(Duke_Cardiac_Data)

## 'data.frame':    10 obs. of  6 variables:
##  $ Sex                                             : chr  "F" "M" "M" "F" ...
##  $ Age                                             : int  63 62 56 59 38 52 74 44 65 63
##  $ Duration.symptoms.coronary.artery.disease..days.: chr  "medium" "medium" "long" "short" ...
##  $ Cholesterol.level.mg                            : int  192 222 224 286 275 204 285 159 205 312
##  $ Significant.coronary.disease                    : chr  "no" "yes" "yes" "no" ...
##  $ Three.vessel.or.left.main.disease               : chr  "no" "yes" "yes" "no" ...

# The answer is 2

6e List the variables that are categorical. Further, define any categorical variables as ordinal or nominal.

# Sex; nominal
# Duration.symptoms.coronary.artery.disease..days.; ordinal
# Significant.coronary.disease; nominal
# Three.vessel.or.left.main.disease; nominal

6f Compute the median, mean, standard deviation and coefficient of variation of cholesterol levels.

median_Duke <- median(Duke_Cardiac_Data$Cholesterol.level.mg)
mean_Duke <- mean(Duke_Cardiac_Data$Cholesterol.level.mg)
sd_Duke <- sd(Duke_Cardiac_Data$Cholesterol.level.mg)
CV_Duke <- sd_Duke/mean_Duke*100
median_Duke

## [1] 223

mean_Duke

## [1] 236.4

sd_Duke

## [1] 49.87362

CV_Duke

## [1] 21.09713

6g Compute the mode of the duration, significant coronary disease and three vessel or left main disease variables.

# I didn't find a workable answer, posting the given solution instead for completeness
levels<-unique(Duke_Cardiac_Data$Duration.symptoms.coronary.artery.disease..days.)
levels[which.max(tabulate(match(Duke_Cardiac_Data$Duration.symptoms.coronary.artery.disease..days.,levels)))]

## [1] "medium"

levels<-unique(Duke_Cardiac_Data$Significant.coronary.disease)
levels[which.max(tabulate(match(Duke_Cardiac_Data$Significant.coronary.disease,levels)))]

## [1] "yes"

levels<-unique(Duke_Cardiac_Data$Three.vessel.or.left.main.disease)
levels[which.max(tabulate(match(Duke_Cardiac_Data$Three.vessel.or.left.main.disease,levels)))]

## [1] "no"

6h Compute the z-scores for the cholesterol level values.

zscores <- (Duke_Cardiac_Data$Cholesterol.level.mg-
              mean(Duke_Cardiac_Data$Cholesterol.level.mg)/
              sd(Duke_Cardiac_Data$Cholesterol.level.mg))
zscores

##  [1] 187.26 217.26 219.26 281.26 270.26 199.26 280.26 154.26 200.26 307.26

6i Estimate the standard error of the mean for cholesterol level.

SE_Duke <- sd_Duke/sqrt(sample_size)
SE_Duke

## [1] 15.77142

6j Use R to produce a scatterplot using the cholesterol level and age data.

plot(Duke_Cardiac_Data$Cholesterol.level.mg~Duke_Cardiac_Data$Age)

W2Exercise6

Ben Hermann

19/03/2021

MoDS Week 2 Exercise 6