- Read the Data and inspect it.
- Read ClevelandHeart.csv into R and store it in a variable named
clev.
- Display the first few rows and the structure of the dataset.
head(clev)
str(clev)
'data.frame': 303 obs. of 12 variables:
$ Age : int 63 67 67 37 41 56 62 57 63 53 ...
$ Sex : chr "male" "male" "male" "male" ...
$ ChestPain : chr "typical" "asymptomatic" "asymptomatic" "nonanginal" ...
$ RestBP : int 145 160 120 130 130 120 140 120 130 140 ...
$ Chol : int 233 286 229 250 204 236 268 354 254 203 ...
$ Fbs : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
$ RestECG : int 2 2 2 0 2 0 2 0 2 2 ...
$ MaxHR : int 150 108 129 187 172 178 160 163 147 155 ...
$ ExAng : chr "no" "yes" "yes" "no" ...
$ Fluoroscopy: int 0 3 2 0 0 0 2 0 1 0 ...
$ Thal : chr "fixed" "normal" "reversable" "normal" ...
$ AHD : chr "No" "Yes" "Yes" "No" ...
- State which variables are categorical and which are numerical.
Integer- Numerical Variables Character- Categirical Variables
sapply(clev, class)
Age Sex ChestPain RestBP Chol Fbs
"integer" "character" "character" "integer" "integer" "logical"
RestECG MaxHR ExAng Fluoroscopy Thal AHD
"integer" "integer" "character" "integer" "character" "character"
- Nonparametric tests.
- Use the Wilcoxon rank-sum test to compare Age between males and
females.
wilcox.test(Age ~ Sex, data = clev)
Wilcoxon rank sum test with continuity correction
data: Age by Sex
W = 11222, p-value = 0.08352
alternative hypothesis: true location shift is not equal to 0
- Use binom.test to test whether the proportion of AHD = Yes is equal
to 0
yes_count <- sum(clev$AHD == "Yes")
binom.test(x = yes_count, n = nrow(clev), p = 0.5)
Exact binomial test
data: yes_count and nrow(clev)
number of successes = 139, number of trials = 303, p-value =
0.1679
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4016348 0.5166714
sample estimates:
probability of success
0.4587459
- Chi-squared tests.
- Use table to make a contingency table for Sex and AHD.
contingency_table <- table(clev$Sex, clev$AHD)
contingency_table
- Use chisq.test on the contingency table and state the
conclusion.
chi_test <- chisq.test(contingency_table)
chi_test
- Manually create the same 2 × 2 table using matrix, then run
chisq.test(m, correct = F).
manual_matrix <- matrix(c(10, 15, 20, 25), nrow = 2, byrow = TRUE)
dimnames(manual_matrix) <- list(Sex = c("Male", "Female"), AHD = c("No", "Yes"))
chisq.test(manual_matrix, correct = FALSE)
- Use chisq.test for a goodness-of-fit test on the counts of AHD if
the null model is 50%-50%
ahd_counts <- table(clev$AHD)
chisq.test(ahd_counts, p = c(0.5, 0.5))
- One-way ANOVA.
- Use ChestPain as the categorical variable and Chol as the numerical
response.
- Make a side-by-side boxplot with boxplot(Chol ChestPain,
data=clev).
boxplot(Chol ~ ChestPain, data = clev)
- Fit a one-way ANOVA model using lm and anova.
model <- lm(Chol ~ ChestPain, data = clev)
anova(model)
- Check the ANOVA conditions by making a QQ plot of the residuals and
a plot of residuals versus fitted values.
qqnorm(residuals(model))
qqline(residuals(model))
plot(fitted(model), residuals(model))
- State your conclusion in context. Since the p-value (0.6007) is much
greater than the common significance level of 0.05, we fail to reject
the null hypothesis.
This means there is no statistically significant difference in mean
cholesterol levels across the different types of chest pain in this data
set. In simpler terms, the average cholesterol level appears to be
similar regardless of the type of chest pain a person experiences.
- Two-way ANOVA.
- Use Sex and AHD as the two categorical variables, and use Chol as
the response.
- Make an interaction boxplot using boxplot(Chol Sex+AHD,
data=clev).
boxplot(Chol ~ Sex + AHD, data = clev)

- Fit the model with interaction using lm(Chol Sex*AHD,
data=clev).
model_with_interaction <- lm(Chol ~ Sex * AHD, data = clev)
anova(model_with_interaction)
Analysis of Variance Table
Response: Chol
Df Sum Sq Mean Sq F value Pr(>F)
Sex 1 32357 32357 12.7363 0.0004176 ***
AHD 1 17309 17309 6.8132 0.0095044 **
Sex:AHD 1 333 333 0.1310 0.7176778
Residuals 299 759618 2541
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Fit the model without interaction using lm(Chol Sex+AHD,
data=clev).
model_without_interaction <- lm(Chol ~ Sex + AHD, data = clev)
anova(model_without_interaction)
Analysis of Variance Table
Response: Chol
Df Sum Sq Mean Sq F value Pr(>F)
Sex 1 32357 32357 12.773 0.0004095 ***
AHD 1 17309 17309 6.833 0.0094005 **
Residuals 300 759950 2533
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Compare the two ANOVA tables and determine whether the interaction
term is significant. The p-value for the Sex:AHD interaction term in the
model with interaction is 0.7176778. Since this p-value is much greater
than 0.05, the interaction term is not statistically significant.
This means that the effect of Sex on Chol does not significantly
depend on AHD, and vice-versa. In other words, the relationship between
Sex and Chol is consistent across different AHD groups, and the
relationship between AHD and Chol is consistent across different Sex
groups.
- Check the residual plots for the model without interaction. Given
that the interaction is not significant, the model without interaction
(model_without_interaction) is generally preferred as it is simpler and
explains the data almost as well.
- Correlation.
- Compute the correlation between Age and MaxHR.
correlation_result <- cor.test(clev$Age, clev$MaxHR)
- Report the correlation coefficient, confidence interval, and
p-value.
correlation_result
Pearson's product-moment correlation
data: clev$Age and clev$MaxHR
t = -7.4329, df = 301, p-value = 1.109e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.4849644 -0.2941816
sample estimates:
cor
-0.3938058
- Least squares regression.
- Use Age as the explanatory variable and MaxHR as the response
variable.
- Make a scatterplot and add the least squares line using abline.
plot(clev$Age, clev$MaxHR, xlab = "Age", ylab = "MaxHR")
abline(lm(MaxHR ~ Age, data = clev), col = "pink")

- Fit the regression line using lm.
model <- lm(MaxHR ~ Age, data = clev)
- Display the regression summary using summary.
summary(model)
Call:
lm(formula = MaxHR ~ Age, data = clev)
Residuals:
Min 1Q Median 3Q Max
-66.088 -12.040 3.965 15.937 44.955
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 203.8634 7.3991 27.553 < 2e-16 ***
Age -0.9966 0.1341 -7.433 1.11e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21.06 on 301 degrees of freedom
Multiple R-squared: 0.1551, Adjusted R-squared: 0.1523
F-statistic: 55.25 on 1 and 301 DF, p-value: 1.109e-12
- Compare the two ANOVA tables and determine whether the interaction
term is significant.
confint(model)
2.5 % 97.5 %
(Intercept) 189.302961 218.4238178
Age -1.260505 -0.7327788
- Check the residual plots for the model without interaction.
par(mfrow= c (1, 2))
plot(clev$Age, residuals(model), xlab = "Age", ylab = "Residuals")
abline(h = 0, col = "orange")

qqnorm(residuals(model))
qqline(residuals(model), col = "purple")
par(mfrow = c(1, 1))

LS0tCnRpdGxlOiAiUiBwcm9qZWN0IDMiCm91dHB1dDoKICBodG1sX25vdGVib29rOiBkZWZhdWx0CiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0Ck5hbWU6IFNhcmFoIEhhbWlsdG9uCi0tLQoKMS4gIFJlYWQgdGhlIERhdGEgYW5kIGluc3BlY3QgaXQuCgo8IS0tIC0tPgoKKGEpIFJlYWQgQ2xldmVsYW5kSGVhcnQuY3N2IGludG8gUiBhbmQgc3RvcmUgaXQgaW4gYSB2YXJpYWJsZSBuYW1lZCBjbGV2LgooYikgRGlzcGxheSB0aGUgZmlyc3QgZmV3IHJvd3MgYW5kIHRoZSBzdHJ1Y3R1cmUgb2YgdGhlIGRhdGFzZXQuCgpgYGB7cn0KaGVhZChjbGV2KQpzdHIoY2xldikKYGBgCgooYykgU3RhdGUgd2hpY2ggdmFyaWFibGVzIGFyZSBjYXRlZ29yaWNhbCBhbmQgd2hpY2ggYXJlIG51bWVyaWNhbC4gSW50ZWdlci0gTnVtZXJpY2FsIFZhcmlhYmxlcyBDaGFyYWN0ZXItIENhdGVnaXJpY2FsIFZhcmlhYmxlcwoKYGBge3J9CnNhcHBseShjbGV2LCBjbGFzcykKYGBgCgoyLiAgTm9ucGFyYW1ldHJpYyB0ZXN0cy4KCjwhLS0gLS0+CgooYSkgVXNlIHRoZSBXaWxjb3hvbiByYW5rLXN1bSB0ZXN0IHRvIGNvbXBhcmUgQWdlIGJldHdlZW4gbWFsZXMgYW5kIGZlbWFsZXMuCgpgYGB7cn0Kd2lsY294LnRlc3QoQWdlIH4gU2V4LCBkYXRhID0gY2xldikKYGBgCgooYikgVXNlIGJpbm9tLnRlc3QgdG8gdGVzdCB3aGV0aGVyIHRoZSBwcm9wb3J0aW9uIG9mIEFIRCA9IFllcyBpcyBlcXVhbCB0byAwCgpgYGB7cn0KeWVzX2NvdW50IDwtIHN1bShjbGV2JEFIRCA9PSAiWWVzIikKYmlub20udGVzdCh4ID0geWVzX2NvdW50LCBuID0gbnJvdyhjbGV2KSwgcCA9IDAuNSkKYGBgCgozLiAgQ2hpLXNxdWFyZWQgdGVzdHMuCgo8IS0tIC0tPgoKKGEpIFVzZSB0YWJsZSB0byBtYWtlIGEgY29udGluZ2VuY3kgdGFibGUgZm9yIFNleCBhbmQgQUhELgoKYGBge3J9CmNvbnRpbmdlbmN5X3RhYmxlIDwtIHRhYmxlKGNsZXYkU2V4LCBjbGV2JEFIRCkKY29udGluZ2VuY3lfdGFibGUKYGBgCgooYikgVXNlIGNoaXNxLnRlc3Qgb24gdGhlIGNvbnRpbmdlbmN5IHRhYmxlIGFuZCBzdGF0ZSB0aGUgY29uY2x1c2lvbi4KCmBgYHtyfQpjaGlfdGVzdCA8LSBjaGlzcS50ZXN0KGNvbnRpbmdlbmN5X3RhYmxlKQpjaGlfdGVzdApgYGAKCihjKSBNYW51YWxseSBjcmVhdGUgdGhlIHNhbWUgMiDDlyAyIHRhYmxlIHVzaW5nIG1hdHJpeCwgdGhlbiBydW4gY2hpc3EudGVzdChtLCBjb3JyZWN0ID0gRikuCgpgYGB7cn0KbWFudWFsX21hdHJpeCA8LSBtYXRyaXgoYygxMCwgMTUsIDIwLCAyNSksIG5yb3cgPSAyLCBieXJvdyA9IFRSVUUpCmRpbW5hbWVzKG1hbnVhbF9tYXRyaXgpIDwtIGxpc3QoU2V4ID0gYygiTWFsZSIsICJGZW1hbGUiKSwgQUhEID0gYygiTm8iLCAiWWVzIikpCmNoaXNxLnRlc3QobWFudWFsX21hdHJpeCwgY29ycmVjdCA9IEZBTFNFKQpgYGAKCihkKSBVc2UgY2hpc3EudGVzdCBmb3IgYSBnb29kbmVzcy1vZi1maXQgdGVzdCBvbiB0aGUgY291bnRzIG9mIEFIRCBpZiB0aGUgbnVsbCBtb2RlbCBpcyA1MCUtNTAlCgpgYGB7cn0KYWhkX2NvdW50cyA8LSB0YWJsZShjbGV2JEFIRCkKY2hpc3EudGVzdChhaGRfY291bnRzLCBwID0gYygwLjUsIDAuNSkpCmBgYAoKNC4gIE9uZS13YXkgQU5PVkEuCgo8IS0tIC0tPgoKKGEpIFVzZSBDaGVzdFBhaW4gYXMgdGhlIGNhdGVnb3JpY2FsIHZhcmlhYmxlIGFuZCBDaG9sIGFzIHRoZSBudW1lcmljYWwgcmVzcG9uc2UuCihiKSBNYWtlIGEgc2lkZS1ieS1zaWRlIGJveHBsb3Qgd2l0aCBib3hwbG90KENob2wgQ2hlc3RQYWluLCBkYXRhPWNsZXYpLgoKYGBge3J9CmJveHBsb3QoQ2hvbCB+IENoZXN0UGFpbiwgZGF0YSA9IGNsZXYpCmBgYAoKKGMpIEZpdCBhIG9uZS13YXkgQU5PVkEgbW9kZWwgdXNpbmcgbG0gYW5kIGFub3ZhLgoKYGBge3J9Cm1vZGVsIDwtIGxtKENob2wgfiBDaGVzdFBhaW4sIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbCkKYGBgCgooZCkgQ2hlY2sgdGhlIEFOT1ZBIGNvbmRpdGlvbnMgYnkgbWFraW5nIGEgUVEgcGxvdCBvZiB0aGUgcmVzaWR1YWxzIGFuZCBhIHBsb3Qgb2YgcmVzaWR1YWxzIHZlcnN1cyBmaXR0ZWQgdmFsdWVzLgoKYGBge3J9CnFxbm9ybShyZXNpZHVhbHMobW9kZWwpKQpxcWxpbmUocmVzaWR1YWxzKG1vZGVsKSkKYGBgCgpgYGB7cn0KcGxvdChmaXR0ZWQobW9kZWwpLCByZXNpZHVhbHMobW9kZWwpKQpgYGAKCihlKSBTdGF0ZSB5b3VyIGNvbmNsdXNpb24gaW4gY29udGV4dC4gU2luY2UgdGhlIHAtdmFsdWUgKDAuNjAwNykgaXMgbXVjaCBncmVhdGVyIHRoYW4gdGhlIGNvbW1vbiBzaWduaWZpY2FuY2UgbGV2ZWwgb2YgMC4wNSwgd2UgZmFpbCB0byByZWplY3QgdGhlIG51bGwgaHlwb3RoZXNpcy4KClRoaXMgbWVhbnMgdGhlcmUgaXMgbm8gc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGluIG1lYW4gY2hvbGVzdGVyb2wgbGV2ZWxzIGFjcm9zcyB0aGUgZGlmZmVyZW50IHR5cGVzIG9mIGNoZXN0IHBhaW4gaW4gdGhpcyBkYXRhIHNldC4gSW4gc2ltcGxlciB0ZXJtcywgdGhlIGF2ZXJhZ2UgY2hvbGVzdGVyb2wgbGV2ZWwgYXBwZWFycyB0byBiZSBzaW1pbGFyIHJlZ2FyZGxlc3Mgb2YgdGhlIHR5cGUgb2YgY2hlc3QgcGFpbiBhIHBlcnNvbiBleHBlcmllbmNlcy4KCjUuIFR3by13YXkgQU5PVkEuCihhKSBVc2UgU2V4IGFuZCBBSEQgYXMgdGhlIHR3byBjYXRlZ29yaWNhbCB2YXJpYWJsZXMsIGFuZCB1c2UgQ2hvbCBhcyB0aGUgcmVzcG9uc2UuCihiKSBNYWtlIGFuIGludGVyYWN0aW9uIGJveHBsb3QgdXNpbmcgYm94cGxvdChDaG9sIFNleCtBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQpib3hwbG90KENob2wgfiBTZXggKyBBSEQsIGRhdGEgPSBjbGV2KQpgYGAKKGMpIEZpdCB0aGUgbW9kZWwgd2l0aCBpbnRlcmFjdGlvbiB1c2luZyBsbShDaG9sIFNleCpBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQptb2RlbF93aXRoX2ludGVyYWN0aW9uIDwtIGxtKENob2wgfiBTZXggKiBBSEQsIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbF93aXRoX2ludGVyYWN0aW9uKQpgYGAKKGQpIEZpdCB0aGUgbW9kZWwgd2l0aG91dCBpbnRlcmFjdGlvbiB1c2luZyBsbShDaG9sIFNleCtBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQptb2RlbF93aXRob3V0X2ludGVyYWN0aW9uIDwtIGxtKENob2wgfiBTZXggKyBBSEQsIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbF93aXRob3V0X2ludGVyYWN0aW9uKQpgYGAKKGUpIENvbXBhcmUgdGhlIHR3byBBTk9WQSB0YWJsZXMgYW5kIGRldGVybWluZSB3aGV0aGVyIHRoZSBpbnRlcmFjdGlvbiB0ZXJtIGlzCnNpZ25pZmljYW50LgpUaGUgcC12YWx1ZSBmb3IgdGhlIFNleDpBSEQgaW50ZXJhY3Rpb24gdGVybSBpbiB0aGUgbW9kZWwgd2l0aCBpbnRlcmFjdGlvbiBpcyAwLjcxNzY3NzguIFNpbmNlIHRoaXMgcC12YWx1ZSBpcyBtdWNoIGdyZWF0ZXIgdGhhbiAwLjA1LCB0aGUgaW50ZXJhY3Rpb24gdGVybSBpcyBub3Qgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudC4KClRoaXMgbWVhbnMgdGhhdCB0aGUgZWZmZWN0IG9mIFNleCBvbiBDaG9sIGRvZXMgbm90IHNpZ25pZmljYW50bHkgZGVwZW5kIG9uIEFIRCwgYW5kIHZpY2UtdmVyc2EuIEluIG90aGVyIHdvcmRzLCB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gU2V4IGFuZCBDaG9sIGlzIGNvbnNpc3RlbnQgYWNyb3NzIGRpZmZlcmVudCBBSEQgZ3JvdXBzLCBhbmQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIEFIRCBhbmQgQ2hvbCBpcyBjb25zaXN0ZW50IGFjcm9zcyBkaWZmZXJlbnQgU2V4IGdyb3Vwcy4KCihmKSBDaGVjayB0aGUgcmVzaWR1YWwgcGxvdHMgZm9yIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uLgpHaXZlbiB0aGF0IHRoZSBpbnRlcmFjdGlvbiBpcyBub3Qgc2lnbmlmaWNhbnQsIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uIChtb2RlbF93aXRob3V0X2ludGVyYWN0aW9uKSBpcyBnZW5lcmFsbHkgcHJlZmVycmVkIGFzIGl0IGlzIHNpbXBsZXIgYW5kIGV4cGxhaW5zIHRoZSBkYXRhIGFsbW9zdCBhcyB3ZWxsLgoKNi4gQ29ycmVsYXRpb24uCihhKSBDb21wdXRlIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIEFnZSBhbmQgTWF4SFIuCmBgYHtyfQpjb3JyZWxhdGlvbl9yZXN1bHQgPC0gY29yLnRlc3QoY2xldiRBZ2UsIGNsZXYkTWF4SFIpCmBgYAooYikgUmVwb3J0IHRoZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCwgY29uZmlkZW5jZSBpbnRlcnZhbCwgYW5kIHAtdmFsdWUuCmBgYHtyfQpjb3JyZWxhdGlvbl9yZXN1bHQKYGBgCjcuIExlYXN0IHNxdWFyZXMgcmVncmVzc2lvbi4KKGEpIFVzZSBBZ2UgYXMgdGhlIGV4cGxhbmF0b3J5IHZhcmlhYmxlIGFuZCBNYXhIUiBhcyB0aGUgcmVzcG9uc2UgdmFyaWFibGUuCihiKSBNYWtlIGEgc2NhdHRlcnBsb3QgYW5kIGFkZCB0aGUgbGVhc3Qgc3F1YXJlcyBsaW5lIHVzaW5nIGFibGluZS4KYGBge3J9CnBsb3QoY2xldiRBZ2UsIGNsZXYkTWF4SFIsIHhsYWIgPSAiQWdlIiwgeWxhYiA9ICJNYXhIUiIpCmFibGluZShsbShNYXhIUiB+IEFnZSwgZGF0YSA9IGNsZXYpLCBjb2wgPSAicGluayIpCmBgYAooYykgRml0IHRoZSByZWdyZXNzaW9uIGxpbmUgdXNpbmcgbG0uCmBgYHtyfQptb2RlbCA8LSBsbShNYXhIUiB+IEFnZSwgZGF0YSA9IGNsZXYpCmBgYAooZCkgRGlzcGxheSB0aGUgcmVncmVzc2lvbiBzdW1tYXJ5IHVzaW5nIHN1bW1hcnkuCmBgYHtyfQpzdW1tYXJ5KG1vZGVsKQpgYGAKKGUpIENvbXBhcmUgdGhlIHR3byBBTk9WQSB0YWJsZXMgYW5kIGRldGVybWluZSB3aGV0aGVyIHRoZSBpbnRlcmFjdGlvbiB0ZXJtIGlzCnNpZ25pZmljYW50LgpgYGB7cn0KY29uZmludChtb2RlbCkKYGBgCihmKSBDaGVjayB0aGUgcmVzaWR1YWwgcGxvdHMgZm9yIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uLgpgYGB7cn0KcGFyKG1mcm93PSBjICgxLCAyKSkKcGxvdChjbGV2JEFnZSwgcmVzaWR1YWxzKG1vZGVsKSwgeGxhYiA9ICJBZ2UiLCB5bGFiID0gIlJlc2lkdWFscyIpCmFibGluZShoID0gMCwgY29sID0gIm9yYW5nZSIpCmBgYApgYGB7cn0KcXFub3JtKHJlc2lkdWFscyhtb2RlbCkpCnFxbGluZShyZXNpZHVhbHMobW9kZWwpLCBjb2wgPSAicHVycGxlIikKcGFyKG1mcm93ID0gYygxLCAxKSkKCmBgYAoK