R project 3

Read the Data and inspect it.

Read ClevelandHeart.csv into R and store it in a variable named clev.
Display the first few rows and the structure of the dataset.

head(clev)

str(clev)

'data.frame':   303 obs. of  12 variables:
 $ Age        : int  63 67 67 37 41 56 62 57 63 53 ...
 $ Sex        : chr  "male" "male" "male" "male" ...
 $ ChestPain  : chr  "typical" "asymptomatic" "asymptomatic" "nonanginal" ...
 $ RestBP     : int  145 160 120 130 130 120 140 120 130 140 ...
 $ Chol       : int  233 286 229 250 204 236 268 354 254 203 ...
 $ Fbs        : logi  TRUE FALSE FALSE FALSE FALSE FALSE ...
 $ RestECG    : int  2 2 2 0 2 0 2 0 2 2 ...
 $ MaxHR      : int  150 108 129 187 172 178 160 163 147 155 ...
 $ ExAng      : chr  "no" "yes" "yes" "no" ...
 $ Fluoroscopy: int  0 3 2 0 0 0 2 0 1 0 ...
 $ Thal       : chr  "fixed" "normal" "reversable" "normal" ...
 $ AHD        : chr  "No" "Yes" "Yes" "No" ...

State which variables are categorical and which are numerical. Integer- Numerical Variables Character- Categirical Variables

sapply(clev, class)

        Age         Sex   ChestPain      RestBP        Chol         Fbs 
  "integer" "character" "character"   "integer"   "integer"   "logical" 
    RestECG       MaxHR       ExAng Fluoroscopy        Thal         AHD 
  "integer"   "integer" "character"   "integer" "character" "character"

Nonparametric tests.

Use the Wilcoxon rank-sum test to compare Age between males and females.

wilcox.test(Age ~ Sex, data = clev)


    Wilcoxon rank sum test with continuity correction

data:  Age by Sex
W = 11222, p-value = 0.08352
alternative hypothesis: true location shift is not equal to 0

Use binom.test to test whether the proportion of AHD = Yes is equal to 0

yes_count <- sum(clev$AHD == "Yes")
binom.test(x = yes_count, n = nrow(clev), p = 0.5)


    Exact binomial test

data:  yes_count and nrow(clev)
number of successes = 139, number of trials = 303, p-value =
0.1679
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4016348 0.5166714
sample estimates:
probability of success 
             0.4587459

Chi-squared tests.

Use table to make a contingency table for Sex and AHD.

contingency_table <- table(clev$Sex, clev$AHD)
contingency_table

Use chisq.test on the contingency table and state the conclusion.

chi_test <- chisq.test(contingency_table)
chi_test

Manually create the same 2 × 2 table using matrix, then run chisq.test(m, correct = F).

manual_matrix <- matrix(c(10, 15, 20, 25), nrow = 2, byrow = TRUE)
dimnames(manual_matrix) <- list(Sex = c("Male", "Female"), AHD = c("No", "Yes"))
chisq.test(manual_matrix, correct = FALSE)

Use chisq.test for a goodness-of-fit test on the counts of AHD if the null model is 50%-50%

ahd_counts <- table(clev$AHD)
chisq.test(ahd_counts, p = c(0.5, 0.5))

One-way ANOVA.

Use ChestPain as the categorical variable and Chol as the numerical response.
Make a side-by-side boxplot with boxplot(Chol ChestPain, data=clev).

boxplot(Chol ~ ChestPain, data = clev)

Fit a one-way ANOVA model using lm and anova.

model <- lm(Chol ~ ChestPain, data = clev)
anova(model)

Check the ANOVA conditions by making a QQ plot of the residuals and a plot of residuals versus fitted values.

qqnorm(residuals(model))
qqline(residuals(model))

plot(fitted(model), residuals(model))

State your conclusion in context. Since the p-value (0.6007) is much greater than the common significance level of 0.05, we fail to reject the null hypothesis.

This means there is no statistically significant difference in mean cholesterol levels across the different types of chest pain in this data set. In simpler terms, the average cholesterol level appears to be similar regardless of the type of chest pain a person experiences.

Two-way ANOVA.

Use Sex and AHD as the two categorical variables, and use Chol as the response.
Make an interaction boxplot using boxplot(Chol Sex+AHD, data=clev).

boxplot(Chol ~ Sex + AHD, data = clev)

Fit the model with interaction using lm(Chol Sex*AHD, data=clev).

model_with_interaction <- lm(Chol ~ Sex * AHD, data = clev)
anova(model_with_interaction)

Analysis of Variance Table

Response: Chol
           Df Sum Sq Mean Sq F value    Pr(>F)    
Sex         1  32357   32357 12.7363 0.0004176 ***
AHD         1  17309   17309  6.8132 0.0095044 ** 
Sex:AHD     1    333     333  0.1310 0.7176778    
Residuals 299 759618    2541                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Fit the model without interaction using lm(Chol Sex+AHD, data=clev).

model_without_interaction <- lm(Chol ~ Sex + AHD, data = clev)
anova(model_without_interaction)

Analysis of Variance Table

Response: Chol
           Df Sum Sq Mean Sq F value    Pr(>F)    
Sex         1  32357   32357  12.773 0.0004095 ***
AHD         1  17309   17309   6.833 0.0094005 ** 
Residuals 300 759950    2533                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Compare the two ANOVA tables and determine whether the interaction term is significant. The p-value for the Sex:AHD interaction term in the model with interaction is 0.7176778. Since this p-value is much greater than 0.05, the interaction term is not statistically significant.

This means that the effect of Sex on Chol does not significantly depend on AHD, and vice-versa. In other words, the relationship between Sex and Chol is consistent across different AHD groups, and the relationship between AHD and Chol is consistent across different Sex groups.

Check the residual plots for the model without interaction. Given that the interaction is not significant, the model without interaction (model_without_interaction) is generally preferred as it is simpler and explains the data almost as well.

Correlation.

Compute the correlation between Age and MaxHR.

correlation_result <- cor.test(clev$Age, clev$MaxHR)

Report the correlation coefficient, confidence interval, and p-value.

correlation_result


    Pearson's product-moment correlation

data:  clev$Age and clev$MaxHR
t = -7.4329, df = 301, p-value = 1.109e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.4849644 -0.2941816
sample estimates:
       cor 
-0.3938058

Least squares regression.

Use Age as the explanatory variable and MaxHR as the response variable.
Make a scatterplot and add the least squares line using abline.

plot(clev$Age, clev$MaxHR, xlab = "Age", ylab = "MaxHR")
abline(lm(MaxHR ~ Age, data = clev), col = "pink")

Fit the regression line using lm.

model <- lm(MaxHR ~ Age, data = clev)

Display the regression summary using summary.

summary(model)


Call:
lm(formula = MaxHR ~ Age, data = clev)

Residuals:
    Min      1Q  Median      3Q     Max 
-66.088 -12.040   3.965  15.937  44.955 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 203.8634     7.3991  27.553  < 2e-16 ***
Age          -0.9966     0.1341  -7.433 1.11e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.06 on 301 degrees of freedom
Multiple R-squared:  0.1551,    Adjusted R-squared:  0.1523 
F-statistic: 55.25 on 1 and 301 DF,  p-value: 1.109e-12

Compare the two ANOVA tables and determine whether the interaction term is significant.

confint(model)

                 2.5 %      97.5 %
(Intercept) 189.302961 218.4238178
Age          -1.260505  -0.7327788

Check the residual plots for the model without interaction.

par(mfrow= c (1, 2))
plot(clev$Age, residuals(model), xlab = "Age", ylab = "Residuals")
abline(h = 0, col = "orange")

qqnorm(residuals(model))
qqline(residuals(model), col = "purple")

par(mfrow = c(1, 1))

LS0tCnRpdGxlOiAiUiBwcm9qZWN0IDMiCm91dHB1dDoKICBodG1sX25vdGVib29rOiBkZWZhdWx0CiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0Ck5hbWU6IFNhcmFoIEhhbWlsdG9uCi0tLQoKMS4gIFJlYWQgdGhlIERhdGEgYW5kIGluc3BlY3QgaXQuCgo8IS0tIC0tPgoKKGEpIFJlYWQgQ2xldmVsYW5kSGVhcnQuY3N2IGludG8gUiBhbmQgc3RvcmUgaXQgaW4gYSB2YXJpYWJsZSBuYW1lZCBjbGV2LgooYikgRGlzcGxheSB0aGUgZmlyc3QgZmV3IHJvd3MgYW5kIHRoZSBzdHJ1Y3R1cmUgb2YgdGhlIGRhdGFzZXQuCgpgYGB7cn0KaGVhZChjbGV2KQpzdHIoY2xldikKYGBgCgooYykgU3RhdGUgd2hpY2ggdmFyaWFibGVzIGFyZSBjYXRlZ29yaWNhbCBhbmQgd2hpY2ggYXJlIG51bWVyaWNhbC4gSW50ZWdlci0gTnVtZXJpY2FsIFZhcmlhYmxlcyBDaGFyYWN0ZXItIENhdGVnaXJpY2FsIFZhcmlhYmxlcwoKYGBge3J9CnNhcHBseShjbGV2LCBjbGFzcykKYGBgCgoyLiAgTm9ucGFyYW1ldHJpYyB0ZXN0cy4KCjwhLS0gLS0+CgooYSkgVXNlIHRoZSBXaWxjb3hvbiByYW5rLXN1bSB0ZXN0IHRvIGNvbXBhcmUgQWdlIGJldHdlZW4gbWFsZXMgYW5kIGZlbWFsZXMuCgpgYGB7cn0Kd2lsY294LnRlc3QoQWdlIH4gU2V4LCBkYXRhID0gY2xldikKYGBgCgooYikgVXNlIGJpbm9tLnRlc3QgdG8gdGVzdCB3aGV0aGVyIHRoZSBwcm9wb3J0aW9uIG9mIEFIRCA9IFllcyBpcyBlcXVhbCB0byAwCgpgYGB7cn0KeWVzX2NvdW50IDwtIHN1bShjbGV2JEFIRCA9PSAiWWVzIikKYmlub20udGVzdCh4ID0geWVzX2NvdW50LCBuID0gbnJvdyhjbGV2KSwgcCA9IDAuNSkKYGBgCgozLiAgQ2hpLXNxdWFyZWQgdGVzdHMuCgo8IS0tIC0tPgoKKGEpIFVzZSB0YWJsZSB0byBtYWtlIGEgY29udGluZ2VuY3kgdGFibGUgZm9yIFNleCBhbmQgQUhELgoKYGBge3J9CmNvbnRpbmdlbmN5X3RhYmxlIDwtIHRhYmxlKGNsZXYkU2V4LCBjbGV2JEFIRCkKY29udGluZ2VuY3lfdGFibGUKYGBgCgooYikgVXNlIGNoaXNxLnRlc3Qgb24gdGhlIGNvbnRpbmdlbmN5IHRhYmxlIGFuZCBzdGF0ZSB0aGUgY29uY2x1c2lvbi4KCmBgYHtyfQpjaGlfdGVzdCA8LSBjaGlzcS50ZXN0KGNvbnRpbmdlbmN5X3RhYmxlKQpjaGlfdGVzdApgYGAKCihjKSBNYW51YWxseSBjcmVhdGUgdGhlIHNhbWUgMiDDlyAyIHRhYmxlIHVzaW5nIG1hdHJpeCwgdGhlbiBydW4gY2hpc3EudGVzdChtLCBjb3JyZWN0ID0gRikuCgpgYGB7cn0KbWFudWFsX21hdHJpeCA8LSBtYXRyaXgoYygxMCwgMTUsIDIwLCAyNSksIG5yb3cgPSAyLCBieXJvdyA9IFRSVUUpCmRpbW5hbWVzKG1hbnVhbF9tYXRyaXgpIDwtIGxpc3QoU2V4ID0gYygiTWFsZSIsICJGZW1hbGUiKSwgQUhEID0gYygiTm8iLCAiWWVzIikpCmNoaXNxLnRlc3QobWFudWFsX21hdHJpeCwgY29ycmVjdCA9IEZBTFNFKQpgYGAKCihkKSBVc2UgY2hpc3EudGVzdCBmb3IgYSBnb29kbmVzcy1vZi1maXQgdGVzdCBvbiB0aGUgY291bnRzIG9mIEFIRCBpZiB0aGUgbnVsbCBtb2RlbCBpcyA1MCUtNTAlCgpgYGB7cn0KYWhkX2NvdW50cyA8LSB0YWJsZShjbGV2JEFIRCkKY2hpc3EudGVzdChhaGRfY291bnRzLCBwID0gYygwLjUsIDAuNSkpCmBgYAoKNC4gIE9uZS13YXkgQU5PVkEuCgo8IS0tIC0tPgoKKGEpIFVzZSBDaGVzdFBhaW4gYXMgdGhlIGNhdGVnb3JpY2FsIHZhcmlhYmxlIGFuZCBDaG9sIGFzIHRoZSBudW1lcmljYWwgcmVzcG9uc2UuCihiKSBNYWtlIGEgc2lkZS1ieS1zaWRlIGJveHBsb3Qgd2l0aCBib3hwbG90KENob2wgQ2hlc3RQYWluLCBkYXRhPWNsZXYpLgoKYGBge3J9CmJveHBsb3QoQ2hvbCB+IENoZXN0UGFpbiwgZGF0YSA9IGNsZXYpCmBgYAoKKGMpIEZpdCBhIG9uZS13YXkgQU5PVkEgbW9kZWwgdXNpbmcgbG0gYW5kIGFub3ZhLgoKYGBge3J9Cm1vZGVsIDwtIGxtKENob2wgfiBDaGVzdFBhaW4sIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbCkKYGBgCgooZCkgQ2hlY2sgdGhlIEFOT1ZBIGNvbmRpdGlvbnMgYnkgbWFraW5nIGEgUVEgcGxvdCBvZiB0aGUgcmVzaWR1YWxzIGFuZCBhIHBsb3Qgb2YgcmVzaWR1YWxzIHZlcnN1cyBmaXR0ZWQgdmFsdWVzLgoKYGBge3J9CnFxbm9ybShyZXNpZHVhbHMobW9kZWwpKQpxcWxpbmUocmVzaWR1YWxzKG1vZGVsKSkKYGBgCgpgYGB7cn0KcGxvdChmaXR0ZWQobW9kZWwpLCByZXNpZHVhbHMobW9kZWwpKQpgYGAKCihlKSBTdGF0ZSB5b3VyIGNvbmNsdXNpb24gaW4gY29udGV4dC4gU2luY2UgdGhlIHAtdmFsdWUgKDAuNjAwNykgaXMgbXVjaCBncmVhdGVyIHRoYW4gdGhlIGNvbW1vbiBzaWduaWZpY2FuY2UgbGV2ZWwgb2YgMC4wNSwgd2UgZmFpbCB0byByZWplY3QgdGhlIG51bGwgaHlwb3RoZXNpcy4KClRoaXMgbWVhbnMgdGhlcmUgaXMgbm8gc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGluIG1lYW4gY2hvbGVzdGVyb2wgbGV2ZWxzIGFjcm9zcyB0aGUgZGlmZmVyZW50IHR5cGVzIG9mIGNoZXN0IHBhaW4gaW4gdGhpcyBkYXRhIHNldC4gSW4gc2ltcGxlciB0ZXJtcywgdGhlIGF2ZXJhZ2UgY2hvbGVzdGVyb2wgbGV2ZWwgYXBwZWFycyB0byBiZSBzaW1pbGFyIHJlZ2FyZGxlc3Mgb2YgdGhlIHR5cGUgb2YgY2hlc3QgcGFpbiBhIHBlcnNvbiBleHBlcmllbmNlcy4KCjUuIFR3by13YXkgQU5PVkEuCihhKSBVc2UgU2V4IGFuZCBBSEQgYXMgdGhlIHR3byBjYXRlZ29yaWNhbCB2YXJpYWJsZXMsIGFuZCB1c2UgQ2hvbCBhcyB0aGUgcmVzcG9uc2UuCihiKSBNYWtlIGFuIGludGVyYWN0aW9uIGJveHBsb3QgdXNpbmcgYm94cGxvdChDaG9sIFNleCtBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQpib3hwbG90KENob2wgfiBTZXggKyBBSEQsIGRhdGEgPSBjbGV2KQpgYGAKKGMpIEZpdCB0aGUgbW9kZWwgd2l0aCBpbnRlcmFjdGlvbiB1c2luZyBsbShDaG9sIFNleCpBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQptb2RlbF93aXRoX2ludGVyYWN0aW9uIDwtIGxtKENob2wgfiBTZXggKiBBSEQsIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbF93aXRoX2ludGVyYWN0aW9uKQpgYGAKKGQpIEZpdCB0aGUgbW9kZWwgd2l0aG91dCBpbnRlcmFjdGlvbiB1c2luZyBsbShDaG9sIFNleCtBSEQsIGRhdGE9Y2xldikuCmBgYHtyfQptb2RlbF93aXRob3V0X2ludGVyYWN0aW9uIDwtIGxtKENob2wgfiBTZXggKyBBSEQsIGRhdGEgPSBjbGV2KQphbm92YShtb2RlbF93aXRob3V0X2ludGVyYWN0aW9uKQpgYGAKKGUpIENvbXBhcmUgdGhlIHR3byBBTk9WQSB0YWJsZXMgYW5kIGRldGVybWluZSB3aGV0aGVyIHRoZSBpbnRlcmFjdGlvbiB0ZXJtIGlzCnNpZ25pZmljYW50LgpUaGUgcC12YWx1ZSBmb3IgdGhlIFNleDpBSEQgaW50ZXJhY3Rpb24gdGVybSBpbiB0aGUgbW9kZWwgd2l0aCBpbnRlcmFjdGlvbiBpcyAwLjcxNzY3NzguIFNpbmNlIHRoaXMgcC12YWx1ZSBpcyBtdWNoIGdyZWF0ZXIgdGhhbiAwLjA1LCB0aGUgaW50ZXJhY3Rpb24gdGVybSBpcyBub3Qgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudC4KClRoaXMgbWVhbnMgdGhhdCB0aGUgZWZmZWN0IG9mIFNleCBvbiBDaG9sIGRvZXMgbm90IHNpZ25pZmljYW50bHkgZGVwZW5kIG9uIEFIRCwgYW5kIHZpY2UtdmVyc2EuIEluIG90aGVyIHdvcmRzLCB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gU2V4IGFuZCBDaG9sIGlzIGNvbnNpc3RlbnQgYWNyb3NzIGRpZmZlcmVudCBBSEQgZ3JvdXBzLCBhbmQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIEFIRCBhbmQgQ2hvbCBpcyBjb25zaXN0ZW50IGFjcm9zcyBkaWZmZXJlbnQgU2V4IGdyb3Vwcy4KCihmKSBDaGVjayB0aGUgcmVzaWR1YWwgcGxvdHMgZm9yIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uLgpHaXZlbiB0aGF0IHRoZSBpbnRlcmFjdGlvbiBpcyBub3Qgc2lnbmlmaWNhbnQsIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uIChtb2RlbF93aXRob3V0X2ludGVyYWN0aW9uKSBpcyBnZW5lcmFsbHkgcHJlZmVycmVkIGFzIGl0IGlzIHNpbXBsZXIgYW5kIGV4cGxhaW5zIHRoZSBkYXRhIGFsbW9zdCBhcyB3ZWxsLgoKNi4gQ29ycmVsYXRpb24uCihhKSBDb21wdXRlIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIEFnZSBhbmQgTWF4SFIuCmBgYHtyfQpjb3JyZWxhdGlvbl9yZXN1bHQgPC0gY29yLnRlc3QoY2xldiRBZ2UsIGNsZXYkTWF4SFIpCmBgYAooYikgUmVwb3J0IHRoZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCwgY29uZmlkZW5jZSBpbnRlcnZhbCwgYW5kIHAtdmFsdWUuCmBgYHtyfQpjb3JyZWxhdGlvbl9yZXN1bHQKYGBgCjcuIExlYXN0IHNxdWFyZXMgcmVncmVzc2lvbi4KKGEpIFVzZSBBZ2UgYXMgdGhlIGV4cGxhbmF0b3J5IHZhcmlhYmxlIGFuZCBNYXhIUiBhcyB0aGUgcmVzcG9uc2UgdmFyaWFibGUuCihiKSBNYWtlIGEgc2NhdHRlcnBsb3QgYW5kIGFkZCB0aGUgbGVhc3Qgc3F1YXJlcyBsaW5lIHVzaW5nIGFibGluZS4KYGBge3J9CnBsb3QoY2xldiRBZ2UsIGNsZXYkTWF4SFIsIHhsYWIgPSAiQWdlIiwgeWxhYiA9ICJNYXhIUiIpCmFibGluZShsbShNYXhIUiB+IEFnZSwgZGF0YSA9IGNsZXYpLCBjb2wgPSAicGluayIpCmBgYAooYykgRml0IHRoZSByZWdyZXNzaW9uIGxpbmUgdXNpbmcgbG0uCmBgYHtyfQptb2RlbCA8LSBsbShNYXhIUiB+IEFnZSwgZGF0YSA9IGNsZXYpCmBgYAooZCkgRGlzcGxheSB0aGUgcmVncmVzc2lvbiBzdW1tYXJ5IHVzaW5nIHN1bW1hcnkuCmBgYHtyfQpzdW1tYXJ5KG1vZGVsKQpgYGAKKGUpIENvbXBhcmUgdGhlIHR3byBBTk9WQSB0YWJsZXMgYW5kIGRldGVybWluZSB3aGV0aGVyIHRoZSBpbnRlcmFjdGlvbiB0ZXJtIGlzCnNpZ25pZmljYW50LgpgYGB7cn0KY29uZmludChtb2RlbCkKYGBgCihmKSBDaGVjayB0aGUgcmVzaWR1YWwgcGxvdHMgZm9yIHRoZSBtb2RlbCB3aXRob3V0IGludGVyYWN0aW9uLgpgYGB7cn0KcGFyKG1mcm93PSBjICgxLCAyKSkKcGxvdChjbGV2JEFnZSwgcmVzaWR1YWxzKG1vZGVsKSwgeGxhYiA9ICJBZ2UiLCB5bGFiID0gIlJlc2lkdWFscyIpCmFibGluZShoID0gMCwgY29sID0gIm9yYW5nZSIpCmBgYApgYGB7cn0KcXFub3JtKHJlc2lkdWFscyhtb2RlbCkpCnFxbGluZShyZXNpZHVhbHMobW9kZWwpLCBjb2wgPSAicHVycGxlIikKcGFyKG1mcm93ID0gYygxLCAxKSkKCmBgYAoK