Introduction

In this project, we analyze the Cereal dataset from the Stat2 Data Page, which contains nutritional information for 36 breakfast cereals. The purpose of this analysis is to examine the relationship between Calories and Sugar content using classical and Bayesian correlation methods.

The objectives of this project are to:

  1. Create a scatterplot of Calories versus Sugar and compute the correlation coefficient.

  2. Conduct a hypothesis test to determine whether Calories and Sugar are correlated in the population and report a 95% confidence interval.

  3. Perform a Bayesian correlation analysis to estimate the posterior distribution of the true correlation and assess the associated uncertainty.

  4. Compare the frequentist 95% confidence interval with the Bayesian 95% credible interval and discuss how their interpretations differ.

This project applies statistical techniques from Chapters 1 and 2, including correlation, hypothesis testing, confidence intervals, and Bayesian inference.

Analysis

We explore the relationship between Calories and Sugar using graphical methods, classical correlation analysis, and Bayesian approaches.

Cereal <- read.csv("https://www.stat2.org/datasets/Cereal.csv")
head(Cereal)
##                  Cereal Calories Sugar Fiber
## 1 Common Sense Oat Bran      100     6     3
## 2            Product 19      100     3     1
## 3   All Bran Xtra Fiber       50     0    14
## 4            Just Right      140     9     2
## 5     Original Oat Bran       70     5    10
## 6             Heartwise       90     5     6

Question a

Make a scatterplot of Calories versus Sugar and calculate the correlation coefficient.

plot(Cereal$Sugar, Cereal$Calories,
     xlab = "Sugar (grams per serving)",
     ylab = "Calories (per serving)",
     main = "Scatterplot of Calories vs Sugar")

cor(Cereal$Calories, Cereal$Sugar)
## [1] 0.5154008

The scatterplot shows a positive linear association between sugar content and calories. The correlation coefficient indicates a moderate to strong positive relationship, suggesting that cereals with higher sugar content tend to have more calories per serving.

Question b

Conduct a test of the null hypothesis that there is no correlation between Calories and Sugar in the population. Report the P-value and the 95% confidence interval.

cor.test(Cereal$Calories, Cereal$Sugar)
## 
##  Pearson's product-moment correlation
## 
## data:  Cereal$Calories and Cereal$Sugar
## t = 3.5069, df = 34, p-value = 0.001296
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2249563 0.7217280
## sample estimates:
##       cor 
## 0.5154008

To test whether Calories and Sugar are correlated in the population, we conduct a Pearson correlation test.

The null hypothesis states that the population correlation is zero. Based on the reported p-value, we determine whether there is sufficient evidence to reject the null hypothesis at the 0.05 significance level. The accompanying 95% confidence interval provides a range of plausible values for the true correlation.

Question c

Perform a Bayesian correlation analysis and compare the results to those from the classical correlation analysis.

library(BayesianFirstAid)
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.3.2
## Loaded modules: basemod,bugs
BayesCorTestCereal <- bayes.cor.test(
  Cereal$Sugar,
  Cereal$Calories
)

BayesCorTestCereal
## 
##  Bayesian First Aid Pearson's Correlation Coefficient Test
## 
## data: Cereal$Sugar and Cereal$Calories (n = 36)
## Estimated correlation:
##   0.49 
## 95% credible interval:
##   0.21 0.73 
## The correlation is more than 0 by a probability of 0.999 
## and less than 0 by a probability of 0.001
plot(BayesCorTestCereal)

The Bayesian correlation analysis estimates the posterior distribution of the true correlation between Calories and Sugar. The posterior probability that the true correlation is positive is very high, and the 95% Bayesian credible interval lies entirely above zero. These results are consistent with the results from the correlation test in part (b), which also indicated a statistically significant positive association.

Question d

The 95% confidence interval from the correlation test in part (b) and the 95% Bayesian credible interval are numerically similar and both lie entirely above zero, indicating a positive association between Calories and Sugar. However, the two intervals have different interpretations. The confidence interval is based on a procedure that would capture the true correlation in 95% of repeated samples, whereas the Bayesian credible interval represents a range in which there is a 95% probability that the true correlation lies, given the observed data. Despite these differences, both intervals lead to the same conclusion about the relationship between Calories and Sugar.

Appendix

The following R code was used for data analysis and visualization in this report.

Cereal <- read.csv("https://www.stat2.org/datasets/Cereal.csv")
head(Cereal)
##                  Cereal Calories Sugar Fiber
## 1 Common Sense Oat Bran      100     6     3
## 2            Product 19      100     3     1
## 3   All Bran Xtra Fiber       50     0    14
## 4            Just Right      140     9     2
## 5     Original Oat Bran       70     5    10
## 6             Heartwise       90     5     6
plot(Cereal$Sugar, Cereal$Calories,
     xlab = "Sugar (grams per serving)",
     ylab = "Calories (per serving)")

cor(Cereal$Calories, Cereal$Sugar)
## [1] 0.5154008
cor.test(Cereal$Calories, Cereal$Sugar)
## 
##  Pearson's product-moment correlation
## 
## data:  Cereal$Calories and Cereal$Sugar
## t = 3.5069, df = 34, p-value = 0.001296
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2249563 0.7217280
## sample estimates:
##       cor 
## 0.5154008
library(BayesianFirstAid)

BayesCorTestCereal <- bayes.cor.test(
  Cereal$Sugar,
  Cereal$Calories
)

BayesCorTestCereal
## 
##  Bayesian First Aid Pearson's Correlation Coefficient Test
## 
## data: Cereal$Sugar and Cereal$Calories (n = 36)
## Estimated correlation:
##   0.49 
## 95% credible interval:
##   0.22 0.73 
## The correlation is more than 0 by a probability of 0.999 
## and less than 0 by a probability of 0.001
plot(BayesCorTestCereal)