Calories and Sugar in Breakfast Cereals: Frequentist and Bayesian Analysis

Author

Sandeep

Introduction

Breakfast cereals vary widely in nutritional content. Sugar is often added to improve taste, but cereals with more sugar may also contain more calories. This project examines the relationship between Sugar (grams per serving) and Calories (per serving) using the Cereal dataset. We use both frequentist and Bayesian methods to answer:

Research Question:

Do cereals with higher sugar content tend to have higher calories?

library(tidyverse)
Cereal <- read.csv("Cereal (2).csv")
head(Cereal)
                 Cereal Calories Sugar Fiber
1 Common Sense Oat Bran      100     6     3
2            Product 19      100     3     1
3   All Bran Xtra Fiber       50     0    14
4            Just Right      140     9     2
5     Original Oat Bran       70     5    10
6             Heartwise       90     5     6

The preview shows cereal names along with their nutritional values. The variable Cereal is a character variable, while Calories, Sugar, and Fiber are numeric. The displayed rows indicate variability in calorie and sugar content across cereals, suggesting that the dataset is suitable for further statistical analysis.

summary(Cereal$Calories)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   50.0    90.0   104.0   101.6   110.0   160.0 
summary(Cereal$Sugar)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.750   5.000   5.714   9.075  15.000 
sd(Cereal$Calories)
[1] 22.16394
sd(Cereal$Sugar)
[1] 4.604666

Calories and sugar values show substantial variability. Calories range from 50–160 (Mean = 101.6, SD = 22.16), while sugar ranges from 0–15 (Mean = 5.71, SD = 4.60). The mean and median values suggest approximately symmetric calories and slightly right-skewed sugar.

ggplot(Cereal, aes(x = Sugar, y = Calories)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Scatterplot of Sugar vs Calories")
`geom_smooth()` using formula = 'y ~ x'

The plot shows a positive linear trend, indicating that cereals with higher sugar content generally tend to have higher calorie values. Although some variability is present, the upward-sloping regression line suggests a moderate positive association. This visual pattern supports the use of correlation analysis and simple linear regression.

cor.test(Cereal$Calories, Cereal$Sugar)

    Pearson's product-moment correlation

data:  Cereal$Calories and Cereal$Sugar
t = 3.5069, df = 34, p-value = 0.001296
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2249563 0.7217280
sample estimates:
      cor 
0.5154008 

The Pearson correlation test showed a moderate positive correlation (r = 0.515) between sugar and calories, which was statistically significant (p = 0.0013).

model <- lm(Calories ~ Sugar, data = Cereal)
summary(model)

Call:
lm(formula = Calories ~ Sugar, data = Cereal)

Residuals:
    Min      1Q  Median      3Q     Max 
-37.428  -9.832   0.245   8.909  40.322 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  87.4277     5.1627  16.935   <2e-16 ***
Sugar         2.4808     0.7074   3.507   0.0013 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.27 on 34 degrees of freedom
Multiple R-squared:  0.2656,    Adjusted R-squared:  0.244 
F-statistic:  12.3 on 1 and 34 DF,  p-value: 0.001296

The regression analysis showed that sugar significantly predicts calories (β = 2.48, p = 0.0013). Each additional gram of sugar is associated with an increase of approximately 2.48 calories. Sugar explains about 26.6% of the variation in calories.

par(mfrow = c(1,1))
plot(model)

Residuals vs Fitted:

The residuals appear randomly scattered around zero, suggesting that the linearity assumption is reasonable and no strong patterns are present.

Normal Q-Q Plot:

The points fall approximately along the reference line, indicating that the residuals are approximately normally distributed.

Scale-Location Plot:

The spread of residuals is relatively constant across fitted values, supporting the assumption of homoscedasticity (constant variance).

Residuals vs Leverage:

Most observations fall within the Cook’s distance boundaries, suggesting no extreme influential outliers that unduly affect the model.

The diagnostic plots indicate that regression assumptions (linearity, normality, constant variance, and absence of influential outliers) are reasonably satisfied.

library(BayesianFirstAid)
Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.3.2
Loaded modules: basemod,bugs
bayes_cor <- bayes.cor.test(Cereal$Calories, Cereal$Sugar)
bayes_cor

    Bayesian First Aid Pearson's Correlation Coefficient Test

data: Cereal$Calories and Cereal$Sugar (n = 36)
Estimated correlation:
  0.49 
95% credible interval:
  0.21 0.73 
The correlation is more than 0 by a probability of 0.998 
and less than 0 by a probability of 0.002 

The Bayesian analysis estimated a moderate positive correlation (r = 0.49), with a 99.8% probability that the true correlation is positive.

test <- cor.test(Cereal$Calories, Cereal$Sugar, conf.level = 0.95)

test

    Pearson's product-moment correlation

data:  Cereal$Calories and Cereal$Sugar
t = 3.5069, df = 34, p-value = 0.001296
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2249563 0.7217280
sample estimates:
      cor 
0.5154008 
round(test$conf.int, 3)
[1] 0.225 0.722
attr(,"conf.level")
[1] 0.95

Hypotheses

H₀: There is no linear correlation between Sugar and Calories (ρ = 0)

H₁: There is a linear correlation between Sugar and Calories (ρ ≠ 0)

p = 0.0013 < 0.05

Since the p-value (0.0013) is less than α = 0.05, we reject the null hypothesis. This indicates that there is a statistically significant linear correlation between sugar content and calorie values among breakfast cereals.

Conclusion

Both frequentist and Bayesian analyses indicate a moderate positive relationship between Sugar and Calories in breakfast cereals. The Pearson correlation test showed a statistically significant association (r = 0.515, p = 0.0013), supported by a 95% confidence interval of [0.225, 0.722]. The regression model further demonstrated that sugar content significantly predicts calories. These findings consistently suggest that cereals with higher sugar content tend to contain more calories.