This report aims to explore the relationship between the sugar content and the calorie count of various cereals.The analysis will include a visual exploration through a scatter plot, a statistical correlation test, and a comparison between frequentest and Bayesian approaches to hypothesis testing. By combining these methods, we can gain an understanding of the nature and strength of the relationship between sugar and calories in cereal products.
In order to explore the relationship between sugar and calories a
scatter plot can be examined along with the correlation coefficient. The
scatter plot for Y = calories and X = Sugar is shown below
We can also calculate the correlation coefficient to help gain a better understanding of the correlation shown below.
## [1] 0.5154008
The correlation coefficient shown above suggest a moderately positive relationship between sugar and calories. This means as sugar contents increase calories contents do as well although this relationship is not perfect.
A null hypothesis test can be conducted in order to gain a further understanding of the relationship between calories and sugar. The hypothesis test for correlation results are shown below.
##
## Pearson's product-moment correlation
##
## data: Cereal$Calories and Cereal$Sugar
## t = 3.5069, df = 34, p-value = 0.001296
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2249563 0.7217280
## sample estimates:
## cor
## 0.5154008
From the test the p-value is shown to be 0.001296 which is much smaller than 0.05. This means the null-hypothesis can be rejected which states that there is no correlation between sugar and calories. The 95% confidence interval was [0.2249563, 0.7217280]. The interval doesn’t include 0 which supports the conclusion that there may be a strong correlation between sugar and calories. Since both ends of the interval are positive this may suggest that the correlation between sugar and calories is also positive.
Examining the Bayesian analysis can help gain better insight to the
correlation between calories and sugar.
The estimated correlation is 0.48 which suggests a moderate positive correlation. The credible interval means that there is a 95% possibility that the true correlation lies between 0.21 and 0.72. Because the interval is positive this suggests that the correlation between the two is also positive. There is also 99.8% probability that the correlation between sugar and calories is greater than zero which further supports the claim that there is a positive correlation between the two. From earlier the p-value is shown to be 0.001296 while the Bayesian probability here is 0.002. The p-value suggests strong evidence against the null hypothesis of no correlation while the Bayesian probability suggests a very small chance that the true correlation is negative. Both of these indicate that there is a significant positive correlation between sugars and calories. From earlier the confidence interval was 0.225 to 0.722 while the Bayesian credible interval was 0.21 to 0.72. The two intervals have a similar intervals and both suggest a positive correlation between sugar and calories.
Code for part A
Cereal <- read.csv("Cereal.csv")
plot(Cereal$Sugar, Cereal$Calories,
xlab = "Sugar (g)",
ylab = "Calories",
main = "Scatter Plot of Calories vs Sugar",
pch = 19, col = "blue")
#Find the corr efficient
cor(Cereal$Calories, Cereal$Sugar)
Code for part B
#Hypothesis test for correlation code
cor_test_result <- cor.test(Cereal$Calories, Cereal$Sugar)
# Display the result
cor_test_result
Code for part C
library(BayesianFirstAid)
# Perform Bayesian correlation test correctly
BayesCorTestCereal <- bayes.cor.test(Cereal$Sugar, Cereal$Calories)
# Print the result
plot(BayesCorTestCereal)