library(tidyverse)
library(Stat2Data)
library(skimr)
data(Cereal)
ggplot(Cereal) + geom_point(aes(x=Sugar, y=Calories))
There is a general positive linearity in the dataset
mod <- lm(Calories ~ Sugar, data = Cereal)
summary(mod)
##
## Call:
## lm(formula = Calories ~ Sugar, data = Cereal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.428 -9.832 0.245 8.909 40.322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 87.4277 5.1627 16.935 <2e-16 ***
## Sugar 2.4808 0.7074 3.507 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.27 on 34 degrees of freedom
## Multiple R-squared: 0.2656, Adjusted R-squared: 0.244
## F-statistic: 12.3 on 1 and 34 DF, p-value: 0.001296
\[ \widehat{\texttt{Calories}} = 87.4277 + 2.4808\cdot \texttt{Sugar} \] (c) For every one gram increase in sugar, the calories in the Cereal is expected to increase by 2.4808 calories. When the cereal does not have any sugar (that is when Sugar=0) then the calories in the cereal is at 87.4277 per serving.
The number of calories that the fitted model predict for a cereal that has 10 grams of sugar is 112.2357 calories.
\[ Residual = Calories_{Cheerios} - \widehat{\texttt{Calories}_{Cheerios}} \] \[ \widehat{\texttt{Calories}_{Cheerios}} = 87.4277 + 2.4808\cdot \texttt{1} \] \[ \widehat{\texttt{Calories}_{Cheerios}} = 89.9085 \] \[ Residual = 110 - 89.9085 \] \[ Residual = 20.0915 \]
This linear regression model is accurate up to some degress since the p-value of the fucntion is significantly low (0.0013) so we can conclude that sugar is significant to the amount of calories in cereals. However, from (b) we see that the residual is off by 20 calories per serving so for someone who is on a diet or has dietary controls, this would not be an a good model for the relationship between calories and sugar content of breakfast cereals.