library(tidyverse)
library(Stat2Data)
library(skimr)

Exercise 0.7

  1. The response variable in the model is WineQuality. It is quantitative.
  2. The explanatory variables in the model is WinterRain, AverageTemp, and HarvestRain. All of the explanatory variables are quantitative.
  3. Higher wine quality is associated with more WinterRain since the coefficient for WinterRain is positive.
  4. Higher wine quality is associated with less HarvestRain since the coefficient for HarvestRain is negative.
  5. Higher wine quality is associated with more average growing season temperature since the coefficient for AverageTemp is positive.
  6. The data is observational because there are other explanatory variables that can be used for the model and the experimentor is not controlling any of the explanatory variables.

Exercise 0.9

  1. The members of the entering class is the population of interest to the registrar at the college.
  2. Parameters because it describe the range at which the GPA surveyed is between, not (for example) the average of what was surveyed.
  3. The population of interest to the Mathematics Department was students who want to take mathematics to identify the appropriate course.
  4. They are statistics because they are calculated average and standard deviations.

Exercise 1.19

data(Cereal)
ggplot(Cereal) + geom_point(aes(x=Sugar, y=Calories))

There is a general positive linearity in the dataset

mod <- lm(Calories ~ Sugar, data = Cereal)
summary(mod)
## 
## Call:
## lm(formula = Calories ~ Sugar, data = Cereal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.428  -9.832   0.245   8.909  40.322 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  87.4277     5.1627  16.935   <2e-16 ***
## Sugar         2.4808     0.7074   3.507   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.27 on 34 degrees of freedom
## Multiple R-squared:  0.2656, Adjusted R-squared:  0.244 
## F-statistic:  12.3 on 1 and 34 DF,  p-value: 0.001296

\[ \widehat{\texttt{Calories}} = 87.4277 + 2.4808\cdot \texttt{Sugar} \] (c) For every one gram increase in sugar, the calories in the Cereal is expected to increase by 2.4808 calories. When the cereal does not have any sugar (that is when Sugar=0) then the calories in the cereal is at 87.4277 per serving.

Exercise 1.21

  1. \[ \widehat{\texttt{Calories}} = 87.4277 + 2.4808\cdot \texttt{Sugar} \] \[ \widehat{\texttt{Calories}} = 87.4277 + 2.4808\cdot \texttt{10} \] \[ \widehat{\texttt{Calories}} = 112.2357 \]

The number of calories that the fitted model predict for a cereal that has 10 grams of sugar is 112.2357 calories.

  1. \[ Residual = Calories_{Cheerios} - \widehat{\texttt{Calories}_{Cheerios}} \] \[ \widehat{\texttt{Calories}_{Cheerios}} = 87.4277 + 2.4808\cdot \texttt{1} \] \[ \widehat{\texttt{Calories}_{Cheerios}} = 89.9085 \] \[ Residual = 110 - 89.9085 \] \[ Residual = 20.0915 \]

  2. This linear regression model is accurate up to some degress since the p-value of the fucntion is significantly low (0.0013) so we can conclude that sugar is significant to the amount of calories in cereals. However, from (b) we see that the residual is off by 20 calories per serving so for someone who is on a diet or has dietary controls, this would not be an a good model for the relationship between calories and sugar content of breakfast cereals.