DATA 606 Question 7.24

Load Libraries

The following libraries are used in this RMD file.

openintro

lattice

knitr::opts_chunk$set(echo = TRUE)

# Clear the console
cat("\014")

if(!require('openintro')) {
  install.packages('openintro')
  library(openintro)
}

## Loading required package: openintro

## Please visit openintro.org for free statistics materials

## 
## Attaching package: 'openintro'

## The following objects are masked from 'package:datasets':
## 
##     cars, trees

if(!require('lattice')) {
  install.packages('lattice')
  library(lattice)
}

## Loading required package: lattice

## 
## Attaching package: 'lattice'

## The following object is masked from 'package:openintro':
## 
##     lsegments

Starbucks Data

Starbucks data is part of “openintro” package. This data is loaded into a data frame.

starbucksDF <- starbucks

head(starbucksDF)

##                          item calories fat carb fiber protein   type
## 1                8-Grain Roll      350   8   67     5      10 bakery
## 2           Apple Bran Muffin      350   9   64     7       6 bakery
## 3               Apple Fritter      420  20   59     0       5 bakery
## 4             Banana Nut Loaf      490  19   75     4       7 bakery
## 5 Birthday Cake Mini Doughnut      130   6   17     0       0 bakery
## 6           Blueberry Oat Bar      370  14   47     5       6 bakery

Linear Regression

starbucks.lm <- lm(starbucksDF$carb ~ starbucksDF$calories, data=starbucksDF) 
summary(starbucks.lm)

## 
## Call:
## lm(formula = starbucksDF$carb ~ starbucksDF$calories, data = starbucksDF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.477  -7.476  -1.029  10.127  28.644 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           8.94356    4.74600   1.884   0.0634 .  
## starbucksDF$calories  0.10603    0.01338   7.923 1.67e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.29 on 75 degrees of freedom
## Multiple R-squared:  0.4556, Adjusted R-squared:  0.4484 
## F-statistic: 62.77 on 1 and 75 DF,  p-value: 1.673e-11

coeffs = coefficients(starbucks.lm); 
coeffs

##          (Intercept) starbucksDF$calories 
##            8.9435605            0.1060309

# with(starbucksDF, plot(starbucksDF$calories, starbucksDF$carb))
with(starbucksDF, plot(starbucksDF$calories, starbucksDF$carb, xlab='Calories', ylab='Carbs (in grams)', main='Starbucks data Linear Regression'))
abline(starbucks.lm)

# Residuals
starbucks.res = resid(starbucks.lm)

plot(starbucksDF$calories, starbucks.res, 
     ylab="Residuals", xlab="Calories", 
     main="Starbucks Calories Residuals") 
abline(0, 0)

# Find the carb for the calorie 420
carbForCalorie420 = coeffs[1] + (coeffs[2] * 420)
paste0("Estimated carbs for 420 calories from Linear Equation is: ", round(carbForCalorie420, 2), " and the actual carbs is: 59")

## [1] "Estimated carbs for 420 calories from Linear Equation is: 53.48 and the actual carbs is: 59"

Question 7.24

Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items con- tain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.

Question 7.24 (a)

7.24 (a) Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

The residuals histogram shows that the data is equally distributed among the positive and negative side.

Question 7.24 (b)

7.24 (b) In this scenario, what are the explanatory and response variables?

Explanatory variable is Calories and the response variable is Carbs

Question 7.24 (c)

7.24 (c) Why might we want to fit a regression line to these data?

To predict the carbs in the food menu based on the calories information

Question 7.24 (d)

7.24 (d) Do these data meet the conditions required for fitting a least squares line?

Fitting a leaset square line generally requires:

Linearity: The data should show linear trend. The Starbucks dataset shows linear trend

Nearly normal residuals: Generally the residuals must be nearly normal. From the data set, the residuals appear to be nearly normal and normally distributed

Constant variability: The variability of points around the least squares line remains roughly constant. Based on the residuals, the variability appears to be constant

Independent observations: The data in Starbucks dataset is independent, not time series data