The following libraries are used in this RMD file.
openintro
lattice
knitr::opts_chunk$set(echo = TRUE)
# Clear the console
cat("\014")
if(!require('openintro')) {
install.packages('openintro')
library(openintro)
}
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
if(!require('lattice')) {
install.packages('lattice')
library(lattice)
}
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:openintro':
##
## lsegments
Starbucks data is part of “openintro” package. This data is loaded into a data frame.
starbucksDF <- starbucks
head(starbucksDF)
## item calories fat carb fiber protein type
## 1 8-Grain Roll 350 8 67 5 10 bakery
## 2 Apple Bran Muffin 350 9 64 7 6 bakery
## 3 Apple Fritter 420 20 59 0 5 bakery
## 4 Banana Nut Loaf 490 19 75 4 7 bakery
## 5 Birthday Cake Mini Doughnut 130 6 17 0 0 bakery
## 6 Blueberry Oat Bar 370 14 47 5 6 bakery
starbucks.lm <- lm(starbucksDF$carb ~ starbucksDF$calories, data=starbucksDF)
summary(starbucks.lm)
##
## Call:
## lm(formula = starbucksDF$carb ~ starbucksDF$calories, data = starbucksDF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.477 -7.476 -1.029 10.127 28.644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.94356 4.74600 1.884 0.0634 .
## starbucksDF$calories 0.10603 0.01338 7.923 1.67e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.29 on 75 degrees of freedom
## Multiple R-squared: 0.4556, Adjusted R-squared: 0.4484
## F-statistic: 62.77 on 1 and 75 DF, p-value: 1.673e-11
coeffs = coefficients(starbucks.lm);
coeffs
## (Intercept) starbucksDF$calories
## 8.9435605 0.1060309
# with(starbucksDF, plot(starbucksDF$calories, starbucksDF$carb))
with(starbucksDF, plot(starbucksDF$calories, starbucksDF$carb, xlab='Calories', ylab='Carbs (in grams)', main='Starbucks data Linear Regression'))
abline(starbucks.lm)
# Residuals
starbucks.res = resid(starbucks.lm)
plot(starbucksDF$calories, starbucks.res,
ylab="Residuals", xlab="Calories",
main="Starbucks Calories Residuals")
abline(0, 0)
# Find the carb for the calorie 420
carbForCalorie420 = coeffs[1] + (coeffs[2] * 420)
paste0("Estimated carbs for 420 calories from Linear Equation is: ", round(carbForCalorie420, 2), " and the actual carbs is: 59")
## [1] "Estimated carbs for 420 calories from Linear Equation is: 53.48 and the actual carbs is: 59"
Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items con- tain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
7.24 (a) Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.
The residuals histogram shows that the data is equally distributed among the positive and negative side.
7.24 (b) In this scenario, what are the explanatory and response variables?
Explanatory variable is Calories and the response variable is Carbs
7.24 (c) Why might we want to fit a regression line to these data?
To predict the carbs in the food menu based on the calories information
7.24 (d) Do these data meet the conditions required for fitting a least squares line?
Fitting a leaset square line generally requires:
Linearity: The data should show linear trend. The Starbucks dataset shows linear trend
Nearly normal residuals: Generally the residuals must be nearly normal. From the data set, the residuals appear to be nearly normal and normally distributed
Constant variability: The variability of points around the least squares line remains roughly constant. Based on the residuals, the variability appears to be constant
Independent observations: The data in Starbucks dataset is independent, not time series data