Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Answer #The p-value of the test is 1.673^{-11}, which is less than the significance level alpha = 0.05. We can conclude that #calories and carbs are significantly correlated with a correlation coefficient of .675 and p-value of 1.673^{-11} .
summary(m_carb_cals)
##
## Call:
## lm(formula = carb ~ calories, data = starbucks)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.477 -7.476 -1.029 10.127 28.644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.94356 4.74600 1.884 0.0634 .
## calories 0.10603 0.01338 7.923 1.67e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.29 on 75 degrees of freedom
## Multiple R-squared: 0.4556, Adjusted R-squared: 0.4484
## F-statistic: 62.77 on 1 and 75 DF, p-value: 1.673e-11
res <- cor.test(starbucks$calories, starbucks$carb,
method = "pearson")
res
##
## Pearson's product-moment correlation
##
## data: starbucks$calories and starbucks$carb
## t = 7.9229, df = 75, p-value = 1.673e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5313531 0.7809149
## sample estimates:
## cor
## 0.674999
Answer
Explanatory variable: Calories on the x axis.
Response variable: Carbohydrates on the y axis.
Answer We are interested in predicting the amount of carbs a menu item has based on its calorie content.
Answer Following 4 conditions have been plotted and each satifies the conditions
Linearity: The trend appears to be linear
Nearly normal residuals :the data fall around the line with no obvious outliers
Constant variability: the variance is roughly constant.
Independent observations:These are also not time series observations
par(mfrow=c(2,2))
plot(m_carb_cals)
#vif(m_carb_cals)
#perform Durbin-Watson test
#dwtest(m_carb_cals)
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center} \end{center}
Answer
The relationship between shoulder girth and height is almost always as shoulder girth increases, height increases. This shows on most cases that the longer the shoulder girth, the taller the person.
Answer If the shoulder girth was measured in inches while the units of height remained in centimeters the relationship would remain the same.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
Answer
#general equation is y=a+bx where a in intercept,y is heightmean,
shoulder_girth_mean <- 107.20
shoulder_girth_sd <- 10.37
height_mean <- 171.14
height_sd <- 9.41
correlation <- 0.67 #this is r value
#b is slope which is calculated by b=r*Sy/Sx
slope <- correlation * (height_sd / shoulder_girth_sd)
slope
## [1] 0.6079749
intercept <- height_mean - slope * shoulder_girth_mean
intercept
## [1] 105.9651
cat('The equation of the regression line for predicting height is height = 105.9651 + .61 * shoulder girth ')
## The equation of the regression line for predicting height is height = 105.9651 + .61 * shoulder girth
Answer
Slope: The slope tells us the predicted increase in height, in cm, for every one cm increase in shoulder girth. For every 1 cm increase in shoulder girth, there will be an additional 0.61 cm to the height
Intercept: Represent the height in centimeters at girth of 0 cm.
r_squared <- correlation^2
r_squared
## [1] 0.4489
cat('In the context of data it means that this linear model explains',r_squared*100, '% of the variation of the height data.')
## In the context of data it means that this linear model explains 44.89 % of the variation of the height data.
randomstudentheight = intercept + slope * 100
randomstudentheight
## [1] 166.7626
cat('height of randomly selected student is : ' ,randomstudentheight)
## height of randomly selected student is : 166.7626
Answer #ei=yi−y #residual is calculated #as observed - predicted, a negative residual means that the predicted value is higher than the observed value
residual_error<-160 - randomstudentheight
cat('Since the residual is negative, this means that the actual data point is below the linear regression line and that the model is overestimating the value by ',residual_error)
## Since the residual is negative, this means that the actual data point is below the linear regression line and that the model is overestimating the value by -6.762581
Answer As we can see, the original data set had a response variable values between 85 and 135 cm. A measure of 56 is outside the sample so it would not be appropriate to use this linear model for prediction.If we extrapolate, we are making an unreliable bet that the approximate linear relationship will be valid in places where it has not been analyzed
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center} \end{center}
#Since y=b0+b1⋅x
#and b1=(Sy/Sx)R
#from the given regression output, the value for β0 and β1 are provided by the first column titled “Estimate” respectively.so the linear output according to the formula.
Answer
Answer The intercept is -0.357 g. This is not useful since it is telling us that the heart weight is -0.357 g when the body weight is 0 kg, but a cat cant have a body weight = 0 kg.
Answer The slope is 4 g(approx). This tells us that if body weight increases by 1 kg, height weight increases by 4 g approx.
par(mar = c(3.7, 3.7, 0.5, 0.5), las = 1, mgp = c(2.5, 0.7, 0),
cex.lab = 1.5, cex.axis = 1.5)
plot(cats$Hwt ~ cats$Bwt,
xlab = "Body weight (kg)", ylab = "Heart weight (g)",
pch = 19, col = COL[1,2],
xlim = c(2,4), ylim = c(5, 20.5), axes = FALSE)
axis(1, at = seq(2, 4, 0.5))
axis(2, at = seq(5, 20, 5))
box()
abline(m_cats_hwt_bwt)
Interpret \(R^2\). Answer # R^2 = 64.66%. This means that 64.66% of the variability in heart weight can be explained by body weight.
Calculate the correlation coefficient.
Answer R-squared(correlation coefficient) tells us what percent of the prediction error in the yyy variable is eliminated when we use least-squares regression on the xxx variable.
ccof<-sqrt(0.6466)
cat('The correlation coefficient is:' , ccof * 100 ,'%')
## The correlation coefficient is: 80.41144 %
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
Answer
#Since y=b0+b1*x
b0 <- 4.010
x <- -0.0883
y <- 3.9983
b1 <- (y - b0)/x
cat('the slope is ', b1)
## the slope is 0.1325028
Answer These data do provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive because the slope calcuated above is postive
Answer
Linearity: The trend appears to be linear.
Nearly normal residuals: As shown in the residuals distribution and Q-Q plot, they are in fact nearly normal.
Constant variability: The scatterplot of the residuals does appear to have constant variability.
Independent observations: Assuming independence due to no clear evidence one way or the other. number of professors would likely be < 10% of nationwide professors, hence this can be assumed to be satisfied.