Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain. The relationship between calories and carbs seems to be positive overall but there is some variability especially at the higher calories level where the data points exist on both sides of the regression line.
In this scenario, what are the explanatory and response variables? The explanatory variables are calories along the x-axis and carbs are the response variable on the y-axis.
Why might we want to fit a regression line to these data? A regression line helps us understand the relationship between the two variables and gives an estimate of the amount of carbs based on the number of calories.
Do these data meet the conditions required for fitting a least squares line? Nearly Normal Residuals: The histogram of residual seems normal.Independent observations: assume the data to be independent. Linearity: The scatter plot shows a linear relationship between carbs and carolies, and Constant Variability: There seem to be constant variability among the variables.
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
\begin{center} \end{center}
Describe the relationship between shoulder girth and height. There is a positive relationship between the height and shoulder girth as evidenced in the graph above.
How would the relationship change if shoulder girth was measured in inches while the units of height remained in centimeters? The positive relationship would be the same but the slope would steepen to account for the larger change in height for each inch of shoulder.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
x <- 10.37
y <- 9.41
r <- 0.67
mnx <- 107.20
mny <- 171.14
m <- (y / x ) *r
b <- mny - (m*mnx)
cat(b, m)
## 105.9651 0.6079749
yˆ=105.9651+0.6079749∗x
The intercept means when should girth is 0cm, the height is 105.97cm, which is impossible to happen. It is an adjustment on prediction of height when we get hold of a piece of shoulder girth information.
Calculate \(R^2\) of the regression line for predicting height from shoulder girth, and interpret it in the context of the application. r^2 is 0.4489. It means the regression line accounts for 44.89% of the variance height predicted from shoulder girth.
A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model. The predicted height of a student having shoulder girth 100cm is around 166.76cm.
m*100+b
## [1] 166.7626
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
\begin{center}
##
## Call:
## lm(formula = cats$Hwt ~ cats$Bwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5694 -0.9634 -0.0921 1.0426 5.1238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3567 0.6923 -0.515 0.607
## cats$Bwt 4.0341 0.2503 16.119 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 142 degrees of freedom
## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441
## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16
\end{center}
Write out the linear model. y=-0.357+4.034*x
Interpret the intercept. The intercept-0.357 means when the body weight equal to 0kg, the heart weight is -0.357 grams. This is not a meaningful value
Interpret the slope. The slope = 4.034, what means for each additional kg increase in body weight, we expect an additional 4.034 grams in the heart weight
Interpret \(R^2\). There is about 64.66% in the data’s variation by using information about body weight for predicting heart weight using a linear model
Calculate the correlation coefficient.
sqrt(0.6466)
## [1] 0.8041144
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
\begin{center} \end{center}
The slope = 0.133.
mnx <- -0.0883
mny <- 3.9983
summary(m_eval_beauty)
##
## Call:
## lm(formula = eval ~ beauty)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.80015 -0.36304 0.07254 0.40207 1.10373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.01002 0.02551 157.205 < 2e-16 ***
## beauty 0.13300 0.03218 4.133 4.25e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5455 on 461 degrees of freedom
## Multiple R-squared: 0.03574, Adjusted R-squared: 0.03364
## F-statistic: 17.08 on 1 and 461 DF, p-value: 4.247e-05
m<-(mny - 4.010)/mnx
m
## [1] 0.1325028
Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning. Yes, since both Sy and Sx are positive the data slope is positive as well.
List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.
Constant variability: The residual scatterplot indicates constant variability.
Independent observations: We assume data is independent in the sample.
Nearly normal residuals: The residuals histogram shows the data as nearly normally distributed.
Linearity: The scatter plots show the relationship between beauty and teaching evaluation as linear.