Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
The relationship between calories and carbs seems to be positive overall but there is some variability especially at the higher calories level where the data points exist on both sides of the regression line.
The explanatory variables are calories along the x-axis and carbs are the response variable on the y-axis.
A regression line helps us understand the relationship between the two variables and gives an estimate of the amount of carbs based on the number of calories.
Body measurements, Part I. (8.13, p. 316) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender for 507 physically active individuals.19 The scatterplot below shows the relationship between height and shoulder girth (over deltoid muscles), both measured in centimeters.
Body measurements, Part III. (8.24, p. 326) Exercise above introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
mean_shoulder <- 107.20
sd_shoulder <- 10.37
mean_height <- 171.14
sd_height <- 9.41
r_sh_h <- 0.67
b_1 <- r_sh_h * (sd_height / sd_shoulder)
b_1
## [1] 0.6079749
b_0 <- mean_height - b_1 * mean_shoulder
b_0
## [1] 105.9651
# The equation for the regression line is:
# height = b_0 + b_1 * x
# height = 105.97 + 0.608x
r_squared <- r_sh_h^2
r_squared
## [1] 0.4489
# R-squared is 0.449.
# The above r-squared value could be interpreted to explain the 44.9% of variation in the linear
# model.
x_shoulder <- 100
height_student <- b_0 + b_1*x_shoulder
height_student
## [1] 166.7626
# The height of the student is 166.76 cm.
real_height <- 160
res_height <- real_height - height_student
res_height
## [1] -6.762581
# The residual is -6.76 which means that the model is over estimating the height of the student.
We can observe from the graph that all the data points are above 80 cm for the shoulder girth. Therefore, 56 cm would not be within the range and we will not be able to use this model to predict the height of a child based on that size.
Cats, Part I. (8.26, p. 327) The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
Based on the information in the table provided, the linear model is as follows:
heart_weight = -0.357 + 4.034 * body_weight
If the body weight is 0 then the heart weight will be -0.357gms. This is not a realistic outcome but the linear model sets this condition. However, there is no chance of getting body weight to 0.
For every 1 kg increase in the body weight, corresponding heart weight will increase by 4.034 gms.
The change in body weight explains the 64.66% of the variation in the heart weight.
# The correlation coefficient or R is calculated as follows:
r_sq_weight <- .6466
r_weight <- round(sqrt(r_sq_weight),3)
r_weight
## [1] 0.804
# The coefficient of correlation is 80.4% which means that the data point on body weight explain
# 80.4% of variance in the heart weight.
Rate my professor. (8.44, p. 340) Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
# From the given information:
# Avg teaching evaluation score or Y = 3.9983
Y_teach <- 3.9983
# b_0 = 4.010
b_0_t <- 4.010
# x = -0.0883
x_eval <- -0.0883
# slope or b_1 = ?
# Writing the linear model equation using the above info:
# Y_teach = b_0_t + b_1_t * x_eval
# From the above equation:
b_1_t <- (Y_teach - b_0_t) / x_eval
b_1_t
## [1] 0.1325028
# The slope is 0.133.
While the plot appears to suggests that there is a very slight positive relationship, the equation confirms that the relationshop between the appearance of instructors and the evaluation has a small positive association.
The conditions reuired for the linear regression are as follows:
Linearity: It is possible to have a regression line run through the data points based on the equation of linear regression, so this condition is met.
Normality: The distribution of residuals seems to be normal. The histogram appears to be skightly skewed to the left but the distribution seems quite normal.
Constant variability: The variability seems to constant from the plot with more variance at the end points of data.
Independence: The observations seem to be independent of each other and the sample size is big enough to have the required indpendence.