The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
Answer: From the graph shown, it is very clear that there is a strong positive linear relationship between the 2 variables - calories and carbs in Starbucks food items.
Answer: In this case, Calories is the explanatory variable and carbs is the response variable
Answer: As we see that there is a strong positive relationship between the 2 variables, we would like to draw a regression line between these 2 variables. It will help us understand and predict how the presence of calories determines the quantity of carbs in a food item.
Answer: For the least suqared lines fitting, we will check how the 4 conditions for the least squared line fair for this scenario: 1) Linear: The relationship is linear as we can see from the scatterplot between the 2 variables. 2) Nearly normal residuals check: The residuals plot clearly shows that the residuals is not close to normal. 3) Constant variability: It is clear from the residual plot that there are outliers on wither side of the residual plots. 4) Indepdendent observations: These are all independent observations as these are for different products.
Even though 1 and 4 above meet the least square line criteria, but 2 and 3 are quite off the mark to say that this scenario meets the least squared line criteria.
Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
Answer: Here shoulder girth is the explanatory variable, x and height is the response variable, y
x_bar = 107.2 s_x = 10.37
y_bar = 171.14 s_y = 9.41
R = 0.67
slope, b1 = (s_y / s_x) * R
x_bar_7.26 <- 107.2
s_x_7.26 <- 10.37
y_bar_7.26 <- 171.14
s_y_7.26 <- 9.41
R_7.26 <- 0.67
#slope
b1_7.26 = (s_y_7.26 / s_x_7.26) * R_7.26
b1_7.26
## [1] 0.6079749
Equation would be: y - y_bar_7.26 = b1_7.26 * (x - x_bar_7.26) y - 171.14 = 0.608 * (x - 107.2) y = 105.96 + 0.608 x
Answer : Intercept value = 105.96 That signifies that when x (shoulder girth) = 0, y (height) = 105.96 That does not sound to be a valid scenario, Hence we can say that suggests the height of the line
Slope = 0.608 means that for every 1 cm of shoulder girth, the height increases by 1 cm.
Answer:
(R_7.26)^2
## [1] 0.4489
That means around 44.9% of the variability in height is measured by the shoulder girth. More the value of R^2 or R, more linear and clean the relationship.
Answer:
y_7.26_d <- 105.96 + 0.608 * 100
y_7.26_d
## [1] 166.76
Answer:
y_7.26_oberved <- 160
residual_7.26 <- y_7.26_oberved - y_7.26_d
residual_7.26
## [1] -6.76
Residual is -6.76, that means the observed value is less than the model calculated value, or the model over-predicted the value.
Answer: No, it would not be appropriate to use the model here, as we would be extrapolating as the shoulder girth of 56 cm is outside the observed range.
The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
Answer: y = -0.357 + 4.034 x
Answer: When x (body weight) = 0, y (heart weight) = -0.357, that is not a valid scenario. Hence in this scenario, the intercept determines the height of the linear regression line.
Answer: For every 1 kg increase of body weight, the heart weight increases by 4.034
Answer: Body weights account for 64.66% of variability in heart weight.
Answer:
R2_7.30 <- 0.6466
R_7.30 <- sqrt(R2_7.30)
R_7.30
## [1] 0.8041144
Correlation coefficient R = 0.804