The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content.
7.24
Ans: The scatter plot shows the correlation is not very strong, but seeing a rough linear trend.
Explanatory variable is ‘number of calories’ response variable is ‘amount of carbohydrates’
Scatter plot shows that there is rough linear relation, so inorder to predict spending for a new obserrvation, you are going fit a regression line. This also see visually how good you are fitting the line with actual scatter plot.
Linear trend: The first plot shows the linear trend, so this condition is met. Residuals should be nearly normal: The histogram of the residuals is nearly normal so this condition is met.
Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
Exercise 7.15
y being the height and x being shoulder grith.
corr <- 0.67
sd_y <- 9.41
sd_x <- 10.37
m <- corr * (sd_y/sd_x)
m
## [1] 0.6079749
Now, we have slope. Lets write the equation.
y = mx + c
we need to find c. We have mean and the regression line of above slope will pass through the mean.
m_y <- 171.14
m_x <- 107.20
c <- m_y - m * m_x
c
## [1] 105.9651
So, the final equation is
y = 0.61 * x + 93.3
The slope of the line predicts that it will require 0.61 in cms for each additional cms in shoulder girth.
R2 <- round(corr^2,3)
R2
## [1] 0.449
we will use above formulae.
y = 93.3 * x + 0.61
y_pred <- 0.61 * 100 + 93.3
y_pred
## [1] 154.3
The predicated is under estimated by 5.7
y_actual <- 160
residual <- y_actual - y_pred
residual
## [1] 5.7
This would be treated an outlier and we may not be using linear model to predict response variable. Because 56 is away above the 2* sd where sd is 10.37
The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cats.
7.30
7.30
Write out the linear model. y = 4.034 * x -0.357
Interpret the intercept.
The intercept is -0.357. This value tells us that this model will predict negative heart weight when there is zero weight.
The slope of the line predicts that it will require 4.034 times for each % increase in weight.
this means 64.66% of response variable is contribued by the explanatory variable(weight )
The correlation is .80 which means strong corelation.
correlation <- sqrt(.64)
correlation
## [1] 0.8