The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content
The plot has a positive linear relationships as it travels upwards from left to right. The amount of carbs increases as the number of calories increases. The relationship between calories and carbs are moderately linear but not perfectly linear because most of dots deviates from the regression line. The plot contains some positive outliers (the dots that deviates from imaginary regression line more than the others).
The explanatory variable is calories while the response variable is carbohydrates.
The closer data to regression line the better the prediction. The regression line helps to predict the amount of carbohydrate by looking at the number of calories in the item..
The first condition is linearity. I would say that there does appear to be some linearity between carbs and calories because dots are concentrated near the regression line and follow the upward trend. Moreover,by looking at the residuals plot we can conclude that the relationships between residuals and calories are moderate linear since residual values not quite equally and randomly spaced around the horizontal axis on the residual plot.
The second condition is nearly normal residual. By looking at the histogram we can state that the residuals distribution is unimodal (one clear peak) and bell-shaped but slightly left skewed.
The third condition is constant variability. By looking at the residual plot we can conclude that there is no constant variability for residuals. The lower number of calories have lower residual values while higher number have larger residual values.
The forth conditions is independent of observations. Since independence information is not given in conditions we can assumed that each menu item is independent of each other.
The third condition is not satisfied whereas three other conditions are pretty much satisfied.
Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.
Te equation for the regression line is
height = B0 + B1*girth, where B0 is the intercept and B1 is the slope
sh_girth_mean <- 107.20
sh_girth_sd <- 10.37
height_mean <- 171.14
height_sd <- 9.41
cor <- 0.67
#calculate slope
B1 <- (height_sd/sh_girth_sd)*cor
B1
## [1] 0.6079749
#calculate intercept
B0 <- height_mean - B1* sh_girth_mean
B0
## [1] 105.9651
The regression line for the linear model of shoulder girth and height is
height = 105.9651 + 0.6079749*girth
The intercept of 105.9651 represents the height in centimeters at shoulder girth of 0 cm.
The slope of 0.6079749 represents the rate of increase in height for each centimeter increase in shoulder girth.
r_squire <- cor^2
r_squire
## [1] 0.4489
R-squared measures how close the data are to the fitted regression line. 44.89% indicates that the model explains 44.89% variability of height around its mean.
girth <- 100
predicted_height <- B0 + B1*girth
predicted_height
## [1] 166.7626
The predicted height is 166.76
actual_height <- 160
actual_height - predicted_height
## [1] -6.762581
Th residual equals to -6.762581.
The value of 56 cm is outside of data range of the variable in the sample. It would not be appropriate to use linear model to predict height of the individual who is outside of the sample. The height of one year old baby can be affected by other factors related to children development that weren’t accounted in the linear model.
The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a data set of 144 domestic cats.
The linear model equation is
Heart_Weight = B0 + B1 * Body_Weight, where B0 is the intercept and B1 is the slope.
Let’s replace the slope and the intercept with the real numbers
Heart_Weight = -0.357 + 4.034 * Body_Weight
The intercept of -0.357 represents the heart weight in grams at the body weight of 0 cm. Since heart weight can’t be negative the intercept of -0.357 is meaningless in this particular case.
The slope of 4.034 represents the rate of increase in heart weight for each kg increase in body weight.
R-squared measures how close the data are to the fitted regression line. 64.66% indicates that the model explains 64.66% variability of heart weight around its mean.
r_squared <- 0.6466
cor <- sqrt(r_squared)
cor
## [1] 0.8041144
The correlation coefficient of 0.8442244 indicates the strong (because it’s greater than 0.5) positive correlation.
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors.24 The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.
(a)Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table
The linear model equation is
Avg_Teaching_Evaluation = B0 + B1*Avg_Beauty, where B0 is the intercept and B1 is the slope.
B0 <- 4.01 #provided in the table
Avg_Teaching_Evaluation <- 3.9983
Avg_Beauty <- -0.0883
#calculate slope
B1 <- (Avg_Teaching_Evaluation - B0)/(Avg_Beauty)
B1
## [1] 0.1325028
The slope of 0.1325028 represents the rate of increase in average teaching evaluation for a single increase in average standardized beauty score.
First, the predictor variable beauty is significant since it’s p-value equals is less than common alpha level of 0.05 (which indicates the significant level of variables in linear model) Second, the plot has a slight upward trend. It shows a little rate of increase. I believe that he data provides convincing evidence that the slope is positive. however, it’s very small.
In order to apply the least squares method the linear regression should meet four conditions.
The first condition is linearity. I would say that there does appear to be some linearity if we take a horizontal slope and place a horizontal line roughly right through the data points. Moreover, by looking at the residuals plot above we can conclude that the relationships between residuals and at-bats is linear since residual values pretty equally and randomly spaced around the horizontal axis on the residual plot.
The second condition is nearly normal residuals. By looking at the histogram we can state that the residuals distribution is unimodal (one clear peak) and bell-shaped but slightly left skewed. However, by looking at normal probability plot we can see that most of the dots don’t deviate much from the line. Moreover, distribution doesn’t contain outliers. So that, we can conclude that distribution of residual is close to normal distribution. It means that the nearly normal residuals condition appear to be met.
The third condition is constant variability. I believe that the variability condition appear to be met as the variability of points around the least squares line is reasonably constant while the variability of residuals around the zero line look reasonably constant as well.
The forth condition is independent observations. By looking at the order of data collection plot we can conclude that there is no pattern that suggests that observations depend on each other. So, most likely the observations are independent.