HW7

7.24 Nutrition at Starbucks, Part I.

The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. 21 Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie content

Describe the relationship between number of calories and amount of carbohydrates (in grams) that Starbucks food menu items contain.

The plot has a positive linear relationships as it travels upwards from left to right. The amount of carbs increases as the number of calories increases. The relationship between calories and carbs are moderately linear but not perfectly linear because most of dots deviates from the regression line. The plot contains some positive outliers (the dots that deviates from imaginary regression line more than the others).

In this scenario, what are the explanatory and response variables?

The explanatory variable is calories while the response variable is carbohydrates.

Why might we want to fit a regression line to these data?

The closer data to regression line the better the prediction. The regression line helps to predict the amount of carbohydrate by looking at the number of calories in the item..

Do these data meet the conditions required for fitting a least squares line?

The first condition is linearity. I would say that there does appear to be some linearity between carbs and calories because dots are concentrated near the regression line and follow the upward trend. Moreover,by looking at the residuals plot we can conclude that the relationships between residuals and calories are moderate linear since residual values not quite equally and randomly spaced around the horizontal axis on the residual plot.

The second condition is nearly normal residual. By looking at the histogram we can state that the residuals distribution is unimodal (one clear peak) and bell-shaped but slightly left skewed.

The third condition is constant variability. By looking at the residual plot we can conclude that there is no constant variability for residuals. The lower number of calories have lower residual values while higher number have larger residual values.

The forth conditions is independent of observations. Since independence information is not given in conditions we can assumed that each menu item is independent of each other.

The third condition is not satisfied whereas three other conditions are pretty much satisfied.

7.26 Body measurements, Part III.

Exercise 7.15 introduces data on shoulder girth and height of a group of individuals. The mean shoulder girth is 107.20 cm with a standard deviation of 10.37 cm. The mean height is 171.14 cm with a standard deviation of 9.41 cm. The correlation between height and shoulder girth is 0.67.

Write the equation of the regression line for predicting height.

Te equation for the regression line is

height = B0 + B1*girth, where B0 is the intercept and B1 is the slope

sh_girth_mean <- 107.20
sh_girth_sd <- 10.37 

height_mean <- 171.14
height_sd <- 9.41

cor <- 0.67

#calculate slope
B1 <- (height_sd/sh_girth_sd)*cor
B1

## [1] 0.6079749

#calculate intercept
B0 <- height_mean - B1* sh_girth_mean
B0

## [1] 105.9651

The regression line for the linear model of shoulder girth and height is

height = 105.9651 + 0.6079749*girth

Interpret the slope and the intercept in this context.

The intercept of 105.9651 represents the height in centimeters at shoulder girth of 0 cm.

The slope of 0.6079749 represents the rate of increase in height for each centimeter increase in shoulder girth.

Calculate R2 of the regression line for predicting height from shoulder girth, and interpret it in the context of the application.

r_squire <- cor^2
r_squire

## [1] 0.4489

R-squared measures how close the data are to the fitted regression line. 44.89% indicates that the model explains 44.89% variability of height around its mean.

A randomly selected student from your class has a shoulder girth of 100 cm. Predict the height of this student using the model.

girth <- 100
predicted_height <- B0 + B1*girth
predicted_height

## [1] 166.7626

The predicted height is 166.76

The student from part (d) is 160 cm tall. Calculate the residual, and explain what this residual means.

actual_height <- 160
actual_height - predicted_height

## [1] -6.762581

Th residual equals to -6.762581.

A one year old has a shoulder girth of 56 cm. Would it be appropriate to use this linear model to predict the height of this child?

The value of 56 cm is outside of data range of the variable in the sample. It would not be appropriate to use linear model to predict height of the individual who is outside of the sample. The height of one year old baby can be affected by other factors related to children development that weren’t accounted in the linear model.

7.30 Cats, Part I.

The following regression output is for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a data set of 144 domestic cats.

Write out the linear model.

The linear model equation is

Heart_Weight = B0 + B1 * Body_Weight, where B0 is the intercept and B1 is the slope.

Let’s replace the slope and the intercept with the real numbers

Heart_Weight = -0.357 + 4.034 * Body_Weight

Interpret the intercept.

The intercept of -0.357 represents the heart weight in grams at the body weight of 0 cm. Since heart weight can’t be negative the intercept of -0.357 is meaningless in this particular case.

Interpret the slope.

The slope of 4.034 represents the rate of increase in heart weight for each kg increase in body weight.

Interpret R2

R-squared measures how close the data are to the fitted regression line. 64.66% indicates that the model explains 64.66% variability of heart weight around its mean.

Calculate the correlation coefficient.

r_squared <- 0.6466
cor <- sqrt(r_squared)
cor

## [1] 0.8041144

The correlation coefficient of 0.8442244 indicates the strong (because it’s greater than 0.5) positive correlation.

7.40 Rate my professor.

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors.24 The scatterplot below shows the relationship between these variables, and also provided is a regression output for predicting teaching evaluation score from beauty score.

(a)Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table

The linear model equation is

Avg_Teaching_Evaluation = B0 + B1*Avg_Beauty, where B0 is the intercept and B1 is the slope.

B0 <- 4.01 #provided in the table
Avg_Teaching_Evaluation <- 3.9983
Avg_Beauty <- -0.0883

#calculate slope
B1 <- (Avg_Teaching_Evaluation - B0)/(Avg_Beauty)
B1

## [1] 0.1325028

The slope of 0.1325028 represents the rate of increase in average teaching evaluation for a single increase in average standardized beauty score.

Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.

First, the predictor variable beauty is significant since it’s p-value equals is less than common alpha level of 0.05 (which indicates the significant level of variables in linear model) Second, the plot has a slight upward trend. It shows a little rate of increase. I believe that he data provides convincing evidence that the slope is positive. however, it’s very small.

List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

In order to apply the least squares method the linear regression should meet four conditions.

The first condition is linearity. I would say that there does appear to be some linearity if we take a horizontal slope and place a horizontal line roughly right through the data points. Moreover, by looking at the residuals plot above we can conclude that the relationships between residuals and at-bats is linear since residual values pretty equally and randomly spaced around the horizontal axis on the residual plot.

The second condition is nearly normal residuals. By looking at the histogram we can state that the residuals distribution is unimodal (one clear peak) and bell-shaped but slightly left skewed. However, by looking at normal probability plot we can see that most of the dots don’t deviate much from the line. Moreover, distribution doesn’t contain outliers. So that, we can conclude that distribution of residual is close to normal distribution. It means that the nearly normal residuals condition appear to be met.

The third condition is constant variability. I believe that the variability condition appear to be met as the variability of points around the least squares line is reasonably constant while the variability of residuals around the zero line look reasonably constant as well.

The forth condition is independent observations. By looking at the order of data collection plot we can conclude that there is no pattern that suggests that observations depend on each other. So, most likely the observations are independent.

HW7

Olya Fomicheva

11/12/2017

7.24 Nutrition at Starbucks, Part I.

7.26 Body measurements, Part III.

7.30 Cats, Part I.

7.40 Rate my professor.