Presentation 7.25 Coast Starlight Part II.

  1. Write the equation of the regression line:
sigma_time = 113
mean_time = 129

sigma_dist = 99
mean_dist = 108

correl    = .636

(beta = correl * sigma_time / sigma_dist)
## [1] 0.7259394

The linear model is

\[T - \mu_{time} = \beta (D - \mu_{dist})\] Rearranging the equation, we see the constant term is the intercept:

(intercept = -mean_dist * beta +  mean_time)
## [1] 50.59855

Thus, we conclude the equation of the regression line is:

\[T = .7259D + 50.59855\]

  1. The slope means that for 1 mile of additional distance, the travel time increases by .7259 minutes. The intercept is the required travel time if the train goes zero distance. since this combination is implausible, we should ignore physical interpretation of the intercept time.

  2. \(R^2\) is the square of the correlation. It is 40.45%. We intepret \(R^2\) to mean that 40.45% of the variation in the travel time is explained by the regression line.

(r_squared = correl^2)
## [1] 0.404496
  1. Estimate the travel time between Santa Barbara and Los Angeles if the distance is 103 miles.

\[ T = .7259 ( 103) + 50.59855\]

( time = .7259 * 103 + 50.59855)
## [1] 125.3663

The estimated travel time from Santa Barbara to Los Angeles is 125.36 minutes based on the regression lines.

  1. Calculate the residual and its meaning. The actual travel time between Santa Barbara to Los Angeles is 168 mins.
(residual = 168 - time)
## [1] 42.63375

The model underestimates the travel time by 42.6 minutes.

  1. Extrapolating the linear model to 500 miles distance seems inadvisable because the model fit is not great.

7.24 Nutrition at Starbucks, Par I

  1. There is a positive relationship between the number of calories and the amount of carbohydrates in Starbucks food items. As the number of calories increases, the amount of carbohydrates tends to increase.

  2. The explanatory variable is the number of calories. The response variable is the amount of carbohydrates.

  3. The regression line gives us the pattern of calories to carbohydrates. It is useful for nutritional menu planning.

  4. The conditions for fitting a regression line are linearity (which appears to be true), near normality (also true based on histogram), constant variance (not satisfied as shown by greater variation for higher calories), and independent ( which appears to be plausible).

7.26 Body measurements, Part III

Let the correlation \(c = .67\) and the standard deviation of girth be \(\sigma_G = 10.37\) and standard deviation of height be \(\sigma_H=9.41\).

  1. The regression equation can be solve by knowing the slope \(b\) satisfies:

\[ b = \frac{\sigma_H}{\sigma_G}c\]

The equation of the line satisfies:

\[ b( G - \bar{G}) = H - \bar{H}\] \[bG - b\bar{G} + \bar{H} = H\]

Solving for these coefficients gives:

sigma_h = 9.41
sigma_g = 10.37
mean_h = 171.14
mean_g = 107.2
c = .67

(b = (sigma_h / sigma_g) * c)
## [1] 0.6079749
(intercept = -b *mean_g + mean_h )
## [1] 105.9651

We conclude that the regression line equation is:

\[ H = .6079749 G + 105.9651 \]

  1. the slope represents the incremental increase in height associated with 1cm increase in girth is .607 cm. The intercept is the height of the person when the girth is zero. Since people don’t have zero girth, \(G=0\) is out of the range of sensible values.

  2. The \(R^2\) is equal to the square of the correlation. \(R^2=.4489\). Thus, 44.9% of the variation in height is explained by the least square lines.

(r_squared = c^2 )
## [1] 0.4489
  1. The predicted height is 166.7 cm based on a girth of 100cm.
( predicted_height = .6079749 * 100 + 105.9651 )
## [1] 166.7626
  1. The residual is -6.76 cm. This means the model overestimates the height of the individual by -6.76 cm.
(residual = 160 - predicted_height )
## [1] -6.76259
  1. The predicted baby height is 140 cm. This is inconsistent with actual data which suggests one year old babies are 75cm in height on average according to google.com. This suggests that physical proportions of adults and babies are different. Babies are shorter than predicted by their girths.
( baby_height = .6079749 * 56 + 105.9651 )
## [1] 140.0117

7.30 Cats, Part I.

  1. The linear model is \(y = 4.034 x - 0.357\) where \(x\) is the body weight in kg and \(y\) is the height weight in grams.

  2. The intercept means that if cats body weight is 0, then the heart weight is negative .357 grams. Neither value is sensible and we obtain to treat the intercept as a parameter used to create the best fit only in its intended range.

  3. For each additional kilogram of body weight of a cat, the heart weight is 4.034 grams heavier.

  4. The \(R^2\) means that 64.66% of the variation in heart weight is explained by the least squares line.

  5. The correlation coefficient is 0.8041144.

7.40 Rate my professor

  1. Slope can be computed from the summary table as follows: slope is 0.1329.
t_stat = 4.13
std_error = 0.0322
(slope = t_stat * std_error )
## [1] 0.132986

Alternatively, we can use the slope connected the intercept and the point defined by the means.

x_avg = -0.0883
y_avg = 3.9983
x_int = 0
y_int = 4.010

(slope = (y_avg - y_int)/ (x_avg - x_int) )
## [1] 0.1325028

The answers differ due to rounding error so we use only the 0.132 for any calculations as the slope.

  1. The answer is statistically significant based on the large t-statistics of 4.13 and small p-value.

  2. the 4 diagnostic plots suggest some skew in the residuals.
    The requirements for a least squares lines are:

Linearity - this appears to be satisfied although the data appears to be dispersed cloud with a slight pattern.

Near normality of the residuals - somewhat satisfied by the histogram of residuals

Constant variability - there is more variability in the left tail of X than the right tail of X. I.e. high beauty professors show less variation than low beauty professors.

independent observations – support by the lack of trend in the order of data collection plot