Problem 7.24

  1. There is a positive relationship since the number of calories increases with amount of carbs. This also appears to be linear because I cannot see any curvature. I would also say this relationship is weak. It seems more tightly correlated for low numbers of calories, but as we go up in calories, the data points seem very far apart.

  2. The explanatory variable is the number of calories, and the response variable is the number of carbs.

  3. If we fit a regression line, we can predict the number of carbs if we know the number of calories.

  4. We can assume that independence is satisfied, since all of the food items in the graph are random. It is clear that there is a linear relationship. Also, we can see that residuals are normal since the histogram is symmetric and unimodal. However, equal variance is not satisfied because there is much higher spread when we go higher in calories. Therefore, not all conditions are met and we should not fit a regression line to this data.

Problem 7.26

The givens for this problem are:

\(\bar{y} = 171.14, s_{y} = 9.41, \bar{x} = 107.20, s_{x} = 10.37, r = 0.67\)

where \(y\) represents height, and \(x\) represents shoulder girth.

  1. The equation for linear regression is:

    \(\hat{y} = \beta_{0} + \beta_{1}x\)

    We can estimate the slope, \(\beta_{1}\), by calculating the point estimate, \(b_{1}\), using the following equation:

    \(\beta_{1} = b_{1} = \frac{s_{y}}{s_{x}}r\)

    We can then estimate the y-intercept, $_{0}, by esimating the point estimate, \(b_{0}\), using the following equation:

    \(\beta_{0} = b_{0} = \bar{y} - b_{1}\bar{x}\)

    We can use R to calculate the variables:

    ybar <- 171.14
    sy <- 9.41
    xbar <- 107.20
    sx <- 10.37
    r <- 0.67
    b1 <- (sy/sx)*r
    b0 <- ybar - (b1*xbar)
    cat("The slope is ",b1,"and the y-intercept is ",b0)
    The slope is  0.6079749 and the y-intercept is  105.9651

    Therefore, the linear regression equation is:

    \(\hat{y} = 105.9651 + 0.6079749x\)

  2. The slope shows the amount to increase \(y\) for every time \(x\) is increased by 1. This means that the height would increase 0.6079749 cm for every 1 cm increase of shoulder girth. The y-intercept represents the value if \(x\) was 0. When there’s a shoulder girth of 0 cm, the height is 105.9651 cm.

  3. \(R^{2}\) is simply the correlation squared (\(r^{2}\)).

    R2 <- r*r
    R2
    [1] 0.4489

    \(R^{2}\) explaines the amount of variation in the response that is explained by the least squares regression line. In the case, 44.89% of the varation in height is explained by the linear regression line between shoulder girth and height.

  4. We simply need to substitute 100 for x:

    x <- 100
    est <- b1*x + b0
    est
    [1] 166.7626

    The estimated height for some with 100 cm of shoulder girth is 166.7626 cm.

  5. The residual measures the difference between the actual value and our estimate:

    \(e_{i} = y - \bar{y}\)

    y <- 160
    res <- y - est
    res
    [1] -6.762581

    Therefore, we overestimated the height by 6.762581 cm.

  6. No because the plot only took into account shoulder girths of from roughly 85 cm to 135 cm, which does not capture the 56 cm shoulder girth.

Problem 7.30

  1. From the table, we can see that \(b_{1} = 4.034\) and \(b_{0} = -0.357\). We can substitute those values for \(\beta_{0}\) and \(\beta_{1}\) in the linear regression model to get the final equation:

    \(\hat{y} = -0.357 + 4.034x\)

  2. When a cat’s body weight is 0 kg, then their heart weight is -0.357 g. This isn’t realistic, but this is what the mathematical model describes.

  3. For every 1 kg the cat increases in body weight, the cat’s heart will increase 4.034 g.

  4. The \(R^{2}\) value is 64.66%. Therefore, 64.66% of the variation in heart weight is explained by the linear regression line between the body weight and heart weight.

  5. The correlation coefficent can be calculated by the square root of \(R^2\):

    R2 <- 0.6466
    r <- sqrt(R2)
    r
    [1] 0.8041144

    This means the value can either be +0.8041144 or -0.8041144. In this case, it is positive since the slope is positive.

Problem 7.40

  1. From the table, we know that \(b_{0}\) is 4.010. In the problem we are given \(\bar{y}\) and \(\bar{x}\), which are 3.9983 and -0.0883, respectively. We need to calculate the slope, \(b_{1}\). We can do the with the following equation:

    \(\bar{y} = b_{0} + b_{1}\bar{x} \rightarrow b_{1}\bar{x} = \bar{y} - b_{0} \rightarrow b_{1} = \frac{\bar{y} - b_{0}}{\bar{x}}\)

    ybar <- 3.9983
    xbar <- -0.0883
    b0 <- 4.010
    b1 <- (ybar - b0)/xbar
    b1
    [1] 0.1325028

    Therefore, the estimate of the slope is 0.1325028.

  2. Our hypotheses are:

    \(H_{0}: \beta_{1} = 0\)

    \(H_{a}: \beta_{1} > 0\)

    The \(t\) value is given in the table to be 4.13, and was calculated using the following equation:

    \(t = \frac{b_{1}-\beta{1}}{SE_{b_{1}}}\)

    The \(P\)-value was also found to be approximately 0. If we assume a significance level of 0.05, then we have sufficient evidence to reject the null hypothesis, and we can support the claim that the slope of the relationship between beauty and teaching evaluation is positive.

  3. There were independent observations since the professors were selected at random. There is a linear relationship since the residual plot is scattered around 0. There is equal variance since the spread of the residual plot seems to be the same everywhere on the plot. Lastly, these are normal residuals since the histogram shows a symmetric and unimodal shape.