7.5 Exercises

Exercise 2

Trends in the residuals. Shown below are two plots of residuals remaining after fitting a linear model to two different sets of data. For each plot, describe important features and determine if a linear model would be appropriate for these data. Explain your reasoning.

For the left plot the residuals are not randomly distributed along the linear model so the model would not be appropriate

For the right plot most of the points are distributed along the line except for a few at the beginning which shows that a linear model would be appropriate

Exercise 3

Identify relationships, I. For each of the six plots, identify the strength of the relationship (e.g., weak, moderate, or strong) in the data and whether fitting a linear model would be reasonable.

  1. strong however the data creates a parabola so a linear model would not work
  2. strong, there is a positive upward trend where a linear model would work
  3. moderate, slight upward trend where linear model could work
  4. moderate, data creates parabola so linear model would not be a good choice
  5. strong, downward trend where linear model would work
  6. weak, no real trend so a model would not be useful

Exercise 4

Identify relationships, II. For each of the six plots, identify the strength of the relationship (e.g., weak, moderate, or strong) in the data and whether fitting a linear model would be reasonable.

  1. strong, no data creates a cubic graph and a linear model would make the data lose nuance
  2. strong, no linear model data makes parabola
  3. strong, linear model would be reasonable
  4. weak, slight upward trend but data relationship is weak so linear model might not be accurate enough
  5. moderate, linear model would be reasonable
  6. moderate, linear model would work

Exercise 7

Match the correlation, I. Match each correlation to the corresponding scatterplot.

  1. a r = 0.45
  2. d r = 0.92
  3. c r = 0.06
  4. b r = -0.7

Exercise 9

Body measurements, correlation. Researchers studying anthropometry collected body and skeletal diameter measurements, as well as age, weight, height and sex for 507 physically active individuals. The scatterplot below shows the relationship between height and shoulder girth (circumference of shoulders measured over deltoid muscles), both measured in centimeters.

  1. There is a positive correlation between shoulder girth and height, where as shoulder girth increases height is likely to increase as well

  2. the data would get clumped more horizontally as the values distance between shoulder girth values decreases since the measurement increases. It would cause the relationship between shoulder girth and height to appear closer to r = 1.

Exercise 19

Starbucks, calories, and protein. The scatterplot below shows the relationship between the number of calories and amount of protein (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we might be interested in predicting the amount of protein a menu item has based on its calorie content

  1. the relation ship between number of calories and protein in grams is that as calories increase protein(in grams) does as well

  2. predictor variable is the number of calories and the outcome variables is protein(grams)

  3. to see if there really is a relationship between the two values and to see how related they really are.

  4. there is high variablility in items with higher predicted protein than those with lower predicted protein

Exercise 23

Poverty and unemployment. The following scatterplot shows the relationship between percent of population below the poverty level (poverty) from unemployment rate among those ages 20-64 (unemployment_rate) in counties in the US, as provided by data from the 2019 American Community Survey. The regression output for the model for predicting poverty from unemployment_rate is also provided

Write out the linear model.

y = ax+b — y = 2.05x + 4.60

Interpret the intercept.

When the unemployment rate is 0% the percent of people below poverty line is 4.6%

Interpret the slope.

As the unemployment rate goes up 1% the percent under the poverty line goes up 2.05%

For this model R^2 is 46%. Interpret this value.

46 percent of the variation in the observation may be explained by this model

Calculate the correlation coefficient.

The correlation coefficient is .678233

R^2 = .46 so r = sqrt.46

sqrt(.46)
[1] 0.678233

Exercise 26

Identify the outliers in the scatterplots shown below and determine what type of outliers they are. Explain your reasoning.

  1. the outlier is at the top left of the scatterplot and it heavily influences the line in comparison with the primary cloud. (influential point)

  2. the outlier is at the bottom left of the scatterplot and it has a little influence on the line (inflential point)

  3. the outlier is at the top middle of the scatterplot and doesnt seem to have any affect on the line. (low leverage point)?