Open Intro Statistics Chapter 7, Problem 39

Example

7.39

Part A

The correlation is negative because the linear best fit model slopes downward. The relationship is moderately strong. There is a cloud with some high leverage points, which are points that appear horizontally away from the center of the cloud.

R-squared in a linear model describes the variation around the linear fit. An R-squared of 0.28 indicates that variability was reduced by 28% by using a linear model.

The correlation coefficient would be the square root of R squared, 0.53, in our case -0.53 because of the negative correlation.

Part B

The residual plot shows a number of negative residuals among high percentages of urban populations. These are leftover variations in the data. This means there are data points significantly below the linear model.

The best line would have small residuals.

To determine if a least squares fit is appropriate for these data, we have to see if it meets the requirements:

  1. Linearity - Yes
  2. Nearly normal residuals -By looking at the residual plot, this seems to be left-skewed, not normal.
  3. Constant variability - Is the variability of points constant? Here it does not seem to be.
  4. Independent observations - Is there an underlying structure or other correlations within the data? Possibly, because there are many factors influencing home ownership in urban areas.

7.39

Comments

To get a better sense of outliers, we would want to research the data sources and methods and other influencing factors.

  1. City life. Why are there more outliers with high leverage at the high end of the percentage of urban population?
  2. Home prices. Is this because home prices are higher in urban areas, or because personal wealth is lower due to the high cost of living, or both?
  3. Own vs. rent. This data could also be compared to scatterplots of renters.
  4. Historical records. Further insight could come from a time series collection of this data.
  5. Economic indicators. Home prices may be closely tied to the state of the economy and political factors influencing lending policy.