Step 1. Load the data.

EXERCISE 1 Relationship of Two Variables: Quick Look

  1. Plot runs vs at_bats.

  2. Find the correlation coefficient between runs and at_bats. [1] 0.610627

  3. Do you think the relationship looks linear? Do you think they are highly correlated? In other words, is there a strong relationship between the two? Why or why not? I think the relationship does appear linear, but the relationship does not appear strong, due to the teams being further from the regression line.

EXERCISE 2 Linear Regression Model

  1. Create the linear model, and then plot the best-fit line on the plot of our data.

  2. Plot the residuals based on the best-fit line from the model.

  3. How do the residuals compare to our data runs? Are they about the same magnitude, a fairly large fraction of the magnitude of the runs data, significantly smaller, significantly larger, etc? In comparison to our data runs, the residuals are significantly larger.

  4. Obtain the information for our linear model using the summary function. Call: lm(formula = runs ~ at_bats, data = mlb11)

Residuals: Min 1Q Median 3Q Max -125.58 -47.05 -16.59 54.40 176.87

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -2789.2429 853.6957 -3.267 0.002871 ** at_bats 0.6305 0.1545 4.080 0.000339 *** — Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 66.47 on 28 degrees of freedom Multiple R-squared: 0.3729, Adjusted R-squared: 0.3505 F-statistic: 16.65 on 1 and 28 DF, p-value: 0.0003388

  1. If a team manager saw the least squares regression line and not the actual data, how many runs would he or she predict for a team with 5,500 at-bats? You can use R as a calculator for this question. [1] 756.0961

EXERCISE 3 Model Diagnostics

  1. Plot the residuals again against the at_bats with a horizontal line through zero.

  2. Is there any apparent pattern in the residuals plot? Are the residuals appear randomly scatters about the horizontal line? What does this indicate about the linearity of the relationship between runs and at_bats? No, there is no apparent pattern in the residuals plot, they do appear to be randomly scattered about the horizontal line, at zero. This would seem to indicate the linearity condition is being met, a linear model is best regarding the relationship between runs and at bats.

  3. Plot the histogram of the residuals and the normal probability plot of the residuals.

  4. Based on the histogram and the normal probability plot, does the nearly normal residuals condition appear to be met? Based on the histogram and the normal probability plot, yes, the nearly normal residuals condition appears to be met.

  5. Create the plot that indicates constant variability.

  6. Would you say that the constant variability condition is met, and why? Yes, I would say the constant variability condition is met. Looking at the vertical spread, it is relatively constant throughout the range of fitted values.