DATA605 Assignment 11

Answer:

Now, let’s start replicating the analysis in the reading (I like using ggplot so the actual output will vary in aesthetics) by taking a look at the data

So, there does appear to be a relationship between the variables. Now, let’s create a linear model to see what we can determin.

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

We can generate some summary statistics to determine how well the data fits.

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

The residuals appeaar to be roughly balanced around zero; however, the maximum has a slightly larger magnitude than the minimum. The standard error for the intercept is 2.6 times smaller than the estimate and the standard error for the speed coefficient is 9.46 times smaller than the estimate. Also, the estimates are indicated to be significant predictors of at least \(.0.01 \le p \le 0.05\) with an \({R}^{2}\) value of 0.651 which means that the model explains 65.1 percent of the data’s variation

Now, we can plot the resduals against the fitted values.

The data in this plot appears to have a constant variability; however, it seems like there may be a slight increase in variability as fitted values increase. We should carefully monitor that as it would mean that there may be a more complex relationship between these variables. Additionally, we also should also view the distribution of the residuals.

From this plot we can see that the curve the data is making may be slightly convex (upturned U), so there may be some right skew to the data, but this appears to be minor. Additionally, here is a histogram plot of the residuals to confirm our findings.

Overall, given the steps provided in the reading, I would say the model does an okay job predicting the response variable. It’s obviously not perfect, but there does not appear to be any extreme issues with the given predictor variable. As previously mentioned, we should keep an eye on the fitted values vs. residuals to ensure there is constant variability.

DATA605 Assignment 11

John Grando

February 17, 2018