price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
Create scatterplots
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
## [1] 0.758112
Interpretation
There is a strong and positive correlation between the price and the weight. The more the car weighs the more expensive the car is.
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
The coefficient is not signignificant at 5% and is not meaningful in explaining changes in weight. ## Q3. What price does the model predict for a car that weighs 4000 pounds? Hint: Check the units of the variables in the openintro manual. If a car weighs 4,000 pounds the price model predicts that it should cost about $48,000.
The reported residual standard error is 7.575lbs which is the difference between the actual weight and the weight predicted. ## Q5. What is the reported adjusted R squared? What does it mean? The adjusted R squared is 0.5666, meainging that 56% of the variability in the price of the car can be explained by its weight.
Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.
Build regression model
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
##
## Call:
## lm(formula = price ~ passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.422 -7.522 -4.189 6.244 42.478
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.244 11.322 -0.552 0.5836
## passengers 5.133 2.195 2.338 0.0233 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.05 on 52 degrees of freedom
## Multiple R-squared: 0.09513, Adjusted R-squared: 0.07773
## F-statistic: 5.467 on 1 and 52 DF, p-value: 0.02326
Interpretation
Linear model 1 fits the data better becuase when comparing the residual standard error and the adjusted R squared are both lower in this model.