price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
There is strong positive correlation between the price and weight of the cars. The cars that are ligter are cheaper, and cars that are heavier are more expensive, like sedans compared to trucks.
Create scatterplots
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
Yes. It is significant at 5%.
Hint: Check the units of the variables in the openintro manual.
The line of regression shows that the average 4000lb car would cost approx. $32,000, but the only peice of data in the 4000lb range shows that it was purchased at around $48,000
The reported residual standard error is 7.575 on 52 degrees of freedom. This basically shows us how accurate the line of best fit is that cuts through the data.
The reported adjusted R squared is 0.5666 which translates to 56.6%. This shows the variabillity in terms of price as a dependant variable and weightvbeing the independant variable.
Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.
The second model fits the model better because it hasa a smaller residual error, and a larger adjusted R squared meaning it is more accurate.
Build regression model
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
Interpretation