price and weight.Suppose that you want to build a regression model that predicts the price of bdims using a data set named bdims.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
The cars that weigh more tend to be more expensive then the cars that don’t have much weight. There is a positive slope going up when the car’s weight keeps getting bigger. It is a strong positive association between these two variables, price and weight. The car has a correlation for price.
Create scatterplots
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
Yes. The coefficient of weight is statistically significant at 5%. The coefficient is price because price goes up approximately 13 bucks for each pound a car weighs.
Hint: Check the units of the variables in the openintro manual. The price that the model predicts for a car that weighs 4000 pounds is around 32000 dollars. However, the actual price is almost up to 50000.
The reported residual standard error is 7.575 and 52 degrees of freedom. 7.5 fits better in the data set because the 52 degree line is cutting through the data set as accurately as it can to produce outliers or errors.
The reported adjusted R squared is .566. What this is telling you is that price is the dependent variable on weight, so price is dependent on weight and weight could be called the independent variable is this case.
Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.
The model that fits better out of the two is the second model. The second model fits better than the first because the residual standard error is less than the residual standard error of the first model. First model= 7.575 Second model= 7.3. R squared is bigger in model two than model one. That means it is a more accurate model.
Build regression model
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
Interpretation