Suppose that you want to build a regression model that predicts the price of bdims using a data set named bdims.

Q1. Per the scatter plot and the computed correlation coefficient, describe relationships between the two variables - price and weight.

Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.

The cars that weigh more tend to be more expensive then the cars that don’t have much weight. There is a positive slope going up when the car’s weight keeps getting bigger. It is a strong positive association between these two variables, price and weight. The car has a correlation for price.

Create scatterplots

## 'data.frame':    54 obs. of  6 variables:
##  $ type      : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
##  $ price     : num  15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
##  $ mpgCity   : int  25 18 19 22 22 19 16 19 16 16 ...
##  $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
##  $ passengers: int  5 5 6 4 6 6 6 5 6 5 ...
##  $ weight    : int  2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...

## [1] 0.758112

Interpretation

Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.

Q2. Is the coefficient of weight statistically significant at 5%? Interpret the coefficient.

Yes. The coefficient of weight is statistically significant at 5%. The coefficient is price because price goes up approximately 13 bucks for each pound a car weighs.

Q3. What price does the model predict for a car that weighs 4000 pounds?

Hint: Check the units of the variables in the openintro manual. The price that the model predicts for a car that weighs 4000 pounds is around 32000 dollars. However, the actual price is almost up to 50000.

Q4. What is the reported residual standard error? What does it mean?

The reported residual standard error is 7.575 and 52 degrees of freedom. 7.5 fits better in the data set because the 52 degree line is cutting through the data set as accurately as it can to produce outliers or errors.

Q5. What is the reported adjusted R squared? What does it mean?

The reported adjusted R squared is .566. What this is telling you is that price is the dependent variable on weight, so price is dependent on weight and weight could be called the independent variable is this case.

Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.

Q6. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

The model that fits better out of the two is the second model. The second model fits better than the first because the residual standard error is less than the residual standard error of the first model. First model= 7.575 Second model= 7.3. R squared is bigger in model two than model one. That means it is a more accurate model.

Build regression model

## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11
## 
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.647  -3.688  -1.134   2.677  33.704 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.348709   7.480301  -0.982   0.3305    
## weight       0.015891   0.001925   8.256  5.8e-11 ***
## passengers  -4.094465   1.831085  -2.236   0.0297 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared:  0.6127, Adjusted R-squared:  0.5975 
## F-statistic: 40.34 on 2 and 51 DF,  p-value: 3.127e-11

Interpretation