Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.

Q1. Per the scatter plot and the computed correlation coefficient, describe relationships between the two variables - price and weight.

Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association. - According to the scatter plot and the computed correlation coeffient the relationship between the two variables " price" and “weight” is positive, showing that the graph is pointing up and right.

Create scatterplots

## 'data.frame':    54 obs. of  6 variables:
##  $ type      : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
##  $ price     : num  15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
##  $ mpgCity   : int  25 18 19 22 22 19 16 19 16 16 ...
##  $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
##  $ passengers: int  5 5 6 4 6 6 6 5 6 5 ...
##  $ weight    : int  2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...

## [1] 0.758112

Interpretation

Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.

Q2. Is the coefficient of weight statistically significant at 5%? Interpret the coefficient.

Q4. What is the reported residual standard error? What does it mean?

Q6. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

Build regression model

## 
## Call:
## lm(formula = weight ~ price, data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1328.29  -228.09    10.92   258.19   924.27 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2171.113    118.956  18.251  < 2e-16 ***
## price         43.331      5.169   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 433 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11
## 
## Call:
## lm(formula = weight ~ price + passengers, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -976.81 -201.56    6.13  151.33  799.88 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   294.25     356.98   0.824    0.414    
## price          35.99       4.36   8.256 5.80e-11 ***
## passengers    395.91      72.56   5.456 1.44e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 347.4 on 51 degrees of freedom
## Multiple R-squared:  0.7315, Adjusted R-squared:  0.7209 
## F-statistic: 69.46 on 2 and 51 DF,  p-value: 2.748e-15

Interpretation