Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.

Q1. Per the scatter plot and the computed correlation coefficient, describe relationships between the two variables - price and weight.

There is strong posotive relationship between price and weight.The lighter the car, the less expensive it is. While the heavier the car, the more expensive it is. Create scatterplots

## 'data.frame':    54 obs. of  6 variables:
##  $ type      : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
##  $ price     : num  15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
##  $ mpgCity   : int  25 18 19 22 22 19 16 19 16 16 ...
##  $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
##  $ passengers: int  5 5 6 4 6 6 6 5 6 5 ...
##  $ weight    : int  2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...

## [1] 0.758112

Interpretation

Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.

Q2. Is the coefficient of weight statistically significant at 5%? Interpret the coefficient.

Yes the stats show that the weight is stattistically significant at 5%.

Q3. What price does the model predict for a car that weighs 4000 pounds?

The price the model predicts would suggest the car at 4,000 pounds would be just about $32,000. However, the only statistical data given to us shows that a car ws purchased at $48,000 dollars.

Q4. What is the reported residual standard error? What does it mean?

The reported residual standard error is 7.57 on 52 degrees of freedom. This is simply telling us the line of best fit is a line that cuts through the data that minimizes the distance between a big group of data points.

Q5. What is the reported adjusted R squared? What does it mean?

.566 or 56.6% . This simply refers to how price is dependant on weight.

Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.

Q6. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

Model 2 better fits the data in this case due to the fact it has a smaller residual error. Also when you look at the adjusted r squared it is a larger number showing that it is more accurate.

Build regression model

## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11
## 
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.647  -3.688  -1.134   2.677  33.704 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.348709   7.480301  -0.982   0.3305    
## weight       0.015891   0.001925   8.256  5.8e-11 ***
## passengers  -4.094465   1.831085  -2.236   0.0297 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared:  0.6127, Adjusted R-squared:  0.5975 
## F-statistic: 40.34 on 2 and 51 DF,  p-value: 3.127e-11

Interpretation