Suppose that you want to build a regression model that predicts the price of bdims using a data set named bdims.

Q1. Per the scatter plot and the computed correlation coefficient, describe relationships between the two variables - price and weight.

Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association

Price and weight have a strong positive that dictates the the price of the car. If the car is smaller it tends to be cheaper and the bigger the car gets the price tends to be higher.

Create scatterplots

## 'data.frame':    507 obs. of  25 variables:
##  $ bia.di: num  42.9 43.7 40.1 44.3 42.5 43.3 43.5 44.4 43.5 42 ...
##  $ bii.di: num  26 28.5 28.2 29.9 29.9 27 30 29.8 26.5 28 ...
##  $ bit.di: num  31.5 33.5 33.3 34 34 31.5 34 33.2 32.1 34 ...
##  $ che.de: num  17.7 16.9 20.9 18.4 21.5 19.6 21.9 21.8 15.5 22.5 ...
##  $ che.di: num  28 30.8 31.7 28.2 29.4 31.3 31.7 28.8 27.5 28 ...
##  $ elb.di: num  13.1 14 13.9 13.9 15.2 14 16.1 15.1 14.1 15.6 ...
##  $ wri.di: num  10.4 11.8 10.9 11.2 11.6 11.5 12.5 11.9 11.2 12 ...
##  $ kne.di: num  18.8 20.6 19.7 20.9 20.7 18.8 20.8 21 18.9 21.1 ...
##  $ ank.di: num  14.1 15.1 14.1 15 14.9 13.9 15.6 14.6 13.2 15 ...
##  $ sho.gi: num  106 110 115 104 108 ...
##  $ che.gi: num  89.5 97 97.5 97 97.5 ...
##  $ wai.gi: num  71.5 79 83.2 77.8 80 82.5 82 76.8 68.5 77.5 ...
##  $ nav.gi: num  74.5 86.5 82.9 78.8 82.5 80.1 84 80.5 69 81.5 ...
##  $ hip.gi: num  93.5 94.8 95 94 98.5 95.3 101 98 89.5 99.8 ...
##  $ thi.gi: num  51.5 51.5 57.3 53 55.4 57.5 60.9 56 50 59.8 ...
##  $ bic.gi: num  32.5 34.4 33.4 31 32 33 42.4 34.1 33 36.5 ...
##  $ for.gi: num  26 28 28.8 26.2 28.4 28 32.3 28 26 29.2 ...
##  $ kne.gi: num  34.5 36.5 37 37 37.7 36.6 40.1 39.2 35.5 38.3 ...
##  $ cal.gi: num  36.5 37.5 37.3 34.8 38.6 36.1 40.3 36.7 35 38.6 ...
##  $ ank.gi: num  23.5 24.5 21.9 23 24.4 23.5 23.6 22.5 22 22.2 ...
##  $ wri.gi: num  16.5 17 16.9 16.6 18 16.9 18.8 18 16.5 16.9 ...
##  $ age   : int  21 23 28 23 22 21 26 27 23 21 ...
##  $ wgt   : num  65.6 71.8 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 ...
##  $ hgt   : num  174 175 194 186 187 ...
##  $ sex   : int  1 1 1 1 1 1 1 1 1 1 ...

## [1] 0.7173011

Interpretation

Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.

Q2. Is the coefficient of weight statistically significant at 5%? Interpret the coefficient.

Yes, the coefficient of weight is statistically significant at 5% and the price increases to roughly 13 dollars per pound.

Q3. What price does the model predict for a car that weighs 4000 pounds?

Hint: Check the units of the variables in the openintro manual.

If a car weighs 4000 pounds it would cost about 32,000 dollars.

Q4. What is the reported residual standard error? What does it mean?

The residual standard error is 7.575. This means that the line of best fit will always be around 7.575.

Q5. What is the reported adjusted R squared? What does it mean?

The reported adjusted R squared is 0.5666. This means the variability in weight can be explained by height.

Q6. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

The second model because it fits better with the data and it has a smaller residual error.

Build regression model

## 
## Call:
## lm(formula = wgt ~ hgt, data = bdims)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.743  -6.402  -1.231   5.059  41.103 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -105.01125    7.53941  -13.93   <2e-16 ***
## hgt            1.01762    0.04399   23.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.308 on 505 degrees of freedom
## Multiple R-squared:  0.5145, Adjusted R-squared:  0.5136 
## F-statistic: 535.2 on 1 and 505 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = wgt ~ hgt + sex, data = bdims)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.184  -5.978  -1.356   4.709  43.337 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.94949    9.42444  -6.043 2.95e-09 ***
## hgt           0.71298    0.05707  12.494  < 2e-16 ***
## sex           8.36599    1.07296   7.797 3.66e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.802 on 504 degrees of freedom
## Multiple R-squared:  0.5668, Adjusted R-squared:  0.5651 
## F-statistic: 329.7 on 2 and 504 DF,  p-value: < 2.2e-16

Interpretation