price and weight.Suppose that you want to build a regression model that predicts the price of bdims using a data set named bdims.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association
Price and weight have a strong positive that dictates the the price of the car. If the car is smaller it tends to be cheaper and the bigger the car gets the price tends to be higher.
Create scatterplots
## 'data.frame': 507 obs. of 25 variables:
## $ bia.di: num 42.9 43.7 40.1 44.3 42.5 43.3 43.5 44.4 43.5 42 ...
## $ bii.di: num 26 28.5 28.2 29.9 29.9 27 30 29.8 26.5 28 ...
## $ bit.di: num 31.5 33.5 33.3 34 34 31.5 34 33.2 32.1 34 ...
## $ che.de: num 17.7 16.9 20.9 18.4 21.5 19.6 21.9 21.8 15.5 22.5 ...
## $ che.di: num 28 30.8 31.7 28.2 29.4 31.3 31.7 28.8 27.5 28 ...
## $ elb.di: num 13.1 14 13.9 13.9 15.2 14 16.1 15.1 14.1 15.6 ...
## $ wri.di: num 10.4 11.8 10.9 11.2 11.6 11.5 12.5 11.9 11.2 12 ...
## $ kne.di: num 18.8 20.6 19.7 20.9 20.7 18.8 20.8 21 18.9 21.1 ...
## $ ank.di: num 14.1 15.1 14.1 15 14.9 13.9 15.6 14.6 13.2 15 ...
## $ sho.gi: num 106 110 115 104 108 ...
## $ che.gi: num 89.5 97 97.5 97 97.5 ...
## $ wai.gi: num 71.5 79 83.2 77.8 80 82.5 82 76.8 68.5 77.5 ...
## $ nav.gi: num 74.5 86.5 82.9 78.8 82.5 80.1 84 80.5 69 81.5 ...
## $ hip.gi: num 93.5 94.8 95 94 98.5 95.3 101 98 89.5 99.8 ...
## $ thi.gi: num 51.5 51.5 57.3 53 55.4 57.5 60.9 56 50 59.8 ...
## $ bic.gi: num 32.5 34.4 33.4 31 32 33 42.4 34.1 33 36.5 ...
## $ for.gi: num 26 28 28.8 26.2 28.4 28 32.3 28 26 29.2 ...
## $ kne.gi: num 34.5 36.5 37 37 37.7 36.6 40.1 39.2 35.5 38.3 ...
## $ cal.gi: num 36.5 37.5 37.3 34.8 38.6 36.1 40.3 36.7 35 38.6 ...
## $ ank.gi: num 23.5 24.5 21.9 23 24.4 23.5 23.6 22.5 22 22.2 ...
## $ wri.gi: num 16.5 17 16.9 16.6 18 16.9 18.8 18 16.5 16.9 ...
## $ age : int 21 23 28 23 22 21 26 27 23 21 ...
## $ wgt : num 65.6 71.8 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 ...
## $ hgt : num 174 175 194 186 187 ...
## $ sex : int 1 1 1 1 1 1 1 1 1 1 ...
## [1] 0.7173011
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
Yes, the coefficient of weight is statistically significant at 5% and the price increases to roughly 13 dollars per pound.
Hint: Check the units of the variables in the openintro manual.
If a car weighs 4000 pounds it would cost about 32,000 dollars.
The residual standard error is 7.575. This means that the line of best fit will always be around 7.575.
The reported adjusted R squared is 0.5666. This means the variability in weight can be explained by height.
The second model because it fits better with the data and it has a smaller residual error.
Build regression model
##
## Call:
## lm(formula = wgt ~ hgt, data = bdims)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.743 -6.402 -1.231 5.059 41.103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -105.01125 7.53941 -13.93 <2e-16 ***
## hgt 1.01762 0.04399 23.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.308 on 505 degrees of freedom
## Multiple R-squared: 0.5145, Adjusted R-squared: 0.5136
## F-statistic: 535.2 on 1 and 505 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = wgt ~ hgt + sex, data = bdims)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.184 -5.978 -1.356 4.709 43.337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.94949 9.42444 -6.043 2.95e-09 ***
## hgt 0.71298 0.05707 12.494 < 2e-16 ***
## sex 8.36599 1.07296 7.797 3.66e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.802 on 504 degrees of freedom
## Multiple R-squared: 0.5668, Adjusted R-squared: 0.5651
## F-statistic: 329.7 on 2 and 504 DF, p-value: < 2.2e-16
Interpretation