price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
ANS: There appears to be a strong positive relationship between weight and price. (the coefficient’s absolute value is about 0.8, which is > 0.6) (its sign is positive)
Create scatterplots
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
ANS: Looking at the significant codes, it indicates three dots (***) which means that the weight is statisically significant at 5%. Changes in the weight explains changes in price. (Increasing)
Hint: Check the units of the variables in the openintro manual.
ANS: Based off the math that I did, the price predicts to be $43,000 when a car weighs 4000 lbs. W = 2171 + (43 * P) 4000 lbs = 2171 + (43 * P) P = 43
ANS: The reported residual standard error is listed as 433 lbs. This means that the difference between the actual weight and the predicted weight is 433 lbs. The model misses the actual weight by 433 lbs.
Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.
ANS: The adjusted R squared is 0.7209 which means that 72% of the variability in weight can be explained by price.
ANS: The second model better fits the data because the second model has a smaller residual error The first model has a residual error or 433 and the second model has a reidual error of 347.
Build regression model
##
## Call:
## lm(formula = weight ~ price, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1328.29 -228.09 10.92 258.19 924.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2171.113 118.956 18.251 < 2e-16 ***
## price 43.331 5.169 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 433 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
##
## Call:
## lm(formula = weight ~ price + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -976.81 -201.56 6.13 151.33 799.88
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 294.25 356.98 0.824 0.414
## price 35.99 4.36 8.256 5.80e-11 ***
## passengers 395.91 72.56 5.456 1.44e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 347.4 on 51 degrees of freedom
## Multiple R-squared: 0.7315, Adjusted R-squared: 0.7209
## F-statistic: 69.46 on 2 and 51 DF, p-value: 2.748e-15
Interpretation