price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
As the weight of cars increase the price also does increase, but it is not a strong correlation and does not follow an exact pattern. The most correlated data is that the cheapest cars in this data set are the lightest in weight.
Create scatterplots
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
Yes, the coefficient of weight statistically is significant at 5%. As the weight of the car increases, the price increases by approximately $13.26 dollars per increase of a pound.
Hint: Check the units of the variables in the openintro manual.
$37,000 USD
433 on 52 degrees of freedom. This means that the line will be alwsy around 433 for the best fit. The line of best fit is the line that cuts through data that minimizes the distance between a big group of data points.
0.7209. The adjusted R-squared compares the explanatory power of regression models that contain different numbers of predictors. The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it’s usually not. It is always lower than the R-squared.
Run a second regression model for price with two explanatory variables: weight and passengers, and answer Q6.
Build regression model
Model 1: residual standard erroer is 433 on 52 degress of freeedom, adjusted r squared 0.5666. Model 2: residual standard error is 347.4 on 51 degrees of freedom, adjusted r squared 0.7209. With the higher adjusted r squared model fits the data better.
##
## Call:
## lm(formula = weight ~ price, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1328.29 -228.09 10.92 258.19 924.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2171.113 118.956 18.251 < 2e-16 ***
## price 43.331 5.169 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 433 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
##
## Call:
## lm(formula = weight ~ price + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -976.81 -201.56 6.13 151.33 799.88
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 294.25 356.98 0.824 0.414
## price 35.99 4.36 8.256 5.80e-11 ***
## passengers 395.91 72.56 5.456 1.44e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 347.4 on 51 degrees of freedom
## Multiple R-squared: 0.7315, Adjusted R-squared: 0.7209
## F-statistic: 69.46 on 2 and 51 DF, p-value: 2.748e-15
Interpretation