price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.There is a positive correlation between price and weight. Lighter cars tend to be cheeaper than heavier cars most likely because of the cost of materials.
Create scatterplots
# Load the package
library(openintro)
library(ggplot2)
str(cars)
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
# relationship between height and wegit
ggplot(data = cars, aes(x = weight, y = price)) + #cars dataset is from openintro rpackage
geom_point()+ geom_smooth(method = "lm", se= FALSE)
# Compute correlation coefficient
cor(cars$price, cars$weight, use = "pairwise.complete.obs")
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
yes it is significant at 5%. Weight increases and price increases by pound. ## Q3. What price does the model predict for a car that weighs 4000 pounds? The price would be about $20,000 Hint: Check the units of the variables in the openintro manual.
RSE is 7.575ata degree of 52. The best fit line is the one that shows a trend in data points. ## Q5. What is the reported adjusted R squared? What does it mean? 0.566
mod_1 <- lm(passengers ~ weight, data = cars)
summary(mod_1)
##
## Call:
## lm(formula = passengers ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4978 -0.4208 0.1407 0.3773 0.9899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.1619504 0.3587028 8.815 6.72e-12 ***
## weight 0.0006417 0.0001155 5.558 9.53e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5528 on 52 degrees of freedom
## Multiple R-squared: 0.3726, Adjusted R-squared: 0.3606
## F-statistic: 30.89 on 1 and 52 DF, p-value: 9.531e-07
Model 1 has smaller RSE and model 1 will be better for fitting the data Build regression model
# Create a linear model 1
mod_1 <- lm(price ~ weight + passengers, data = cars)
# View summary of model 1
summary(mod_1)
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
# Create a linear model 2
mod_2 <- lm(price ~ weight + passengers, data = cars)
# View summary of model 2
summary(mod_2)
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
Interpretation