price and weight.Suppose that you want to build a regression model that predicts the price of cars using a data set named cars.
price and weight.-There is a positive association between both the price and the weight. The cars differentiate them by type. The larger cars are usually more expensive than smaller cars.
Create scatterplots
# Load the package
library(openintro)
library(ggplot2)
str(cars)
## 'data.frame': 54 obs. of 6 variables:
## $ type : Factor w/ 3 levels "large","midsize",..: 3 2 2 2 2 1 1 2 1 2 ...
## $ price : num 15.9 33.9 37.7 30 15.7 20.8 23.7 26.3 34.7 40.1 ...
## $ mpgCity : int 25 18 19 22 22 19 16 19 16 16 ...
## $ driveTrain: Factor w/ 3 levels "4WD","front",..: 2 2 2 3 2 2 3 2 2 2 ...
## $ passengers: int 5 5 6 4 6 6 6 5 6 5 ...
## $ weight : int 2705 3560 3405 3640 2880 3470 4105 3495 3620 3935 ...
# relationship between height and wegit
ggplot(data = cars, aes(x = weight, y = price)) + #cars dataset is from openintro rpackage
geom_point()+ geom_smooth(method = "lm", se = FALSE)
# Compute correlation coefficient
cor(cars$price, cars$weight, use = "pairwise.complete.obs")
## [1] 0.758112
Interpretation
Run a regression model for price with one explanatory variable, weight, and answer Q2 through Q5.
-Yes, the weight increases and the price increases by pound ## Q3. What price does the model predict for a car that weighs 4000 pounds? -The price of a car that has a weight of 4000 pounds would be approximately $32,171
-The residual standard error is 7.575 on a degree of 52.this means that the line in which best fits will be around 7.575. This means the line that cuts through the data that minimizes the distance between the data points is the best fit. ## Q5. What is the reported adjusted R squared? What does it mean? -Reported adjusted R squared is 56.6% of variability in terms of price dependant on weight.
mod <- lm(passengers ~ weight, data = cars)
summary(mod)
##
## Call:
## lm(formula = passengers ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4978 -0.4208 0.1407 0.3773 0.9899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.1619504 0.3587028 8.815 6.72e-12 ***
## weight 0.0006417 0.0001155 5.558 9.53e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5528 on 52 degrees of freedom
## Multiple R-squared: 0.3726, Adjusted R-squared: 0.3606
## F-statistic: 30.89 on 1 and 52 DF, p-value: 9.531e-07
# Create a linear model 1
mod_1 <- lm(price ~ weight + passengers, data = cars)
# View summary of model 1
summary(mod_1)
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
# Create a linear model 2
mod_2 <- lm(price ~ weight + passengers, data = cars)
# View summary of model 2
summary(mod_2)
##
## Call:
## lm(formula = price ~ weight + passengers, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.647 -3.688 -1.134 2.677 33.704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.348709 7.480301 -0.982 0.3305
## weight 0.015891 0.001925 8.256 5.8e-11 ***
## passengers -4.094465 1.831085 -2.236 0.0297 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.3 on 51 degrees of freedom
## Multiple R-squared: 0.6127, Adjusted R-squared: 0.5975
## F-statistic: 40.34 on 2 and 51 DF, p-value: 3.127e-11
Interpretation