Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
For this excercise have picked the dataset - ThreeCars
data(ThreeCars)
ThreeCars <- ThreeCars %>%
#select(CarType, Price, Mileage) %>%
mutate(CarType = as.character(CarType))
options(digits = 5, scipen = 20, width = 90)
#Displaying the ThreeCars dataset contents:
str(ThreeCars)
## 'data.frame': 90 obs. of 8 variables:
## $ CarType: chr "Porsche" "Porsche" "Porsche" "Porsche" ...
## $ Price : num 69.4 56.9 49.9 47.4 42.9 36.9 83 72.9 69.9 67.9 ...
## $ Age : int 3 3 2 4 4 6 0 0 2 0 ...
## $ Mileage: num 21.5 43 19.9 36 44 49.8 1.3 0.67 13.4 9.7 ...
## $ Car : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Porsche: int 1 1 1 1 1 1 1 1 1 1 ...
## $ Jaguar : int 0 0 0 0 0 0 0 0 0 0 ...
## $ BMW : int 0 0 0 0 0 0 0 0 0 0 ...
#Showing stats with Summary function:
summary(ThreeCars)
## CarType Price Age Mileage Car
## Length:90 Min. :12.0 Min. : 0.00 Min. : 0.67 Min. :0
## Class :character 1st Qu.:23.9 1st Qu.: 3.25 1st Qu.: 20.75 1st Qu.:0
## Mode :character Median :33.7 Median : 5.00 Median : 42.85 Median :1
## Mean :37.6 Mean : 5.66 Mean : 41.32 Mean :1
## 3rd Qu.:50.0 3rd Qu.: 7.00 3rd Qu.: 59.83 3rd Qu.:2
## Max. :83.0 Max. :22.00 Max. :100.70 Max. :2
## Porsche Jaguar BMW
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.333 Mean :0.333 Mean :0.333
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000 Max. :1.000
The scatterplot below shows the relationship between mileage, price, and car type.
qplot(x = Mileage, y = Price, color = CarType, data = ThreeCars, geom = "point")
The scatterplot below shows the relationship between age, price, and car type.
qplot(x = Age, y = Price, color = CarType, data = ThreeCars, geom = "point")
Modeling it for Mileage and Age:
car_mult_lm <- lm(Price ~ Mileage + CarType + Mileage:CarType, data = ThreeCars)
car_aget_lm <- lm(Price ~ Age + CarType + Age:CarType, data = ThreeCars)
qplot(x = .fitted, y = .stdresid, data = car_mult_lm)
qplot(x = .fitted, y = .stdresid, data = car_aget_lm)
qplot(sample = .stdresid, data = car_mult_lm) +
geom_abline(color = "blue")
qplot(sample = .stdresid, data = car_aget_lm) +
geom_abline(color = "blue")
summary(car_mult_lm)
##
## Call:
## lm(formula = Price ~ Mileage + CarType + Mileage:CarType, data = ThreeCars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.327 -4.832 -0.285 4.423 18.812
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.2901 4.1551 13.55 < 0.0000000000000002 ***
## Mileage -0.4899 0.0723 -6.78 0.0000000016 ***
## CarTypeJaguar -2.0626 5.2358 -0.39 0.6946
## CarTypePorsche 14.8004 5.0415 2.94 0.0043 **
## Mileage:CarTypeJaguar -0.1304 0.1057 -1.23 0.2206
## Mileage:CarTypePorsche -0.0995 0.0994 -1.00 0.3196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.64 on 84 degrees of freedom
## Multiple R-squared: 0.774, Adjusted R-squared: 0.76
## F-statistic: 57.4 on 5 and 84 DF, p-value: <0.0000000000000002
summary(car_aget_lm)
##
## Call:
## lm(formula = Price ~ Age + CarType + Age:CarType, data = ThreeCars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.889 -6.804 -0.925 6.090 23.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 58.227 4.706 12.37 < 0.0000000000000002 ***
## Age -4.826 0.752 -6.42 0.0000000079 ***
## CarTypeJaguar -1.238 6.274 -0.20 0.8440
## CarTypePorsche 5.148 5.370 0.96 0.3405
## Age:CarTypeJaguar -0.213 1.067 -0.20 0.8419
## Age:CarTypePorsche 2.756 0.812 3.39 0.0011 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.66 on 84 degrees of freedom
## Multiple R-squared: 0.717, Adjusted R-squared: 0.7
## F-statistic: 42.6 on 5 and 84 DF, p-value: <0.0000000000000002
Based on the Mileage:
For every one thousand mile increase in Mileage for a BMW car (holding all other variables constant), we expect Price to decrease by 0.48988 thousands of dollars ($489.88).
We predict Jaguars to cost $2,062.61 less than BMWs and Porches to cost $14,800.37 more than BMWs (holding mileage and interaction terms fixed).
we expect a BMW car with zero miles to have a price of $56,290.07.
Based on the Age:
For every one year increase in Age for a BMW car (holding all other variables constant), we expect Price to decrease by 4.826 thousands of dollars ($4,826.00).
We predict Jaguars to cost $1,238.00 less than BMWs and Porches to cost $5,148.00 more than BMWs (holding mileage and interaction terms fixed).
we expect a BMW car with zero Years to have a price of $58,227.00.