Discussion 13

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

For this excercise have picked the dataset - ThreeCars

data(ThreeCars)
ThreeCars <- ThreeCars %>%
  #select(CarType, Price, Mileage) %>%
  mutate(CarType = as.character(CarType))
options(digits = 5, scipen = 20, width = 90)

#Displaying the ThreeCars dataset contents:
str(ThreeCars)
## 'data.frame':    90 obs. of  8 variables:
##  $ CarType: chr  "Porsche" "Porsche" "Porsche" "Porsche" ...
##  $ Price  : num  69.4 56.9 49.9 47.4 42.9 36.9 83 72.9 69.9 67.9 ...
##  $ Age    : int  3 3 2 4 4 6 0 0 2 0 ...
##  $ Mileage: num  21.5 43 19.9 36 44 49.8 1.3 0.67 13.4 9.7 ...
##  $ Car    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Porsche: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Jaguar : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ BMW    : int  0 0 0 0 0 0 0 0 0 0 ...
#Showing stats with Summary function:
summary(ThreeCars)
##    CarType              Price           Age           Mileage            Car   
##  Length:90          Min.   :12.0   Min.   : 0.00   Min.   :  0.67   Min.   :0  
##  Class :character   1st Qu.:23.9   1st Qu.: 3.25   1st Qu.: 20.75   1st Qu.:0  
##  Mode  :character   Median :33.7   Median : 5.00   Median : 42.85   Median :1  
##                     Mean   :37.6   Mean   : 5.66   Mean   : 41.32   Mean   :1  
##                     3rd Qu.:50.0   3rd Qu.: 7.00   3rd Qu.: 59.83   3rd Qu.:2  
##                     Max.   :83.0   Max.   :22.00   Max.   :100.70   Max.   :2  
##     Porsche          Jaguar           BMW       
##  Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000  
##  Median :0.000   Median :0.000   Median :0.000  
##  Mean   :0.333   Mean   :0.333   Mean   :0.333  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :1.000   Max.   :1.000   Max.   :1.000

The scatterplot below shows the relationship between mileage, price, and car type.

qplot(x = Mileage, y = Price, color = CarType, data = ThreeCars, geom = "point")

The scatterplot below shows the relationship between age, price, and car type.

qplot(x = Age, y = Price, color = CarType, data = ThreeCars, geom = "point")

Modeling it for Mileage and Age:

car_mult_lm <- lm(Price ~ Mileage + CarType + Mileage:CarType, data = ThreeCars)
car_aget_lm <- lm(Price ~ Age + CarType + Age:CarType, data = ThreeCars)

qplot(x = .fitted, y = .stdresid, data = car_mult_lm)

qplot(x = .fitted, y = .stdresid, data = car_aget_lm)

qplot(sample = .stdresid, data = car_mult_lm) + 
  geom_abline(color = "blue")

qplot(sample = .stdresid, data = car_aget_lm) + 
  geom_abline(color = "blue")

summary(car_mult_lm)
## 
## Call:
## lm(formula = Price ~ Mileage + CarType + Mileage:CarType, data = ThreeCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.327  -4.832  -0.285   4.423  18.812 
## 
## Coefficients:
##                        Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)             56.2901     4.1551   13.55 < 0.0000000000000002 ***
## Mileage                 -0.4899     0.0723   -6.78         0.0000000016 ***
## CarTypeJaguar           -2.0626     5.2358   -0.39               0.6946    
## CarTypePorsche          14.8004     5.0415    2.94               0.0043 ** 
## Mileage:CarTypeJaguar   -0.1304     0.1057   -1.23               0.2206    
## Mileage:CarTypePorsche  -0.0995     0.0994   -1.00               0.3196    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.64 on 84 degrees of freedom
## Multiple R-squared:  0.774,  Adjusted R-squared:  0.76 
## F-statistic: 57.4 on 5 and 84 DF,  p-value: <0.0000000000000002
summary(car_aget_lm)
## 
## Call:
## lm(formula = Price ~ Age + CarType + Age:CarType, data = ThreeCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.889  -6.804  -0.925   6.090  23.091 
## 
## Coefficients:
##                    Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          58.227      4.706   12.37 < 0.0000000000000002 ***
## Age                  -4.826      0.752   -6.42         0.0000000079 ***
## CarTypeJaguar        -1.238      6.274   -0.20               0.8440    
## CarTypePorsche        5.148      5.370    0.96               0.3405    
## Age:CarTypeJaguar    -0.213      1.067   -0.20               0.8419    
## Age:CarTypePorsche    2.756      0.812    3.39               0.0011 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.66 on 84 degrees of freedom
## Multiple R-squared:  0.717,  Adjusted R-squared:   0.7 
## F-statistic: 42.6 on 5 and 84 DF,  p-value: <0.0000000000000002

Based on the Mileage:

Based on the Age: