title: “HW11 605 042421” author: “Lisa Szydziak” date: “4/24/2021” output: pdf_document: default word_document: default html_document: default DISCUSSION:
Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
Motor Trend Car Road Tests Description The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Usage mtcars Format A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors
We will build a model to predict mpg with predictor variables: vs (dichotomous), carb2 (quadratic), drat, disp, vs*disp (interaction)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v stringr 1.4.0
## v tidyr 1.1.2 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(broom)
library(ggfortify)
## Warning: package 'ggfortify' was built under R version 4.0.5
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
dim(mtcars)
## [1] 32 11
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
class(mtcars)
## [1] "data.frame"
cars<-mtcars
cars$carb2<- cars$carb^2
#linear regression
model <- lm(mpg ~ vs + drat + carb2+ (vs*disp), data = cars)
model
##
## Call:
## lm(formula = mpg ~ vs + drat + carb2 + (vs * disp), data = cars)
##
## Coefficients:
## (Intercept) vs drat carb2 disp vs:disp
## 21.93796 5.98336 1.30442 -0.09324 -0.02709 -0.03334
summary(model)
##
## Call:
## lm(formula = mpg ~ vs + drat + carb2 + (vs * disp), data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1966 -1.6758 -0.4554 1.9272 5.1472
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.937957 6.714557 3.267 0.00305 **
## vs 5.983358 3.222723 1.857 0.07473 .
## drat 1.304421 1.549859 0.842 0.40767
## carb2 -0.093239 0.047428 -1.966 0.06007 .
## disp -0.027093 0.007657 -3.538 0.00154 **
## vs:disp -0.033344 0.017137 -1.946 0.06256 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.868 on 26 degrees of freedom
## Multiple R-squared: 0.8101, Adjusted R-squared: 0.7736
## F-statistic: 22.18 on 5 and 26 DF, p-value: 1.263e-08
Displacement (cu.in.) is a the only significant predictor p=0.001 of mpg in this model.
I am not a car person. What is Disp?
Here is a definition: Engine displacement is the measure of the cylinder volume swept by all of the pistons of a piston engine, excluding the combustion chambers. It is commonly used as an expression of an engine’s size, and by extension as a loose indicator of the power an engine might be capable of producing and the amount of fuel it should be expected to consume. For this reason displacement is one of the measures often used in advertising, as well as regulating, motor vehicles.
So, there you have it. A loose indicator of the amount of fuel it should be expected to consume.
The coefficient for disp is -0.027. This can be interpreted by an inverse relationship with mpg. In other words, the larger the displacement the smaller the mpg.
The predictor vs: Engine (0 = V-shaped, 1 = straight) with p=.07 is close to being significant. With a coefficient of 5.9 the straight engine has better gas mileage.
The predictor carb: Number of carburetors has a coefficient of -.09 and p=.06 close to significant. Interpret as more carburetors means less mpg.
The interaction term VS:disp is close to being significant with coefficient -0.03 and p=.06.
The Adjusted R^2 is .77 which says the model accounts for 77% of the variability.
plot(fitted(model),resid(model))
#residuals are seem uniformly scattered
qqnorm(resid(model))
#QQ plot is ok
here is some code that gives more diagnostic PLOTS
model.diag.metrics <- augment(model)
head(model.diag.metrics)
## # A tibble: 6 x 12
## .rownames mpg vs drat carb2 disp .fitted .resid .hat .sigma .cooksd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mazda RX4 21 0 3.9 16 160 21.2 -0.198 0.172 2.92 2.01e-4
## 2 Mazda RX~ 21 0 3.9 16 160 21.2 -0.198 0.172 2.92 2.01e-4
## 3 Datsun 7~ 22.8 1 3.85 1 108 26.3 -3.52 0.0955 2.83 2.94e-2
## 4 Hornet 4~ 21.4 1 3.08 1 258 16.3 5.15 0.462 2.57 8.55e-1
## 5 Hornet S~ 18.7 0 3.15 4 360 15.9 2.78 0.0993 2.87 1.92e-2
## 6 Valiant 18.1 1 2.76 1 225 17.8 0.270 0.328 2.92 1.08e-3
## # ... with 1 more variable: .std.resid <dbl>
autoplot(model)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
#the shows a slight pattern in residual v fitted, may need another quadratic
#normal QQ plot is ok
#scale location addresses heteroscadesicity may need some more analysis
#Leverage - Maserati is a leverage point