title: “HW11 605 042421” author: “Lisa Szydziak” date: “4/24/2021” output: pdf_document: default word_document: default html_document: default DISCUSSION:

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Motor Trend Car Road Tests Description The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Usage mtcars Format A data frame with 32 observations on 11 (numeric) variables.

[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors

We will build a model to predict mpg with predictor variables: vs (dichotomous), carb2 (quadratic), drat, disp, vs*disp (interaction)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(broom)
library(ggfortify)
## Warning: package 'ggfortify' was built under R version 4.0.5
attach(mtcars)
## The following object is masked from package:ggplot2:
## 
##     mpg
dim(mtcars)
## [1] 32 11
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
class(mtcars)
## [1] "data.frame"
cars<-mtcars
cars$carb2<- cars$carb^2
#linear regression
model <- lm(mpg ~ vs + drat +  carb2+ (vs*disp), data = cars)
model
## 
## Call:
## lm(formula = mpg ~ vs + drat + carb2 + (vs * disp), data = cars)
## 
## Coefficients:
## (Intercept)           vs         drat        carb2         disp      vs:disp  
##    21.93796      5.98336      1.30442     -0.09324     -0.02709     -0.03334
summary(model)   
## 
## Call:
## lm(formula = mpg ~ vs + drat + carb2 + (vs * disp), data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1966 -1.6758 -0.4554  1.9272  5.1472 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 21.937957   6.714557   3.267  0.00305 **
## vs           5.983358   3.222723   1.857  0.07473 . 
## drat         1.304421   1.549859   0.842  0.40767   
## carb2       -0.093239   0.047428  -1.966  0.06007 . 
## disp        -0.027093   0.007657  -3.538  0.00154 **
## vs:disp     -0.033344   0.017137  -1.946  0.06256 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.868 on 26 degrees of freedom
## Multiple R-squared:  0.8101, Adjusted R-squared:  0.7736 
## F-statistic: 22.18 on 5 and 26 DF,  p-value: 1.263e-08

Displacement (cu.in.) is a the only significant predictor p=0.001 of mpg in this model.

I am not a car person. What is Disp?

Here is a definition: Engine displacement is the measure of the cylinder volume swept by all of the pistons of a piston engine, excluding the combustion chambers. It is commonly used as an expression of an engine’s size, and by extension as a loose indicator of the power an engine might be capable of producing and the amount of fuel it should be expected to consume. For this reason displacement is one of the measures often used in advertising, as well as regulating, motor vehicles.

So, there you have it. A loose indicator of the amount of fuel it should be expected to consume.

The coefficient for disp is -0.027. This can be interpreted by an inverse relationship with mpg. In other words, the larger the displacement the smaller the mpg.

The predictor vs: Engine (0 = V-shaped, 1 = straight) with p=.07 is close to being significant. With a coefficient of 5.9 the straight engine has better gas mileage.

The predictor carb: Number of carburetors has a coefficient of -.09 and p=.06 close to significant. Interpret as more carburetors means less mpg.

The interaction term VS:disp is close to being significant with coefficient -0.03 and p=.06.

The Adjusted R^2 is .77 which says the model accounts for 77% of the variability.

plot(fitted(model),resid(model))

#residuals are seem uniformly scattered
qqnorm(resid(model))

#QQ plot is  ok

here is some code that gives more diagnostic PLOTS

model.diag.metrics <- augment(model)
head(model.diag.metrics)
## # A tibble: 6 x 12
##   .rownames   mpg    vs  drat carb2  disp .fitted .resid   .hat .sigma .cooksd
##   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
## 1 Mazda RX4  21       0  3.9     16   160    21.2 -0.198 0.172    2.92 2.01e-4
## 2 Mazda RX~  21       0  3.9     16   160    21.2 -0.198 0.172    2.92 2.01e-4
## 3 Datsun 7~  22.8     1  3.85     1   108    26.3 -3.52  0.0955   2.83 2.94e-2
## 4 Hornet 4~  21.4     1  3.08     1   258    16.3  5.15  0.462    2.57 8.55e-1
## 5 Hornet S~  18.7     0  3.15     4   360    15.9  2.78  0.0993   2.87 1.92e-2
## 6 Valiant    18.1     1  2.76     1   225    17.8  0.270 0.328    2.92 1.08e-3
## # ... with 1 more variable: .std.resid <dbl>
autoplot(model)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

#the shows a slight pattern in residual v fitted, may need another quadratic
#normal QQ plot is ok
#scale location addresses heteroscadesicity may need some more analysis
#Leverage - Maserati is a leverage point