Harold Nelson
3/29/2021
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Examine the dataframe mpg (from ggplot2).
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", …
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", …
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2…
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 20…
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8,…
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "aut…
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "…
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, …
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, …
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "…
## $ class <chr> "compact", "compact", "compact", "compact", "compact", "…
Build a model to predict hwy (highway mpg)
Our first model is an equation of the form
\[hwy = intercept + slope * displ\] Use lm() to create model1.
Now use summary to look at model1.
##
## Call:
## lm(formula = hwy ~ displ, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1039 -2.1646 -0.2242 2.0589 15.0105
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.6977 0.7204 49.55 <2e-16 ***
## displ -3.5306 0.1945 -18.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.836 on 232 degrees of freedom
## Multiple R-squared: 0.5868, Adjusted R-squared: 0.585
## F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
What are the values of intercept and slope?
intercept = 35.6977
slope = -3.5306
Write down the equation
hwy = 35.6977 - 3.5306 * displ
Use this equation to calculate the value of hwy when displ = 2. call this hwy2.
## [1] 28.6365
Do the same for an engine with 3 liters displacement.
## [1] -3.5306
Note the interpretation.
Increasing the displ by 1 “adds” the value of slope to hwy
The data does not lie exactly on the line we have estimated.
Show this graphically using a scatterplot of hwy and displ.
mpg %>%
ggplot(aes(x = displ, y = hwy)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red")
## `geom_smooth()` using formula 'y ~ x'
We summarize the relationship in general.
Actual = Predicted + Residual
Look at the contents of model1 using str().
## List of 12
## $ coefficients : Named num [1:2] 35.7 -3.53
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "displ"
## $ residuals : Named num [1:234] -0.343 -0.343 2.364 1.364 0.188 ...
## ..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
## $ effects : Named num [1:234] -358.566 -69.626 2.405 1.405 0.218 ...
## ..- attr(*, "names")= chr [1:234] "(Intercept)" "displ" "" "" ...
## $ rank : int 2
## $ fitted.values: Named num [1:234] 29.3 29.3 28.6 28.6 25.8 ...
## ..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
## $ assign : int [1:2] 0 1
## $ qr :List of 5
## ..$ qr : num [1:234, 1:2] -15.2971 0.0654 0.0654 0.0654 0.0654 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:234] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:2] "(Intercept)" "displ"
## .. ..- attr(*, "assign")= int [1:2] 0 1
## ..$ qraux: num [1:2] 1.07 1.08
## ..$ pivot: int [1:2] 1 2
## ..$ tol : num 1e-07
## ..$ rank : int 2
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 232
## $ xlevels : Named list()
## $ call : language lm(formula = hwy ~ displ, data = mpg)
## $ terms :Classes 'terms', 'formula' language hwy ~ displ
## .. ..- attr(*, "variables")= language list(hwy, displ)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "hwy" "displ"
## .. .. .. ..$ : chr "displ"
## .. ..- attr(*, "term.labels")= chr "displ"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(hwy, displ)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "hwy" "displ"
## $ model :'data.frame': 234 obs. of 2 variables:
## ..$ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## ..$ displ: num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language hwy ~ displ
## .. .. ..- attr(*, "variables")= language list(hwy, displ)
## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:2] "hwy" "displ"
## .. .. .. .. ..$ : chr "displ"
## .. .. ..- attr(*, "term.labels")= chr "displ"
## .. .. ..- attr(*, "order")= int 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(hwy, displ)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:2] "hwy" "displ"
## - attr(*, "class")= chr "lm"
Extract the item residuals from the model and see what it is using str().
## Named num [1:234] -0.343 -0.343 2.364 1.364 0.188 ...
## - attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
Create mpg_augment by adding the vector of residuals using cbind.