library()
function is used to load groups of functions/datasets.library(MASS)
library(ISLR)
attach(Boston)
names(Boston)
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
lm(y~x, data)
lm.fit = lm(medv~lstat, data=Boston) # OR...
lm.fit = lm(medv~lstat)
summary(lm.fit)
.
names(lm.fit)
function to find out information stored in lm.fit()
.coef(lm.fit)
or confint(lm.fit)
to access these quantities.predict()
function can evaluate response for a given input value (or list of values). Can produce the associated confidence intervals or prediction intervals (need to specify which).
predict(lm.fit, data.frame(lstat=c(5, 10, 15)), interval="confidence")
Plot medv and lstat along with least squares regression line using plot()
and abline()
.
plot(lstat, medv) # Plots data points only.
abline(lm.fit) # Plots regression line.
abline()
can be used to draw any line with slope \(a\) and intercept \(b\) via abline(a, b)
. + lwd=3
(in plot() and abline()) causes width of line to increase by factor of 3. + pch="+"
(in plot(), maybe abline()) can be used to create different plotting symbols; here, creates ‘+’ symbol as marker.
abline(lm.fit, lwd=3, col="red")
plot(lstat, medv, pch=20) # 20 == small circles
plot(lstat, medv, pch=1:20) # Uses 20 different symbols.
par(mfrow=c(2,2))
plot(lm.fit)
lm(y~x1 + x2 + x3)
to fit a model in various ways with three predictors \(x1\), \(x2\), \(x3\).lm.fit = lm(medv~lstat+age, data=Boston) # Two predictors.
lm.fit = lm(medv~., data=Boston) # All predictors.
lm.fit = lm(medv~.-age, data=Boston) # All EXCEPT age, method 1.
lm.fit1 = update(lm.fit, ~.-age) # All EXCEPT age, method 2.
summary(lm.fit)
?summary.lm
to see what is available.
summary(lm.fit)$r.sq
and summary(lm.fit)$sigma
.Need to include the car package to get VIF. Use the vif() function.
library(car)
vif(lm.fit)
Can easily include interaction terms as well (i.e. multiplying predictors together).
lm(medv~lstat*age, data=Boston)
I(X^2)
.lm.fit = lm(medv ~ lstat)
lm.fit2 = lm(medv ~ lstat + I(lstat^2))
anova(lm.fit, lm.fit2)
## Analysis of Variance Table
##
## Model 1: medv ~ lstat
## Model 2: medv ~ lstat + I(lstat^2)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 504 19472
## 2 503 15347 1 4125.1 135.2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
poly()
function to create the polynomials in lm(), via lm(y~poly(x, 5))
for a 5th order polynomial, etc.