Elements of Statistical Learning

Chapter 2

quantitive outputs - continuous output (regression) qualitative outputs - categorical/discrete output (classification), classed as \(\{0,1\}\) or \(\{-1,1\}\) multiclass targets are defined by vectors of binaries \(\{0, 0, 1, 0, 0\}\)

Input features can be expressed with \(X\). \(X_j\) refers to the \(j\) element of vector \(X\). Output labels can be expressed with \(Y\) if the output is quantative, \(G\) if the output is qualitative. If \(G\) has output {0, 1} then \(G\) = \(Y\) and \(Y >= 0 | Y <= 1\) and is classified positive where \(Y > 0.5\)

Linear Models - least squares

The objective of least squares is to minimize the error defined by:

\[\hat{Y} = \hat{\beta}_0 + \sum_{j=0}^pX_j\beta_j\] which can be rewritten as:

\[\hat{Y} = X^T\hat{\beta}\] We fit the set of coefficients to minimize the residual sum of squares:

\[RSS(\beta) = \sum_{i = 1}^N(y_i - x_i^T)^2\] or in matrix notation:

\[RSS(\beta) = (y - X\beta)^T(y - X\beta)\]

Simple R example

alligator <- data.frame(
  lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
    3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
  lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
    3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)

And if we plot length against weight:

fit <- lm(lnWeight ~ lnLength, data = alligator)
summary(fit)

## 
## Call:
## lm(formula = lnWeight ~ lnLength, data = alligator)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24348 -0.03186  0.03740  0.07727  0.12669 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -8.4761     0.5007  -16.93 3.08e-10 ***
## lnLength      3.4311     0.1330   25.80 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1229 on 13 degrees of freedom
## Multiple R-squared:  0.9808, Adjusted R-squared:  0.9794 
## F-statistic: 665.8 on 1 and 13 DF,  p-value: 1.495e-12

Linear Models - nearest neighbours

Nearest-neighbor methods use those observations in the training set T closest in input space to x to form ˆ Y . Specifically, the k-nearest neighbor fit for ˆ Y is defined as follows:

\[\hat{Y}(x) = \frac1k\sum_{x_i\in{N}_k(x)}y_i\]

require(class)

## Loading required package: class

library(ElemStatLearn)
require(class)
x <- mixture.example$x
g <- mixture.example$y
xnew <- mixture.example$xnew
mod15 <- knn(x, xnew, g, k=15, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
px1 <- mixture.example$px1
px2 <- mixture.example$px2
prob15 <- matrix(prob, length(px1), length(px2))
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
        "15-nearest neighbour", axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
gd <- expand.grid(x=px1, y=px2)
points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()