Taken from “Deep Learning” by Goodfellow, Bengio and Courville.
Measure how well a task is performed.
There are many hyperparameters that make things work. There are no general rules on how to set them, they are often application specific. Success and failure can be hard to explain.
Data: We are given training set \((x_i,y_i)\), \(i=1:n\); \(x_i\in R^p\) are the input variables and \(y_i\) are the labels. Typically, the label could be one of the \(K\geq 2\) classes.
Task: Given an input \(x\), predict the correct label.
The basic idea in Linear Discriminant Analysis is that samples from each class are drawn from a multivariate normal distribution with mean \(\mu_k\) and common covariance \(\Sigma\). So, if \(x_k\) is class \(k\), then
\[ X \sim N(\mu_k, \Sigma)\].
We use the training data to learn \(\mu_k\) and \(\Sigma\):
\[\hat{\pi}_k = N_k/N\] \[\hat{\mu}_k = \sum_{y_i=k} x_i/N_k\] \[\hat{\Sigma} = \sum_{k=1}^K \sum_{y_i=k} (x_i - \hat{\mu}_k)(x_i-\hat{\mu}_k)^T/(N-K)\] Linear Discriminant Functions \[ \delta_k(x) = x^T\Sigma^{-1}\mu_k-\frac{1}{2} \mu_k^T \Sigma^{-1} \mu_k + \log \pi_k \] Method: Given \(x\), we calculate \(\delta_k(x)\). We assign class \(k\) if \(\delta_k(x)\) is maximized. Based on computing the posterior probability \(P(Y=k|X=x)\).
The Decision boundary between class \(k\) and class \(k'\) is the set \(\{ x: \delta_k(x) = \delta_k'(x)\}\)
library(ISLR)
library(MASS)
library(ggplot2)
library(dplyr)
library(class)
smp_size <- floor(0.7 * nrow(Default))
set.seed(123)
train_ind <- sample(seq_len(nrow(Default)), size = smp_size)
train <- Default[train_ind, ]
test <- Default[-train_ind, ]
#visualize training set
ggplot(train,
aes(x=income,y=balance,color=default))+
geom_point(size=1,shape=3)
LDA on training set
# train on data
lda.fit = lda(default ~ income + balance, data = train)
Plot decision boundary (from michael.hahsler.net)
# I got this from michael.hahsler.net
decisionplot <- function(model, data, class = NULL, predict_type = "class",
resolution = 100, showgrid = TRUE, ...) {
if(!is.null(class)) cl <- data[,class] else cl <- 1
data <- data[,1:2]
k <- length(unique(cl))
plot(data, col = as.integer(cl)+1L, pch = as.integer(cl)+1L, ...)
# make grid
r <- sapply(data, range, na.rm = TRUE)
xs <- seq(r[1,1], r[2,1], length.out = resolution)
ys <- seq(r[1,2], r[2,2], length.out = resolution)
g <- cbind(rep(xs, each=resolution), rep(ys, time = resolution))
colnames(g) <- colnames(r)
g <- as.data.frame(g)
### guess how to get class labels from predict
### (unfortunately not very consistent between models)
p <- predict(model, g, type = predict_type)
if(is.list(p)) p <- p$class
p <- as.factor(p)
if(showgrid) points(g, col = as.integer(p)+1L, pch = ".")
z <- matrix(as.integer(p), nrow = resolution, byrow = TRUE)
contour(xs, ys, z, add = TRUE, drawlabels = FALSE,
lwd = 2, levels = (1:(k-1))+.5)
invisible(z)
}
q <- train[1:7000,c("income","balance","default")]
decisionplot(lda.fit, q, class = "default")
Test LDA model on test data
lda.pred = predict(lda.fit,test)
table(lda.pred$class,test$default)
##
## No Yes
## No 2906 71
## Yes 7 16
Homework: Learn about ROC curve for assessing classifier.
Assumes that for each class, you have a different mean and covariance matrix \[X \sim N(\mu_k,\Sigma_k)\] The formula for to find out what class to assign to \(x\) is \[\delta_k(x) = -\frac{1}{2}(x-\mu_k)^T\Sigma_k^{-1}(x-\mu_k) -\frac{1}{2}\log|\Sigma_k| + \log \pi_k \] Note that the formula is quadratic in \(x\).
Try QDA on the Default data.
qda.fit = qda(default ~ income + balance, data = train)
decisionplot(qda.fit, q, class = "default")
qda.pred = predict(qda.fit, test)
table(qda.pred$class, test$default)
##
## No Yes
## No 2902 68
## Yes 11 19
We are again given labeled data \((y_i,x_i)\). Note that KNN can be used for regression task. The idea behind KNN is simple. You need to specify the number of neighbors to check. Given a test point \(x\), you find \(k\) nearest neighbors to \(x\) in \(\{x_i\}\), read the associated labels, use majority rule.
library(caret)
## Warning: package 'caret' was built under R version 3.3.3
## Loading required package: lattice
knn.fit <- knn3(default ~ balance + income, data = train, k=3)
decisionplot(knn.fit, q, class="default")
attach(Default)
train.X = cbind(balance,income)[train_ind,]
test.X = cbind(balance,income)[-train_ind,]
train.default = default[train_ind]
knn.pred=knn(train.X,test.X,train.default,k=1)
table(knn.pred,default[-train_ind])
##
## knn.pred No Yes
## No 2847 65
## Yes 66 22
knn.pred=knn(train.X,test.X,train.default,k=5)
table(knn.pred,default[-train_ind])
##
## knn.pred No Yes
## No 2891 75
## Yes 22 12
Homework (easy): Do ISLR 4.6 “Lab: Logistic Regression, QDA, and KNN”
Homework (more challenging): Problem 10 on page 171 of ISLR (the Lab above will be helpful)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.