No one method will dominate the others in every situation.

  1. When the true decision boundaries are linear, then the LDA and logistic regression approaches will tend to perform well.
  2. When the boundaries are moderately non-linear, QDA may give better results.
  3. Finally, for much more complicated decision boundaries, a non-parametric approach such as KNN can be superior.

Logistic Regression

In this type of regression, unlike Linear Regression we predict the class or category of the dependent variable using independent variable by using relation between them.

Let us use the stock market dataset ‘Smarket’ from the ISLR Library, to predict the direction of stock market based on the other variables present in dataset.

library(ISLR)
## Warning: package 'ISLR' was built under R version 3.4.3
attach(Smarket)
head(Smarket, 10)
##    Year   Lag1   Lag2   Lag3   Lag4   Lag5 Volume  Today Direction
## 1  2001  0.381 -0.192 -2.624 -1.055  5.010 1.1913  0.959        Up
## 2  2001  0.959  0.381 -0.192 -2.624 -1.055 1.2965  1.032        Up
## 3  2001  1.032  0.959  0.381 -0.192 -2.624 1.4112 -0.623      Down
## 4  2001 -0.623  1.032  0.959  0.381 -0.192 1.2760  0.614        Up
## 5  2001  0.614 -0.623  1.032  0.959  0.381 1.2057  0.213        Up
## 6  2001  0.213  0.614 -0.623  1.032  0.959 1.3491  1.392        Up
## 7  2001  1.392  0.213  0.614 -0.623  1.032 1.4450 -0.403      Down
## 8  2001 -0.403  1.392  0.213  0.614 -0.623 1.4078  0.027        Up
## 9  2001  0.027 -0.403  1.392  0.213  0.614 1.1640  1.303        Up
## 10 2001  1.303  0.027 -0.403  1.392  0.213 1.2326  0.287        Up
names(Smarket)
## [1] "Year"      "Lag1"      "Lag2"      "Lag3"      "Lag4"      "Lag5"     
## [7] "Volume"    "Today"     "Direction"

Now let us create a logistic regression model to predict the ‘direction’ of the market using all other variables of the dataset.

glm.fit=glm(Direction ~ Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=Smarket, family =binomial )

let us have a look at the model using ‘summary’ function.

summary(glm.fit)
## 
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
##     Volume, family = binomial, data = Smarket)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.446  -1.203   1.065   1.145   1.326  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.126000   0.240736  -0.523    0.601
## Lag1        -0.073074   0.050167  -1.457    0.145
## Lag2        -0.042301   0.050086  -0.845    0.398
## Lag3         0.011085   0.049939   0.222    0.824
## Lag4         0.009359   0.049974   0.187    0.851
## Lag5         0.010313   0.049511   0.208    0.835
## Volume       0.135441   0.158360   0.855    0.392
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1731.2  on 1249  degrees of freedom
## Residual deviance: 1727.6  on 1243  degrees of freedom
## AIC: 1741.6
## 
## Number of Fisher Scoring iterations: 3

The star logic follows as in Linear Regression. Variables having more number of stars are more significant (P-Value < 0.05). But in our model, no variable seems important or doesnot impact the ‘Direction’.

As our ‘Direction’ variable is a categorical variable, R assigns a dummy variable to it. To know the code of dummy variable, we use contrasts function.

contrasts(Direction)
##      Up
## Down  0
## Up    1

Making Predictions

Now using our ‘glm.fit’ function, we will try to make predictions for new unknown data. As I dont have any test data, I will use first 10 rows of the same data to make predictions.

test <- head(Smarket, 10)

Now I will give this ‘test’ set to ‘predict’ function and obtain probability for P(Y=1 | x). We give ‘type = “response”’ to obtain probabilities instead of ‘log-odds’

predictions <- predict(glm.fit, newdata = test, type = "response")
predictions
##         1         2         3         4         5         6         7 
## 0.5070841 0.4814679 0.4811388 0.5152224 0.5107812 0.5069565 0.4926509 
##         8         9        10 
## 0.5092292 0.5176135 0.4888378

Now, Let us convert these probabilities into categorical variable by assigning ‘Up’ for probabilities ‘>0.5’ and ‘Down’ for probabilities ‘<0.5’

direction <- ifelse(predictions > 0.5, 'Up', 'Down')
direction
##      1      2      3      4      5      6      7      8      9     10 
##   "Up" "Down" "Down"   "Up"   "Up"   "Up" "Down"   "Up"   "Up" "Down"

Let us see how our predictions fared compared to original predictions using table function.

table(test$Direction, direction)
##       direction
##        Down Up
##   Down    2  0
##   Up      2  6

So our accuracy is 0.8

Linear Discriminant Analysis

Logistic Regression is used for categorizing two class variables or data. Linear Discriminant Analysis is used for categorizing multiclass variables. It uses Bayes Theorem and Normal Distribution to do the classification. It assumes that inclass variance in all classes is same. We use ‘lda’ function to categorize.

We use the same ‘Smarket’ data to understand LDA.

library(MASS)
## Warning: package 'MASS' was built under R version 3.4.3
lda.fit=lda(Direction ~ Lag1+Lag2 ,data=Smarket)
lda.fit
## Call:
## lda(Direction ~ Lag1 + Lag2, data = Smarket)
## 
## Prior probabilities of groups:
##   Down     Up 
## 0.4816 0.5184 
## 
## Group means:
##             Lag1        Lag2
## Down  0.05068605  0.03229734
## Up   -0.03969136 -0.02244444
## 
## Coefficients of linear discriminants:
##             LD1
## Lag1 -0.7567605
## Lag2 -0.4707872

It provides ‘prior probabilities’ and ‘class means’ to be used by Bayes Theorem. We used only Lag1 and Lag2 variables for our model, just for simplicity.

Making Predictions

We can use ‘predict’ function to predict the class.

new_pred <- predict(lda.fit, newdata = test)
new_pred
## $class
##  [1] Up   Down Down Up   Up   Up   Down Up   Up   Down
## Levels: Down Up
## 
## $posterior
##         Down        Up
## 1  0.4861024 0.5138976
## 2  0.5027466 0.4972534
## 3  0.5104516 0.4895484
## 4  0.4817860 0.5182140
## 5  0.4854771 0.5145229
## 6  0.4920394 0.5079606
## 7  0.5085978 0.4914022
## 8  0.4896886 0.5103114
## 9  0.4774690 0.5225310
## 10 0.5049515 0.4950485
## 
## $x
##             LD1
## 1  -0.193187790
## 2  -0.900356413
## 3  -1.227714911
## 4  -0.009643717
## 5  -0.166603724
## 6  -0.445506476
## 7  -1.148941474
## 8  -0.345614409
## 9   0.174041524
## 10 -0.994023376

So our model has 3 variables, ‘class’ which is the class of given observation deduced using our model.

Let us look at the confusion matrix, to verify our predictions.

table(test$Direction, new_pred$class)
##       
##        Down Up
##   Down    2  0
##   Up      2  6

Our accuracy was same as the previous one. i.e., 80%

Quadractic Discriminant Analyis (QDA)

The difference between LDA and QDA is QDA assumes that the inclass variance among different classes is different.

qda.fit=qda(Direction ~ Lag1+Lag2 ,data=Smarket)
qda.fit
## Call:
## qda(Direction ~ Lag1 + Lag2, data = Smarket)
## 
## Prior probabilities of groups:
##   Down     Up 
## 0.4816 0.5184 
## 
## Group means:
##             Lag1        Lag2
## Down  0.05068605  0.03229734
## Up   -0.03969136 -0.02244444

It provides ‘prior probabilities’ and ‘class means’ to be used by Bayes Theorem. We used only Lag1 and Lag2 variables for our model, just for simplicity.

Makig Predictions

We can use ‘predict’ function to predict the class.

qda_pred <- predict(qda.fit, newdata = test)
qda_pred
## $class
##  [1] Up   Up   Down Up   Up   Up   Down Up   Up   Up  
## Levels: Down Up
## 
## $posterior
##         Down        Up
## 1  0.4754475 0.5245525
## 2  0.4944981 0.5055019
## 3  0.5083068 0.4916932
## 4  0.4780731 0.5219269
## 5  0.4773961 0.5226039
## 6  0.4836726 0.5163274
## 7  0.5013701 0.4986299
## 8  0.4916379 0.5083621
## 9  0.4675238 0.5324762
## 10 0.4967838 0.5032162

So our model has 3 variables, ‘class’ which is the class of given observation deduced using our model.

Let us look at the confusion matrix, to verify our predictions.

table(test$Direction, qda_pred$class)
##       
##        Down Up
##   Down    2  0
##   Up      0  8

And now, our accuracy is 100%

k- Nearest Neighbours (kNN)

kNN doesnot use parametric approach. It finds the k nearest points to our test data and assigns the test data point to the class which was most repleated in its k neighbours.

We will now perform KNN using the knn() function, which is part of the knn() class library. This function works rather differently from the other modelfitting functions that we have encountered thus far. Rather than a two-step approach in which we first fit the model and then we use the model to make predictions, knn() forms predictions using a single command. The function requires four inputs.

  1. A matrix containing the predictors associated with the training data, labeled ‘train_knn’
  2. A matrix containing the predictors associated with the data for which we wish to make predictions - ‘test_knn’
  3. A vector containing the class labels for the training observations, labeled ‘direction’
  4. A value for K, the number of nearest neighbors to be used by the classifier.
library (class)
## Warning: package 'class' was built under R version 3.4.3
train_knn <- Smarket[,2:3]
test_knn <- test[,2:3]
direction <- Smarket$Direction

We got our inputs, So let us create our kNN Algorithm. kNN directly makes the predictions

knn.pred=knn (train_knn, test_knn, direction ,k=3)

Now let us verify the accuracy of our model using table function to create confusion matrix.

table(test$Direction, knn.pred)
##       knn.pred
##        Down Up
##   Down    1  1
##   Up      0  8

Our accuracy is 78%