Classification algorithms:

In this blog, I would like to focus on the algorithms which are very good at classification. Some of them like KNN, SVM could also be used for regression, but they are best at predicting classification values.

What is classification:

If our dependent variable is categorical in nature i.e., something which has discrete set of value or binary values, then we refer them as categorical. For example, predicting if it will rain as Yes or No, result of a coin flip, categorizing a transaction as fraudulent or genuine etc

Types of classification

Within classification, we can sub divide the types into two at a broad level 1. Binary classification - Where we have only 2 expected result values. Eg: Yes or No, Heads or Tails 2. Multi classification - Here our dependent variable can contain more than 2 discrete values, example flavours of icecream (Chocolate, Vanilla, Pistachio etc)

Logistic regression

Logistic regression is one of the most simplest of the model available. It relies on calculating the log of value being beyond 50% or not, and if it is then it’s categorized as a positive else a negative case. The formula used for logistic regression is below

Sample R code:

library(datasets)
library(nnet)
str(iris)
## 'data.frame':  150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
levels(iris$Species)
## [1] "setosa"     "versicolor" "virginica"
iris$speciesRelevel <- relevel(iris$Species, ref = "setosa")
multinom(speciesRelevel ~ Sepal.Length + Sepal.Width  + Petal.Length + Petal.Width  , data = iris)
## # weights:  18 (10 variable)
## initial  value 164.791843 
## iter  10 value 16.177348
## iter  20 value 7.111438
## iter  30 value 6.182999
## iter  40 value 5.984028
## iter  50 value 5.961278
## iter  60 value 5.954900
## iter  70 value 5.951851
## iter  80 value 5.950343
## iter  90 value 5.949904
## iter 100 value 5.949867
## final  value 5.949867 
## stopped after 100 iterations
## Call:
## multinom(formula = speciesRelevel ~ Sepal.Length + Sepal.Width + 
##     Petal.Length + Petal.Width, data = iris)
## 
## Coefficients:
##            (Intercept) Sepal.Length Sepal.Width Petal.Length Petal.Width
## versicolor    18.69037    -5.458424   -8.707401     14.24477   -3.097684
## virginica    -23.83628    -7.923634  -15.370769     23.65978   15.135301
## 
## Residual Deviance: 11.89973 
## AIC: 31.89973

Other classification algorithms include

Naive Bayes

Navie Bayes algorithm works on the basis of calculating the probability and assigning a value to the dependent variable. It is also called generative algorithm, because it generates new data our of the existing data and use it to calculate the probability. Naive Bayes is very good at NLP as it work really great with Strings of data.

The formula used for Naive Bayes is below

P(B|A) = ( P(A|B) * P(B) ) / (P(A))

SVM

Support Vector Machines is considered one of the best algorithms for classification, since it is much faster and deals with outliers very well. It works on the principle of using the data to construct a plane that separates out the data points. And any new observation that’s given is calculate to see on which side of the plane it lands and hence it classifies the value.

Decision Trees

Decision Trees work by using a Tree based structure of the data. Using concepts such as Entropy, Jini and Information Gain, Decision Tree algorithm evaluates and constructs a Tree with root node being the most efficient one that could help classify the data, and the child nodes being the other features that end up with nodes that represent the dependent variable classes.

KNN

KNN works on the nearest neighbors model. It projects the data onto a plane and any new observation is receives, it projects onto the plane and calculates the nearest neighbour using Euclidean distance formula. Once it identifies the neighbours, then it finds the majority of them and assigns the value to target variable of the observation.