Logistic regression

Logistic regression is one of the most simplest of the model available. It relies on calculating the log of value being beyond 50% or not, and if it is then it’s categorized as a positive else a negative case. The formula used for logistic regression is below

Sample R code:

library(datasets)
library(nnet)
str(iris)

## 'data.frame':  150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

levels(iris$Species)

## [1] "setosa"     "versicolor" "virginica"

iris$speciesRelevel <- relevel(iris$Species, ref = "setosa")
multinom(speciesRelevel ~ Sepal.Length + Sepal.Width  + Petal.Length + Petal.Width  , data = iris)

## # weights:  18 (10 variable)
## initial  value 164.791843 
## iter  10 value 16.177348
## iter  20 value 7.111438
## iter  30 value 6.182999
## iter  40 value 5.984028
## iter  50 value 5.961278
## iter  60 value 5.954900
## iter  70 value 5.951851
## iter  80 value 5.950343
## iter  90 value 5.949904
## iter 100 value 5.949867
## final  value 5.949867 
## stopped after 100 iterations

## Call:
## multinom(formula = speciesRelevel ~ Sepal.Length + Sepal.Width + 
##     Petal.Length + Petal.Width, data = iris)
## 
## Coefficients:
##            (Intercept) Sepal.Length Sepal.Width Petal.Length Petal.Width
## versicolor    18.69037    -5.458424   -8.707401     14.24477   -3.097684
## virginica    -23.83628    -7.923634  -15.370769     23.65978   15.135301
## 
## Residual Deviance: 11.89973 
## AIC: 31.89973

Other classification algorithms include

Naive Bayes

Navie Bayes algorithm works on the basis of calculating the probability and assigning a value to the dependent variable. It is also called generative algorithm, because it generates new data our of the existing data and use it to calculate the probability. Naive Bayes is very good at NLP as it work really great with Strings of data.

The formula used for Naive Bayes is below

P(B|A) = ( P(A|B) * P(B) ) / (P(A))

SVM

Support Vector Machines is considered one of the best algorithms for classification, since it is much faster and deals with outliers very well. It works on the principle of using the data to construct a plane that separates out the data points. And any new observation that’s given is calculate to see on which side of the plane it lands and hence it classifies the value.

Decision Trees

Decision Trees work by using a Tree based structure of the data. Using concepts such as Entropy, Jini and Information Gain, Decision Tree algorithm evaluates and constructs a Tree with root node being the most efficient one that could help classify the data, and the child nodes being the other features that end up with nodes that represent the dependent variable classes.

KNN

KNN works on the nearest neighbors model. It projects the data onto a plane and any new observation is receives, it projects onto the plane and calculates the nearest neighbour using Euclidean distance formula. Once it identifies the neighbours, then it finds the majority of them and assigns the value to target variable of the observation.

621_Blog_2

Classification algorithms:

What is classification:

Types of classification

Logistic regression

Other classification algorithms include

Naive Bayes

SVM

Decision Trees

KNN