Naive Bayes Vs Logistic Regression Vs Support Vector Machine: Python + Rstudio
Naive Bayes Vs Logistic Regression Vs Support Vector Machine: Python + Rstudio
- 1 Import python packages
- 2 Naive Bayes Theorem: Basic Idea in natural language processing (NLP) problems.
- 3 A Practical Example
- 4 Text Before Cleansing
- 5 Text After Cleansing
- 6 Split Dataset into Train-set and Test-set
- 7 Naive Bayes Classifier for Multinomial Models
- 8 Logistic Regression
- 9 Linear Support Vector Machine
1 Import python packages
import nltk
import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.corpus import movie_reviews
from nltk.tokenize import word_tokenize,sent_tokenize
from sklearn.svm import SVC , LinearSVC , NuSVC
from sklearn.naive_bayes import MultinomialNB,BernoulliNB
from sklearn.linear_model import LogisticRegression,SGDClassifier
2 Naive Bayes Theorem: Basic Idea in natural language processing (NLP) problems.
\[\begin{equation} \textbf{P(Tag}|\textbf{Sentence)} = \textbf{P(Tag)} \frac{\textbf{P(Sentence} |\textbf{Tag})}{\textbf{P(Sentence)}} \end{equation}\]
\[\mbox{Posterior}= \mbox{Likelihood}\frac{\mbox{Proposition prior probability}}{\mbox{Evidence prior probability}}\]
2.1 Bayes’ Theorem for Naive Bayes Algorithm
The basic idea how to use Naive Bayes algorithm for machine learning classification problem is as follows: Suppose that a busines problem has multiple feature classes, say, \(C_1, C_2, \ldots, C_h\). The Naive Bayes algorithm use to compute the conditional probability of an object with a feature vector \(x_1, x_2,\ldots, x_m\) belongs to a particular class \(C_i\),
\[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)=\frac{P(x_1, x_2,\ldots, x_m|C_i).P(C_i)}{P(x_1, x_2,\ldots, x_m)}\]
The main assumption of Naive Bayes Algorithm is that feature classes are mutually independent. Therefore, the conditional probability term, \(P(x_j|x_{j+1},\ldots, x_m, C_i)\) becomes \(P(x_j|C_i)\). Then
\[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)=\left(\prod_{j=1}^{j=m}P(x_j|C_i)\right).\frac{P(C_i)}{P(x_1, x_2,\ldots, x_m)}\]
Due to the invariant scaling expression of \(P(x_1, x_2,\ldots, x_m)\) for all the feature classes, the above expression can simplify as \[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)\propto\left(\prod_{j=1}^{j=m}P(x_j|C_i)\right).P(C_i)\] for \(1\leq i\leq h\)