2 Naive Bayes Theorem: Basic Idea in natural language processing (NLP) problems.

\[\begin{equation} \textbf{P(Tag}|\textbf{Sentence)} = \textbf{P(Tag)} \frac{\textbf{P(Sentence} |\textbf{Tag})}{\textbf{P(Sentence)}} \end{equation}\]

\[\mbox{Posterior}= \mbox{Likelihood}\frac{\mbox{Proposition prior probability}}{\mbox{Evidence prior probability}}\]

2.1 Bayes’ Theorem for Naive Bayes Algorithm

The basic idea how to use Naive Bayes algorithm for machine learning classification problem is as follows: Suppose that a busines problem has multiple feature classes, say, \(C_1, C_2, \ldots, C_h\). The Naive Bayes algorithm use to compute the conditional probability of an object with a feature vector \(x_1, x_2,\ldots, x_m\) belongs to a particular class \(C_i\),

\[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)=\frac{P(x_1, x_2,\ldots, x_m|C_i).P(C_i)}{P(x_1, x_2,\ldots, x_m)}\]

The main assumption of Naive Bayes Algorithm is that feature classes are mutually independent. Therefore, the conditional probability term, \(P(x_j|x_{j+1},\ldots, x_m, C_i)\) becomes \(P(x_j|C_i)\). Then

\[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)=\left(\prod_{j=1}^{j=m}P(x_j|C_i)\right).\frac{P(C_i)}{P(x_1, x_2,\ldots, x_m)}\]

Due to the invariant scaling expression of \(P(x_1, x_2,\ldots, x_m)\) for all the feature classes, the above expression can simplify as \[\displaystyle P(C_i|x_1, x_2,\ldots, x_m)\propto\left(\prod_{j=1}^{j=m}P(x_j|C_i)\right).P(C_i)\] for \(1\leq i\leq h\)