This note attempts to summarize the theory of Naive Bayes - a classic clissification algorithm. Please check the original soures for the details. Source 1 ; source 2; source 3; source 4

Usage of classification

Basic concepts

  1. Tokenization: the process of chopping up sentences into smaller pieces (words or tokens)
  1. Normalization: recude each word to its base/stem form
  1. Bag-of-words: take individual features (in this case is words) into account and give each word a specific subjectivity score.

  2. Classifier evaluation: for determining the accuracy of a single classifier, or comparing the results of different classifier, the F-score and accuracy are usually used.

F = 2pr / (p + r)

where p is the precision and r is the recall. P – indicates how many selected items are relevant (TP / (TP + FP)). r – indicates how many relavent items are selected (TP / (TP + FN)). accuracy = (TP + TN) / (TP + FP + FN +TN)

Naive Bayes

In reality, we have to predict an outcome given mutiple features. In this case, the math gets very complicated. To get around that complication, one approach is to ‘uncouple’ multiple pieces of features, and treat each piece of features as independent. Assumes that all features are independent of each other is the pre-assumption.

Pros:

Cons:

Solutions:

Evaluate the correlation of attributes pairwise with each other using a correlation matrix and remove those features that are the most highly correlated.

An example:

Summary: The Posterior probability of a clase equals to the prior probability of a class multiplies the probability of each features within this class

Implement Naive Bayes

Please check the code here