Classification - Naive Bayes

This note attempts to summarize the theory of Naive Bayes - a classic clissification algorithm. Please check the original soures for the details. Source 1 ; source 2; source 3; source 4

Usage of classification

topic classification
spam filtering
sentiment analysis

Basic concepts

Tokenization: the process of chopping up sentences into smaller pieces (words or tokens)

The choice of delimiter – most cases be a whitespace
Deal with punctuation marks – ! (emphasis) and ? (uncertain)
bold, italic, underline, bracket or a link –scarpe the html code.

Normalization: recude each word to its base/stem form

chop off the affixes
treat with apostrophe (’) and quotation mark (ex. George’s Phone - george and phone)
convert capital letters to lowercase
ambigious words. (e.x. USA, US and U.S.A.)

Bag-of-words: take individual features (in this case is words) into account and give each word a specific subjectivity score.
Classifier evaluation: for determining the accuracy of a single classifier, or comparing the results of different classifier, the F-score and accuracy are usually used.

F = 2pr / (p + r)

where p is the precision and r is the recall. P – indicates how many selected items are relevant (TP / (TP + FP)). r – indicates how many relavent items are selected (TP / (TP + FN)). accuracy = (TP + TN) / (TP + FP + FN +TN)

Naive Bayes

In reality, we have to predict an outcome given mutiple features. In this case, the math gets very complicated. To get around that complication, one approach is to ‘uncouple’ multiple pieces of features, and treat each piece of features as independent. Assumes that all features are independent of each other is the pre-assumption.

Pros:

Naive bayes can handle missing data.
Does not need a lot of data to perform well. Try Naive Bayes if you do not have much training data.

Cons:

If we try to calculate the probability for a feature that doesn’t exist in the training set, it will come out as zero and is unable to make a prediction.
Naive Bayes will not be reliable if there are significant differences in the attribute distributions compared to the training dataset.
If features/attributes are correlated with each other and voted for multiples in the model, it causes over inflating their importance.

Solutions:

Evaluate the correlation of attributes pairwise with each other using a correlation matrix and remove those features that are the most highly correlated.

An example:

Summary: The Posterior probability of a clase equals to the prior probability of a class multiplies the probability of each features within this class

Implement Naive Bayes

Please check the code here

Classification - Naive Bayes

JH

February 29, 2016