This note attempts to summarize the theory of Naive Bayes - a classic clissification algorithm. Please check the original soures for the details. Source 1 ; source 2; source 3; source 4
Usage of classification
Basic concepts
Bag-of-words: take individual features (in this case is words) into account and give each word a specific subjectivity score.
Classifier evaluation: for determining the accuracy of a single classifier, or comparing the results of different classifier, the F-score and accuracy are usually used.
F = 2pr / (p + r)
where p is the precision and r is the recall. P – indicates how many selected items are relevant (TP / (TP + FP)). r – indicates how many relavent items are selected (TP / (TP + FN)). accuracy = (TP + TN) / (TP + FP + FN +TN)
Naive Bayes
In reality, we have to predict an outcome given mutiple features. In this case, the math gets very complicated. To get around that complication, one approach is to ‘uncouple’ multiple pieces of features, and treat each piece of features as independent. Assumes that all features are independent of each other is the pre-assumption.
Pros:
Cons:
Solutions:
Evaluate the correlation of attributes pairwise with each other using a correlation matrix and remove those features that are the most highly correlated.
An example:
Summary: The Posterior probability of a clase equals to the prior probability of a class multiplies the probability of each features within this class
Implement Naive Bayes
Please check the code here