My Capstone Project

Fernando Crema
April, 2015

The goal of this project is to complete sentences using previous information.

We need an incomplete sentence.
Where do we learn?
- Input from Twitter, Blogs and News given by the instructors.
Used library library(e1071) for NaiveBayes.
Natural Language Processing.
- Used lm library for Natural Language Processing.
- Used RWeka for tokenize input and generate n-grams.

We need to calculate the probability of the nth word using n-1 previous words. So we have a set:

2-gram

As we have the n-grams, we need to calculate a table of frequencies.
Using the frequencies, we can obtain all the conditional probabilites we need.
For all the words we choose the maximum conditional probability of all words given the set.

2-gram

2-gram

2-gram