Coursera Capstone Project
Claudia V
August 2015
This project's objective is to create a text predicting application based on a corpus made of news, twitter and blogs. In order to make it work we created files with the frequency of appearance with unigram, bigram, trigram and quadrigram
An example for a bigram text looks like:
Once we have all the words and their frequency we are able to calculate probability as Maximum Likelihood Estimate, this will be used just for unigrams.
MLE is a good approach but fails to deliver the best results. To improve our results we are using a linear interpolation to calculate each n-Gram probability and then adding this probability to a lookup table

In order to find the best words we will use stupid backoff, we will give the best 3 results based on the query

Write the sentence you want the next word to be predicted and hit: Predict!
The three most likely words to come after your text will be displayed under: “Your prediction”
In prediction data you will see the data within the corpus.