Leandro Guerra
August 2015
The main idea behing Text Prediction is the estimation of the next character or word given a string of the input history. This may represent a useful solution to the problem of mistyping words and to suggest which is the next word that should be.
The objective of this project is to develop a text predictive algorithm derived from large data sets composed of different sources material such as blogs, twitter and news data.
To start, the main techinique used is the n-grams approach where n-gram is a contiguous sequence of n items from a given sequence of text or speech.
An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram”; size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “four-gram”, “five-gram”, and son on.
These large sizes are not going to be used in this project.