Andrew Martinez-Novoa
25/02/2018
The objective of the Capstone project was to develop an algorithm to predict the next word in a sentence and implement it as a Shiny application
The problem was solved with the well-known N-gram model in natural language processing
The final project was a Stupid Backoff model using Unigrams, Bigrams and Trigrams
The implementation of a Stupid Backoff model has two key advantages over other models:
Inexpensive: It requires few resources compared other models such as the Katz Backoff Model
Accurate: It approaches an accuracy similar to Kneser-Ney Smoothing
In the Stupid Backoff model, the backoff factor Alpha is heuristically set to a fixed value (0.4) instead of being computed to reduce complexity.
In our implementation, the N-gram (Unigram, bigram and trigram) data was generated normally. The backoff factor Alpha was applied within the prediction R script.
The Word Prediction app can be found here: https://vradek.shinyapps.io/en_US/
It is simple and intuitive to use. Just type in a sentence and click the predict button, your word should then appear