Myriam Ragni - April,2020
The objective of this final project was to develop a text prediction algorithm using Natural Language Processing along with a Shiny Application that takes as input a phrase and outputs a prediction of the next word.
The following diagram depicts the different phases of the development of the Predictive Text Product.
After the N-Grams tokenization, uni/bi/tri and quadgram term frequency matrices were created; those are the fundament for the generation of frequency dictionaries which include the smoothed probabilities to the different N-Grams calculated using the Kneser-Ney smoothing method.
The flow below shows the logic used in the Shiny APPS to predict the possible words following a sentence provided by the user. It is based on the Katz Back-Off technique.
Detailed instructions and output description are available is the 'About' tab of the application.