A Word Prediction Application using fundamental NPL processes

Yamina Touhami
12/14/2014

The model we present here is based on a series of data sets that is a consolidation of US twitter files, News, and Blogs provided by the capstone team leaders.
The model we implemented firsts processes the files and create a corpus of several datasets using an n-gram classification methodology.
Using the n-gram classification, a total of 500,000 sentences and expressions extracted from the twitter blog and news files were deployed as a database or a dictionary from which the prdictor algorithm creates its principal sentences.

In each token sample set the word frequences were computed to optimize the liklihood of the output word predictions.
Profanity words were extracted using a google profanity standard dictionary and the prediction process in this case could be biased since the elimination of these words might introduce some abberations in the input sentences.

A simple but yet practical application has been created for this project.
In order to use the app, there is an input domain on the left side of the app page. Use the space to enter a subsentence omitting the last word.
Once clicking on the “Submit”“ button, the code runs as fast as possible and provides the user with the 5 most possible scenarios to finishing the inputed subsentence.
This App is avaialbe on the shinyapp.io website under ”https://tensoriel.shinyapps.io/final/“