Using NLP to Read Minds

Jeff Spoelstra
10/07/2016

Project Goals:
- The goal is to create a model for look-ahead prediction/suggestion of a likely next word given a sequence of words entered by a user.
- The target environment for running the model is a hand-held device such as a smartphone or tablet.
- Model must be fast and use as little memory as possible while still being highly accurate.
Model Design
- The model is a combination of 2-gram, 3-gram, and 4-gram frequency lists along with R script code to determine the correct n-gram list to use to make a prediction.
- For each n-gram size, the 10 most probable next words (in order of frequency of occurence in the training text) were saved in the model.
- 256,000 lines of data (ranging from a few words to several sentences each) were taken from samples of online news story text and blog post text to build the n-gram lists (i.e., to train the model).
- Twitter text was not used because the grammar and vocabulary differed radically from the news/blog text.
- Total size of all the n-gram data used by the model: 32,017,752 bytes.
- The runtime version of the model does not require any R packages.

Improve prediction accuracy by creating separate models for each text source (news text, blog text, twitter text, etc.) to overcome the grammar and vocabulary differences of text source.
Separate models may each require less memory for data as well.
Use a larger list of probable next words in the model and use character-by-character lookup as a user types to drill-down to the most probable next word.
Optimization of the model algorithm and possibly re-coding into C to make it faster.

Full disclosure: the app doesn't really read minds. It's a cute context for showing the model in action.
App URL: https://jeffspoelstra.shinyapps.io/PredictText/
Enter a word sequence in the Sentence Fragment box and click on the Read My Mind button.
The model's prediction and alternative suggestions will appear below the button.

app-image