Using NLP to Read Minds
Jeff Spoelstra
10/07/2016
Project Goals & Model Design
Project Goals:
- The goal is to create a model for look-ahead prediction/suggestion of a likely next word given a sequence of words entered by a user.
- The target environment for running the model is a hand-held device such as a smartphone or tablet.
- Model must be fast and use as little memory as possible while still being highly accurate.
Model Design
- The model is a combination of 2-gram, 3-gram, and 4-gram frequency lists along with R script code to determine the correct n-gram list to use to make a prediction.
- For each n-gram size, the 10 most probable next words (in order of frequency of occurence in the training text) were saved in the model.
- 256,000 lines of data (ranging from a few words to several sentences each) were taken from samples of online news story text and blog post text to build the n-gram lists (i.e., to train the model).
- Twitter text was not used because the grammar and vocabulary differed radically from the news/blog text.
- Total size of all the n-gram data used by the model: 32,017,752 bytes.
- The runtime version of the model does not require any R packages.
Model Performance
- Results compiled from a test run using 250,000 lines of blog post text:
- 13.6% exact predictions - True Next Word (TNW) was the predicted word.
- 16.0% close predictions - TNW was one of 9 suggested alternative words.
- 50.2% of the close predictions suggested the TNW as one of the top 2 alternative words.
- 48.0% of the time the model recognized predictable sequences of 2 or more words.
- 4 predictions/second average rate.
Recommendations
- Improve prediction accuracy by creating separate models for each text source (news text, blog text, twitter text, etc.) to overcome the grammar and vocabulary differences of text source.
- Separate models may each require less memory for data as well.
- Use a larger list of probable next words in the model and use character-by-character lookup as a user types to drill-down to the most probable next word.
- Optimization of the model algorithm and possibly re-coding into C to make it faster.
Using the Sample App
- Full disclosure: the app doesn't really read minds. It's a cute context for showing the model in action.
- App URL: https://jeffspoelstra.shinyapps.io/PredictText/
- Enter a word sequence in the Sentence Fragment box and click on the Read My Mind button.
- The model's prediction and alternative suggestions will appear below the button.