NLP-WordPrediction
Michael Kamfonas
January 16, 2016
Single Word Prediction Application
- Type or paste a partial sentence in the input text box and click on the buton to predict next word
- One next word prediction is returned
- Application is under 1GB and loads with default settings
- A delay is experienced if the server needs to be restarted
- If the input textbox is empty, the user sees an error message.
- To manage space:
- Only top-1 ranking prediction is retained in the model
- Certain N-grams that result in equivalent predictions from the next lower (N-1)-gram were eliminated
Dynamic List Prediction Application
- Predictions are generated reactively as text is typed.
- If the last character is a space, the next word is predicted.
- Predictions are filtered so they match the prefix after each character typed.
- A prioritized list of predictions is returned. The user can control its length or show all predictions by setting the length to zero.
- Ranks and scores of the predictions are included.
- The server takes over a minute to start but predictions are fast, keeping up with typing new characters.
The Model Used
- Based on example data from Blogs, Twits and News
- The text was preprocessed as follows:
- Breakdown into sentences
- Eliminate white space and convert to lower case
- Expand contractions (like it’s, hasn’t etc.)
- Replace start of sentences, numerics, URLs, E-Mail addresses etc. with special generic tokens.
- N-grams are generated and conditional probabilities calculated from counts.
- A simple back-off model is used from 4-grams to bi-grams until match is found
Test: Matching order of the actual next word and N-Gram it was derived from