NLP-WordPrediction

Michael Kamfonas
January 16, 2016

Two Shiny applications:

Single Word Prediction Application

  • Type or paste a partial sentence in the input text box and click on the buton to predict next word
  • One next word prediction is returned
  • Application is under 1GB and loads with default settings
  • A delay is experienced if the server needs to be restarted
  • If the input textbox is empty, the user sees an error message.
  • To manage space:
    • Only top-1 ranking prediction is retained in the model
    • Certain N-grams that result in equivalent predictions from the next lower (N-1)-gram were eliminated

Dynamic List Prediction Application

  • Predictions are generated reactively as text is typed.
  • If the last character is a space, the next word is predicted.
  • Predictions are filtered so they match the prefix after each character typed.
  • A prioritized list of predictions is returned. The user can control its length or show all predictions by setting the length to zero.
  • Ranks and scores of the predictions are included.
  • The server takes over a minute to start but predictions are fast, keeping up with typing new characters.

The Model Used

  • Based on example data from Blogs, Twits and News
  • The text was preprocessed as follows:
    • Breakdown into sentences
    • Eliminate white space and convert to lower case
    • Expand contractions (like it’s, hasn’t etc.)
    • Replace start of sentences, numerics, URLs, E-Mail addresses etc. with special generic tokens.

  • N-grams are generated and conditional probabilities calculated from counts.
  • A simple back-off model is used from 4-grams to bi-grams until match is found

Test: Matching order of the actual next word and N-Gram it was derived from

width