10/28/2021

The Word Predictor App uses natural language processing to predict the next word in a sentence

To Use the App

  • Type a word or phrase into the text box

    “Now I”

  • The app uses the last 4 words in the phrase to predict the next word
  • The most likely match to the input phrase is then displayed in the main panel

    “have”

Model Building Steps

  1. Raw data is collected from blogs, news articles, and twitter for model training
  2. Data set is cleaned, removing unwanted words/characters including: profanity, non-English words, numbers, punctuation
  3. Words are grouped into 2, 3, 4, and 5 word phrases called N-grams and saved as tibbles
  4. Tibbles are sorted by frequency and saved in repos
  5. A “Back-Off” prediction model is used to predict the next word in the N-gram

Benefits of the Chosen Model

  • Code is easy to read
  • Training data is processed quickly
  • A large portion of the corpus can be sampled for more accurate results
  • Only requires a small amount of saved memory

Resources and Links