Word Predictor App

by Manish Gyawali

15 July 2021



Incremental Slides!

Press the forward/back button to move incrementally through the slides

  • Basic idea behind app: (Stupid Backoff method or SBM)
    • first use complex patterns (sequences of distant words) for prediction
    • if no prediction use simpler patterns (sequences of near words)
  • SBM based on Markov's probability rule or MPR but for textual patterns.
  • MPR: Past events are useless for making future predictions; only the present matters.
  • In practice, close past events may have some utility
  • By analogy to MPR, in SBM, events are either words or phrases
  • Some utility in using the word immediately preceeding the word or phrase using which we want to make predictions.

Additional R libraries used

  • qs (fast loading of data)
  • quanteda (textual data analysis)
  • stringi (string manipulation)
  • data.table (fast table manipualtion)
  • ggplot2 (graphical display)
  • shiny (interactive presentation)

User Options

  • 2 separate output formats

    • Graphical: presentation using the ggplot2 package
    • Tabular: presentation

Drawbacks

  • Only single word or two word phrases accepted as input
  • May not work effectively with punctuation

How to use the app:

  • select type of representation (graphial/tabular) and predict!

App results:

  • Graphical output illustrated for normal (no backoff) prediction of one word

  • Depending on input,could get different outputs

Backoff predictions

  • SBM/MPR was used to predict word using most recent word “trump” as there was no phrase “dona trump” in the data

  • no matter how many words are used, only last two are used

  • if there is no match, app uses ML to list most frequent words in the data (not illustrated)