Capstone: Text Prediction

Ray Jones
24th April 2016

Why ?

Do you get impatient with all that typing ?

  • Is it tricky to find all those tiny keys with your big fingers ?

  • All that spelling, do you really have to remember that ?

  • You asked yourself “Why does it have to be this hard ?”

  • Why can't my computer / phone / iPad do it all for me ?

  • Why can't it - because you didn't have this shiny app !

  • Well, now you do … so it can !

What ?

  • Problem - from the text already typed - predict next word
  • 25% random sample of the en_US news, blogs and twitter data was processed to:
    • Build a corpus & set all cases to lower
    • Remove numerics / punctuation / profanities
    • Tokenize and detect all 1, 2, 3, 4 and 5-grams
  • Done using the {stylo} make.ngrams() function
  • N-grams were then saved in .RData format
  • N-grams used for text prediction by the shiny app (next slide)

How ?

  • The shiny app parses input text to detect input words
  • Then IDs all possible corpus 2, 3, 4 & 5-grams from the last 1, 2, 3 & 4 input words
  • Assigns score to each possible predicted word based on frequency of specific corpus n-gram versus all similar n-grams:
if (exists_5g)      score<-1.000*f5g/fall5gs
else if (exists_4g) score<-0.400*f4g/fall4gs
else if (exists_3g) score<-0.160*f3g/fall3gs
else                score<-0.064*f2g/fall2gs
  • Once scores assigned, results are ordered (descending) & highest scoring word selected

What do I have to do ... ?

  • It's easy, just launch the shiny app

  • Wait for the n-gram data to load (“the” will be predicted)

  • Then enter your text in the box on the left hand side

  • The predicted next word will appear in the box on the right hand side

  • Below it, you can select to show a bar chart with the scores of the top 5 candidates, use this to assess confidence in the prediction.

  • …. enjoy !