Word Prediction with R and Shiny

Sean Rossouw
2019

Description of the Program


Demonstration of a word prediction using R and Shiny

  • Corpus generated from text scraping blogs, news and twitter (832 Mb)

  • 50% Used as training data, cleaned and tokenised

  • Unigrams, Bigrams and Trigrams generated and trimmed to create lookup tables

The Prediction Algorithm


  • The prediction takes a string and integer input

  • The last and second last words are used to search through matches in the lookup tables

  • Results are added together with adjustable weighting and the score used as a measure of likeliness of a match

  • The prediction returns a list of the n most highly scored predictions for the input string

How to use the program


  • Go to the Shiny app

  • Enter your text and select how many results you want returned with the slider

  • Click the Submit button

  • Try change the context or use the first two words of common phrases to see how results from the trigram are weighted above bigram and unigram results

More information


Thanks to Jeff Leek, Roger Peng and Brian Caffo