May 22, 2018

Text Prediction App

This is a simple app that makes use of text analytics to predict the next word of the text you entered.

Drawing from three English text files (blog, twitter and new) from HC Corpora, Unigram, Bigram, Trigram and Four-Gram dictionaries were created from the sample. Based on Markov's Assumption:

"The future is independent of the past given the present"

We rely on the last few words of the input. For instance, for a bigram model,

\(P\)(the | its water is so transparent that)
is approximately the same as

\(P\)(the | that)

Building of N-gram Dictionaries

Text transformation is performed to:
  • Remove special characters found in foreign language
  • Replace contractions to full words for better predictive power and to denote the n-gram correctly.
  • Convert all text to lower case for ease of analysis
  • Remove URL, hash tags, twitter handles, profanity
  • Remove numbers, punctuation (but preserving intra-word dashes) and extra white space
N-gram dictionaries built using NGramTokenizer
  • Only words occuring > 3 times are stored to facilitate faster search and save memory space
  • Saved as text files (in order of frequency) for reference checks

Decription of Prediction Algorithm

  1. Reads user's input and conducts similar text transformations when building the n-gram dictionaries.
  2. Checks the last one to maximum three words of the input text against the unigram, bigram, trigram and four-gram dictionaries where applicable.
  3. Predicts possible next words based on the the word frequencies in the dictionaries. Given that the n-gram dictionaries are small, the main purpose is to display a listing of the predicted words rather than comparing probabilities. Maximum likelihood estimates are therefore not necessary.

Text Prediction App Features

  • Predicts the next word and displays the corresponding frequency
  • Faciliates search of the word you are looking for in the prediction list
  • Filters the list of the words by frequency or the letter(s) they contain
  • Provides a visualisation of all the predicted words with a word cloud

Please experience my Text Prediction App at https://ruthleeyl.shinyapps.io/textpredictionapp/.