WordChain

V. Boodram
August 23rd, 2015

WordChain is a word prediction applet, written for the JHU/Coursera Data Science Capstone Project. It accepts as input one or more English words, and returns up to three sentences that are most likely, given the input text. It may be viewed at https://budotron.shinyapps.io/App-1.

Usage

WordChain was designed to require the fewest number of keystrokes from the user as possible, to increase speed and user friendliness. A slider determines the number of output sentences. Words are entered into a textbox, and the output sentences are updated dynamically. Instructions are provided when the app is launched

alt text

Algorithm

The number of words are determined by counting the number of blank spaces

  • if the final three words have not been seen in the training process, the algorithm compensates by treating the last word as a noun, and predicts a part of the verb “to be”, appropriately
  • if the number of input words is greater than or equal to 3, the last three words are used to predict the next word
  • if the number of input words is equal to 2, the these two words are used to predict the next word
  • if the number of input words is equal to 1, this word is used to predict the next word

Model and Accuracy

Each prediction is made with a Naive Bayes model that was fit with a feature space of 1, 2, and 3 words, respectively. These models were tested, to determine the frequency with which the first, second and third words predicted were accurate

plot of chunk unnamed-chunk-1

Clearly, the app works best with shorter inputs

Benifits, and further work

This app offers two main benefits over traditional typing:

  • reduction in keystrokes (leading to less typing time, and may assist the handicapped in text entry)
  • spelling fidelity

The current models can be improved by adding more words to the feature space.

References, acknowledgements

J Eng, JM Eisner - Radiographics, 2004 - pubs.rsna.org, “Radiology Report Entry with Automatic Phrase Completion Driven by Language Modeling”

Thanks:

  • Shuhei Fuijiwara amd Derick Cornwald, for their suggestions in building this app
  • The DS team at JHU/Coursera, for providing such a great series of courses