Data Science CapStone: Predict the Next Word

angelayuan
Aug 4th 2015

Synopsis

Nowadays people are spending a lot of time on their mobile devices in a whole range of activities. But typing on mobile devices can be a serious pain. The purpose of current Data Science CapStone is to develop a shiny application which can predict the next word based on the text being entered.

Key steps to implement capstone project are

Prediction Algorithm

  • NGram and Backoff Model

After preopcessing the corpus (transforming to lower cases, removing punctuations, numbers, and profanity words, tokenization etc.), I build NGram and combine it with Backoff Model to predict the next word.

alt text

User Guide

  • After the app is lauched, you can see the following GUI. alt text
  • Input please enter non-profanity English word in the input box at the left panel and click Go button to run the prediction algorithm.
  • The program will used the last 1~3 words to predict the next word.

User Guide

  • Output You will see what you have entered and the top 10 words (if algorithm returns more than 10 words) will be listed in probability-decreasing order in a table at the right panel.
  • More information Click “About” tab on the navigation bar.

alt text