Capstone Project - Word Predition

Hawk
2016.01.24


Introduction

This is a very simple application to list most possible prediction of next word of user input.

The application is hosted on shinyapps.io and its URL is: https://yourwanghao.shinyapps.io/wordPrediction2/

Usage

User only needs to input some english sentence in the text input box, and the most possible next word will automatically listed on the page.

UI

What's Behind the Magic?

The application is based on modern natural language processing technic by the CMU-Cambridge Statistical Language Modeling toolkit (http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html)

  1. preprocess the corpus
  2. create n-grams (3-grams in my application)
  3. smooth and discouting
  4. create prediction model

For more technical information, please refer to http://svr-www.eng.cam.ac.uk/~prc14/eurospeech97.ps

My Prediction Model

  1. Preload unigram, bigram, trigram table.

  2. Read user input string, get the last 2-ngrams after normalize the input.

  3. Search this 2-grams in the trigram table. If found, then return the highest possibility next word in the trigram table

  4. If not found, then search the first word of the 2-grams in the bigram table. If found, then return the highest possibility next word in the trigram table

  5. If still not found, then then return the highest possibility word in the unigram table