Capstone Project - Word predition

Hawk
2016.01.24


Introduction

This is a very simple application to list 3 possible predictions of next word of user input.

The application is hosted on shinyapps.io and its URL is: https://yourwanghao.shinyapps.io/wordPrediction2/

Usage

User only needs to input some english sentence in the text input box, and the 3 possible next words will automatically listed on the page.

UI

What's Behind the Magic?

The application is based on modern natural language processing technic by the CMU-Cambridge Statistical Language Modeling toolkit (http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html)

  1. preprocess the corpus
  2. create n-grams (3-grams in my application)
  3. smooth and discouting
  4. create prediction model

For more technical information, please refer to http://svr-www.eng.cam.ac.uk/~prc14/eurospeech97.ps

My Prediction Model

  1. Preload unigram, bigram, trigram table.

  2. Read user input string, get the last 2-ngrams after normalize the input.

  3. Search this 2-grams in the trigram table. If found, then return the 3 high possibility next words in the trigram table

  4. If not found, then search the first word of the 2-grams in the bigram table. If found, then return the 3 high possibility next words in the trigram table

  5. If still not found, then then return the 3 high possibility next words in the unigram table