Capstone Project - Word predition

Hawk
2016.01.24

Introduction

This is a very simple application to list 3 possible predictions of next word of user input.

The application is hosted on shinyapps.io and its URL is: https://www.shinyapps.io/admin/#/application/79822

User only needs to input some english sentence in the text input box,

and the 3 possible next words will automatically listed on the page.

What's Behind the Magic?

The application is based on modern natural language processing technic by following procedure:

  1. preprocess the corpus
  2. create n-grams (3-grams in my application)
  3. smooth and discouting
  4. create prediction model

For more technical information, please refer to http://svr-www.eng.cam.ac.uk/~prc14/eurospeech97.ps

My Prediction Model

  1. Preload unigram, bigram, trigram table.

  2. Read user input string, get the last 2-ngrams after normalize the input.

  3. Search this 2-grams in the trigram table. If found, then return the 3 high possibility next words in the trigram table

  4. If not found, then search the first word of the 2-grams in the bigram table. If found, then return the 3 high possibility next words in the trigram table

  5. If still not found, then then return the 3 high possibility next words in the unigram table

Data Stats

  1. My trigram object size is:
  2. My bigram object size is:
  3. My unigram object size is:

Tools Used:

The CMU-Cambridge Statistical Language Modeling toolkit http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html